Joachim Worringen
Guest
|
Posted:
Fri Dec 23, 2005 9:15 am Post subject:
Re: Auto Parallelization |
|
|
Greg Lindahl schrieb:
| Quote: | In article <40psofF1bnqiqU2@individual.net>,
Jan Vorbrüggen <jvorbrueggen-not@mediasec.de> wrote:
and I haven't heard yet that
profile-based feedback is used to drive parallelization, although that
might have happened before to some degree.
It's a standard feature -- the main feedback is whether or not a loop
has enough work to make parallelization worthwhile.
|
It's a little bit different in this case as the compiler does not only
look at loops, but also on arbitrary branches. Supported by the
hardware, it executes multiple branches in parallel and stores the
respective data in private memory buffers. It's sort of extreme
speculative execution.
However, this development mostly targets mostly embedded, mobile
systems, and not HPC. The previous generation of this multi-core
hardware will appear in cell phones next year, IIRC.
Joachim |
|
Joachim Worringen
Guest
|
Posted:
Fri Dec 23, 2005 9:15 am Post subject:
Re: Auto Parallelization |
|
|
Paul Gotch schrieb:
| Quote: | I suspect it's not quite that. NEC have been researching coarse grain
speculative threading for years. The idea being that you speculatively
execute ahead of branches on both execution paths then cancel the incorrect
thread as soon as you know if the branch was actually taken or not.
NEC were looking at implementing an existing architecture using this
technique but they obviously came to the conclusion that they could get
better performance with some compiler support probably to add hinting
instructions. The same way as you can get better usage of caches with
carefully used prefetch instructions, which architecturally are just NOPs.
I suspect NEC have combined this with conventional feedback directed
optimisation such that you don't execute down both paths of all branches,
only ones you can't predict with high certainty, and also doing conventional
autovectorisation of some loops to vector units.
|
Fully correct, apart from the fact that this technique is (not yet)
applied to SX vector CPUs, but to multi-core CPUs for embedded
applications. Probably a bigger market...
Joachim |
|