Not enough parallelism in programming
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Not enough parallelism in programming
Goto page Previous  1, 2, 3, 4, 5 ... 16, 17, 18  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
JJ
Guest





Posted: Sat Aug 27, 2005 12:15 am    Post subject: Re: Not enough parallelism in programming Reply with quote

anonymous wrote:
Quote:
JJ wrote:
What Rupert refers to will be presented at cpa2005 in a few weeks.

"New Transputer Design for FPGAs" although FPGAs are just a away to
prototype the architecture and not the main idea presented.

In effect a collection of 4 way threaded light PEs "ALT"ing into an 8
way threaded (or interleaved), hashed RLDRAM. These PEs themselves look
quite a bit like sparcs with register sliding reg cache per thread but
thats not important either, in fact any ISA would work here even x86
:-/

The real idea is that threaded cpus with conventional cache DRAM
hierarchy are even worse than single threaded cpus putting even more
pressure on the cache and hence more threads, less locality. They can
in theory churn through more ops though but only if they can be fed and
it yes it would be a nightmare to syncronize these threads through
120ns DRAM.

The right way to help multithreaded cpus is to multithread the memory
too, which can be done with RLDRAM (8 way issue every 2.5ns rate) and
effectively gives each PE a mem access of a few instructions slots
across the entire DRAM array hence no data cache. It's not raw
bandwidth or latency that matters, its the rate of uncorrelated memory
issues that can be dispensed across multiple banks on behalf of a no of
cooperating (or not) threads. This leads into a hashing MMU and
protected objects done in the clock cycles. Concurency support for
occam etc provided with protected objects.

For the application layer I am inclined to see if occam and Objective C
makes more sense rather than C++, but thats another paper

transputer2 at yahoo ....
johnjakson at usa ...

RLDRAM reads as a better replacement for Direct Ram Bus DRAM for my
VLIW SMP MPP FORTH theory. Thank you.

Atleast someone paying attention. Micron isn't though, the only way to
get their attention is to talk in terms of networking. Rambus is also
talking up XDR2 for 4? way interleaving too but with conventional
horrid latencies. Atleast you can prototype RLDRAM at 75% of best speed
with high end FPGA at 300MHz access rates and you could even fake the
memory model in SRAM by imposing artificial bank constraints with
timers. That will allow you to do low end prototyping with Spartan SRAM
boards and change the banking ratio to anything you like. There is also
(Fast Cycle) FCDRAM but I think its the same networking architecture
story. Micron FAE even told me they had been asked to increase banking
interleave to 128 by some customers, that would be sweet, almost no
collisions. And next year or 07 they have RLDRAM3 coming with all no's
trimmed 25%, sub 2ns memory issue rates, 15ns latency if you can drive
a 533MHz DDR bus.

johnjakson at usa ...
transputer2 at yahoo ....
Back to top
Hank Oredson
Guest





Posted: Sat Aug 27, 2005 12:16 am    Post subject: Re: Not enough parallelism in programming Reply with quote

"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:denn8l$3c5$1@gemini.csx.cam.ac.uk...
Quote:
In article <s71irxsr2ah.fsf@beryl.CS.Berkeley.EDU>,
David Gay <dgay@beryl.CS.Berkeley.EDU> wrote:

2) The threads have to be such that they ARE restartable. That
imposes a pretty considerable constraint on the programming paradigm,
and cannot practically be done in any third generation von Neumann
language, without imposing draconian restrictions.

I think you're missing the bit about "hardware support" here. The
conflicts are defined in terms of virtual addresses, and the processor
fully supports restoring itself to an earlier state. There are a few
minor (well ok, not so minor) issues of course:
- can the hardware really rollback arbitrarily far?
- what about aliasing of virtual addresses to each other?
- what about I/O? (you mentioned that one :-))

No, I didn't. You are right that they aren't minor - in particular,
it is very likely that a thread will update a large proportion of
its data before a conflict is discovered. That was the point of my
remark about checkpoints.

There are also non-memory CPU states (e.g. IEEE 754 modes and flags),
signals and numerous other 'hidden' data. The existence of these is
why checkpoint/restart for arbitrary programs has failed dismally
every single time it has been introduced over a period of 35+ years.
I fail to see that this approach is all that different - if it claims
to be able to handle arbitrary programs :-(

Chris's paper talks about how some of these (and your next point) are
resolved in a real body of code; it's definitely worth a read.

Yes, I intend to. As I said, database codes are one area where this
might work pretty well.


This whole discussion reminds me of DBMS work on concurrency.

One might need locking at db, table, row, field level.
There might be multiple pending updates at each level.
The update / commit / rollback paradigm has been well-explored.
Recovery without transaction loss is almost always required.
Etc.

Perhaps there are some lessons from that world that apply
to the lower level issues under discussion in this thread ?

"Beavis ... heh heh ... he said "thread" ... heh heh."

--

... Hank

http://home.earthlink.net/~horedson
http://home.earthlink.net/~w0rli
Back to top
Rupert Pigott
Guest





Posted: Sat Aug 27, 2005 12:16 am    Post subject: Re: Not enough parallelism in programming Reply with quote

Sander Vesik wrote:
Quote:
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:

There is certainly a possibility that someone will make a breakthrough
where hundreds of people have failed before, but I am not holding my
breath waiting for it. I know of NO approach that has a significant
chance of success until and unless we are prepared to abandon the
serial von Neumann paradigm.


But are we prepared? A large part of the progarmmers manage to output
really bad code for even 'serial von Neumann' and tend to cope best
with very simplistic locking strategies if paralellism in any way is
needed. And much prefer the 'lock it all' strategy, esp after they can't
debug the other ones.

My feeling is that we are *not* prepared because we are using
von-Neumann tools to solve || problems. The fact that embedded
and hardware types are so different in their approach to SW
speaks volumes to me.

Many of the problems I have seen with "serial von-Neumann" have
been a result of people trying to cram inherently parallel
constructs into sequential models. Hardware types and || types
tend to feel very frustrated by the von-Neumann tools they are
forced to use to model and implement inherently parallel systems.

People talk about the expressive power of C/C++. Sure, it's great
for sequential stuff, but it provides precisely zero expression
for real world constructs such as a simple timeout on I/O. This
is *not* an isolated one-in-a-million situation, it crops up all
the time and yet it simply is not part of the language model.
Instead we have a bunch of libraries (eg: Pthreads) that require
the toolchain to work outside of the language spec to implement
correctly... That is asking for trouble, right ?

That is a gap that needs to be filled. You can't reasonably expect
people to build a moon-rocket with an Adze and a plank of wood.

Hell, I'm not saying anything new and I get the feeling that I
am preaching to the choir. :(

Cheers,
Rupert
Back to top
glen herrmannsfeldt
Guest





Posted: Sat Aug 27, 2005 12:16 am    Post subject: Re: Not enough parallelism in programming Reply with quote

Nick Maclaren wrote:

(snip)

Quote:
There are also non-memory CPU states (e.g. IEEE 754 modes and flags),
signals and numerous other 'hidden' data. The existence of these is
why checkpoint/restart for arbitrary programs has failed dismally
every single time it has been introduced over a period of 35+ years.
I fail to see that this approach is all that different - if it claims
to be able to handle arbitrary programs :-(

I remember reading about Checkpoint/Restart in OS/360 many years ago,
long before I had any programs where it would be useful.

Well, I learned a lot about OS design from reading IBM manuals such
as Supervisor Services and Macro Instructions, and other manuals
related to details about OS/360. I even read the PCP (Fixed Task
Supervisor) Program Logic Manual which also may have described
Checkpoint/Restart.

I don't think I have ever known anyone to actually use it, though.

-- glen
Back to top
Joe Seigh
Guest





Posted: Sat Aug 27, 2005 5:34 am    Post subject: Re: Not enough parallelism in programming Reply with quote

Andi Kleen wrote:
Quote:
MitchAlsup@aol.com writes:


All synchronization in current machines is memory based. Memory latency
remains rather constant as processor frequency scales to every higher
numbers. So 10 years ago when CPUs were at 200 MHz and main memory was
150 ns away, one could conceivably perform 30 instructions between
synchronization events (and 300 instructions between synchronizations
is vastly more reasonable). Now we have 3GHz machines with 120ns memory
access times.


Hmm - but don't some cache line coherency protocols with more states
than MESI (like MOESI) allow cache line transfer between CPUs without
going through memory? In that case the 2x120ns latency would be
avoided and replaced with the presumably shorter CPU<->CPU latency
time. Of course the dirty cache line would still need to be eventually
written back to memory. But that write wouldn't be on the critical path
and it could be done slowly in the background and just turn into an easier
"bandwidth problem" compare to a hard "latency problem".

With suitable instructions other tricks might be possible, e.g. the
new MONITOR/MWAIT on x86 look like they could theoretically optimize
much more in this space. e.g. it could tell the other CPU to toss
you over the cache line as soon as it has changed.

Some of the lock-free stuff tolerates stale data so you could reduce

cache traffic even further with a slightly different protocol that
let the software have more control over invalidating the cache lines.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Andi Kleen
Guest





Posted: Sat Aug 27, 2005 7:07 am    Post subject: Re: Not enough parallelism in programming Reply with quote

Joe Seigh <jseigh_01@xemaps.com> writes:
Quote:

Some of the lock-free stuff tolerates stale data so you could reduce
cache traffic even further with a slightly different protocol that
let the software have more control over invalidating the cache lines.

There is already the CLFLUSH instruction on x86 that allows you
to flush a cache line. Apparently it helps some MPI setups.

-Andi
Back to top
Nick Maclaren
Guest





Posted: Sat Aug 27, 2005 3:09 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

In article <1125098795.067104.304130@g43g2000cwa.googlegroups.com>,
Rupert Pigott <darkboong@hotmail.com> wrote:
Quote:
Joe Seigh wrote:

It's worse than that. Posix pthreads has no formal semantics.
So there's no way to formally prove correctness of implementation.

Nick and others have gnashed their teeth muchly on this point. :)

That is definitely a matter of public record :-)

Quote:
They're working on thread support in the C++ language (not the
C language). They're working on a memory model for C++.
I'm not sure how it will work out.

Good ! Let's hope for the best, eh ? :)

Yes. I have joined the BSI C++ panel specifically for this matter.
When the relevant person produces a draft, I shall try to help to
make it tight enough to use without constraining optimisation too
much. This might be tricky ....


Regards,
Nick Maclaren.
Back to top
Nick Maclaren
Guest





Posted: Sat Aug 27, 2005 3:14 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

In article <oRMPe.138087$5N3.38120@bgtnsc05-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
Quote:


What new directions have been proposed? If I missed one, please
remind me.

How about the one discussed by Chris Colohan in his post? I don't claim
that it is radically new, but I haven't seen anything like it before, and,
if is successfull, seems to offer a way to get there (or at least closer to
there) from here (where here is existing sequential programs).

I had heard of such approaches. For the reasons I gave in a previous
posting, I am not convinced that it can be made to work in general.

My point isn't that it isn't a useful tool for the back-end (i.e. the
implementation of the parallelism), but that it doesn't help much for
the front-end (i.e. the identification of parallelism and detection
of aliasing). I can see no way to do that that does not involve
requiring programmers to change the paradigms that they use, and 40
years of research and development has failed to get anywhere with
starting from the serial von Neumann paradigm.


Regards,
Nick Maclaren.
Back to top
Nick Maclaren
Guest





Posted: Sat Aug 27, 2005 3:39 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

In article <2TMPe.679967$cg1.123073@bgtnsc04-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
Quote:
"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:denksk$s1d$1@gemini.csx.cam.ac.uk...

Bitter experience is that, even when that is possible, it leads to
very poor use of parallelism and low scalability. Unless you design
in parallelism (often by NOT designing in seriality), you usually
end up with no more than a factor of 2-4 improvement. Few practical
programmers will bother to modify their programs for that level of
improvement, as it is easier to wait a couple of years.

Two comments.

First, for a start, a factor of 2-4 may be good enough. We are talking here
about "typical", (i.e. not highly tuned HPC) application taking abvantage of
a 2-4 core implementation. If it gets typical programmers started and shows
them, through that experience what kinds of things work well and what kinds
of things cause problems (through the feedback mechanism that Chris
proposed) that will lead to more improvement.

Well, that has been said more times than I care to recall, and has
proved to be the converse of the truth (in both respects) every time.
I am almost certain, for reasons given below, that it is false.

Quote:
Second, while in the past one could wait a couple of years and expect a 2-4
times impropvement from process, etc. that time seems to be past. I expect
that will cause more emphasis on exploiting the parallelsim for general
applications.

That is possibly true, but we have no historical record to guide us.
I am dubious, but not certain that it is false.

Quote:
Again, I don't claim an advantage for the kinds of programs you deal with as
they have already had the "easy" paralelism (and often much of the "hard"
parallelsim) already built in.

I was actually referring more to 'general' programs, which I actually
have more experience with (perhaps surprisingly!) There are actually
several fundamental reasons why factors of 2-4 aren't enough:

1) That is not far away from the range that can be obtained by
simply changing the specification slightly, splitting an application
in multiple parts and so on. And those approaches are much easier.
So it will apply only when that has not been done.

2) Adding the type of parallelism described is VERY error-prone
(for the reasons I gave), especially when implemented by the average
software house (with a high turnover of monkeys). Even when I extend
my own code, I often make a mistake, forget a constraint, and introduce
a bug. You CAN'T document everything!

3) The solution to a failure is often to switch off the parallelism
and specify a higher performance computer. While this MAY be getting
less easy, there is still a factor of 2-4 between the slowest and
fastest systems of a particular type based on a single process.

4) In the commercial arena, multiple CPUs have been widespread for
25 years and near-universal for 15. My remark about bitter experience
is based on a LOT of evidence.

5) Related to that, all experience is that the way to use 2-4 cores
for a 'normal' personal or server system is to use them to reduce
context switching and allow the kernel and I/O to run in parallel with
the application processes. Something that is often missed is that
this approach could EASILY be extended to use up to (say) 16 cores,
with very little impact on application code (but a large impact on
standards, especially POSIX).


An explanation of the last. Let us say that we change the I/O design
from a synchronous, copying one to a streaming, asynchronous one.
This would immediately enable the use of extra cores for speeding up
I/O - and, if it were done at every level, that would use a LOT for
the very high I/O commercial codes, networking etc.

The best thing is that it would be largely transparent to the simple
applications, because it can be implemented behind both Fortran and C
I/O. Indeed, I have done it, in both cases :-) It would need a
major change to most TCP/IP implementations, but almost no change to
the specification. And so on.


Regards,
Nick Maclaren.
Back to top
Stephen Fuld
Guest





Posted: Sat Aug 27, 2005 4:15 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:depfsl$ms8$1@gemini.csx.cam.ac.uk...
Quote:
In article <2TMPe.679967$cg1.123073@bgtnsc04-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:denksk$s1d$1@gemini.csx.cam.ac.uk...

Bitter experience is that, even when that is possible, it leads to
very poor use of parallelism and low scalability. Unless you design
in parallelism (often by NOT designing in seriality), you usually
end up with no more than a factor of 2-4 improvement. Few practical
programmers will bother to modify their programs for that level of
improvement, as it is easier to wait a couple of years.

Two comments.

First, for a start, a factor of 2-4 may be good enough. We are talking
here
about "typical", (i.e. not highly tuned HPC) application taking abvantage
of
a 2-4 core implementation. If it gets typical programmers started and
shows
them, through that experience what kinds of things work well and what
kinds
of things cause problems (through the feedback mechanism that Chris
proposed) that will lead to more improvement.

Well, that has been said more times than I care to recall, and has
proved to be the converse of the truth (in both respects) every time.
I am almost certain, for reasons given below, that it is false.

Second, while in the past one could wait a couple of years and expect a
2-4
times impropvement from process, etc. that time seems to be past. I
expect
that will cause more emphasis on exploiting the parallelsim for general
applications.

That is possibly true, but we have no historical record to guide us.
I am dubious, but not certain that it is false.

Again, I don't claim an advantage for the kinds of programs you deal with
as
they have already had the "easy" paralelism (and often much of the "hard"
parallelsim) already built in.

I was actually referring more to 'general' programs, which I actually
have more experience with (perhaps surprisingly!) There are actually
several fundamental reasons why factors of 2-4 aren't enough:

1) That is not far away from the range that can be obtained by
simply changing the specification slightly, splitting an application
in multiple parts and so on. And those approaches are much easier.
So it will apply only when that has not been done.

OK, so even accepting that, it only "delays the inevitable" for a while
(assuming current trends continue).

Quote:
2) Adding the type of parallelism described is VERY error-prone
(for the reasons I gave), especially when implemented by the average
software house (with a high turnover of monkeys). Even when I extend
my own code, I often make a mistake, forget a constraint, and introduce
a bug. You CAN'T document everything!

Right. That is why I like Chris' approach. As I understand it, it still
guarantees correctness (at least to the same level as the single thread
program), and gives feedback about where the proposed increased parallelism
failed. Comparing the before and after change reports should indicate a
problem and where it might be.

Quote:
3) The solution to a failure is often to switch off the parallelism
and specify a higher performance computer. While this MAY be getting
less easy, there is still a factor of 2-4 between the slowest and
fastest systems of a particular type based on a single process.

But, again, pending more information, with Chris' approach it isn't and "all
or nothing" thing to turn on or off parallelism. You leave in the
directives where it helps, and remove them where the report says it is a
loss. It seems to be pretty easy to "play around" and move the directives
around to evaluate performance, all the time still guaranteesing
correctness.

Quote:
4) In the commercial arena, multiple CPUs have been widespread for
25 years and near-universal for 15. My remark about bitter experience
is based on a LOT of evidence.

Yes, for servers, where inter task parallelism is pretty easy, but not for
desktops. Wheras today there are relativly few existing multi-processor
desktops, within say 5 years, I expect there to be very few uniprocessor
desktops around.

--
- Stephen Fuld
e-mail address disguised to prevent spam
Back to top
Alexander Terekhov
Guest





Posted: Sat Aug 27, 2005 4:15 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

Nick Maclaren wrote:
[...]
Quote:
When the relevant person produces a draft,

Don't wait.

http://jupiter.robustserver.com/pipermail/cpp-threads_decadentplace.org.uk

regards,
alexander.

P.S. http://www.hpl.hp.com/personal/Hans_Boehm/tmp/tremblant.pdf

<quote>

It has become increasingly clear that any specification will have
an unavoidable impact on compiler optimization. Some currently
common compiler optimizations need to be adapted to ensure thread
safety. But this also reinforces the urgency for thread support in
C++: Current implementations make it much harder than it should be
to write correct multithreaded code.

Our progress has been slowed by both the technical difficulties of
defining a memory model that is compatible with a high performance
atomics library, and by disagreements about the atomics library
itself.

</quote>
Back to top
Stephen Fuld
Guest





Posted: Sat Aug 27, 2005 4:15 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:depeek$k5g$1@gemini.csx.cam.ac.uk...
Quote:
In article <oRMPe.138087$5N3.38120@bgtnsc05-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:


What new directions have been proposed? If I missed one, please
remind me.

How about the one discussed by Chris Colohan in his post? I don't claim
that it is radically new, but I haven't seen anything like it before, and,
if is successfull, seems to offer a way to get there (or at least closer
to
there) from here (where here is existing sequential programs).

I had heard of such approaches. For the reasons I gave in a previous
posting, I am not convinced that it can be made to work in general.

Sorry, I saw your response to Chris only after I posted the previous.

Quote:
My point isn't that it isn't a useful tool for the back-end (i.e. the
implementation of the parallelism), but that it doesn't help much for
the front-end (i.e. the identification of parallelism and detection
of aliasing). I can see no way to do that that does not involve
requiring programmers to change the paradigms that they use, and 40
years of research and development has failed to get anywhere with
starting from the serial von Neumann paradigm.

I like the idea of the feedback mechanism he discussed as a "learning
mechanism". Hopefully, programmers will get some easy to use report that
indicates where parallel limiting performance occurred, and what to do about
it. As the proceed, they may be able to anticipate and thus internalize the
results, which would lead incrementally to better programming.

--
- Stephen Fuld
e-mail address disguised to prevent spam
Back to top
Kees van Reeuwijk
Guest





Posted: Sat Aug 27, 2005 10:22 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:

Quote:
In article <1h1vp39.sn58yjk7bf7qN%reeuwijk@few.vu.nl>,
Kees van Reeuwijk <reeuwijk@few.vu.nl> wrote:
Another important model is divide-and-conquer, and its degenerate case
farmer-worker.

Yes. I omitted those because you can fairly regard them as low
communication (i.e. trivial) cases. But why communicate if you
have nothing to say? :-)

It is still necessary to communicate task input and output data. Beyond
toy programs that may be a large.
Back to top
Nick Maclaren
Guest





Posted: Sat Aug 27, 2005 11:45 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

In article <43107EC3.459BA34@web.de>,
Alexander Terekhov <terekhov@web.de> wrote:
Quote:

Nick Maclaren wrote:
[...]
When the relevant person produces a draft,

Don't wait.

http://jupiter.robustserver.com/pipermail/cpp-threads_decadentplace.org.uk

Thanks. I will take a look when I get a moment.

< Hans Boehm quote>
Quote:

It has become increasingly clear that any specification will have
an unavoidable impact on compiler optimization. Some currently
common compiler optimizations need to be adapted to ensure thread
safety. But this also reinforces the urgency for thread support in
C++: Current implementations make it much harder than it should be
to write correct multithreaded code.

Yeah. That sort of paragraph makes me feel that he understands the
issues.


Regards,
Nick Maclaren.
Back to top
Nick Maclaren
Guest





Posted: Sun Aug 28, 2005 12:00 am    Post subject: Re: Not enough parallelism in programming Reply with quote

In article <vL%Pe.140436$5N3.125056@bgtnsc05-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
Quote:

2) Adding the type of parallelism described is VERY error-prone
(for the reasons I gave), especially when implemented by the average
software house (with a high turnover of monkeys). Even when I extend
my own code, I often make a mistake, forget a constraint, and introduce
a bug. You CAN'T document everything!

Right. That is why I like Chris' approach. As I understand it, it still
guarantees correctness (at least to the same level as the single thread
program), and gives feedback about where the proposed increased parallelism
failed. Comparing the before and after change reports should indicate a
problem and where it might be.

And that is one of the things that I don't believe. Even ignoring
non-memory interactions, there is a BIG problem with partial update
of aggregates. If two threads update two fields in the same structure,
it can invalidate the structure, but the hardware will think that they
are unrelated.

This doesn't just affect contiguous structures, incidentally, but
even ones like Dirichlet tesselations, B-trees and so on. Update
those in parallel, and one thread will be certain to invalidate the
invariants assumed by the other while the former's update is taking
place, leading to corruption. And, of course, the effect is a race
condition of an evil form ....

Quote:
But, again, pending more information, with Chris' approach it isn't and "all
or nothing" thing to turn on or off parallelism. You leave in the
directives where it helps, and remove them where the report says it is a
loss. It seems to be pretty easy to "play around" and move the directives
around to evaluate performance, all the time still guaranteesing
correctness.

No, that works only with the detected problems. I am referring to the
more serious (and VERY common) ones where the result of parallelisation
is unrepeatable wrong answers. It is very common for binary chop to
fail to help - i.e. the problem appears and disappears more-or-less
independently of WHICH bit of code you serialise.

Quote:
4) In the commercial arena, multiple CPUs have been widespread for
25 years and near-universal for 15. My remark about bitter experience
is based on a LOT of evidence.

Yes, for servers, where inter task parallelism is pretty easy, but not for
desktops. Wheras today there are relativly few existing multi-processor
desktops, within say 5 years, I expect there to be very few uniprocessor
desktops around.

The situation is no different there. Most desktops nowadays are used
to run GUIs, often to run an application that accesses the network.
There is a LOT of inter-task parallelisation that is feasible by a
redesign of the components (as I said, in the network case, without
even a change of specification).

That isn't possible for GUIs without fairly radical changes to the
specification, but I flatly don't believe that his approach will work
for GUIs. They are so unspeakably disgusting that even switching
on quite modest optimisation often causes them to fail in the almost
unlocatable way I describe above. Been there - been defeated by
that :-(


Regards,
Nick Maclaren.
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2, 3, 4, 5 ... 16, 17, 18  Next
Page 4 of 18

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB