| Author |
Message |
Sander Vesik
Guest
|
Posted:
Fri Aug 26, 2005 6:40 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
| Quote: |
There is certainly a possibility that someone will make a breakthrough
where hundreds of people have failed before, but I am not holding my
breath waiting for it. I know of NO approach that has a significant
chance of success until and unless we are prepared to abandon the
serial von Neumann paradigm.
|
But are we prepared? A large part of the progarmmers manage to output
really bad code for even 'serial von Neumann' and tend to cope best
with very simplistic locking strategies if paralellism in any way is
needed. And much prefer the 'lock it all' strategy, esp after they can't
debug the other ones.
| Quote: |
Regards,
Nick Maclaren.
|
--
Sander
+++ Out of cheese error +++ |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Fri Aug 26, 2005 9:49 pm Post subject:
Re: Not enough parallelism in programming |
|
|
"Greg Lindahl" <lindahl@pbm.com> wrote in message
news:430e0315$1@news.meer.net...
| Quote: | In article <hHkPe.133637$5N3.100234@bgtnsc05-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
I'd like to explore this further. ISTM that there are only a few
languages
that have any explicit way to declare parallelism and that they differ in
the "level" of parallelism they express.
After 30 years of research, there are more than just a few which have
been proposed, experimented with, and spilled much ink on papers about
the results, etc.
|
Yes, but, as demonstrated by some of the posts in this thread, there is some
new research in different directions.
| Quote: | I'm not sure what you hope to accomplish with a comp.arch discussion
of such a huge body of work.
|
Pretty much what has happened. :-) (I don't claim credit for that, as most
appear to be responses to the OP, not mine, but effect is still good.) Some
information on current status of old ideas and some information and pointers
to newer research.
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Fri Aug 26, 2005 10:05 pm Post subject:
Re: Not enough parallelism in programming |
|
|
In article <LKHPe.678440$cg1.572020@bgtnsc04-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
| Quote: |
After 30 years of research, there are more than just a few which have
been proposed, experimented with, and spilled much ink on papers about
the results, etc.
Yes, but, as demonstrated by some of the posts in this thread, there is some
new research in different directions.
|
New research, perhaps. Different directions, no. I haven't seen
anything that isn't work on approaches that haven't been flogged
to death several times before[*].
We know where we can get starting from third-generation von Neumann
languages, and it isn't anywhere very useful. C and C++ may be
hopeless, but Fortran isn't much better. We know where we can get
with improved paradigms (including strictly constrained variants
of Fortran and even C or C++), and it is a lot further, but only
about as far as we can get with the more abstract programming
languages.
We know that it isn't generally feasible to get to those paradigms
starting from a large, complex program that is written to a serial
von Neumann design. And we know the reluctance of the programming
community (often the beancounters, rather than the programmers) to
move away from serial von Neumann.
There is certainly a possibility that someone will make a breakthrough
where hundreds of people have failed before, but I am not holding my
breath waiting for it. I know of NO approach that has a significant
chance of success until and unless we are prepared to abandon the
serial von Neumann paradigm.
What new directions have been proposed? If I missed one, please
remind me.
[*] Yes, they have all the characteristics of zombie mules.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Fri Aug 26, 2005 10:52 pm Post subject:
Re: Not enough parallelism in programming |
|
|
In article <ucloe7lvmqc.fsf@cilento.stampede.cs.cmu.edu>,
Chris Colohan <colohan+@cs.cmu.edu> wrote:
| Quote: |
I hate to put a plug in for my own work, but there is an interesting
point in the middle. The choice is not just "explicit parallelism"
(with all of the pain of creating parallel programs) or "fully
automatic parallelism" (with no benefits if your compiler is unable to
find suitable threads).
|
Well, yes. That is essentially what OpenMP does, and has been a
standard method for at least 25 years. What it does is to decouple
the issue of identifying parallelism (and its converse, aliasing)
and say that it is someone else's problem. Note that I am NOT saying
that isn't justifiable, merely that it shouldn't be claimed to be a
complete solution.
| Quote: | There has been a whole load of work in the past 15 years or so on
speculative parallel architectures. The main work in this area
started with the Multiscalar architecture out of Wisconsin, and other
projects in the area include Hydra (Stanford), IACOMA (Illinois), RAW
(MIT), and STAMPede (my project at CMU). In these architectures you
mark thread boundaries in a sequential program (written in any
language, including C or even assembly), and the machine will execute
the code in parallel, while maintaining the original sequential
semantics. It does this by detecting data dependences which exist
between your threads and either partially or completely restarting
threads which consume mis-speculated values.
|
Right. The basic approach is much older than 15 years, but the one
of detecting aliasing and restarting when it is encountered is fairly
new. However, it has three obvious fundamental problems:
1) There has to be a clean definition of aliasing, and it has
to be detectable. Despite what you say above, you CAN'T do it in C,
for reasons I describe in my Objects diatribe (please ask for a copy
if you want). You might be able to do it in a suitable variant of C,
if you have the stomache to define one :-(
2) The threads have to be such that they ARE restartable. That
imposes a pretty considerable constraint on the programming paradigm,
and cannot practically be done in any third generation von Neumann
language, without imposing draconian restrictions.
3) It will be effective only if such aliasing is sparse. The
behaviour of such retry algorithms is well understood, and they decay
catastrophically as the number of conflicts increases. Again, this
imposes constraints on the programming paradigm.
| Quote: | Under the thread-level speculation (TLS) programming model the
programmer starts with a sequential program, and simply marks where
new threads should be spawned. They run it on a TLS machine, and they
get a parallel _and correct_ execution. At first, the performance may
not be that good, since many threads will restart due to dependences.
|
I am sorry, but I don't believe that bald statement. You must add
extra constraints or mechanisms to ensure that threads ARE restartable.
If you take a checkpoint, you are multiplying the store requirements
(even excluding the evil problem of non-memory side-effects, which
are so critical for real programs).
How do you ensure that you can restore a thread's state to its initial
one when restarting it? Memory, file states, CPU flags, signals and
all?
| Quote: | The TLS machine can provide profile feedback which lists the most
frequent dependences, and the programmer can use this to optimize
their program -- they can either modify the code or they can change
their thread spawn points to avoid dependences. Using this method a
programmer can _gradually_ add parallelism to their program: first
they add threads, and then they only have to change the parts of their
code which are frequently executed _and_ have frequent dependences
between the threads.
|
Bitter experience is that, even when that is possible, it leads to
very poor use of parallelism and low scalability. Unless you design
in parallelism (often by NOT designing in seriality), you usually
end up with no more than a factor of 2-4 improvement. Few practical
programmers will bother to modify their programs for that level of
improvement, as it is easier to wait a couple of years.
There are problem areas where this is not true, but not all that many.
| Quote: | I have been working on applying these techniques to database
transactions for my thesis, and will be presenting a paper on this at
VLDB next week. If you are interested, the paper (and a tech report
on the hardware aspects) are linked off of my home page:
www.colohan.com
|
Ah. Now, THAT I can believe. Typical database designs (well, good
ones, anyway) have well-defined aliasing rules and their primitive
operations are designed to be retriable. Also many of their uses
have fairly sparse aliasing in practice.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
David Gay
Guest
|
Posted:
Fri Aug 26, 2005 11:17 pm Post subject:
Re: Not enough parallelism in programming |
|
|
nmm1@cus.cam.ac.uk (Nick Maclaren) writes:
| Quote: | In article <ucloe7lvmqc.fsf@cilento.stampede.cs.cmu.edu>,
Chris Colohan <colohan+@cs.cmu.edu> wrote:
There has been a whole load of work in the past 15 years or so on
speculative parallel architectures. The main work in this area
started with the Multiscalar architecture out of Wisconsin, and other
projects in the area include Hydra (Stanford), IACOMA (Illinois), RAW
(MIT), and STAMPede (my project at CMU). In these architectures you
mark thread boundaries in a sequential program (written in any
language, including C or even assembly), and the machine will execute
the code in parallel, while maintaining the original sequential
semantics. It does this by detecting data dependences which exist
between your threads and either partially or completely restarting
threads which consume mis-speculated values.
Right. The basic approach is much older than 15 years, but the one
of detecting aliasing and restarting when it is encountered is fairly
new. However, it has three obvious fundamental problems:
1) There has to be a clean definition of aliasing, and it has
to be detectable. Despite what you say above, you CAN'T do it in C,
for reasons I describe in my Objects diatribe (please ask for a copy
if you want). You might be able to do it in a suitable variant of C,
if you have the stomache to define one :-(
2) The threads have to be such that they ARE restartable. That
imposes a pretty considerable constraint on the programming paradigm,
and cannot practically be done in any third generation von Neumann
language, without imposing draconian restrictions.
|
I think you're missing the bit about "hardware support" here. The
conflicts are defined in terms of virtual addresses, and the processor
fully supports restoring itself to an earlier state. There are a few
minor (well ok, not so minor) issues of course:
- can the hardware really rollback arbitrarily far?
- what about aliasing of virtual addresses to each other?
- what about I/O? (you mentioned that one :-))
Chris's paper talks about how some of these (and your next point) are
resolved in a real body of code; it's definitely worth a read.
| Quote: | 3) It will be effective only if such aliasing is sparse. The
behaviour of such retry algorithms is well understood, and they decay
catastrophically as the number of conflicts increases. Again, this
imposes constraints on the programming paradigm.
|
--
David Gay
dgay@acm.org |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Fri Aug 26, 2005 11:32 pm Post subject:
Re: Not enough parallelism in programming |
|
|
In article <s71irxsr2ah.fsf@beryl.CS.Berkeley.EDU>,
David Gay <dgay@beryl.CS.Berkeley.EDU> wrote:
| Quote: |
2) The threads have to be such that they ARE restartable. That
imposes a pretty considerable constraint on the programming paradigm,
and cannot practically be done in any third generation von Neumann
language, without imposing draconian restrictions.
I think you're missing the bit about "hardware support" here. The
conflicts are defined in terms of virtual addresses, and the processor
fully supports restoring itself to an earlier state. There are a few
minor (well ok, not so minor) issues of course:
- can the hardware really rollback arbitrarily far?
- what about aliasing of virtual addresses to each other?
- what about I/O? (you mentioned that one :-))
|
No, I didn't. You are right that they aren't minor - in particular,
it is very likely that a thread will update a large proportion of
its data before a conflict is discovered. That was the point of my
remark about checkpoints.
There are also non-memory CPU states (e.g. IEEE 754 modes and flags),
signals and numerous other 'hidden' data. The existence of these is
why checkpoint/restart for arbitrary programs has failed dismally
every single time it has been introduced over a period of 35+ years.
I fail to see that this approach is all that different - if it claims
to be able to handle arbitrary programs :-(
| Quote: | Chris's paper talks about how some of these (and your next point) are
resolved in a real body of code; it's definitely worth a read.
|
Yes, I intend to. As I said, database codes are one area where this
might work pretty well.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
David Gay
Guest
|
Posted:
Fri Aug 26, 2005 11:53 pm Post subject:
Re: Not enough parallelism in programming |
|
|
nmm1@cus.cam.ac.uk (Nick Maclaren) writes:
| Quote: | In article <s71irxsr2ah.fsf@beryl.CS.Berkeley.EDU>,
David Gay <dgay@beryl.CS.Berkeley.EDU> wrote:
2) The threads have to be such that they ARE restartable. That
imposes a pretty considerable constraint on the programming paradigm,
and cannot practically be done in any third generation von Neumann
language, without imposing draconian restrictions.
I think you're missing the bit about "hardware support" here. The
conflicts are defined in terms of virtual addresses, and the processor
fully supports restoring itself to an earlier state. There are a few
minor (well ok, not so minor) issues of course:
- can the hardware really rollback arbitrarily far?
- what about aliasing of virtual addresses to each other?
- what about I/O? (you mentioned that one :-))
No, I didn't. You are right that they aren't minor - in particular,
it is very likely that a thread will update a large proportion of
its data before a conflict is discovered. That was the point of my
remark about checkpoints.
There are also non-memory CPU states (e.g. IEEE 754 modes and flags),
signals and numerous other 'hidden' data.
|
Well I would presume that those ones would be handled by an in-CPU
implementation. I'm more worried about the ones outside the CPU, which
I think could all be reasonably categorised as I/O.
--
David Gay
dgay@acm.org |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Sat Aug 27, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:deni41$m5i$1@gemini.csx.cam.ac.uk...
| Quote: | In article <LKHPe.678440$cg1.572020@bgtnsc04-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
After 30 years of research, there are more than just a few which have
been proposed, experimented with, and spilled much ink on papers about
the results, etc.
Yes, but, as demonstrated by some of the posts in this thread, there is
some
new research in different directions.
New research, perhaps. Different directions, no. I haven't seen
anything that isn't work on approaches that haven't been flogged
to death several times before[*].
We know where we can get starting from third-generation von Neumann
languages, and it isn't anywhere very useful. C and C++ may be
hopeless, but Fortran isn't much better. We know where we can get
with improved paradigms (including strictly constrained variants
of Fortran and even C or C++), and it is a lot further, but only
about as far as we can get with the more abstract programming
languages.
We know that it isn't generally feasible to get to those paradigms
starting from a large, complex program that is written to a serial
von Neumann design. And we know the reluctance of the programming
community (often the beancounters, rather than the programmers) to
move away from serial von Neumann.
There is certainly a possibility that someone will make a breakthrough
where hundreds of people have failed before, but I am not holding my
breath waiting for it. I know of NO approach that has a significant
chance of success until and unless we are prepared to abandon the
serial von Neumann paradigm.
What new directions have been proposed? If I missed one, please
remind me.
|
How about the one discussed by Chris Colohan in his post? I don't claim
that it is radically new, but I haven't seen anything like it before, and,
if is successfull, seems to offer a way to get there (or at least closer to
there) from here (where here is existing sequential programs).
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Sat Aug 27, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:denksk$s1d$1@gemini.csx.cam.ac.uk...
snip
| Quote: | Bitter experience is that, even when that is possible, it leads to
very poor use of parallelism and low scalability. Unless you design
in parallelism (often by NOT designing in seriality), you usually
end up with no more than a factor of 2-4 improvement. Few practical
programmers will bother to modify their programs for that level of
improvement, as it is easier to wait a couple of years.
|
Two comments.
First, for a start, a factor of 2-4 may be good enough. We are talking here
about "typical", (i.e. not highly tuned HPC) application taking abvantage of
a 2-4 core implementation. If it gets typical programmers started and shows
them, through that experience what kinds of things work well and what kinds
of things cause problems (through the feedback mechanism that Chris
proposed) that will lead to more improvement.
Second, while in the past one could wait a couple of years and expect a 2-4
times impropvement from process, etc. that time seems to be past. I expect
that will cause more emphasis on exploiting the parallelsim for general
applications.
Again, I don't claim an advantage for the kinds of programs you deal with as
they have already had the "easy" paralelism (and often much of the "hard"
parallelsim) already built in.
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Sat Aug 27, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
"Dan Koren" <dankoren@yahoo.com> wrote in message
news:430e0dc3$1@news.meer.net...
| Quote: | "Stephen Fuld" <s.fuld@PleaseRemove.att.net> wrote in message
news:hHkPe.133637$5N3.100234@bgtnsc05-news.ops.worldnet.att.net...
|
snip
| Quote: | - Is there an existing language that expresses parallelism at the right
level for multi-core/multithreaded core to take advantage of?
Yes: APL.
|
I don't claim to be any kind of expert on APL, so please be gentle with your
answer. Doesn't APL parallelsim come from its ability to express
essentially matrix operations with a single operator, that it a clean
expression of data parallel mechansims? Does it provide any mechanism for
multiple parallel control operations? I'm not deprecating data parallelsim,
just pointing out that it is somewhat limited.
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Sat Aug 27, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Rupert Pigott wrote:
| Quote: | People talk about the expressive power of C/C++. Sure, it's great
for sequential stuff, but it provides precisely zero expression
for real world constructs such as a simple timeout on I/O. This
is *not* an isolated one-in-a-million situation, it crops up all
the time and yet it simply is not part of the language model.
Instead we have a bunch of libraries (eg: Pthreads) that require
the toolchain to work outside of the language spec to implement
correctly... That is asking for trouble, right ?
|
It's worse than that. Posix pthreads has no formal semantics.
So there's no way to formally prove correctness of implementation.
They're working on thread support in the C++ language (not the
C language). They're working on a memory model for C++.
I'm not sure how it will work out.
I've worked with i/o timeouts in a multi-threaded environment.
It depends on what you're doing, I suppose. How should timed
i/o be expressed?
I've done stuff where I could say it would be nice to have
certain language features, or certain functions in the OS
or hardware. I don't think the problem is with coming up
with these things or saying how they should be expressed.
It's getting concensus on these things. A bit difficult when
some people won't even explicitly state or define what
they're talking about. I know what restarting a thread
means and possible ways of implementing it in software at least.
I have no idea what some people mean when they use that term.
It's easier for me to work on the problems I think are interesting
because I at least know what they are, as opposed to problems
people refuse to define.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
anonymous
Guest
|
Posted:
Sat Aug 27, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Torben Ęgidius Mogensen wrote:
| Quote: | Joe Seigh <jseigh_01@xemaps.com> writes:
At least in the opinion of this person here
http://www.eetimes.com/news/latest/showArticle.jhtml;jsessionid=JZEMR0B52MV2CQSNDBESKHA?articleID=169300057
I'm not sure what he's getting at though. Does he think multi-cores are
not the way to go but SIMD like the GPUs? Or is MIMD ok, we're just not
getting enough threads to run?
I think what he says is that SIMD is good for GPUs because their
workload suits this model, but that general-purpose CPUs that offer
parallelism in the form of MIMD or other won't find many programs
capable of exploiting said parallelism. This is mainly due to
mainstream programmming methodologies and languages being inherently
sequential.
|
/This is mainly due to/ poor factoring.
| Quote: | From a object oriented communicating sequential processes ( CSP)
software development point-of-view, an inhertit parallism is natural. ( |
google for communicating sequential processes CSP, and more so, Timed
CSP )
<SNIP>
---
President Clinton is a jerk |
|
| Back to top |
|
 |
anonymous
Guest
|
Posted:
Sat Aug 27, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
JJ wrote:
| Quote: | What Rupert refers to will be presented at cpa2005 in a few weeks.
"New Transputer Design for FPGAs" although FPGAs are just a away to
prototype the architecture and not the main idea presented.
In effect a collection of 4 way threaded light PEs "ALT"ing into an 8
way threaded (or interleaved), hashed RLDRAM. These PEs themselves look
quite a bit like sparcs with register sliding reg cache per thread but
thats not important either, in fact any ISA would work here even x86
:-/
The real idea is that threaded cpus with conventional cache DRAM
hierarchy are even worse than single threaded cpus putting even more
pressure on the cache and hence more threads, less locality. They can
in theory churn through more ops though but only if they can be fed and
it yes it would be a nightmare to syncronize these threads through
120ns DRAM.
The right way to help multithreaded cpus is to multithread the memory
too, which can be done with RLDRAM (8 way issue every 2.5ns rate) and
effectively gives each PE a mem access of a few instructions slots
across the entire DRAM array hence no data cache. It's not raw
bandwidth or latency that matters, its the rate of uncorrelated memory
issues that can be dispensed across multiple banks on behalf of a no of
cooperating (or not) threads. This leads into a hashing MMU and
protected objects done in the clock cycles. Concurency support for
occam etc provided with protected objects.
For the application layer I am inclined to see if occam and Objective C
makes more sense rather than C++, but thats another paper
transputer2 at yahoo ....
johnjakson at usa ...
|
RLDRAM reads as a better replacement for Direct Ram Bus DRAM for my
VLIW SMP MPP FORTH theory. Thank you. |
|
| Back to top |
|
 |
Andi Kleen
Guest
|
Posted:
Sat Aug 27, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
MitchAlsup@aol.com writes:
| Quote: | All synchronization in current machines is memory based. Memory latency
remains rather constant as processor frequency scales to every higher
numbers. So 10 years ago when CPUs were at 200 MHz and main memory was
150 ns away, one could conceivably perform 30 instructions between
synchronization events (and 300 instructions between synchronizations
is vastly more reasonable). Now we have 3GHz machines with 120ns memory
access times.
|
Hmm - but don't some cache line coherency protocols with more states
than MESI (like MOESI) allow cache line transfer between CPUs without
going through memory? In that case the 2x120ns latency would be
avoided and replaced with the presumably shorter CPU<->CPU latency
time. Of course the dirty cache line would still need to be eventually
written back to memory. But that write wouldn't be on the critical path
and it could be done slowly in the background and just turn into an easier
"bandwidth problem" compare to a hard "latency problem".
With suitable instructions other tricks might be possible, e.g. the
new MONITOR/MWAIT on x86 look like they could theoretically optimize
much more in this space. e.g. it could tell the other CPU to toss
you over the cache line as soon as it has changed.
-Andi |
|
| Back to top |
|
 |
Rupert Pigott
Guest
|
Posted:
Sat Aug 27, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Joe Seigh wrote:
| Quote: | Rupert Pigott wrote:
People talk about the expressive power of C/C++. Sure, it's great
for sequential stuff, but it provides precisely zero expression
for real world constructs such as a simple timeout on I/O. This
is *not* an isolated one-in-a-million situation, it crops up all
the time and yet it simply is not part of the language model.
Instead we have a bunch of libraries (eg: Pthreads) that require
the toolchain to work outside of the language spec to implement
correctly... That is asking for trouble, right ?
It's worse than that. Posix pthreads has no formal semantics.
So there's no way to formally prove correctness of implementation.
|
Nick and others have gnashed their teeth muchly on this point. :)
| Quote: | They're working on thread support in the C++ language (not the
C language). They're working on a memory model for C++.
I'm not sure how it will work out.
|
Good ! Let's hope for the best, eh ? :)
| Quote: | I've worked with i/o timeouts in a multi-threaded environment.
It depends on what you're doing, I suppose. How should timed
i/o be expressed?
|
OCCAM did it thus.
PRI ALT
tim ? AFTER timeout
SEQ
-- timed out
in ? v
SEQ
-- process input
It's not big, not clever, but it covers a fair amount of the
trivial cases that folks encounter in day to day coding. I
figure that C++ syntax might be able to cover it with a try
-catch construct, but that is not as neat & tidy.
Cheers,
Rupert |
|
| Back to top |
|
 |
|
|
|
|