| Author |
Message |
David Gay
Guest
|
Posted:
Fri Aug 26, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Scott A Crosby <scrosby@cs.rice.edu> writes:
| Quote: | On Thu, 25 Aug 2005 14:35:57 GMT, "Stephen Fuld" <s.fuld@PleaseRemove.att.net> writes:
I'd like to explore this further. ISTM that there are only a few
languages that have any explicit way to declare parallelism and that
they differ in the "level" of parallelism they express. For
example, I believe that Occam expresses parallelism at a very low
level (even evaluation of a single expression can be specified to be
in parallel. But COBOL offers explicit parallelism at the level
similar to a full thread. I'm not sure which, if either of these is
the "right" thing to do.
I'm not sure I like this. What if a programmer gets such a declaration
wrong? Immediate crash, random unreliability, or would the
consequences be pretty mild.
So, the questions are
- Is there an existing language that expresses parallelism at the right
level for multi-core/multithreaded core to take advantage of? Is it easy
for programmers to use the language productivly?
- If not what would such a language look like? That is, what level of
parallelism is appropriate? How would the semantics/syntax be designed to
make program development easiest and most bug free?
A purely functional language has the property that once written, data
is never modified. Thus, in such a language, all evaluation except for
IO would be side-effect free and thus the store is no longer a
syncronization point. Even in languages that aren't purely functional
like OCaml, most of the code (that I at least have written) is purely
functional or written in a functional style. That's the language style
that the language encourages.
OCaml doesn't have any paralleling compilers that I know about, but I
will speculate on how it could be done.
The compiler can determine which functions are not purely functional
or 'pure'. This could involve just computing the transitive closure of
any function thats inpure or invokes an inpure function. A more
sophisticated analysis would perform an escape analysis on any store
that is mutated. Even if a nominally impure function mutates a store
that cannot be shared then the function is effectively pure.)
|
The FX project at MIT explored this in the late 80s... Google for
`"fx project" mit parallel' to get some useful references. My understanding
is that the results were not overwhelming. One major issue (alluded to
elsewhere in this thread) is deciding when to spawn a new thread vs execute
things in sequence.
--
David Gay
dgay@acm.org |
|
| Back to top |
|
 |
JJ
Guest
|
Posted:
Fri Aug 26, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
What Rupert refers to will be presented at cpa2005 in a few weeks.
"New Transputer Design for FPGAs" although FPGAs are just a away to
prototype the architecture and not the main idea presented.
In effect a collection of 4 way threaded light PEs "ALT"ing into an 8
way threaded (or interleaved), hashed RLDRAM. These PEs themselves look
quite a bit like sparcs with register sliding reg cache per thread but
thats not important either, in fact any ISA would work here even x86
:-/
The real idea is that threaded cpus with conventional cache DRAM
hierarchy are even worse than single threaded cpus putting even more
pressure on the cache and hence more threads, less locality. They can
in theory churn through more ops though but only if they can be fed and
it yes it would be a nightmare to syncronize these threads through
120ns DRAM.
The right way to help multithreaded cpus is to multithread the memory
too, which can be done with RLDRAM (8 way issue every 2.5ns rate) and
effectively gives each PE a mem access of a few instructions slots
across the entire DRAM array hence no data cache. It's not raw
bandwidth or latency that matters, its the rate of uncorrelated memory
issues that can be dispensed across multiple banks on behalf of a no of
cooperating (or not) threads. This leads into a hashing MMU and
protected objects done in the clock cycles. Concurency support for
occam etc provided with protected objects.
For the application layer I am inclined to see if occam and Objective C
makes more sense rather than C++, but thats another paper
transputer2 at yahoo ....
johnjakson at usa ... |
|
| Back to top |
|
 |
Rupert Pigott
Guest
|
Posted:
Fri Aug 26, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Yikes, premature senility strikes... I missed an 'h' in
your name... Currently working with 3 "Jon"s which does
not help either. :P |
|
| Back to top |
|
 |
Kees van Reeuwijk
Guest
|
Posted:
Fri Aug 26, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
| Quote: | Not in the domains where people do that seriously - admittedly
rather specialised. As I said at SunHPC, getting arbitrary
parallel designs right is a nightmare task, but there are several
restricted models that are feasible to debug. Experienced
parallel programmers use one of them.
For example, using lockstep models (BSP, MPI collectives etc.)
is feasible. So are many classes of dataflow.
|
Another important model is divide-and-conquer, and its degenerate case
farmer-worker.
Some people argue that any programming model that restricts the
dependencies between tasks to planar graphs (SP graphs) simplifies
parallel programming enough to make it tractable. At least BSP and
divide-and-conquer indeed have this structure. |
|
| Back to top |
|
 |
Scott A Crosby
Guest
|
Posted:
Fri Aug 26, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
On Thu, 25 Aug 2005 14:35:57 GMT, "Stephen Fuld" <s.fuld@PleaseRemove.att.net> writes:
| Quote: | ""Torben Ęgidius Mogensen"" <torbenm@app-6.diku.dk> wrote in message
news:7zll2qtcx5.fsf@app-6.diku.dk...
|
| Quote: | I'd like to explore this further. ISTM that there are only a few
languages that have any explicit way to declare parallelism and that
they differ in the "level" of parallelism they express. For
example, I believe that Occam expresses parallelism at a very low
level (even evaluation of a single expression can be specified to be
in parallel. But COBOL offers explicit parallelism at the level
similar to a full thread. I'm not sure which, if either of these is
the "right" thing to do.
|
I'm not sure I like this. What if a programmer gets such a declaration
wrong? Immediate crash, random unreliability, or would the
consequences be pretty mild.
| Quote: | So, the questions are
- Is there an existing language that expresses parallelism at the right
level for multi-core/multithreaded core to take advantage of? Is it easy
for programmers to use the language productivly?
- If not what would such a language look like? That is, what level of
parallelism is appropriate? How would the semantics/syntax be designed to
make program development easiest and most bug free?
|
A purely functional language has the property that once written, data
is never modified. Thus, in such a language, all evaluation except for
IO would be side-effect free and thus the store is no longer a
syncronization point. Even in languages that aren't purely functional
like OCaml, most of the code (that I at least have written) is purely
functional or written in a functional style. That's the language style
that the language encourages.
OCaml doesn't have any paralleling compilers that I know about, but I
will speculate on how it could be done.
The compiler can determine which functions are not purely functional
or 'pure'. This could involve just computing the transitive closure of
any function thats inpure or invokes an inpure function. A more
sophisticated analysis would perform an escape analysis on any store
that is mutated. Even if a nominally impure function mutates a store
that cannot be shared then the function is effectively pure.)
Because of the relatively low amount of mutation in the style in which
these languages are used, tracking pointers and alias analysis on the
few mutations that do occur should be much more robust.
The compiler can makes sure that accesses to mutable state such as IO
or mutable store get appropriately ordered&syncronized. An algorithm
for doing this is that any thread that's about to alter mutable state
or do any IO is suspended until all earlier in program order impure
threads run to completion first. If the language spec states that
argument evaluation order is undefined, that relaxes the constraints
on ordering --- and the consequent syncronization requirements.
Scott |
|
| Back to top |
|
 |
Guest
|
Posted:
Fri Aug 26, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
I think the problem is different.
All synchronization in current machines is memory based. Memory latency
remains rather constant as processor frequency scales to every higher
numbers. So 10 years ago when CPUs were at 200 MHz and main memory was
150 ns away, one could conceivably perform 30 instructions between
synchronization events (and 300 instructions between synchronizations
is vastly more reasonable). Now we have 3GHz machines with 120ns memory
access times. So the granularity for synchronization is 20X worse today
than it was just 10 years ago! As the size of granulation changes, the
structure of the parallel computation should change to follow.
In addition, contention during synchronization causes cubic (or worse)
amounts of memory traffic in order to pass successfully through a
critical region, even with non-blocking (or similar) programming
styles. This excess traffic slows down synchronization, and clogs up
the DRAM controllers wtih excess requests that don't lead to work
getting done, and wipe lines from caches, exasterbating the
interference itself!
Unless and until, someone addresses the granlation problem and the
exponent of memory traffic interference, not much can be done at the
language level to do much more than ameliorate the problems. I have a
purported solution, that NDA-able people may get to hear about in 6
months or so. |
|
| Back to top |
|
 |
Chris Colohan
Guest
|
Posted:
Fri Aug 26, 2005 12:16 am Post subject:
Re: Not enough parallelism in programming |
|
|
torbenm@app-6.diku.dk (Torben Ęgidius Mogensen) writes:
| Quote: | Research in automatically recognising parallelism in programs written
in mainstream languages has largely failed (with the exception of
numerical code written in Fortran), so IMO we need either to use
languages where parallelism is either explicit (and verifiably
independent) or recognizable with methods sufficiently robust that
small code changes won't break parallelism. The latter will probably
require restrictions that almost amount to explicit declaration of
parallelism, so the first way may be the way to go.
|
I hate to put a plug in for my own work, but there is an interesting
point in the middle. The choice is not just "explicit parallelism"
(with all of the pain of creating parallel programs) or "fully
automatic parallelism" (with no benefits if your compiler is unable to
find suitable threads).
There has been a whole load of work in the past 15 years or so on
speculative parallel architectures. The main work in this area
started with the Multiscalar architecture out of Wisconsin, and other
projects in the area include Hydra (Stanford), IACOMA (Illinois), RAW
(MIT), and STAMPede (my project at CMU). In these architectures you
mark thread boundaries in a sequential program (written in any
language, including C or even assembly), and the machine will execute
the code in parallel, while maintaining the original sequential
semantics. It does this by detecting data dependences which exist
between your threads and either partially or completely restarting
threads which consume mis-speculated values.
Under the thread-level speculation (TLS) programming model the
programmer starts with a sequential program, and simply marks where
new threads should be spawned. They run it on a TLS machine, and they
get a parallel _and correct_ execution. At first, the performance may
not be that good, since many threads will restart due to dependences.
The TLS machine can provide profile feedback which lists the most
frequent dependences, and the programmer can use this to optimize
their program -- they can either modify the code or they can change
their thread spawn points to avoid dependences. Using this method a
programmer can _gradually_ add parallelism to their program: first
they add threads, and then they only have to change the parts of their
code which are frequently executed _and_ have frequent dependences
between the threads.
I have been working on applying these techniques to database
transactions for my thesis, and will be presenting a paper on this at
VLDB next week. If you are interested, the paper (and a tech report
on the hardware aspects) are linked off of my home page:
www.colohan.com
Chris
--
Chris Colohan Email: chris@colohan.ca PGP: finger colohan@cs.cmu.edu
Web: www.colohan.com Phone: (412)268-4751 |
|
| Back to top |
|
 |
Kelly Hall
Guest
|
Posted:
Fri Aug 26, 2005 6:25 am Post subject:
Re: Not enough parallelism in programming |
|
|
Dan Koren wrote:
| Quote: | Yes: this is why APL and its descendants (APL2, A, J, K) are
the darlings of the financial community.
|
Over the years I've heard of a number of technologies being described as
the "darlings of the financial community". So far, my short list of
these darlings includes: NeXT computers / NeXTstep; Smalltalk, prolog,
expert systems, and now APL.
I've only known two actul people to have been programmers for banks: one
wrote in assembly for the System/370 mainframe, and the the other wrote
in Visual Basic.
Kelly |
|
| Back to top |
|
 |
JJ
Guest
|
Posted:
Fri Aug 26, 2005 6:57 am Post subject:
Re: Not enough parallelism in programming |
|
|
I think APL has been used in business for ever, lots of job ads
mentioned it. Its funny that Smalltalk-NeXTstep have Obj-C in common
too, without which NeXT probably wouldn't have happened.
APL is a neat language, at one time it was also used as a hardware
description language for a very old comp arch text "digital computer
systems principles" by Hellerman 1967,1973. It uses APL more to
describe microcode snips than to actually model hardware. A section on
IBM 360,370 architecture in there too.
JJ |
|
| Back to top |
|
 |
Anne & Lynn Wheeler
Guest
|
Posted:
Fri Aug 26, 2005 8:12 am Post subject:
Re: Not enough parallelism in programming |
|
|
"JJ" <johnjakson@yahoo.com> writes:
| Quote: | I think APL has been used in business for ever, lots of job ads
mentioned it. Its funny that Smalltalk-NeXTstep have Obj-C in common
too, without which NeXT probably wouldn't have happened.
APL is a neat language, at one time it was also used as a hardware
description language for a very old comp arch text "digital computer
systems principles" by Hellerman 1967,1973. It uses APL more to
describe microcode snips than to actually model hardware. A section on
IBM 360,370 architecture in there too.
|
cambridge science center ported part of apl\360 to cms for cms\apl.
apl\360 had a monitor and its own multitasking swapping monitor.
typical apl\360 workspaces were 16k to 32k bytes ... and the
multitasking monitor would swap the whole workspace at a time.
http://www.garlic.com/~lynn/subtopic.html#545tech
apl\360 had memory management that on every assignment assigned the
next available space (in the workspace) ... and marked any previous
allocation unused. when allocation reached the end of the workspace,
it would do garbage collection and start all over.
moving to cms\apl, the garbage collection had to be completely
reworked. cms\apl might allow several mbyte virtual memory page
workspace. the apl\360 strategy would appear like a page thrashing
program running in a "large" virtual memory environment. the other
notable thing done for cms\apl was a mechanism that allowed
interfacing to system calls. this caused some consternation among the
apl aficionados since it violated the purity of apl. this wasn't
rectified until the introduction of shared variables as a mechanism
for interfacing to system calls/functions.
early on, the science center provided some "large-memory" online apl
service to other parts of the corporation. one of the customers were
the business people from corporate hdqtrs which was using the large
memory apl capability to do what-if scenarios using the most sensitive
of corporate business information. this also offered some security
challenge because the was amount of access to the science center
system by students from various univ. & colleges in the boston area.
there was quite a bit of apl use for things that were later done using
spreadsheets.
one of the largest commercial time-sharing operations
http://www.garlic.com/~lynn/subtopic.html#timeshare
become the internal HONE system
http://www.garlic.com/~lynn/subtopic.html#hone
which provided world-wide online dataprocessing services to all
marketing, sales, field people. the HONE environment offered services
were almost all implemented in APL (starting with cms\apl, involving
into apl\cms, then apl\sv, etc). one specific application developed by
the science center was the performance predictor which was a detailed
analytical system operation. salesmen could obtained workload and
configuration information from a customer and then do what-if
questions about changes in workload and/or configuration.
starting with 370 115/125, a salesman could no longer even submit an
order w/o it having been processed by a HONE "configurator".
HONE was one of my hobbies for quite some time. when emea hdqtrs moved
from westchester to la defense outside paris ... i got to do some
amount of the work getting HONE up at running at the new data center
(at that time, there were three new bldgs ... not completely finished,
and the grounds around the bldgs was bare dirt .... landscaping work
was still in progress).
--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/ |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Fri Aug 26, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
In article <1h1vp39.sn58yjk7bf7qN%reeuwijk@few.vu.nl>,
Kees van Reeuwijk <reeuwijk@few.vu.nl> wrote:
| Quote: | Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
Not in the domains where people do that seriously - admittedly
rather specialised. As I said at SunHPC, getting arbitrary
parallel designs right is a nightmare task, but there are several
restricted models that are feasible to debug. Experienced
parallel programmers use one of them.
For example, using lockstep models (BSP, MPI collectives etc.)
is feasible. So are many classes of dataflow.
Another important model is divide-and-conquer, and its degenerate case
farmer-worker.
|
Yes. I omitted those because you can fairly regard them as low
communication (i.e. trivial) cases. But why communicate if you
have nothing to say? :-)
| Quote: | Some people argue that any programming model that restricts the
dependencies between tasks to planar graphs (SP graphs) simplifies
parallel programming enough to make it tractable. At least BSP and
divide-and-conquer indeed have this structure.
|
It's a plausible viewpoint, but I think that I could produce fair
counter-examples. However, it gives a good indication of the
maximum level of complexity that mere mortals can handle.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Fri Aug 26, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
In article <ea6dnbEhhfuz3JPeRVn-1w@comcast.com>,
Joe Seigh <jseigh_01@xemaps.com> wrote:
| Quote: |
Scheduler artifacts are fun to deal with. On OS X it appears that for
short sleeps using nanosleep() rather than setting a kernel timer
event, the thread just gets put on the ready queue, the assumption
being it would take you at least that much time to get dispatched again
based on cpu speed and dispatcher code path. Except if there are lots
of threads already on the run queue it could be a very long wait.
|
Yes. I haven't investigated serious parallelism on that system,
but would be surprised if it didn't have at least as many gotchas
as the ones I have looked at.
| Quote: | I think the previous poster is refering to synchronizing memory itself
rather than by some other mechanism like the TOD clock or direct
processor to processor communication.
|
Well, yes, but there are lots of issues here. One is that parallel
compilers/libraries/whatever use the least unsuitable mechanisms
and another is that multiple synchronisation facilities interact
in nasty ways.
The ridiculous thing is that most of this is so unnecessary, but
it isn't soluble because the people who are designing the systems
don't have any direct communication with the people who know about
the requirements and solutions. All right, there are getting to be
damn few of the latter :-(
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Guest
|
Posted:
Fri Aug 26, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Kelly Hall <khall@acm.org> writes:
| Quote: | as the "darlings of the financial community". So far, my short list
of these darlings includes: NeXT computers / NeXTstep; Smalltalk,
prolog, expert systems, and now APL.
I've only known two actul people to have been programmers for banks:
one wrote in assembly for the System/370 mainframe, and the the other
wrote in Visual Basic.
|
The reality is much more prosaic.
Have a look at this message from another forum, about the
natural gas trading system at Enron.
-------------------------------------------------------------------------
| Quote: | Reminds me of a project at Enron. The user application was Delphi,
which used a VC DLL that used CORBA to communicate with a Java layer
which then updated an Oracle database and returned the data written
to it by reversing the pathway above. The user app then updated a
second Oracle database with the same data written by the Java layer.
|
That would be Sitara, the physical Natural Gas trading system.
| Quote: | The DB updated by the Delphi app used a somewhat well-normalized set
of tables, but this DB was only used for reporting. The actual
"serious" database, written to by the Java app, contained a single
table. That table contained a single column, which was a
VARCHAR2(2048) or some such nonsense; the same data returned to the
Delphi app was written in a single row in this table.
|
That's it! There were actually two reporting DBs, with slightly
different schemas. (No idea why.) The reporting database started out as
debugging tool. However, they soon realized that they needed to get
data out of Sitara for use in other systems, so the reporting DBs
became official. (And the main DB used CLOBS not varchar2s.)
Why did they store their data in long strings? Because that way they
could do whatever they wanted with their data model, without having a a
DBA or Data Analyst get in their way. This decision was later
recognized as being a bad mistake.
Enron's trading systems were a maze of twisty passages, all
different. In addition to what was mentioned above, the data would go
from Sitara to the old Gas Trading system called CPR. Another system
called ERMS had views into the CPR tables, and was used to calculate
the trader's positions. The only problem was that CPR only got a
month's worth of data from Sitara, so if a deal was longer than a
month, you also booked it in the financial trading system (TAGG).
Simple, no?
---------------------------------
Amusing in the extreme. |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Fri Aug 26, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
"Kelly Hall" <khall@acm.org> wrote in message
news:5cuPe.701$5k1.567@newssvr27.news.prodigy.net...
| Quote: | Dan Koren wrote:
Yes: this is why APL and its descendants (APL2, A, J, K) are
the darlings of the financial community.
Over the years I've heard of a number of technologies being described as
the "darlings of the financial community". So far, my short list of these
darlings includes: NeXT computers / NeXTstep; Smalltalk, prolog, expert
systems, and now APL.
|
Not "now". It's been going on for a long time.
| Quote: | I've only known two actul people to have been programmers for banks: one
wrote in assembly for the System/370 mainframe, and the the other wrote in
Visual Basic.
|
I am not referring to banks.
I am referring to large trading/investment
houses like Morgan-Stanley, Merrill-Lynch,
Goldman-Sachs, etc...
Morgan-Stanley even developed their own
APL version, and made it available as
open source.
dk |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Fri Aug 26, 2005 4:15 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Nick Maclaren wrote:
| Quote: | In article <ea6dnbEhhfuz3JPeRVn-1w@comcast.com>,
Joe Seigh <jseigh_01@xemaps.com> wrote:
Scheduler artifacts are fun to deal with. On OS X it appears that for
short sleeps using nanosleep() rather than setting a kernel timer
event, the thread just gets put on the ready queue, the assumption
being it would take you at least that much time to get dispatched again
based on cpu speed and dispatcher code path. Except if there are lots
of threads already on the run queue it could be a very long wait.
Yes. I haven't investigated serious parallelism on that system,
but would be surprised if it didn't have at least as many gotchas
as the ones I have looked at.
|
Synchronization standards such as Posix pthreads don't define
forward progress let alone minimum standards for it. This means
implementations can vary widely in performance. Some of the
common thread design patterns are to get around poorly performing
implementations. E.g. thread pools and other forms of lighter
weight threading to mention a few.
| Quote: |
I think the previous poster is refering to synchronizing memory itself
rather than by some other mechanism like the TOD clock or direct
processor to processor communication.
Well, yes, but there are lots of issues here. One is that parallel
compilers/libraries/whatever use the least unsuitable mechanisms
and another is that multiple synchronisation facilities interact
in nasty ways.
The ridiculous thing is that most of this is so unnecessary, but
it isn't soluble because the people who are designing the systems
don't have any direct communication with the people who know about
the requirements and solutions. All right, there are getting to be
damn few of the latter :-(
|
It's worse. I mess around with lock-free stuff (like hazard pointers
which don't need memory barriers in smp environments) and there are no
communication channels to the hw vendors to discuss issues. Intel
*wants* people to start exploiting their multi-core stuff so they
can sell them but they don't support the people who are trying to
make this possible in a scalable manner.
Even the research and academic area is squirrely. I don't work
in that area so I don't know what's going on for sure. There
seem to be turf and clique things going on with the various groups
of researchers. I can't reveal the few details I do know about.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
|
|
|
|