| Author |
Message |
Jan Vorbrüggen
Guest
|
Posted:
Wed Sep 21, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
| Quote: | You stated that only one process can write to a data structure at a
time.
|
So we end up defining what "data structure" means? Here's my take, the
one I had in mind when I mentioned Fortran 90: The data structure is, for
instance, a two-dimensional array. However, several tasks executing in
parallel are working on disjoint subset, e.g., one-dimensional slices.
You can even extend this to overlapping boundaries, say, of subblocks
of the array that are read by the "neighbouring" processes: You either
implement a form of double buffering, or you copy the boundaries to a
temporary before computation, or you live with the non-detmerminism of
not knowing whether the boundary value had already been updated or not
(there are some algorithms that can work quite well with that).
What is you definition?
Jan |
|
| Back to top |
|
 |
Jan Vorbrüggen
Guest
|
Posted:
Wed Sep 21, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
| Quote: | The problem with determinism is that you don't need it when you have a
correct program.
Your assertion might be correct if you replace all "you" with "I", since
you are free to define "correct" any way you want to, but if I interpret
"you" to collectively include me...
My correct programs often need non-determinism to do what I want them to do.
|
How does that contract Andy's aphorism? You are saying, as I see it, just a
more strict version of it - instead of not needing determinism, you say you
need non-determinism.
I do agree that a correct program does not need determinism, and that non-
determinism _can_ be beneficial. On the other hand, I would say that a
correct program doesn't _need_ non-determinism: I have seen cases where a
program can properly on a non-deterministic scheduler but deadlocked on a
deterministic scheduler, and I don't consider that correct behaviour.
Jan |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Wed Sep 21, 2005 1:38 pm Post subject:
Re: Not enough parallelism in programming |
|
|
In article <3pcequF9qesqU1@individual.net>,
=?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <jvorbrueggen-not@mediasec.de> wrote:
| Quote: | You stated that only one process can write to a data structure at a
time.
So we end up defining what "data structure" means? Here's my take, the
one I had in mind when I mentioned Fortran 90: The data structure is, for
instance, a two-dimensional array. However, several tasks executing in
parallel are working on disjoint subset, e.g., one-dimensional slices.
You can even extend this to overlapping boundaries, say, of subblocks
of the array that are read by the "neighbouring" processes: You either
implement a form of double buffering, or you copy the boundaries to a
temporary before computation, or you live with the non-detmerminism of
not knowing whether the boundary value had already been updated or not
(there are some algorithms that can work quite well with that).
What is you definition?
|
You have seen my diatribe on that issue for C? :-(
Fortran is pretty clear. Depending on context, in Fortran 77 and before,
it was a single storage unit or a contiguous number of them; there were
a few rough edges, but not too many. Fortran 90 added structures, so
it is a bit more complicated, but the same principle applies. I.e. it
is an answerable question, but may demand a language lawyer to analyse
a program to answer it - one of my many jobs.
However, it is NOT answerable except in the context of the execution of
a particular program on particular data (your point about boundaries is
very relevant). I.e. in general, it cannot be answered statically.
For a language to allow good, automatic parallelisation and diagnostics,
it needs to be answerable semi-statically (where I could define what I
mean, but it is complicated).
Mutter.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Rupert Pigott
Guest
|
Posted:
Wed Sep 21, 2005 2:01 pm Post subject:
Re: Not enough parallelism in programming |
|
|
David C. DiNucci wrote:
| Quote: | Rupert Pigott wrote:
|
[SNIP]
| Quote: | IME it is rare that a "huge data structure" is comprised of
a bunch of unique elements. They tend to be comprised of a
bunch of nigh-on identical, and if that is the case you can
simply have multiple instances of the same process... OCCAM
had Replicated PAR for just such occasions.
I personally don't think cases like this are so rare. Example: bounded
buffer (circular queue).
|
I would not class a circular queue as being a "huge" data
structure. It has very little state to manipulate, the objects
that it manages on the other hand are a different matter - but
surely the circular queue should not really care about them or
do anything to them, right ?
Hmm. Let me see, queues serialise access to a set of objects.
Why are try to parallelise manipulation of a stucture that is
used to serialise access to a set of objects ? That makes bugger
all sense to me.
One approach in CSP land is to have a process that maintains the
state of the queue, and simply have it wait for messages to push/
pop elements on that queue. If you don't want to send the entire
object you can keep the objects in a pool and pass handles to the
queue management process instead.
[SNIP]
| Quote: | This argument reminds me somewhat of one of the arguments
against breaking up a program into subroutines...
It makes perfect sense to me: If you change your program to support
more states than it did before, then it's more likely that some of those
states will be erroneous. Of course, the same argument could be made if
the states were introduced by adding variables, statements, etc.
|
Indeed. So you see what I am getting at ?
Cheers,
Rupert |
|
| Back to top |
|
 |
Andy Freeman
Guest
|
Posted:
Wed Sep 21, 2005 4:15 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Jan Vorbrüggen wrote:
| Quote: | So we end up defining what "data structure" means? Here's my take, the
one I had in mind when I mentioned Fortran 90: The data structure is, for
instance, a two-dimensional array. However, several tasks executing in
parallel are working on disjoint subset, e.g., one-dimensional slices.
You can even extend this to overlapping boundaries, say, of subblocks
of the array that are read by the "neighbouring" processes: You either
implement a form of double buffering, or you copy the boundaries to a
temporary before computation, or you live with the non-detmerminism of
not knowing whether the boundary value had already been updated or not
(there are some algorithms that can work quite well with that).
|
I stated that the "one process" rule tempted folks to restructure
things
so they could have more than one process and that said restructuring
offered
opportunities for bugs. Vorbrüggen seems to disagree with the
statement,
but the supporting evidence for that position is examples of said
opportunities.
It may be that the rule's benefits outweigh its costs, but, as the
above
shows, there are costs. This shouldn't be controversial - very few
things
are free. |
|
| Back to top |
|
 |
Andy Freeman
Guest
|
Posted:
Wed Sep 21, 2005 4:15 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Jan Vorbrüggen wrote:
| Quote: | So we end up defining what "data structure" means? Here's my take, the
one I had in mind when I mentioned Fortran 90: The data structure is, for
instance, a two-dimensional array. However, several tasks executing in
parallel are working on disjoint subset, e.g., one-dimensional slices.
You can even extend this to overlapping boundaries, say, of subblocks
of the array that are read by the "neighbouring" processes: You either
implement a form of double buffering, or you copy the boundaries to a
temporary before computation, or you live with the non-detmerminism of
not knowing whether the boundary value had already been updated or not
(there are some algorithms that can work quite well with that).
|
I stated that the "one process" rule tempted folks to restructure
things
so they could have more than one process and that said restructuring
offered
opportunities for bugs. Vorbrüggen seems to disagree with the
statement,
but the supporting evidence for that position is examples of said
opportunities.
It may be that the rule's benefits outweigh its costs, but, as the
above
shows, there are costs. This shouldn't be controversial - very few
things
are free. |
|
| Back to top |
|
 |
Andy Glew
Guest
|
Posted:
Wed Sep 21, 2005 4:15 pm Post subject:
Re: Not enough parallelism in programming |
|
|
| Quote: | Moreover, you need it, since you have tested that a program is
correct with a given set of inputs - you want it to always stay
correct. And not occasionally give a wrong answer.
I interpreted that to mean he believes that a correct program should
be deterministic. Removing the "Moreover" and "since" clause, I guess
it just says "You need [determinism]". Andy?
|
No, I believe that there are correct non-deterministic programs. I
have written a few myself. My current favorite example is a
parallelized search, somthing like depth first, with early out as soon
as you find a candidate that is good enough (or, sometimes, provavnly
the best you can get). The sequence of nodes you look depends on the
timing of the threads.
I like the Myrias approach to deterministic parallelism, but I
remember that my biggest objection to it when I first encountered it,
almost 20 years ago, was that it did not support such non-determistic
"early out".
By the way, I'd love to see a proof describing what problems can be
solved efficiently with deterministic parallelism, and what no.
| Quote: | Removing the "Moreover" and "since" clause, I guess
it just says "You need [determinism]".
|
Your linguistic transformation was flawed. Removing those clauses, it
says "Once you have tested a program and believe it to be correct, you
want it to stay correct."
Non-determinism means that a program's execution sequence can change,
and sometimes its results.
If the new results are correct, and are always so, great.
But sometimes the new results that are introduced by non-determinism
are incorrect, while the results that were tested were correct.
That's a problem.
Yes, formal proofs of correctness would help, and perhaps even remove
any problems with non-determinism. And we all know the arguments
between testing and formal proofs. And although the theorists object,
we all know that most real world programs are tested for correctness,
not formally proven.
DELIBERATELY AMUSING SUGGESTION: maybe all non-deterministic programs
should be required to be subjected to formal proofs of correctness,
while deterministic programs can be tested?
Anyway, my point remains: if it is possible to be deterministic
(for the problem, and the resources), it's nice to be so. |
|
| Back to top |
|
 |
Jan Vorbrüggen
Guest
|
Posted:
Wed Sep 21, 2005 4:15 pm Post subject:
Re: Not enough parallelism in programming |
|
|
| Quote: | I can see your interpretation of what he said, and that may very well be
what he meant there, but I was confused by his last statement:
Moreover, you need it, since you have tested that a program is
correct with a given set of inputs - you want it to always stay
correct. And not occasionally give a wrong answer.
|
I read that as a sarcastic comment by Andy on the "state of the art".
| Quote: | You don't consider the deterministic scheduler correct, or the program
which would not run under it?
|
I don't consider a program correct for which an implementation detail of
the supporting software makes the difference between working and non-
working.
In this case, it was a Java program, and it worked with IBM's JVM and
not with Sun's, and I was told the difference lay in the way threads
were scheduled by the two implementations. Either the program was broken,
or the JVM specification was broken, but I don't care which - I got to
use something with a stupid constraint that made me suspicious of the
whole enterprise.
| Quote: | As for whether a correct program _needs_ non-determinism: What would
you say about a real-time program that _needs_ to execute some
conditional code only if it misses (or is going to miss) a deadline? Is
that program non-deterministic, and if so, does it need to be to be
correct?
|
Any program with time-dependent external input - and this, of course,
includes anything with a GUI or attached to a network - is non-deterministic
almost by definition. That's what makes debugging an OS so "interesting".
I would agree that arguing about whether it "is" or whether it "needs to
be" non-deterministic is more philosophical than anything else.
| Quote: | And for those cases where the non-determinism is left in through
execution (i.e. never constrained out at higher levels of
specification), it is possible (using static analysis) to minimally
instrument a ScalPL plan to produce a very lightweight trace for use in
replaying the non-deterministic choices at a later time, e.g. for
debugging. (Based on my dissertation work.)
|
That, of course, is a very useful feature, and will save many a debugger's
hair.
Jan |
|
| Back to top |
|
 |
David C. DiNucci
Guest
|
Posted:
Wed Sep 21, 2005 4:15 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Jan Vorbrüggen wrote:
| Quote: | The problem with determinism is that you don't need it when you have a
correct program.
Your assertion might be correct if you replace all "you" with "I",
since you are free to define "correct" any way you want to, but if I
interpret "you" to collectively include me...
My correct programs often need non-determinism to do what I want them
to do.
How does that contract Andy's aphorism? You are saying, as I see it, just a
more strict version of it - instead of not needing determinism, you say you
need non-determinism.
|
I can see your interpretation of what he said, and that may very well be
what he meant there, but I was confused by his last statement:
| Quote: | Moreover, you need it, since you have tested that a program is
correct with a given set of inputs - you want it to always stay
correct. And not occasionally give a wrong answer.
|
I interpreted that to mean he believes that a correct program should be
deterministic. Removing the "Moreover" and "since" clause, I guess it
just says "You need [determinism]". Andy?
| Quote: | I do agree that a correct program does not need determinism, and that non-
determinism _can_ be beneficial. On the other hand, I would say that a
correct program doesn't _need_ non-determinism: I have seen cases where a
program can properly on a non-deterministic scheduler but deadlocked on a
deterministic scheduler, and I don't consider that correct behaviour.
|
You don't consider the deterministic scheduler correct, or the program
which would not run under it?
As for whether a correct program _needs_ non-determinism: What would
you say about a real-time program that _needs_ to execute some
conditional code only if it misses (or is going to miss) a deadline? Is
that program non-deterministic, and if so, does it need to be to be
correct? At that point, it starts getting rather philosophical (and
reminds me of arguments I used to make in grad school) as to what
constitutes an input and what constitutes non-determinism.
One approach to RT in ScalPL is to have the programmer specify all of
the acceptable computations, regardless of deadlines, in terms of
non-deterministic choices, and then (separately) to impose the RT
scheduling constraints at a higher level, to effectively help the
scheduler decide which of the non-deterministic choices to make based on
real-time information. In other words, ScalPL interprets
non-determinism very generally as any causal ordering which is not
strictly determined by the programmer in (that layer of) the program
specification. It may remain undetermined through run time, or it may
be further or completely constrained later on, at higher levels. That
means architecture-dependent optimizations can be handled the same way
as RT, except that the higher-level scheduling constraints may be
hardwired for the particular architecture instead of made dependent on
the clock. This approach allows at least some analysis of the
correctness of the unchanging heart of the program without even
considering the RT and/or target architecture constraints that will be
added at higher levels.
And for those cases where the non-determinism is left in through
execution (i.e. never constrained out at higher levels of
specification), it is possible (using static analysis) to minimally
instrument a ScalPL plan to produce a very lightweight trace for use in
replaying the non-deterministic choices at a later time, e.g. for
debugging. (Based on my dissertation work.)
-Dave |
|
| Back to top |
|
 |
David C. DiNucci
Guest
|
Posted:
Thu Sep 22, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Jan Vorbrüggen wrote:
| Quote: | DiNucci wrote:
And for those cases where the non-determinism is left in through
execution (i.e. never constrained out at higher levels of
specification), it is possible (using static analysis) to minimally
instrument a ScalPL plan to produce a very lightweight trace for use
in replaying the non-deterministic choices at a later time, e.g. for
debugging. (Based on my dissertation work.)
That, of course, is a very useful feature, and will save many a debugger's
hair.
|
No it won't. Not unless they actually utilize this technology. The
usual response to my dissertation work is "that's really cool, but I
can't be bothered with ScalPL (or Software Cabling or F-Nets), so can
you do the same for my dusty deck?"
Answer: No.
-Dave |
|
| Back to top |
|
 |
David C. DiNucci
Guest
|
Posted:
Thu Sep 22, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Andy Glew wrote:
| Quote: | I like the Myrias approach to deterministic parallelism, but I
remember that my biggest objection to it when I first encountered it,
almost 20 years ago, was that it did not support such non-determistic
"early out".
|
I admit that I didn't pay much attention to the Myrias stuff until this
thread, mostly because I was skeptical that there were any quick fixes
and also because they didn't end up making much of a splash. From
comments I find now, it looks like the skepticism, by me and apparently
by the market, was probably justified. It's not just that it appears to
be an efficient solution looking for a problem, it appears to not even
be very efficient period, at least not without specialized hardware support.
| Quote: | By the way, I'd love to see a proof describing what problems can be
solved efficiently with deterministic parallelism, and what no.
|
Assuming sufficient parallelism, is this equivalent to deciding the set
difference between P and NP?
SNIP
| Quote: | Yes, formal proofs of correctness would help, and perhaps even remove
any problems with non-determinism. And we all know the arguments
between testing and formal proofs. And although the theorists object,
we all know that most real world programs are tested for correctness,
not formally proven.
|
Instead of objecting, I think theorists can say "you can pay me now, or
you can pay me later". If a program that has been tested for
correctness turns out to have bugs, and the error is costly enough,
that's when the testers turn to the theorists for answers (if it's not
too late).
| Quote: | DELIBERATELY AMUSING SUGGESTION: maybe all non-deterministic programs
should be required to be subjected to formal proofs of correctness,
while deterministic programs can be tested?
|
With appropriate tools, all-paths testing applies just as well to
non-deterministic paths as deterministic ones. But my serious
suggestion is to use any technique you like to verify deterministic
modules, then use the formality provided by ScalPL to formally prove
their (deterministic and/or nondeterministic) composition.
| Quote: | Anyway, my point remains: if it is possible to be deterministic
(for the problem, and the resources), it's nice to be so.
|
Simpler is often better.
-Dave |
|
| Back to top |
|
 |
David C. DiNucci
Guest
|
Posted:
Thu Sep 22, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Rupert Pigott wrote:
| Quote: | David C. DiNucci wrote:
Rupert Pigott wrote:
[SNIP]
IME it is rare that a "huge data structure" is comprised of
a bunch of unique elements. They tend to be comprised of a
bunch of nigh-on identical, and if that is the case you can
simply have multiple instances of the same process... OCCAM
had Replicated PAR for just such occasions.
I personally don't think cases like this are so rare. Example: bounded
buffer (circular queue).
I would not class a circular queue as being a "huge" data
structure. It has very little state to manipulate, the objects
that it manages on the other hand are a different matter - but
surely the circular queue should not really care about them or
do anything to them, right ?
|
I included a disclaimer suggesting as much in many cases, but the
example was meant to be simple and illustrative, and it is not too
farfetched to imagine that a queue operation would operate on the state
of the objects in the queue, if for no other reason than to copy them or
ensure certain properties guaranteed by the queue of its contents.
"Huge" is in the eye of the beholder. If I'm not allowed to access
field x because someone else keeps accessing field y which seems
independent but is part of the same structure and thereby covered by the
same lock, it is huge enough. An enq and deq ptr within a queue may
qualify.
| Quote: | Hmm. Let me see, queues serialise access to a set of objects.
Why are try to parallelise manipulation of a stucture that is
used to serialise access to a set of objects ? That makes bugger
all sense to me.
|
Some might say that loops are also used to serialize, and those are
parallelized all the time. I would argue that the role of a loop or a
queue is not to serialize, but to make serializable (sequentially
consistent)--i.e. to make it possible to reason about long-lived events
as though they were serial. Depending upon how one defines "access", it
may or may not be a long-lived event. Whether parallelizing makes sense
depends on if you want speed and you've got resources to help provide
it. That applies to serializable events--e.g. using pipelining--as well
as non-.
| Quote: | One approach in CSP land is to have a process that maintains the
state of the queue, and simply have it wait for messages to push/
pop elements on that queue. If you don't want to send the entire
object you can keep the objects in a pool and pass handles to the
queue management process instead.
|
In which case, the single process serializes access to the state, so
constitutes a critical region.
SNIP
| Quote: | It makes perfect sense to me: If you change your program to support
more states than it did before, then it's more likely that some of those
states will be erroneous. Of course, the same argument could be made if
the states were introduced by adding variables, statements, etc.
Indeed. So you see what I am getting at ?
|
Yes, perhaps a case of violent agreement.
-Dave |
|
| Back to top |
|
 |
Jan Vorbrüggen
Guest
|
Posted:
Thu Sep 22, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
| Quote: | Yes, formal proofs of correctness would help, and perhaps even remove
any problems with non-determinism. And we all know the arguments
between testing and formal proofs. And although the theorists object,
we all know that most real world programs are tested for correctness,
not formally proven.
|
Recently, someone mentioned the September issue of IEE Spectrum, in the
context of failed software projects. The web version also makes an article
on Praxis available, plus some background reading. That article makes the
points that, first, formal methods are more than proving - much less for-
mally proving - correctness and that, second, a large fraction of the
benefit of formal methods can be obtained by writing a formal specification,
and that a large part of the benefit of doing _that_ is in forcing the
customer/user/... to actually think about what she wants to have done,
involving all concerned, and to commit to that in written form. Makes
for fascinating reading, in particular in the light of the title article
on the 140 megabucks of your tay dollars wasted, to a large degree, on the
FBI's Virtual Case File project.
Jan |
|
| Back to top |
|
 |
Andy Glew
Guest
|
Posted:
Thu Sep 22, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
| Quote: | By the way, I'd love to see a proof describing what problems can be
solved efficiently with deterministic parallelism, and what no.
Assuming sufficient parallelism, is this equivalent to deciding the
set difference between P and NP?
|
It might be; or it might be similar to one of the other set problems.
I believe it has been proven that in some cases truly randomized or
non-deterministic automation are more efficient than deterministic. (I
don't have my theory books beside me in Israel.)
But I am not sure that the sort of nondeterminism we are talking about
here is as powerful as the nondeterminism in a non-deterministic
finite state automaton.
What we are discussing here is parallel programs made out of
individual processors, each of which is deterministic. The
non-determinism only arises because of timing variations due to
varying execution times between these processors, causing varying
patterns of interaction.
In some degree, this is non-deterministic only because of lack of
complete knowledge. It isn't truly non-deterministic. So I suspect
that "non-deterministic" parallel programs are not as efficient as
truly non-deterministic automata.
Some folks use the term "deterministic chaos" to describe systems that
have very complex behavior, that might still be deterministic if we
had full knowledge. I think this term applies to most of our parallel
programming techniques. |
|
| Back to top |
|
 |
Rupert Pigott
Guest
|
Posted:
Thu Sep 22, 2005 1:34 pm Post subject:
Re: Not enough parallelism in programming |
|
|
David C. DiNucci wrote:
| Quote: | Rupert Pigott wrote:
David C. DiNucci wrote:
Rupert Pigott wrote:
[SNIP]
IME it is rare that a "huge data structure" is comprised of
a bunch of unique elements. They tend to be comprised of a
bunch of nigh-on identical, and if that is the case you can
simply have multiple instances of the same process... OCCAM
had Replicated PAR for just such occasions.
I personally don't think cases like this are so rare. Example: bounded
buffer (circular queue).
I would not class a circular queue as being a "huge" data
structure. It has very little state to manipulate, the objects
that it manages on the other hand are a different matter - but
surely the circular queue should not really care about them or
do anything to them, right ?
I included a disclaimer suggesting as much in many cases, but the
example was meant to be simple and illustrative, and it is not too
farfetched to imagine that a queue operation would operate on the state
of the objects in the queue, if for no other reason than to copy them or
ensure certain properties guaranteed by the queue of its contents.
|
That really is not the Queue's responsibility. If fact it breaks
my idea of what a Queue is. Copying larging objects around in a
circular queue suggests to me that it's not a circular queue. :)
I struggle to think of a huge data structure that I can't manipulate
with lots of processes. Perhaps you are struggling too ! :)
Cheers,
Rupert |
|
| Back to top |
|
 |
|
|
|
|