| Author |
Message |
Joe Seigh
Guest
|
Posted:
Mon Sep 19, 2005 4:15 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Jan Vorbrüggen wrote:
| Quote: | But apart from the places where it doesn't work, it also loses wrt
expressing the many known parallelizable linked data structure
manipulations - such as the B-tree manipulations databases use.
From what I understand of those (which isn't all too much), things
get interesting quickly. Like when you demand a balanced tree, and
you are doing an insert or delete - in many cases, this will only
have a local impact, but such an operation has the potential to
modify every single node. How is any parallel process running on
the same data structure going to handle that?
The only practical system in wide-spread use I know of that allows
parallel and distributed updates to a parallel and distributed data
structure is the VMS lock manager. (I suspect LPARs have something
similar, and certainly databased need that as well.) Once you get
into wholesale update operations, all you do is take an exclusive
lock on the data structure, wait for it to be granted, do your thing,
and release the lock. The best you can hope for is that such an
"exceptional" operation doesn't happen so often that it has a severe
performance impact.
Hey, I've seen too many invalid assumptions in kernel mode in handling
shared data structures - even just traversing them for searching, for
example - to believe there's an easy solution to this.
|
You can use lock-free algorithms like RCU or SMR. Speaking of b-trees,
I prototyped some reader lock-free binary tree operations in Java to
see what the insert, delete, and rotate operations would look like.
I didn't get into trying any tree balancing heuristics to see how well they
would perform though. It's a case of premature optimization at this
point. We're not at a point where conventional solutions perform poorly
enough to warrant investigation into alternative techniques.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Jean-Marc Bourguet
Guest
|
Posted:
Mon Sep 19, 2005 4:15 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Joe Seigh <jseigh_01@xemaps.com> writes:
| Quote: | [...] Speaking of b-trees, I prototyped [...] I didn't get into
trying any tree balancing heuristics
|
I though that in B-Tree, B meant balanced.
Yours,
--
Jean-Marc |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Mon Sep 19, 2005 4:15 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Jean-Marc Bourguet wrote:
| Quote: | Joe Seigh <jseigh_01@xemaps.com> writes:
[...] Speaking of b-trees, I prototyped [...] I didn't get into
trying any tree balancing heuristics
I though that in B-Tree, B meant balanced.
Just a binary tree with rotate operations at this point. If I |
applied balancing heuristics to it then it would be a b-tree
then. But I don't think I'm going to waste any time on implementing
something that isn't needed yet.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
David C. DiNucci
Guest
|
Posted:
Mon Sep 19, 2005 11:42 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Jan Vorbrüggen wrote:
| Quote: | But apart from the places where it doesn't work, it also loses wrt
expressing the many known parallelizable linked data structure
manipulations - such as the B-tree manipulations databases use.
From what I understand of those (which isn't all too much), things
get interesting quickly. Like when you demand a balanced tree, and
you are doing an insert or delete - in many cases, this will only
have a local impact, but such an operation has the potential to
modify every single node. How is any parallel process running on
the same data structure going to handle that?
|
I won't pretend to even remember much of b-tree manipulations, but I
suspect that at some point you know whether particular branches are
balanced, or at least ordered, or at least linked correctly to be valid
trees, and some processing can continue on that basis. Even if changes
may propogate across the entire tree, as long as the nodes involved can
be ordered, re-processing of those parts already processed can occur in
a pipelined manner.
I think the more indepth answer is hidden in your condition "when you
demand a balanced tree". In conventional parallel programming, such
"when" conditions are expressed in terms of waiting on locks, but locks
are not very expressive. They are usually binary (or at best
read-write), their association with data structures is informal, and it
is clumsy/inefficient to provide them with much structure, like the
"is balanced tree" > "is a sorted tree" > "is a tree"
that I mentioned, especially on a large scale (e.g. for every branch).
In ScalPL, the user (graphically) declares different stages which each
data structure may enter, declares which activities can access the data
structure in each stage, and declares how these activities can cause the
stages to transition. If a data structure has multiple instances (e.g.
in an array or as allocated), each has its own stages. Transitions
between high-level stages can be broken down into transitions between
finer stages at deeper levels of program hierarchy.
| Quote: | Hey, I've seen too many invalid assumptions in kernel mode in handling
shared data structures - even just traversing them for searching, for
example - to believe there's an easy solution to this.
|
While I wouldn't claim that there's an "easy solution", I would claim
that the assumptions end up being invalid because they're based on
attempts to clumsily assign meaning to locks and other program state
which were not designed to support or enforce such meaning, leading to
misinterpretations and inconsistencies.
-Dave |
|
| Back to top |
|
 |
Eugene Miya
Guest
|
Posted:
Tue Sep 20, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
In article <hrKdnXM8X6zi2bHeRVn-gw@scnresearch.com>,
David C. DiNucci <dave@elepar.com> wrote:
| Quote: | But there's a reason for that. SW languages don't want or need the
majority of what's in timing accurate HDLS--especially the notion of a
clock.
Eugene Miya wrote:
Never ran on the Cyber 205 did we now Dave?
I don't know, maybe. Is that the one where you loaded something from
(stored something to) memory by moving the memory address to some other
register having the same numeric suffix?
|
I think you may have run on a 10 (ETA).
In article <61IWe.35402$Vh2.29184@fe64.usenetserver.com>,
Derek Gladding <derek-spammenot@ebollocks.net> wrote:
| Quote: | Maybe not a full clock, but the ability to explicitly slice time into
"before" and "after" at a relatively low level - which is what a clock edge
effectively is - would be interesting.
Go for the full clock. But, Keep it simple.
The problem is that when a curious anomaly happens, and you have to
contrive things to fight a compiler/optimizer when things start to
get too complex.
There are certainly more problems, too, like performance issues
resulting from artifically constraining which events/functions can occur
concurrently. The VLIW and BSP camps may go for lockstep parallel
logic, but I never saw much sense in it. The justification is usually
based on making things easier to understand, but that can be better
solved with a deeper understanding of what's really going on (i.e.
better analytical tools) and better software tools.
|
Determinism raises it head.....
-- |
|
| Back to top |
|
 |
David C. DiNucci
Guest
|
Posted:
Tue Sep 20, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Eugene Miya wrote:
| Quote: | In article <hrKdnXM8X6zi2bHeRVn-gw@scnresearch.com>,
David C. DiNucci <dave@elepar.com> wrote:
Eugene Miya wrote:
Never ran on the Cyber 205 did we now Dave?
I don't know, maybe. Is that the one where you loaded something from
(stored something to) memory by moving the memory address to some other
register having the same numeric suffix?
I think you may have run on a 10 (ETA).
|
I doubt it. I think the machine I described was a Cyber machine at
Oregon State circa 1981. Google now leads me to believe it was a Cyber
170. The ETA at NASA (which I think is the only one I might have had a
chance to access) was gone (or effectively so) by the time I got there.
| Quote: | The problem is that when a curious anomaly happens, and you have to
contrive things to fight a compiler/optimizer when things start to
get too complex.
There are certainly more problems, too, like performance issues
resulting from artifically constraining which events/functions can occur
concurrently. The VLIW and BSP camps may go for lockstep parallel
logic, but I never saw much sense in it. The justification is usually
based on making things easier to understand, but that can be better
solved with a deeper understanding of what's really going on (i.e.
better analytical tools) and better software tools.
Determinism raises it head.....
|
You can get determinism, if you want it, without lockstep execution.
Again, appropriate analytical and software tools can help tremendously.
On the other hand, non-determinism is also a very handy and efficient
tool to have at your disposal in certain circumstances, as long as it's
appropriately understood and governed.
-Dave |
|
| Back to top |
|
 |
Andy Freeman
Guest
|
Posted:
Tue Sep 20, 2005 5:32 am Post subject:
Re: Not enough parallelism in programming |
|
|
Jan Vorbrüggen wrote:
| Quote: | And, it means that that creating large structures is unnecessarily
slow, leading to programs that use buggy aggregates of small structures
to simulate large ones.
I can't see how this follows. It's not true in my experience. Support from
your tools might be insufficient, yes - but in a language such as Fortran 90
et seq., you can at least express the necessary things.
|
You stated that only one process can write to a data structure at a
time.
If I've got a huge data structure and an algorithm that would allow
concurrent
writes, that restriction tempts me to break my data structure into
pieces
so I can have multiple concurrent writers. If I succumb, I'm likely to
introduce bugs. |
|
| Back to top |
|
 |
Rupert Pigott
Guest
|
Posted:
Tue Sep 20, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Andy Freeman wrote:
[SNIP]
| Quote: | You stated that only one process can write to a data structure at a
time.
If I've got a huge data structure and an algorithm that would allow
concurrent
writes, that restriction tempts me to break my data structure into
pieces
so I can have multiple concurrent writers.
|
IME it is rare that a "huge data structure" is comprised of
a bunch of unique elements. They tend to be comprised of a
bunch of nigh-on identical, and if that is the case you can
simply have multiple instances of the same process... OCCAM
had Replicated PAR for just such occasions.
| Quote: | If I succumb, I'm likely to introduce bugs.
|
This argument reminds me somewhat of one of the arguments
against breaking up a program into subroutines...
Cheers,
Rupert |
|
| Back to top |
|
 |
Andy Glew
Guest
|
Posted:
Tue Sep 20, 2005 4:06 pm Post subject:
Re: Not enough parallelism in programming |
|
|
| Quote: | Eugene Miya:
Determinism raises it['s] head.....
|
The problem with determinism is that you don't need it when you have a
correct program.
You only need it with a possibly incorrect program, so that you can
reproduce the bugs. (Note: this is not the same as "You only need it
during debugging.")
Moreover, you need it, since you have tested that a program is correct
with a given set of inputs - you want it to always stay correct. And
not occasionally give a wrong answer. |
|
| Back to top |
|
 |
Alex Colvin
Guest
|
Posted:
Tue Sep 20, 2005 4:23 pm Post subject:
Re: Not enough parallelism in programming |
|
|
| Quote: | I don't know, maybe. Is that the one where you loaded something from
(stored something to) memory by moving the memory address to some other
register having the same numeric suffix?
|
CDC 6600 and seq.?
store contents of X{6,7} by setting register A{6,7} with address
Look at temporal logic (epecially CTL+) for software notion of
before/after/immediately after/eventually clocking
--
mac the naďf |
|
| Back to top |
|
 |
David C. DiNucci
Guest
|
Posted:
Tue Sep 20, 2005 9:50 pm Post subject:
Re: Not enough parallelism in programming |
|
|
Alex Colvin wrote:
| Quote: | I don't know, maybe. Is that the one where you loaded something from
(stored something to) memory by moving the memory address to some other
register having the same numeric suffix?
CDC 6600 and seq.?
store contents of X{6,7} by setting register A{6,7} with address
|
Yeah, that's it. Thanks.
| Quote: | Look at temporal logic (epecially CTL+) for software notion of
before/after/immediately after/eventually clocking
|
I'm familiar with it. Wrote a term paper on formal spec in my first
term in grad school c. 1985, which included TL. (Who could forget
titles like "'Sometime' is sometimes 'not never'".) Since then, I've
been working to rescue the parallel world from reasoning about
artifically-sequenced events by allowing them to program directly with
partial orderings...which may explain some of my aversion to global clocks.
-Dave |
|
| Back to top |
|
 |
David C. DiNucci
Guest
|
Posted:
Wed Sep 21, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Andy Glew wrote:
| Quote: | Eugene Miya wrote:
Determinism raises it['s] head.....
The problem with determinism is that you don't need it when you have a
correct program.
|
Your assertion might be correct if you replace all "you" with "I", since
you are free to define "correct" any way you want to, but if I interpret
"you" to collectively include me...
My correct programs often need non-determinism to do what I want them to
do, especially regarding speed. For example, if my program is
concurrently computing a number of independent answers, and I don't care
about the order in which it spews them out, then I will likely desire a
non-deterministic first-ready-first-out output. Some people will try to
avoid calling this non-deterministic by interpreting each answer as
being produced to a separate virtual device, but that's just a mind
trick: The output will often look different after each run, and they're
all correct to me. Another example would be if I am concurrently
searching for the first set of parameters that will lead to a function
result within a given tolerance, or, say, any factorization of a
particular number. As soon as I get an acceptable result, I want it to
go into the next stage of the computation.
| Quote: | You only need it with a possibly incorrect program, so that you can
reproduce the bugs. (Note: this is not the same as "You only need it
during debugging.")
|
Even with a correct non-deterministic program--i.e. one with outputs
which may differ from run to run which I still consider correct, like
those above--it's often nice to know precisely how a particular output
was computed. I have worked with programs that gave two very different
answers on different runs, where we didn't realize they were both right
until we saw how the program arrived at them.
| Quote: | Moreover, you need it, since you have tested that a program is correct
with a given set of inputs - you want it to always stay correct. And
not occasionally give a wrong answer.
|
Nondeterminism is so often interpreted as "incorrect" because of how
often it arises accidentally and hides insidiously using existing tools.
The only way for nondeterminism itself to be an error is for the
programmer to intend the outcome to be determin(ed/istic). By the same
reasoning, deterministically specifying a program's behavior can itself
be an error if the programmer intended not to. In either case, a
computation or result can be correct or incorrect, independent of the
determinism or non, based only on whether or not that computation or
result was among those that the programmer intended.
-Dave |
|
| Back to top |
|
 |
Rupert Pigott
Guest
|
Posted:
Wed Sep 21, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
David C. DiNucci wrote:
[SNIP]
| Quote: | titles like "'Sometime' is sometimes 'not never'".) Since then, I've
been working to rescue the parallel world from reasoning about
artifically-sequenced events by allowing them to program directly with
partial orderings...which may explain some of my aversion to global clocks.
|
I can't say I am keen on global clocks either. However I find it
emotionally difficult to say I'll never need them. Rationally I
figure I'll work out a way. :)
Cheers,
Rupert |
|
| Back to top |
|
 |
David C. DiNucci
Guest
|
Posted:
Wed Sep 21, 2005 12:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
Rupert Pigott wrote:
| Quote: | Andy Freeman wrote:
[SNIP]
You stated that only one process can write to a data structure at a
time.
If I've got a huge data structure and an algorithm that would allow
concurrent
writes, that restriction tempts me to break my data structure into
pieces
so I can have multiple concurrent writers.
IME it is rare that a "huge data structure" is comprised of
a bunch of unique elements. They tend to be comprised of a
bunch of nigh-on identical, and if that is the case you can
simply have multiple instances of the same process... OCCAM
had Replicated PAR for just such occasions.
|
I personally don't think cases like this are so rare. Example: bounded
buffer (circular queue).
First approximation: Just cover the whole data structure (buffer, enq
ptr, and deq ptr) with a lock, say qop. Big drawback: Can't ensure
that queue has anything in it (or is full) without acquiring qop.
Second approx: Block enq on full buffer and deq on empty buffer by
introducing two new locks, enqOK, and deqOK. Leave old lock there, so
both enq and deq must acquire qop before updating/checking the enq ptr
and deq ptr.
(In ScalPL, one wouldn't even have two locks, just a queue with stages
"empty", "full", and "neither", and transition rules between them.)
Third approx: Suppose enq and deq operation are slow, so you wish to
allow multiple enq and deq to operate in parallel. (Think grocery
store, unloading cartful onto conveyor belt for enq and bagging cartful
at other end for deq.) Let existing enq ptr and deq ptr continue to
point to next item to be enq'd or deq'd, but add new enqing and deqing
ptrs to signify location of oldest item in the act of being enq'd or
deq'd. Also need to keep track of each queue item's status (being
enq'd, enq'd, being deq'd, or deq'd), when each op finishes, etc.
(Yes, I understand that there are other ways to deal with slow enq/deq,
but that's not the point.)
| Quote: | If I succumb, I'm likely to introduce bugs.
This argument reminds me somewhat of one of the arguments
against breaking up a program into subroutines...
|
It makes perfect sense to me: If you change your program to support
more states than it did before, then it's more likely that some of those
states will be erroneous. Of course, the same argument could be made if
the states were introduced by adding variables, statements, etc.
-Dave |
|
| Back to top |
|
 |
Jan Vorbrüggen
Guest
|
Posted:
Wed Sep 21, 2005 8:15 am Post subject:
Re: Not enough parallelism in programming |
|
|
| Quote: | Hey, I've seen too many invalid assumptions in kernel mode in handling
shared data structures - even just traversing them for searching, for
example - to believe there's an easy solution to this.
While I wouldn't claim that there's an "easy solution", I would claim
that the assumptions end up being invalid because they're based on
attempts to clumsily assign meaning to locks and other program state
which were not designed to support or enforce such meaning, leading to
misinterpretations and inconsistencies.
|
Sure enough. Although I would say that a lock, being a data structure, cannot
by itself support or, in particular, enforce such meaning, in the sense of
making sure that pre- and post-conditions of an operation hold - unless you
think of it in an object-oriented way with only private members.
The OS I'm talking about solved this problem by providing traversal and
manipulation routines that ensured the semantics, while the user (in this
case, other kernel programmers) could specify call-backs (co-routines) that
would be executed appropriately and which had clearly defined restrictions
in what they could and could not do. That apporach improved things a lot at
the expense of (a little) performance.
Jan |
|
| Back to top |
|
 |
|
|
|
|