Not enough parallelism in programming
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Not enough parallelism in programming
Goto page Previous  1, 2, 3 ... 14, 15, 16, 17, 18  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Andy Freeman
Guest





Posted: Fri Sep 16, 2005 4:15 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

JJ wrote:
Quote:
If you ask a hardware EE in Verilog (or VHDL) the precise answer is

assign
c = a+b, d = a-b;

Do not read this as C , exprn but as 2 continuous processes that will
follow any change at all on any bits of a,b even if 1,0,?,x bit value
changes.

Except that hardware doesn't work that way. Outputs (both gates and
wires) take time, as jj noted.

While HDL folks like the local event model and delays and insist on
how natural it is, levelization is how they make things work in the
synchronous world where things take time. (Yes, minimum clock is
defined by delay totals, but good abstractions let people focus on
the minutia only when necessary.)

Assertion: A levelized HDL would be more productive than "you can
fake levelization in Verilog/VHDL".

-andy
Back to top
JJ
Guest





Posted: Fri Sep 16, 2005 4:15 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

David C. DiNucci wrote:
Quote:
JJ wrote:
David C. DiNucci wrote:

A bigger problem for parallel software engineering lies in hiding
behavior/timing. For example, I can ask two programmers to implement a
module that takes value on two ports, A and B, and produces a result on
port C that is their sum and one on D which is their difference (A-B).
Even if both programmers implement the module to this spec, with no
funny side effects or access to global info, one might work in my app
and one might not solely due to the order in which they read their
inputs from ports A and B and/or produce their results. Adding this
sort of info to the spec (e.g. in temporal logic) can get very complex


If you ask a hardware EE in Verilog (or VHDL) the precise answer is

assign
c = a+b, d = a-b;
snip
Trying to model this in occam and I suspect in most any SW par language
is just doomed to reinvent only partially what timing accurate HDLS
already have.

But there's a reason for that. SW languages don't want or need the
majority of what's in timing accurate HDLS--especially the notion of a
clock.


Funnily enough nor do some hardware projects if some simplifying
conditions are imposed hence C cycle simulators but with out the
benefit of the Verilog syntax unless a Verilog to C translator is used.

Clocks and master slave DFlops are the obvious hardware form of process
syncronization. The assign statement is a continouos process whose
outputs are eventually syncronized at a pipeline register. Now in
asynch hardware designs, clocks are replaced with self generated ready
strobes, not very useful to most ASIC or any FPGA designers.

Quote:
Now when I do this in C, to be cheap & fast, the code looks exactly the
same but presumes that a,b are atatic and also only binary,

Pretty valid presumptions in my book.

so c,d get
reavualated whenever a,b might have or did change with in a group
change.

I don't know how you came to that conclusion. It depends completely on
the context in which those statements are written.


In my context of poor mans simulator based on cycle C simulation with
out event wheel, values are just reevaluated every pipeline cycle no
matter if they change or not. This is still effective even if many
values don't change often, as long as some do change. The cycle
simulation trades speed and simplicity for the event driven simulator
accuracy. The two appraoches can be mixed.

Quote:
snip
One might conclude the software world is busily doing the same thing
with concurrency reinventing something the hardware people have had for
maybe 40-50yrs in simulation. Not that HDLs are of much use yet to seq
coders.

Nor to parallel coders, I would conclude.

Note that I was certainly not saying that it was impossible, or even
difficult, to more carefully specify the problem above, or to code the
solution in existing parallel languages, such that there wouldn't be any
problem. It was meant just to illustrate that with only a specification
based on inputs and outputs and not on relative timings, it can lead to
a problem in software languages, just as it could in an HDL. For more
info, google "Brock-Ackerman Anomaly".


I looked at
http://theory.stanford.edu/people/jcm/cs358-96/4-15-lecture.html
but I get lost in unfamilair style. I am not sure why the 2 sorts when
combined into another system would be a problem if the sorts proceed as
atomic operations. Were they not atomic?

I think your and my parallel camps are rather different. Imagine the
lambda calculas person trying to talk with the regular digital guys.

I am not sure where your par interests in ScalPL are and what the
applications are likely to be.

Most of the par apps I am interested in might be called embarrassingly
parallel, but include anything that might be remotely accelerated as
hardware esp if in FPGA or ASIC. It also includes the reverse case of
things that went from hardware to software such as all the DSP codecs
that run very well on PCs but might also be described more naturally as
HDL processes. It esp includes EDA software that runs as slow as
mollasses on PCs.

Occam is used both as a hardware description language in HandelC form
and as a general purpose par language so I see it as a bridge between
pure HDLs and seq languages. HandelC had to have added a no of things
Verilog already had such as wide bit types and clocks. If concurrent
processes can model hardware, then one is bound to compare to other
hardware modelling languages.



Quote:
I'm pretty certain we tossed this "HDL as parallel programming language"
notion around quite a bit in The Thread That Would Not Die some time ago
(2 years?). I didn't believe it then, and I still don't.

-Dave

Yes we did, but A Glew brought up the side thread of hardware
simulation which is in my area of interest, and how a par language for
our camp might exploit the usefull concepts of Verilog (or VHDL) with
the general purpose C base without turning into a Superlog monster.

Since there have been 1000 plus seq languages over the years, it would
hardly be surprising if more than a few par languages were added to the
picture and could be just as entirely domain specific. I never see any
refs to hardware languages though in the multi language tomes or even
much on par ones.


johnjakson at usa ...
Back to top
JJ
Guest





Posted: Fri Sep 16, 2005 4:15 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

Andy Freeman wrote:
Quote:
JJ wrote:
If you ask a hardware EE in Verilog (or VHDL) the precise answer is

assign
c = a+b, d = a-b;

Do not read this as C , exprn but as 2 continuous processes that will
follow any change at all on any bits of a,b even if 1,0,?,x bit value
changes.

Except that hardware doesn't work that way. Outputs (both gates and
wires) take time, as jj noted.

While HDL folks like the local event model and delays and insist on
how natural it is, levelization is how they make things work in the
synchronous world where things take time. (Yes, minimum clock is
defined by delay totals, but good abstractions let people focus on
the minutia only when necessary.)

Assertion: A levelized HDL would be more productive than "you can
fake levelization in Verilog/VHDL".

-andy

So we are just agreeing that detailed event simulation just costs more
but levelized code is good enough for bulk simulations when used with
synthesis?

johnjakson at usa ...
Back to top
Andy Freeman
Guest





Posted: Fri Sep 16, 2005 4:15 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

JJ wrote:
Quote:
Andy Freeman wrote:
Assertion: A levelized HDL would be more productive than "you can
fake levelization in Verilog/VHDL".

-andy

So we are just agreeing that detailed event simulation just costs more
but levelized code is good enough for bulk simulations when used with
synthesis?

I'm asserting that the event model encourages errors and slows down
the design and implementation process. If "just costs more" includes
"and doesn't provide sufficient compensating benefits, so is a bad
idea", we agree.

Yes, the minimum clock is defined by aggregated delays but that doesn't
imply that the local event model is a productive metaphor. (Note that
event model languages mischaracterize delays in the cases that you care
about most.)

When designing, what kind of pictures do you draw? When debugging,
what
model do you extract? Why not use that model to express the design?
Yes, you still have to make sure that the design satisfies timing
constraints, but the event model didn't help you with that anyway.

Another way to look at it is that clocks are the only events that
matter.
(Outputs are only valid wrt clocks and inputs are similarly
constrained.)

-andy
Back to top
Eugene Miya
Guest





Posted: Sat Sep 17, 2005 12:15 am    Post subject: Re: Not enough parallelism in programming Reply with quote

David C. DiNucci wrote:
Quote:
But there's a reason for that. SW languages don't want or need the
majority of what's in timing accurate HDLS--especially the notion of a
clock.

Never ran on the Cyber 205 did we now Dave?

In article <61IWe.35402$Vh2.29184@fe64.usenetserver.com>,
Derek Gladding <derek-spammenot@ebollocks.net> wrote:
Quote:
Maybe not a full clock, but the ability to explicitly slice time into
"before" and "after" at a relatively low level - which is what a clock edge
effectively is - would be interesting.

Go for the full clock. But, Keep it simple.

The problem is that when a curious anomaly happens, and you have to
contrive things to fight a compiler/optimizer when things start to
get too complex.

--
------------ And now a word from our sponsor ---------------------
For a secure high performance FTP using SSL/TLS encryption
upgrade to SurgeFTP
---- See http://netwinsite.com/sponsor/sponsor_surgeftp.htm ----
Back to top
Derek Gladding
Guest





Posted: Sat Sep 17, 2005 12:15 am    Post subject: Re: Not enough parallelism in programming Reply with quote

David C. DiNucci wrote:

Quote:
But there's a reason for that. SW languages don't want or need the
majority of what's in timing accurate HDLS--especially the notion of a
clock.


Maybe not a full clock, but the ability to explicitly slice time into
"before" and "after" at a relatively low level - which is what a clock edge
effectively is - would be interesting.

- Derek
Back to top
Andy Glew
Guest





Posted: Sat Sep 17, 2005 1:19 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

In a previous post, I explained why the serial construct
obj1.method1(); obj2.method2()
has well defined serial semantics, but the parallel construct
obj1.method1(), obj2.method2()
has well-defined parallel semantics iff obj1 and obj2 do not overlap
in some invisible way. Requiring this violates software engineering
principles such as data hiding and modularity.

Speculative parallelism may permit parallization, but the semantics is
still serial.

I mention that applying transactional semantics. which might be expressed
TRANSACTION{obj1.method1()}, TRANSACTION{obj2.method2()}
does not provide parallel semantics. It really provides non-determinsitic
serial semantics - you don't know which one executes first, but it is as
if one does.

What I understand to be the Myrias model, which essentially clones the
entire state of the parent computation, performs each child in isolation,
and then merges back in some deterministic way, actually *HAS* parallel
semantics. Myrias semantics are clearly useful for some applications;
it is not clear to me if they are useful in general, for stuff such
as linked lists; or whether automatic tools could detect situations
where Myrias' well defined parallel behavior would cause problems.

I wonder whether there are any other well defined parallel semantics,
and how useful they are.
Back to top
Andy Glew
Guest





Posted: Sat Sep 17, 2005 1:22 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

Quote:
- It is really only linked lists and similar data structures that need
pointers. In such cases, you need synchronized access to the shared
data structure in some way or other. The important point is that this
is implemented in an (instance of an) object that is outside of your
parallel construct (running in parallel, possibly, but only one instance
being active at any one time), thus making sure that any synch proper-
ties are being properly implemented.

Sure, this works. Almost.

But apart from the places where it doesn't work, it also loses wrt
expressing the many known parallelizable linked data structure
manipulations - such as the B-tree manipulations databases use.
Back to top
David C. DiNucci
Guest





Posted: Sat Sep 17, 2005 9:24 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

Derek Gladding wrote:
Quote:
David C. DiNucci wrote:
But there's a reason for that. SW languages don't want or need the
majority of what's in timing accurate HDLS--especially the notion of a
clock.

Maybe not a full clock, but the ability to explicitly slice time into
"before" and "after" at a relatively low level - which is what a clock edge
effectively is - would be interesting.

I agree that it would be interesting to explicitly slice up the history
of each data object into a sequence of well-defined stages, each stage
boundary ("clock edge") considered a "before" and "after" boundary of
some event (e.g. function invocation). It would also be interesting for
any single event to be able to atomically transition the stage of any
user-selectable set of these data objects at once.

This is precisely what ScalPL does/is.

-Dave
Back to top
David C. DiNucci
Guest





Posted: Sat Sep 17, 2005 9:29 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

Eugene Miya wrote:
Quote:
David C. DiNucci wrote:

But there's a reason for that. SW languages don't want or need the
majority of what's in timing accurate HDLS--especially the notion of a
clock.


Never ran on the Cyber 205 did we now Dave?

I don't know, maybe. Is that the one where you loaded something from
(stored something to) memory by moving the memory address to some other
register having the same numeric suffix?

Quote:
In article <61IWe.35402$Vh2.29184@fe64.usenetserver.com>,
Derek Gladding <derek-spammenot@ebollocks.net> wrote:

Maybe not a full clock, but the ability to explicitly slice time into
"before" and "after" at a relatively low level - which is what a clock edge
effectively is - would be interesting.


Go for the full clock. But, Keep it simple.

The problem is that when a curious anomaly happens, and you have to
contrive things to fight a compiler/optimizer when things start to
get too complex.

There are certainly more problems, too, like performance issues
resulting from artifically constraining which events/functions can occur
concurrently. The VLIW and BSP camps may go for lockstep parallel
logic, but I never saw much sense in it. The justification is usually
based on making things easier to understand, but that can be better
solved with a deeper understanding of what's really going on (i.e.
better analytical tools) and better software tools.

-Dave
Back to top
David C. DiNucci
Guest





Posted: Sun Sep 18, 2005 12:15 am    Post subject: Re: Not enough parallelism in programming Reply with quote

Andy Glew wrote:
Quote:
In a previous post, I explained why the serial construct
obj1.method1(); obj2.method2()
has well defined serial semantics, but the parallel construct
obj1.method1(), obj2.method2()
has well-defined parallel semantics iff obj1 and obj2 do not overlap
in some invisible way.

No, it has well-defined parallel semantics whenever the model in which
it is expressed defines its parallel semantics, and many do define such
semantics iff any potentially-shared data is appropriately protected
with locks, etc. (Posix threads may be an exception.) Some models (eg.
occam) have well-defined semantics without such constraints, by defining
a level of atomicity that the programmer can't violate (e.g. a read or
write of a single datum).

Here, I'm using the term "well defined" to include non-deterministic
behavior as long as the potential set of outcomes is defined by the
model, and the size and elements of that set can be controlled by the
programmer within the model.

Quote:
Requiring this violates software engineering
principles such as data hiding and modularity.

No, but it will likely require some more expressive language and more
constructs to define the programmer-caller contract. If the only
constructs you have to work with are the input-output behavior of each
object, then you are severely restricted in the useful contracts you can
express regarding their composition, e.g. given initial state I, and
interpreting a() and b() as mappings from one (partially hidden) state
to another, you get to define final state F for a() composed with b() as:
F = b(a(I)): Serial
F = a(b(I)): Reverse serial
F = b(a(I)) or a(b(I)): Serializable
F = c(a(I), b(I)), for some c(): Copy-in-out
and a few other haphazard combinations.

Quote:
Speculative parallelism may permit parallization, but the semantics is
still serial.

Right.

Quote:
I mention that applying transactional semantics. which might be expressed
TRANSACTION{obj1.method1()}, TRANSACTION{obj2.method2()}
does not provide parallel semantics. It really provides non-determinsitic
serial semantics - you don't know which one executes first, but it is as
if one does.

I would call this serializable (from the literature).

Quote:
What I understand to be the Myrias model, which essentially clones the
entire state of the parent computation, performs each child in isolation,
and then merges back in some deterministic way, actually *HAS* parallel
semantics. Myrias semantics are clearly useful for some applications;
it is not clear to me if they are useful in general, for stuff such
as linked lists; or whether automatic tools could detect situations
where Myrias' well defined parallel behavior would cause problems.

Myrias semantics is an example of the "copy-in-out" definition I gave
above--i.e. F=c(a(I),b(I))), where c() is deterministically defined as
some sort of merge. I see no reason to expect that even this
necessarily corresponds to any desired contract between programmer and user.

Quote:
I wonder whether there are any other well defined parallel semantics,
and how useful they are.

One can add as many new functions (e.g. like c()) as desired to the
definition of a() composed with b(), but I don't see why one would
expect them to contribute to definitions that are as or more
intuitive/useful than those already mentioned.

I think the main wrong turn of your argument arose from your apparent
assumption that data hiding and modularity imply that the user can have
no information whatsoever about the module other than its input-output
behavior. While it's true that the module should not expose unnecessary
implementation details, it should be able to expose or declare contracts
other than Input to Output, and formal specification languages have been
based on such contracts for a long time. For example, push() and pop()
are usually not abstractly defined by what the stack looks like before
and after each individually, but by how compositions like
(stack.push(x)).pop() behave.

-Dave
Back to top
David C. DiNucci
Guest





Posted: Sun Sep 18, 2005 6:02 am    Post subject: Re: Not enough parallelism in programming Reply with quote

JJ wrote:
Quote:
David C. DiNucci wrote:

snip

Quote:
Note that I was certainly not saying that it was impossible, or even
difficult, to more carefully specify the problem above, or to code the
solution in existing parallel languages, such that there wouldn't be any
problem. It was meant just to illustrate that with only a specification
based on inputs and outputs and not on relative timings, it can lead to
a problem in software languages, just as it could in an HDL. For more
info, google "Brock-Ackerman Anomaly".



I looked at
http://theory.stanford.edu/people/jcm/cs358-96/4-15-lecture.html
but I get lost in unfamilair style. I am not sure why the 2 sorts when
combined into another system would be a problem if the sorts proceed as
atomic operations. Were they not atomic?

According to some interpretations of the word "atomic", Q_1 would be
considered atomic and Q_2 would not be. That is one manisfestation of
the issue: Atomicity is not a function of the mapping from inputs to
outputs, but of the order in which inputs and outputs are processed.

Quote:
I think your and my parallel camps are rather different. Imagine the
lambda calculas person trying to talk with the regular digital guys.

You might (or might not) be surprised at how much lambda calculus is
used in chip verification. That doesn't necessariy meant the two camps
talk much, though.

Quote:
I am not sure where your par interests in ScalPL are and what the
applications are likely to be.

I personally think that the apps will be all over the map. The people
who will find it most useful initially will likely be programmers trying
to exploit parallel or distributed hardware for uses where existing
tools aren't doing the trick, especially wrt portability. If I had to
point to the most immediate big potential market, I'd say PS3 game
programmers trying to deal with the complexities of their games while
simultaneously trying to exploit the Cell SPEs (with their local store).
ScalPL's roots have always been in software engineering, but inasmuch as
it is an approach to reasoning about all concurrent events at their most
basic, it could apply to almost anything from modeling chemical
reactions to the behavior of large organizations.

Quote:
Most of the par apps I am interested in might be called embarrassingly
parallel, but include anything that might be remotely accelerated as
hardware esp if in FPGA or ASIC. It also includes the reverse case of
things that went from hardware to software such as all the DSP codecs
that run very well on PCs but might also be described more naturally as
HDL processes. It esp includes EDA software that runs as slow as
mollasses on PCs.

I have been keeping my ear to the EDA wall (living here in Portland it's
almost hard not to), but for the most part, they seem to have burned so
many times with the promise of parallel processing that they'll believe
it when they see it, and I can't afford to show it to them. It appears
that they're currently content to wait a day or two for runs, as long as
they can throw several runs/simulations in at once (this seems to apply
especially to analog).

Quote:
Occam is used both as a hardware description language in HandelC form
and as a general purpose par language so I see it as a bridge between
pure HDLs and seq languages. HandelC had to have added a no of things
Verilog already had such as wide bit types and clocks. If concurrent
processes can model hardware, then one is bound to compare to other
hardware modelling languages.

Until a few months ago, ScalPL was known as Software Cabling, and
programs consisted of boards, chips, sockets, pins, cables, signals,
etc., so such comparisons are not foreign to me. On the flip side, I
think that the hardware community could potentially benefit from some
ideas in ScalPL for asynchronous logic to decrease reliance on a global
clock even within one chip.

Quote:
I'm pretty certain we tossed this "HDL as parallel programming language"
notion around quite a bit in The Thread That Would Not Die some time ago
(2 years?). I didn't believe it then, and I still don't.

Yes we did, but A Glew brought up the side thread of hardware
simulation which is in my area of interest, and how a par language for
our camp might exploit the usefull concepts of Verilog (or VHDL) with
the general purpose C base without turning into a Superlog monster.

Since there have been 1000 plus seq languages over the years, it would
hardly be surprising if more than a few par languages were added to the
picture and could be just as entirely domain specific. I never see any
refs to hardware languages though in the multi language tomes or even
much on par ones.

For what it's worth, ScalPL is not only not domain specific, it's not
even language specific. For all ScalPL cares, Andy's original example
with the simultaneous assignments was a sufficient input-output
specification for a task. ScalPL doesn't care how you express
input-output specifications of tasks, as long as you know how to read
(and preferably implement) those specifications. ScalPL only deals with
getting tasks to work productively together to achieve an end.

On a related subject, some of your comments on leveling/relaxation
remind me of the Unity approach to parallel programming taught by M.
Chandy and J. Misra some time ago. I don't recall if they made an
analogy to hardware in their descriptions. A primary difference between
Unity and ScalPL is that ScalPL more clearly specifies the circumstances
under which new (task) evaluations should take place, especially wrt
other task evaluations, and how data might move between such old and new
evaluations. I would consider that an advantage of ScalPL over Unity,
and over HDLs that use similar concepts.

-Dave
Back to top
Sander Vesik
Guest





Posted: Sun Sep 18, 2005 9:17 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

JJ <johnjakson@yahoo.com> wrote:
Quote:

I was playing with an old Metrowerks CodeWarrior on Windows, almost all
the MFC templates give 440K+ and the plain old Win32 templates give
around 40K.

you used a total POS compiler that never did windows even approximately
write and used that to generalised...

Quote:

johnjakson at usa ..


--
Sander

+++ Out of cheese error +++
Back to top
Jan Vorbrüggen
Guest





Posted: Mon Sep 19, 2005 4:15 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

Quote:
It's not the only way, but it is sufficient. (It actually just moves
the bugs to process interaction, which programmers can't do correctly
either.)

That is correct. But incorrect process interactions lead (mostly) to
deadlocks, which you will notice immediately and at the point of the
bug. No silent data structure corruptions that show their effect far
away (spatio-temporally) from their cause.

Quote:
And, it means that that creating large structures is unnecessarily
slow, leading to programs that use buggy aggregates of small structures
to simulate large ones.

I can't see how this follows. It's not true in my experience. Support from
your tools might be insufficient, yes - but in a language such as Fortran 90
et seq., you can at least express the necessary things.

Jan
Back to top
Jan Vorbrüggen
Guest





Posted: Mon Sep 19, 2005 4:15 pm    Post subject: Re: Not enough parallelism in programming Reply with quote

Quote:
But apart from the places where it doesn't work, it also loses wrt
expressing the many known parallelizable linked data structure
manipulations - such as the B-tree manipulations databases use.

From what I understand of those (which isn't all too much), things
get interesting quickly. Like when you demand a balanced tree, and
you are doing an insert or delete - in many cases, this will only
have a local impact, but such an operation has the potential to
modify every single node. How is any parallel process running on
the same data structure going to handle that?

The only practical system in wide-spread use I know of that allows
parallel and distributed updates to a parallel and distributed data
structure is the VMS lock manager. (I suspect LPARs have something
similar, and certainly databased need that as well.) Once you get
into wholesale update operations, all you do is take an exclusive
lock on the data structure, wait for it to be granted, do your thing,
and release the lock. The best you can hope for is that such an
"exceptional" operation doesn't happen so often that it has a severe
performance impact.

Hey, I've seen too many invalid assumptions in kernel mode in handling
shared data structures - even just traversing them for searching, for
example - to believe there's an easy solution to this.

Jan
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2, 3 ... 14, 15, 16, 17, 18  Next
Page 15 of 18

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB