| Author |
Message |
David Magda
Guest
|
Posted:
Wed Nov 23, 2005 1:15 am Post subject:
Re: The Emperor's new clothes |
|
|
Ken Hagan <K.Hagan@thermoteknix.co.uk> writes:
| Quote: | David Magda wrote:
Under Windows I'm sure Norton will chew up one of the cores. (I'm
only half-kidding.)
Lots of corporate customers have no choice but to run such filth, so
they will see an immediate benefit. (I'm not even half-kidding.)
|
Yes, including where I work. Didn't seem to stop the latest Sober
variants (nor did it prevent Sony's rootkit from being installed). :-/
Oh well.
--
David Magda <dmagda at ee.ryerson.ca>
Because the innovator has for enemies all those who have done well under
the old conditions, and lukewarm defenders in those who may do well
under the new. -- Niccolo Machiavelli, _The Prince_, Chapter VI |
|
| Back to top |
|
 |
Chris Thomasson
Guest
|
Posted:
Wed Nov 23, 2005 9:15 am Post subject:
Re: The Emperor's new clothes |
|
|
"Joe Seigh" <jseigh_01@xemaps.com> wrote in message
news:qdednTWXVIYW3x7eRVn-jA@comcast.com...
| Quote: | Chris Thomasson wrote:
"Joe Seigh" <jseigh_01@xemaps.com> wrote in message
The magic parallelization fairy?
Well, the fact that double-width compare-and-swap did not get "reliably"
ported to 64-bit architectures make be think that may be relying on a
magical fairy to come down and show application developers the light...
I should add that I have a work around for that problem for one
situations at least. It's lock-free and doesn't involve KCSS (k-compare,
single swap) which is only obstruction-free.
|
Humm, I wonder if your solution is anything like the one I have tinkered
around with a couple of years ago. Here is some rough pseudo-code that
illustrates the basic idea:
/* 128-bits */
struct dwcas_node_t
{
void *ptr1;
void *ptr2;
};
struct dwcas_anchor_t
{
int32 idx;
int32 aba;
};
static dwcas_node_t p_nodes[WHATEVER_DEPTH];
int DWCAS( dwcas_anchor_t *dest, void *cmp, void *xchg )
{
dwcas_anchor_t lcmp, lxchg;
dwcas_node_t *n = node_cache_pop();
memcpy( n, xchg, sizeof( *n ) );
lxchg.idx = n - p_nodes;
lcmp = *dest;
do
{
if ( memcmp( &p_nodes[lcmp.idx], cmp, sizeof( *n ) ) )
{
node_cache_push( n );
return 0;
}
/* emulate LL/SC-like behavior */
lxchg.aba = lcmp.aba + 1;
/* normal 64-bit cas */
} while ( ! CAS( dest, &lcmp, &lxchg ) );
/* cache old node */
node_cache_push( &p_nodes[lcmp.idx] );
return 1;
}
As you can see, I am using a "offset-as-pointer" trick and an aba count to
emulate a DWCAS. The node_cache_* functions would check a per-thread cache
first, then global, and finally allocate another slab of nodes if the caches
were empty. The crude design could also be extended to compare-and-swap more
than 2 contiguous pointers. I am wondering if you solution is far more
efficient than mine?
:)
| Quote: | The only real beneficiary
would be sparc which doesn't have double wide compare and swap. It's not
likely that I'll get a Niagara processor to play around with however.
|
casxa did not get ported to 64-bit systems? How can that be...
DOH!
;) |
|
| Back to top |
|
 |
Chris Thomasson
Guest
|
Posted:
Wed Nov 23, 2005 3:16 pm Post subject:
Re: The Emperor's new clothes |
|
|
Yikes!!!
[...]
| Quote: | lcmp = *dest;
^^^^^^^^^^^^^^^ |
this line needs to be moved:
right here
lcmp = *dest;
^^^^^^^^^^^^
| Quote: | if ( memcmp( &p_nodes[lcmp.idx], cmp, sizeof( *n ) ) )
{
node_cache_push( n );
return 0;
}
/* emulate LL/SC-like behavior */
lxchg.aba = lcmp.aba + 1;
/* normal 64-bit cas */
} while ( ! CAS( dest, &lcmp, &lxchg ) );
|
Sorry!
Humm, I wonder if Mr. Terekhov would be pleased because the CAS did not
return the new value on failure in the example...
;) |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Wed Nov 23, 2005 5:15 pm Post subject:
Re: The Emperor's new clothes |
|
|
Chris Thomasson wrote:
| Quote: | "Joe Seigh" <jseigh_01@xemaps.com> wrote in message
news:qdednTWXVIYW3x7eRVn-jA@comcast.com...
Chris Thomasson wrote:
Well, the fact that double-width compare-and-swap did not get "reliably"
ported to 64-bit architectures make be think that may be relying on a
magical fairy to come down and show application developers the light...
I should add that I have a work around for that problem for one
situations at least. It's lock-free and doesn't involve KCSS (k-compare,
single swap) which is only obstruction-free.
Humm, I wonder if your solution is anything like the one I have tinkered
around with a couple of years ago. Here is some rough pseudo-code that
illustrates the basic idea:
[...]
As you can see, I am using a "offset-as-pointer" trick and an aba count to
emulate a DWCAS. The node_cache_* functions would check a per-thread cache
first, then global, and finally allocate another slab of nodes if the caches
were empty. The crude design could also be extended to compare-and-swap more
than 2 contiguous pointers. I am wondering if you solution is far more
efficient than mine?
No, it's not a double wide compare and swap solution. It's a reader/writer |
solution w/ readers being lock-free. Yet another one. It doesn't
require double wide compare and swap and that's all I'm saying for now.
| Quote: |
The only real beneficiary
would be sparc which doesn't have double wide compare and swap. It's not
likely that I'll get a Niagara processor to play around with however.
casxa did not get ported to 64-bit systems? How can that be...
DOH!
;)
|
casx is only 64 bits.
I'm tempted not to publish the solution and leave Sun in the iteresting position
of having lock-free solutions that only work on their Opteron based systems and
not on their sparc based systems (the offset trick aside). In theory they could use
RCU+SMR since they're probably cross licensed with IBM but NIH would probably prevent
that.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Chris Thomasson
Guest
|
Posted:
Wed Nov 23, 2005 5:15 pm Post subject:
Re: The Emperor's new clothes |
|
|
| Quote: | I should add that I have a work around for that problem for one
situations at least. It's lock-free and doesn't involve KCSS (k-compare,
single swap) which is only obstruction-free.
|
You could also simply align a proxy collector data-structure anchor on a 128
or 256+ bit boundary and use the extra space as a reference count...
Differential counting algorithm would take care of the rest... |
|
| Back to top |
|
 |
Ken Hagan
Guest
|
Posted:
Thu Nov 24, 2005 5:10 pm Post subject:
Re: The Emperor's new clothes |
|
|
Brian Hurt wrote:
| Quote: |
What is worrisome is the implicit assumption here is that this is it.
[...] In 15 years, mid-level systems could have 512 to 1024 cores.
We're reasonably certain Moore's law will continue until at least then.
|
Agreed, but I'm only saying that we don't have a problem *right now*.
I took Joe's original post to be implying that the current 2-core
systems were already "ahead of the software". (Or if not, then next
year's 4-core systems certainly would be.) I don't think that's true.
| Quote: | The threads & locks model (or the minor variant of threads & monitors
used in Java and C#) doesn't scale.
|
Agreed, again, but they will last a few years.
| Quote: | And remember that what an extremely talented programmer can do is, by
and large, irrelevent to the debate. It's what the below-average
programmer can do that's relevent.
|
I disagree. There are a lot of server systems out there running maybe
half a dozen apps and a lot of home systems out there running another
half dozen. Get a dozen apps using threads intelligently and correctly
and the chip manufacturers will have eager customers for 32-way boxes.
| Quote: | I have some suspicions as to what that solution might look like. I'm
not sure I'm right, but one thing I am sure of: the solution will
*NOT* look like C++ or Java or C# or Ruby or Python or C or PHP or
Perl or etc.
|
I disagree again. There are various frameworks around which let dumb
programmers write single-threaded components in the above languages
which are then run (provably) safely in parallel. Large amounts of
dumb software also uses these frameworks.
In summary, I agree we have a real wall coming up but I think it is
still a few years off. |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Thu Nov 24, 2005 5:15 pm Post subject:
Re: The Emperor's new clothes |
|
|
"Ken Hagan" <K.Hagan@thermoteknix.co.uk> wrote in message
news:dm472c$7sb$1$8300dec7@news.demon.co.uk...
| Quote: | Brian Hurt wrote:
What is worrisome is the implicit assumption here is that this is it.
[...] In 15 years, mid-level systems could have 512 to 1024 cores.
We're reasonably certain Moore's law will continue until at least then.
Agreed, but I'm only saying that we don't have a problem *right now*.
I took Joe's original post to be implying that the current 2-core
systems were already "ahead of the software". (Or if not, then next
year's 4-core systems certainly would be.) I don't think that's true.
The threads & locks model (or the minor variant of threads & monitors
used in Java and C#) doesn't scale.
Agreed, again, but they will last a few years.
And remember that what an extremely talented programmer can do is, by
and large, irrelevent to the debate. It's what the below-average
programmer can do that's relevent.
I disagree. There are a lot of server systems out there running maybe
half a dozen apps and a lot of home systems out there running another
half dozen. Get a dozen apps using threads intelligently and correctly
and the chip manufacturers will have eager customers for 32-way boxes.
I have some suspicions as to what that solution might look like. I'm
not sure I'm right, but one thing I am sure of: the solution will
*NOT* look like C++ or Java or C# or Ruby or Python or C or PHP or
Perl or etc.
I disagree again. There are various frameworks around which let dumb
programmers write single-threaded components in the above languages
which are then run (provably) safely in parallel. Large amounts of
dumb software also uses these frameworks.
In summary, I agree we have a real wall coming up but I think it is
still a few years off.
|
I'm not even sure that we have a wall that will matter. Talking about the
capabilities of future hardware without talking about what will drive its
adoptation seems like putting the cart before the horse.
For servers, they can easily use just about as many threads/cores, as anyone
can provide, so they are a target market. Similarly, high performance
scientific programming already uses lots of parallelization, to they can
take advantage of more as soon as it is available.
For desktops (and notebooks), which are the volume driver for the PC market,
I don't see any substantial requirement for lots of threads, just as there
isn't much requirement for faster single thread CPUs today. What benefit is
there to running Word faster? The biggest area where the general user
probably would want better performance in the future is in graphics (better
web experience, video editing, etc.) , and that seems a discrete enough area
that it will be handled by more specialized instructions and graphics
processors. The big driver for faster CPUs is games, and they could take
advantage of multiple cores, but the people who program them are certainly
way above average.
So I don't see a "crisis" as the typical user won't benefit much from having
all those cores as things are fast enough for them already. So not being
able to run any faster won't be a problem and the lack of parallel
applications won't matter.
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
Felger Carbon
Guest
|
Posted:
Thu Nov 24, 2005 11:30 pm Post subject:
Re: The Emperor's new clothes |
|
|
"Stephen Fuld" <s.fuld@PleaseRemove.att.net> wrote in message
news:I5mhf.88118$qk4.32696@bgtnsc05-news.ops.worldnet.att.net...
| Quote: |
So I don't see a "crisis" as the typical user won't benefit much
from having
all those cores as things are fast enough for them already. So not
being
able to run any faster won't be a problem and the lack of parallel
applications won't matter.
|
Stephen, the above summarizes what Greg(?) Forrest was posting for the
past few years on comp.arch ("the Forrest Curve"). I've believed for
a long time that we were headed in that direction, and like you, I
think we've arrived.
But I see a black cloud on the horizon.
Security software, spam blockers, and popup blockers keep getting
"updates" at weekly intervals. Each update comes with tens of
megabytes of new stuff to watch out for. Is the time arriving when
we'll need 99% of our computing power to block popups and 1% to do
what we want? No smiley face. |
|
| Back to top |
|
 |
Niels Jørgen Kruse
Guest
|
Posted:
Fri Nov 25, 2005 9:15 am Post subject:
Re: The Emperor's new clothes |
|
|
Felger Carbon <fmsfnf@jfoops.net> wrote:
| Quote: | "Stephen Fuld" <s.fuld@PleaseRemove.att.net> wrote in message
news:I5mhf.88118$qk4.32696@bgtnsc05-news.ops.worldnet.att.net...
So I don't see a "crisis" as the typical user won't benefit much
from having
all those cores as things are fast enough for them already. So not
being
able to run any faster won't be a problem and the lack of parallel
applications won't matter.
Stephen, the above summarizes what Greg(?) Forrest was posting for the
past few years on comp.arch ("the Forrest Curve"). I've believed for
a long time that we were headed in that direction, and like you, I
think we've arrived.
|
If most everyday tasks have (just) dropped below the Forrest Curve, then
the end of scaling could be a blessing in disguise for CPU
manufactureres, in that it prevents single thread performance from
becoming a complete commodity.
--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark |
|
| Back to top |
|
 |
Ken Hagan
Guest
|
Posted:
Fri Nov 25, 2005 5:15 pm Post subject:
Re: The Emperor's new clothes |
|
|
Stephen Fuld wrote:
| Quote: |
I'm not even sure that we have a wall that will matter. Talking about the
capabilities of future hardware without talking about what will drive its
adoptation seems like putting the cart before the horse.
|
Yes, but that doesn't mean we won't hit these problems somewhere.
Even without invoking the next "killer app / machine hog" (desktop
searching?) I would predict that battery performance will continue
to suck for the forseeable future and so getting today's performance
out of chips that consume 10% or 1% of today's power will be a big
issue for whatever replaces today's desktops and laptops.
And what's going to process the output of those HDTV camcorders
that spew terabytes of crud onto holographic discs that could
swallow the whole of today's internet? Finding the useful data
in all that will take a lot of processing power. (Or, putting a
different spin on Felger's bleak scenario, *we* may be the ones
generating most of the stuff that we then want to filter out.) |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Fri Nov 25, 2005 5:15 pm Post subject:
Re: The Emperor's new clothes |
|
|
"Ken Hagan" <K.Hagan@thermoteknix.co.uk> wrote in message
news:dm6tej$jh0$1$8302bc10@news.demon.co.uk...
| Quote: | Stephen Fuld wrote:
I'm not even sure that we have a wall that will matter. Talking about
the capabilities of future hardware without talking about what will drive
its adoptation seems like putting the cart before the horse.
Yes, but that doesn't mean we won't hit these problems somewhere.
|
Certainly true.
| Quote: | Even without invoking the next "killer app / machine hog" (desktop
searching?)
|
I don't think so. Desktop searching is probably either totally disk bound
or "embarassingly parallel" such that no big advance in programming
technology or expertise would be required.
| Quote: | I would predict that battery performance will continue
to suck for the forseeable future and so getting today's performance
out of chips that consume 10% or 1% of today's power will be a big
issue for whatever replaces today's desktops and laptops.
|
True. What are the power implications of this whole issue? I'm don't know
enough to have an intelligent comment here, but ISTM that adding extra
transistors for the extra cores would require more power. I guess you could
posit that we should have many quite slow (therefore low power) cores rather
than one "adequate speed" core, but I don't know about the tradeoffs here.
| Quote: | And what's going to process the output of those HDTV camcorders
that spew terabytes of crud onto holographic discs that could
swallow the whole of today's internet? Finding the useful data
in all that will take a lot of processing power. (Or, putting a
different spin on Felger's bleak scenario, *we* may be the ones
generating most of the stuff that we then want to filter out.)
|
:-)
But note that I specifically excepted graphics as that seems to be more
amenable to special instructions (e.g. SSE) or more use of the graphics
processor for streaming operations.
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
Russell Crook - Computer
Guest
|
Posted:
Sat Nov 26, 2005 12:20 am Post subject:
Re: The Emperor's new clothes |
|
|
Stephen Fuld wrote:
| Quote: | "Ken Hagan" <K.Hagan@thermoteknix.co.uk> wrote in message
news:dm6tej$jh0$1$8302bc10@news.demon.co.uk...
Stephen Fuld wrote:
I'm not even sure that we have a wall that will matter. Talking about
the capabilities of future hardware without talking about what will drive
its adoptation seems like putting the cart before the horse.
Yes, but that doesn't mean we won't hit these problems somewhere.
Certainly true.
Even without invoking the next "killer app / machine hog" (desktop
searching?)
I don't think so. Desktop searching is probably either totally disk bound
|
Rotating rust. Pah. There MUST be something better (since we're talking
about future products :->)
(Even today, if you were designing for power first, you
might use flash+SRAM.cf iPods)
| Quote: | or "embarassingly parallel" such that no big advance in programming
technology or expertise would be required.
I would predict that battery performance will continue
to suck for the forseeable future
|
Probably so, there's only so much you can do with chemicals
(and nuclear power has form-factor and power density limitations)
(1/2 :->)
| Quote: | and so getting today's performance
out of chips that consume 10% or 1% of today's power will be a big
issue for whatever replaces today's desktops and laptops.
|
The processor is only a part of the power issue. Displays
(esp. backlit) are significant problems. Massive memory
vs. power looks more tractable.
| Quote: |
True. What are the power implications of this whole issue? I'm don't know
enough to have an intelligent comment here, but ISTM that adding extra
transistors for the extra cores would require more power. I guess you could
posit that we should have many quite slow (therefore low power) cores rather
than one "adequate speed" core, but I don't know about the tradeoffs here.
|
If you start from the ground up designing for throughput, multiple
slower cores appear to be a significant throughput/watt win, with the
UltraSPARC T1 and the Raza XLR as current examples (each 8 core,
32 thread).
| Quote: |
And what's going to process the output of those HDTV camcorders
that spew terabytes of crud onto holographic discs that could
swallow the whole of today's internet? Finding the useful data
in all that will take a lot of processing power.
|
If you ever bother to look :-< I think that a lot of this
data generated will end up as "write once, read never"
as people will start routinely recording masses of data to
which they never return.
Much like some blogs :->
(Or, putting a
| Quote: | different spin on Felger's bleak scenario, *we* may be the ones
generating most of the stuff that we then want to filter out.)
:-)
But note that I specifically excepted graphics as that seems to be more
amenable to special instructions (e.g. SSE) or more use of the graphics
processor for streaming operations.
|
It would make more (CPU cycle) sense to analyze the scene
when recorded, recording a much more structured data stream
than mere rasters.
(I'm not holding my breath on this ...)
Russell
> |
|
| Back to top |
|
 |
Hank Oredson
Guest
|
Posted:
Sat Nov 26, 2005 8:36 am Post subject:
Re: The Emperor's new clothes |
|
|
"Russell Crook - Computer Systems - System Engineer" <russell.crook@sun.com>
wrote in message news:438755FB.5000305@sun.com...
| Quote: | Stephen Fuld wrote:
"Ken Hagan" <K.Hagan@thermoteknix.co.uk> wrote in message
news:dm6tej$jh0$1$8302bc10@news.demon.co.uk...
Stephen Fuld wrote:
I'm not even sure that we have a wall that will matter. Talking about
the capabilities of future hardware without talking about what will
drive its adoptation seems like putting the cart before the horse.
Yes, but that doesn't mean we won't hit these problems somewhere.
Certainly true.
Even without invoking the next "killer app / machine hog" (desktop
searching?)
I don't think so. Desktop searching is probably either totally disk
bound
Rotating rust. Pah. There MUST be something better (since we're talking
about future products :->)
(Even today, if you were designing for power first, you
might use flash+SRAM.cf iPods)
or "embarassingly parallel" such that no big advance in programming
technology or expertise would be required.
I would predict that battery performance will continue
to suck for the forseeable future
Probably so, there's only so much you can do with chemicals
(and nuclear power has form-factor and power density limitations)
(1/2 :->)
and so getting today's performance
out of chips that consume 10% or 1% of today's power will be a big
issue for whatever replaces today's desktops and laptops.
The processor is only a part of the power issue. Displays
(esp. backlit) are significant problems. Massive memory
vs. power looks more tractable.
True. What are the power implications of this whole issue? I'm don't
know enough to have an intelligent comment here, but ISTM that adding
extra transistors for the extra cores would require more power. I guess
you could posit that we should have many quite slow (therefore low power)
cores rather than one "adequate speed" core, but I don't know about the
tradeoffs here.
If you start from the ground up designing for throughput, multiple
slower cores appear to be a significant throughput/watt win, with the
UltraSPARC T1 and the Raza XLR as current examples (each 8 core,
32 thread).
And what's going to process the output of those HDTV camcorders
that spew terabytes of crud onto holographic discs that could
swallow the whole of today's internet? Finding the useful data
in all that will take a lot of processing power.
If you ever bother to look :-< I think that a lot of this
data generated will end up as "write once, read never"
as people will start routinely recording masses of data to
which they never return.
|
This seems to be happening already, with still pictures.
Every picture I have ever taken with my digital SLR
is right here on this hard drive. Sometimes it is fun to
go back and look at the raw pix (for example) from that
2001 trip down the California coast. The "good pix" are
copied to another directory, thus I end up with two
copies of them on the hard drive.
Will do the same if I ever get a camcorder ...
Storage is essentially free now.
| Quote: | Much like some blogs :-
(Or, putting a
different spin on Felger's bleak scenario, *we* may be the ones
generating most of the stuff that we then want to filter out.)
:-)
But note that I specifically excepted graphics as that seems to be more
amenable to special instructions (e.g. SSE) or more use of the graphics
processor for streaming operations.
It would make more (CPU cycle) sense to analyze the scene
when recorded, recording a much more structured data stream
than mere rasters.
(I'm not holding my breath on this ...)
Russell
|
--
... Hank
http://home.earthlink.net/~horedson
http://home.earthlink.net/~w0rli |
|
| Back to top |
|
 |
Charles Richmond
Guest
|
Posted:
Sun Nov 27, 2005 9:15 am Post subject:
Re: The Emperor's new clothes |
|
|
Joe Seigh wrote:
| Quote: | So these processor manufacturers all have these
nice new multi-core cpu's but apart from market
hyperbole (these cpu's will save the environment, etc...)
I don't see them actually doing anything to exploit their
potential. By "them", I mean them not us. We of course
know to do. But what's going on to get all the applications
to start exploiting this? The magic parallelization fairy?
|
Perhaps the magic paralization fairy... ;-) |
|
| Back to top |
|
 |
Andrew Reilly
Guest
|
Posted:
Sun Nov 27, 2005 9:15 am Post subject:
Re: The Emperor's new clothes |
|
|
On Fri, 25 Nov 2005 17:10:06 +0000, Stephen Fuld wrote:
| Quote: | I don't think so. Desktop searching is probably either totally disk bound
or "embarassingly parallel" such that no big advance in programming
technology or expertise would be required.
|
Why would you expect that the bulk of applications with non-trivial
completion times (i.e., that make you wait, and consequently desire better
throughput) will all turn out to be such: i/o bound or embarrasingly
parallel (or at least fairly trivially parallel, if not actually
embarrasing)?
--
Andrew |
|
| Back to top |
|
 |
|
|
|
|