For a performance-oriented ISA, how important is the ease of
developing a scalar implementation?
While there are obvious pressures against extremely wide issue
implementations,
there also seems to be some pressure away from
scalar implementations even for moderate complexity
implementations with constraining power budgets.
Even with broad hardware-implemented functionality (e.g., FP and
DSP-like functionality) costing area and some power, a scalar
implementation, I assume, still provides the lowest energy/task.
However, a superscalar design might be the winner if energy-delay
is the value of merit.
I made the agrument, around 9 months ago, that that one could build
just under 1/2 the performance of a great big core like PIII, PIV, or
Opteron with a core around the size of the I$ of Opteron; that is 1/12
to 1/16 the die area gives ~1/2 the performance (and power). Or, in
reverse, we pay 12X to 16X in core area to get the last factor of 2 in
performance; Power follows a similar trend.
The problem in the PC market is that absolutely nobody is willing to
buy less than 60% of the performance of the top gun (processor).
MitchAlsup@aol.com wrote:
I made the agrument, around 9 months ago, that that one could build
just under 1/2 the performance of a great big core like PIII, PIV, or
Opteron with a core around the size of the I$ of Opteron; that is 1/12
to 1/16 the die area gives ~1/2 the performance (and power). Or, in
reverse, we pay 12X to 16X in core area to get the last factor of 2 in
performance; Power follows a similar trend.
So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?
I made the agrument, around 9 months ago, that that one could build
just under 1/2 the performance of a great big core like PIII, PIV, or
Opteron with a core around the size of the I$ of Opteron; that is 1/12
to 1/16 the die area gives ~1/2 the performance (and power). Or, in
reverse, we pay 12X to 16X in core area to get the last factor of 2 in
performance; Power follows a similar trend.
The problem in the PC market is that absolutely nobody is willing to
buy less than 60% of the performance of the top gun (processor). And
the problems in the multiprocessor market are still looking for a
solution to efficient synchronization {like bigO(ln(n)) rather than
bigO(n**3)}.
MitchAlsup@aol.com wrote:
[...]
So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?
I'm assuming that (Windows) software can use 8 CPUs, but that's not an
absurd suggestion. According to Windows's Performance Monitor, when
I'm actually doing something (compiling, searching, browsing, whatever)
my system has half a dozen threads in the "ready to run" state.
In article <dk86oi$fav$1$8300dec7@news.demon.co.uk>, Ken Hagan wrote:
MitchAlsup@aol.com wrote:
[...]
So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?
I'm assuming that (Windows) software can use 8 CPUs, but that's not an
absurd suggestion. According to Windows's Performance Monitor, when
I'm actually doing something (compiling, searching, browsing, whatever)
my system has half a dozen threads in the "ready to run" state.
That's misleading. Yes, while your CPU is busy on one heavy-duty job,
none of the other threads can run, so they pile up. But very likely
none of them would need more than a few cycles.
"Stephan Schulz" <schulz@sunbroy2.informatik.tu-muenchen.de> wrote in
message news:slrndqftqe.kcv.schulz@sunbroy2.informatik.tu-muenchen.de...
In article <dk86oi$fav$1$8300dec7@news.demon.co.uk>, Ken Hagan wrote:
MitchAlsup@aol.com wrote:
[...]
So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?
I'm assuming that (Windows) software can use 8 CPUs, but that's not an
absurd suggestion. According to Windows's Performance Monitor, when
I'm actually doing something (compiling, searching, browsing, whatever)
my system has half a dozen threads in the "ready to run" state.
That's misleading. Yes, while your CPU is busy on one heavy-duty job,
none of the other threads can run, so they pile up. But very likely
none of them would need more than a few cycles.
So the Windows scheduler is running a "heavy-duty" thread when there are
multiple "light-duty" (i.e. needing only "a few cycles") threads ready to
run? I certainly don't claim to understand how the Windows CPU scheduler
works, but it seems to me that it shouldn't do that unless that heavy duty
thread is so much higher priority that the others are esentially not
required or are essentially idle cycle "soaker-upers". What am I missing?
The scheduler ensures that the other processes are served
"early enough", but not earlier.
In article
jXWpf.174776$qk4.159464@bgtnsc05-news.ops.worldnet.att.net>, Stephen
Fuld wrote:
"Stephan Schulz" <schulz@sunbroy2.informatik.tu-muenchen.de> wrote in
message news:slrndqftqe.kcv.schulz@sunbroy2.informatik.tu-muenchen.de...
In article <dk86oi$fav$1$8300dec7@news.demon.co.uk>, Ken Hagan wrote:
MitchAlsup@aol.com wrote:
[...]
So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?
I'm assuming that (Windows) software can use 8 CPUs, but that's not an
absurd suggestion. According to Windows's Performance Monitor, when
I'm actually doing something (compiling, searching, browsing, whatever)
my system has half a dozen threads in the "ready to run" state.
That's misleading. Yes, while your CPU is busy on one heavy-duty job,
none of the other threads can run, so they pile up. But very likely
none of them would need more than a few cycles.
So the Windows scheduler is running a "heavy-duty" thread when there are
multiple "light-duty" (i.e. needing only "a few cycles") threads ready to
run? I certainly don't claim to understand how the Windows CPU scheduler
works, but it seems to me that it shouldn't do that unless that heavy duty
thread is so much higher priority that the others are esentially not
required or are essentially idle cycle "soaker-upers". What am I missing?
The scheduler does not know how long which thread will run. It can
only look at the past history. Context switching is
expensive. Switching processes is even more expensive, as it usually
flushes the cache or large parts of it. To optimize throughput, a
process should only be forced to yield the CPU when it either has
finished or is waiting for something anyways. That obviously would not
lead to acceptable interactive behaviour, so the scheduler will
preempt running processes. But there still is a tendency to run
processes which have run for a long time. That is, processes that have
been soaking up CPU cycles are indeed rewarded with a higher
priority. The scheduler ensures that the other processes are served
"early enough", but not earlier.
Return to Computer Architecture
Users browsing this forum: Yahoo [Bot] and 0 guests