Value of scalar architecture?
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Value of scalar architecture?

 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Guest






Posted: Thu Oct 27, 2005 9:35 pm    Post subject: Value of scalar architecture? Reply with quote

For a performance-oriented ISA, how important is the ease of
developing a scalar implementation?

Obviously, such provides a broader market range for the ISA,
allowing certain fixed costs (ISA design, software tools
development, e.g.) to be covered over a broader range of
products. OTOH, some small fraction of performance might be
sacrificed in the primary target market (superscalar
implementations).

Interestingly, new interest has been generated in scalar RISC for
servers with the Pirahna concept and the Niagara product. There
is also the minor benefit of being able to use the same ISA for
service processors as well as the workhorse processors.

While there are obvious pressures against extremely wide issue
implementations, there also seems to be some pressure away from
scalar implementations even for moderate complexity
implementations with constraining power budgets.

Even with broad hardware-implemented functionality (e.g., FP and
DSP-like functionality) costing area and some power, a scalar
implementation, I assume, still provides the lowest energy/task.
However, a superscalar design might be the winner if energy-delay
is the value of merit.

Anyway, it seems that what seemed a straightforward choice--a
superscalar ISA for performance-oriented market--seems a little
less obvious. (It seemed clear that greater scalability than
offered by a traditional VLIW ISA was desirable, but it is a bit
surprising that a scalar implementation might be desirable.)


Paul A. Clayton
a 'Dysthymicdolt' reachable at aol.com
Back to top
Guest






Posted: Tue Nov 01, 2005 5:15 pm    Post subject: Re: Value of scalar architecture? Reply with quote

Dysthymicdolt@aol.com wrote:
Quote:
For a performance-oriented ISA, how important is the ease of
developing a scalar implementation?

This is actually a much better question that you imagine.....

Quote:
While there are obvious pressures against extremely wide issue
implementations,

Such as the loss in frequency outweighting the gains in IPC, not to
mention power going through the roof.

Quote:
there also seems to be some pressure away from
scalar implementations even for moderate complexity
implementations with constraining power budgets.

Because, in many markets, there is a minimum performance level
below which nobody will buy the parts...

Quote:
Even with broad hardware-implemented functionality (e.g., FP and
DSP-like functionality) costing area and some power, a scalar
implementation, I assume, still provides the lowest energy/task.
However, a superscalar design might be the winner if energy-delay
is the value of merit.

I made the agrument, around 9 months ago, that that one could build
just under 1/2 the performance of a great big core like PIII, PIV, or
Opteron with a core around the size of the I$ of Opteron; that is 1/12
to 1/16 the die area gives ~1/2 the performance (and power). Or, in
reverse, we pay 12X to 16X in core area to get the last factor of 2 in
performance; Power follows a similar trend.

The problem in the PC market is that absolutely nobody is willing to
buy less than 60% of the performance of the top gun (processor). And
the problems in the multiprocessor market are still looking for a
solution to efficient synchronization {like bigO(ln(n)) rather than
bigO(n**3)}.

Mitch
Back to top
Ken Hagan
Guest





Posted: Tue Nov 01, 2005 5:15 pm    Post subject: Re: Value of scalar architecture? Reply with quote

MitchAlsup@aol.com wrote:
Quote:

I made the agrument, around 9 months ago, that that one could build
just under 1/2 the performance of a great big core like PIII, PIV, or
Opteron with a core around the size of the I$ of Opteron; that is 1/12
to 1/16 the die area gives ~1/2 the performance (and power). Or, in
reverse, we pay 12X to 16X in core area to get the last factor of 2 in
performance; Power follows a similar trend.

So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?

I'm assuming that (Windows) software can use 8 CPUs, but that's not an
absurd suggestion. According to Windows's Performance Monitor, when
I'm actually doing something (compiling, searching, browsing, whatever)
my system has half a dozen threads in the "ready to run" state. I'd
guess that the thread pool managers used by several system components
would raise that number further if I actually had more than one CPU
to offer them.

Quote:
The problem in the PC market is that absolutely nobody is willing to
buy less than 60% of the performance of the top gun (processor).

Well that's partly because by the time you've reduced your CPU to that
performance level, it is no longer a contributor to the overall system
cost, so there are few benefits to any further reductions.
Back to top
Sander Vesik
Guest





Posted: Tue Nov 01, 2005 7:10 pm    Post subject: Re: Value of scalar architecture? Reply with quote

Ken Hagan <K.Hagan@thermoteknix.co.uk> wrote:
Quote:
MitchAlsup@aol.com wrote:

I made the agrument, around 9 months ago, that that one could build
just under 1/2 the performance of a great big core like PIII, PIV, or
Opteron with a core around the size of the I$ of Opteron; that is 1/12
to 1/16 the die area gives ~1/2 the performance (and power). Or, in
reverse, we pay 12X to 16X in core area to get the last factor of 2 in
performance; Power follows a similar trend.

So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?

Sort of yes - see for example Niagra. (no, its not x86)


--
Sander

+++ Out of cheese error +++
Back to top
Andrew Reilly
Guest





Posted: Wed Nov 02, 2005 1:16 am    Post subject: Re: Value of scalar architecture? Reply with quote

On Tue, 01 Nov 2005 08:53:46 -0800, MitchAlsup wrote:
Quote:
I made the agrument, around 9 months ago, that that one could build
just under 1/2 the performance of a great big core like PIII, PIV, or
Opteron with a core around the size of the I$ of Opteron; that is 1/12
to 1/16 the die area gives ~1/2 the performance (and power). Or, in
reverse, we pay 12X to 16X in core area to get the last factor of 2 in
performance; Power follows a similar trend.

The problem in the PC market is that absolutely nobody is willing to
buy less than 60% of the performance of the top gun (processor). And
the problems in the multiprocessor market are still looking for a
solution to efficient synchronization {like bigO(ln(n)) rather than
bigO(n**3)}.

Maybe not nobody. Isn't that a pretty good description of VIA's modus
operandi and market? Maybe they're not taking the world by storm, but I
believe that they're selling a reasonable number of small systems.

--
Andrew
Back to top
Stephan Schulz
Guest





Posted: Tue Dec 20, 2005 12:22 pm    Post subject: Re: Value of scalar architecture? Reply with quote

In article <dk86oi$fav$1$8300dec7@news.demon.co.uk>, Ken Hagan wrote:
Quote:
MitchAlsup@aol.com wrote:
[...]
So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?

I'm assuming that (Windows) software can use 8 CPUs, but that's not an
absurd suggestion. According to Windows's Performance Monitor, when
I'm actually doing something (compiling, searching, browsing, whatever)
my system has half a dozen threads in the "ready to run" state.

That's misleading. Yes, while your CPU is busy on one heavy-duty job,
none of the other threads can run, so they pile up. But very likely
none of them would need more than a few cycles. Check the CPU
utilization when you do not run anything specifically. That is all the
work that could be spread to other processors in your scenario - for
Linux, that number is typically around 0.2% to 2%. Even on my Mac, its
usually only around 10% (and I always wonder what exactly the computer
is doing...).

Bye,

Stephan

--
-------------------------- It can be done! ---------------------------------
Please email me as schulz@eprover.org (Stephan Schulz)
----------------------------------------------------------------------------
Back to top
Stephen Fuld
Guest





Posted: Tue Dec 20, 2005 5:15 pm    Post subject: Re: Value of scalar architecture? Reply with quote

"Stephan Schulz" <schulz@sunbroy2.informatik.tu-muenchen.de> wrote in
message news:slrndqftqe.kcv.schulz@sunbroy2.informatik.tu-muenchen.de...
Quote:
In article <dk86oi$fav$1$8300dec7@news.demon.co.uk>, Ken Hagan wrote:
MitchAlsup@aol.com wrote:
[...]
So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?

I'm assuming that (Windows) software can use 8 CPUs, but that's not an
absurd suggestion. According to Windows's Performance Monitor, when
I'm actually doing something (compiling, searching, browsing, whatever)
my system has half a dozen threads in the "ready to run" state.

That's misleading. Yes, while your CPU is busy on one heavy-duty job,
none of the other threads can run, so they pile up. But very likely
none of them would need more than a few cycles.

So the Windows scheduler is running a "heavy-duty" thread when there are
multiple "light-duty" (i.e. needing only "a few cycles") threads ready to
run? I certainly don't claim to understand how the Windows CPU scheduler
works, but it seems to me that it shouldn't do that unless that heavy duty
thread is so much higher priority that the others are esentially not
required or are essentially idle cycle "soaker-upers". What am I missing?

--
- Stephen Fuld
e-mail address disguised to prevent spam
Back to top
Stephan Schulz
Guest





Posted: Wed Dec 21, 2005 12:27 am    Post subject: Re: Value of scalar architecture? Reply with quote

In article
<jXWpf.174776$qk4.159464@bgtnsc05-news.ops.worldnet.att.net>, Stephen
Fuld wrote:
Quote:

"Stephan Schulz" <schulz@sunbroy2.informatik.tu-muenchen.de> wrote in
message news:slrndqftqe.kcv.schulz@sunbroy2.informatik.tu-muenchen.de...
In article <dk86oi$fav$1$8300dec7@news.demon.co.uk>, Ken Hagan wrote:
MitchAlsup@aol.com wrote:
[...]
So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?

I'm assuming that (Windows) software can use 8 CPUs, but that's not an
absurd suggestion. According to Windows's Performance Monitor, when
I'm actually doing something (compiling, searching, browsing, whatever)
my system has half a dozen threads in the "ready to run" state.

That's misleading. Yes, while your CPU is busy on one heavy-duty job,
none of the other threads can run, so they pile up. But very likely
none of them would need more than a few cycles.

So the Windows scheduler is running a "heavy-duty" thread when there are
multiple "light-duty" (i.e. needing only "a few cycles") threads ready to
run? I certainly don't claim to understand how the Windows CPU scheduler
works, but it seems to me that it shouldn't do that unless that heavy duty
thread is so much higher priority that the others are esentially not
required or are essentially idle cycle "soaker-upers". What am I missing?

The scheduler does not know how long which thread will run. It can
only look at the past history. Context switching is
expensive. Switching processes is even more expensive, as it usually
flushes the cache or large parts of it. To optimize throughput, a
process should only be forced to yield the CPU when it either has
finished or is waiting for something anyways. That obviously would not
lead to acceptable interactive behaviour, so the scheduler will
preempt running processes. But there still is a tendency to run
processes which have run for a long time. That is, processes that have
been soaking up CPU cycles are indeed rewarded with a higher
priority. The scheduler ensures that the other processes are served
"early enough", but not earlier.

Bye,

Stephan

--
-------------------------- It can be done! ---------------------------------
Please email me as schulz@eprover.org (Stephan Schulz)
----------------------------------------------------------------------------
Back to top
Jan Vorbrüggen
Guest





Posted: Wed Dec 21, 2005 9:15 am    Post subject: Re: Value of scalar architecture? Reply with quote

Quote:
The scheduler ensures that the other processes are served
"early enough", but not earlier.

That's how is should be, but the Windows scheduler doesn't get it right.
With a single compute-bound or even disk-bound job, interactive response
(even if no paging is required) goes down the tubes totally. There are
existence proofs that it _can_ be done right - an 68040, having only a
small fraction of today's processor power, running NexTStep is an example.

Jan
Back to top
Stephen Fuld
Guest





Posted: Wed Dec 21, 2005 9:15 am    Post subject: Re: Value of scalar architecture? Reply with quote

"Stephan Schulz" <schulz@sunbroy2.informatik.tu-muenchen.de> wrote in
message news:slrndqh8aa.n12.schulz@sunbroy2.informatik.tu-muenchen.de...
Quote:
In article
jXWpf.174776$qk4.159464@bgtnsc05-news.ops.worldnet.att.net>, Stephen
Fuld wrote:

"Stephan Schulz" <schulz@sunbroy2.informatik.tu-muenchen.de> wrote in
message news:slrndqftqe.kcv.schulz@sunbroy2.informatik.tu-muenchen.de...
In article <dk86oi$fav$1$8300dec7@news.demon.co.uk>, Ken Hagan wrote:
MitchAlsup@aol.com wrote:
[...]
So could one build an 8-way multicore thingummy where each core had
about half the "normal" performance? Wouldn't the parallel performance
of those cores have to scale rather badly for the resulting chip not
to blow its "normal" rivals out of the water?

I'm assuming that (Windows) software can use 8 CPUs, but that's not an
absurd suggestion. According to Windows's Performance Monitor, when
I'm actually doing something (compiling, searching, browsing, whatever)
my system has half a dozen threads in the "ready to run" state.

That's misleading. Yes, while your CPU is busy on one heavy-duty job,
none of the other threads can run, so they pile up. But very likely
none of them would need more than a few cycles.

So the Windows scheduler is running a "heavy-duty" thread when there are
multiple "light-duty" (i.e. needing only "a few cycles") threads ready to
run? I certainly don't claim to understand how the Windows CPU scheduler
works, but it seems to me that it shouldn't do that unless that heavy duty
thread is so much higher priority that the others are esentially not
required or are essentially idle cycle "soaker-upers". What am I missing?

The scheduler does not know how long which thread will run. It can
only look at the past history. Context switching is
expensive. Switching processes is even more expensive, as it usually
flushes the cache or large parts of it. To optimize throughput, a
process should only be forced to yield the CPU when it either has
finished or is waiting for something anyways. That obviously would not
lead to acceptable interactive behaviour, so the scheduler will
preempt running processes. But there still is a tendency to run
processes which have run for a long time. That is, processes that have
been soaking up CPU cycles are indeed rewarded with a higher
priority. The scheduler ensures that the other processes are served
"early enough", but not earlier.

But it appears from my experience, and the complaints of others, that it
doesn't do this. Of course, it can't do it for exactly the reason you
stated - it can't know the future. So it can't know when "early enough" is,
nor how long before that tme it has to schedule a process in order for it be
done by that time.

In the time since your previous post, I did a little research and saw
something that indicated that NT (I am guessing that more recent NT based
systems do the same) does something similar to what I would have expected
and am used to. Specifically, frequent short quanta of CPU time to
non-compute bound tasks and less frequent, though longer quanta for CPU
bound tasks. It does this by reducing the priority and increasing the
quantum time of tasks that exceed their quantum without giving up the CPU.
Does it not do this anymore?

--
- Stephen Fuld
e-mail address disguised to prevent spam
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB