| Author |
Message |
David Kanter
Guest
|
Posted:
Mon Oct 17, 2005 1:40 pm Post subject:
Re: hyperthreading in database-benchmarks |
|
|
Jens Meyer wrote:
Well, it was canned. So unless API decides to try and clone it...
| Quote: | I'd rather like to see cores like that of Sun's Niagara.
My dream-CPU would look like the following:
- four or even six in-order pipes per core
- no speculative execution
- all pipes competing for four execution-units;
one ALU, one load/store-unit, one fp-adder and one fp-multiplier
|
Can you explain the point of having 4 FUs, with 6 pipelines? That
doesn't make any sense at all. Nobody would design a core that
crazy...
BTW, wouldn't you be better off with a FP-MAC unit instead of a FPA and
FPM unit?
| Quote: | - of course: fully pipelined execution-units, i.e. every execution unit
can take one request per clock-cycle from any pipe, but a pipe always
stalls until its request has finished
- i- & d-caches in with sizes in the magnitude of Niagara's l1-caches
- large shared l2-cache (f.e. 2MB for a desktop-CPU and eight MB for a
server-CPU)
- support for execute-ahead (aka scouting) on unused threads on a core
I think that such a simple archicteture would make it much easier to
get high clock-rates as the pipes are two magnitudes simpler than those
of full-blown brainiac cores.
|
Judging by Niagara's clock rates, I wouldn't hold your breath...
David |
|
| Back to top |
|
 |
Jens Meyer
Guest
|
Posted:
Mon Oct 17, 2005 3:45 pm Post subject:
Re: hyperthreading in database-benchmarks |
|
|
| Quote: | Can you explain the point of having 4 FUs, with 6 pipelines? That
doesn't make any sense at all. Nobody would design a core that
crazy...
|
The pipelines will be longer than that of Niagara if you want to have high
clock-rates. So stalls for taken branches will take longer. And further,
much loads will stall more than one clock-cycle because at high clock-rates,
you won't get a one-cycle l1-cache - not to mention the l2-cache. With both
factors in mind, it is very reasonable to have more pipes than execution-units.
| Quote: | BTW, wouldn't you be better off with a FP-MAC unit instead of a
FPA and FPM unit?
|
No, that's less flexible.
| Quote: | I think that such a simple archicteture would make it much easier to
get high clock-rates as the pipes are two magnitudes simpler than those
of full-blown brainiac cores.
Judging by Niagara's clock rates, I wouldn't hold your breath...
|
Niagara has only a pipe with a depth of five. I think that's the main-reason for
the limited clock-rate. But as server-apps usually wait for l1- and l2-caches and
these delays don't become better in absolute times when growing the clock-rate,
that's not really a constraint. I think a x86-core with the mentioned features of
the simple-pipes will have about the same depth like a Athlon-pipe but will reach
a higher clock-rate.
The only problem I see with such a CPU is that a lot of software isn't
parallelized yet - although writing multithreaded apps is quite easy. |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Mon Oct 17, 2005 4:15 pm Post subject:
Re: hyperthreading in database-benchmarks |
|
|
Jens Meyer wrote:
| Quote: |
Judging by Niagara's clock rates, I wouldn't hold your breath...
Niagara has only a pipe with a depth of five. I think that's the
main-reason for
the limited clock-rate. But as server-apps usually wait for l1- and
l2-caches and
these delays don't become better in absolute times when growing the
clock-rate,
that's not really a constraint. I think a x86-core with the mentioned
features of
the simple-pipes will have about the same depth like a Athlon-pipe but
will reach
a higher clock-rate.
The only problem I see with such a CPU is that a lot of software isn't
parallelized yet - although writing multithreaded apps is quite easy.
|
This isn't a problem for software. It's a problem for hardware. It's a
classic chicken and egg problem. The multi-core hardware has to be out
there and have sufficient market saturation in the various niches before
the software in those niches begins to exploit it. The hardware vendors
waited too long before pushing multi-core so they're in a pinch.
I play with lock-free algorithms and have considered writing a lock-free
database to exploit multi-core hw except those things aren't in my price
range yet. I'm in a really low niche unfortunately.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Andy Glew
Guest
|
Posted:
Mon Oct 17, 2005 9:36 pm Post subject:
Re: hyperthreading in database-benchmarks |
|
|
| Quote: | The real question is whether you
are better off with CMP than a wide SMT...hard to say
|
Pinned to my cubicle wall is a slide labelled "AMD 2005 Analysts
Day", (i.e. public material), supposedly presented by Chuck Moore,
labelled "Multi-threading done right":
It is the classic 4 quadrant diagram:
X-axis "Compute Density and Efficiency"
Y-axis "Consistent Performance"
with the quadrants labelled
Compute Density and Efficiency
Low
Consistent Performance
Low
SMT
High
CMP
High
Consistent Performance
Low
SoEMT
High
Cluster-based Multithreading
with "Cluster based Multithreading"
called out as taking ~50% area investment
for ~80% throughput gain |
|
| Back to top |
|
 |
David Hopwood
Guest
|
Posted:
Mon Oct 17, 2005 9:46 pm Post subject:
Re: hyperthreading in database-benchmarks |
|
|
Andy Glew wrote:
| Quote: | The real question is whether you
are better off with CMP than a wide SMT...hard to say
Pinned to my cubicle wall is a slide labelled "AMD 2005 Analysts
Day", (i.e. public material), supposedly presented by Chuck Moore,
labelled "Multi-threading done right":
|
<http://www.amd.com/us-en/assets/content_type/DownloadableAssets/Chuck_Moore_6-10-05.pdf>
page 4.
--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> |
|
| Back to top |
|
 |
Jens Meyer
Guest
|
Posted:
Mon Oct 17, 2005 10:23 pm Post subject:
Re: hyperthreading in database-benchmarks |
|
|
| Quote: | The only problem I see with such a CPU is that a lot of software isn't
parallelized yet - although writing multithreaded apps is quite easy.
This isn't a problem for software. It's a problem for hardware. It's a
classic chicken and egg problem. ...
|
Right! And further, it's a problem of benchmarks that are parallelized enough.
A lot of people would be disappointed to see low numbers of single-threaded
benchmarks running on processors like that. |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Mon Oct 17, 2005 11:39 pm Post subject:
Re: hyperthreading in database-benchmarks |
|
|
David Hopwood wrote:
| Quote: | Andy Glew wrote:
The real question is whether you
are better off with CMP than a wide SMT...hard to say
Pinned to my cubicle wall is a slide labelled "AMD 2005 Analysts
Day", (i.e. public material), supposedly presented by Chuck Moore,
labelled "Multi-threading done right":
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/Chuck_Moore_6-10-05.pdf
page 4.
What's the "Throughput Architecture" under the Future header on the Architectural |
Generations foil? Is this the same as Sun's throughput computing? Is throughput
as standard term?
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
David Kanter
Guest
|
Posted:
Tue Oct 18, 2005 5:25 am Post subject:
Re: hyperthreading in database-benchmarks |
|
|
Jens Meyer wrote:
| Quote: | The only problem I see with such a CPU is that a lot of software isn't
parallelized yet - although writing multithreaded apps is quite easy.
This isn't a problem for software. It's a problem for hardware. It's a
classic chicken and egg problem. ...
Right! And further, it's a problem of benchmarks that are parallelized enough.
A lot of people would be disappointed to see low numbers of single-threaded
benchmarks running on processors like that.
|
Spot on. If you look at the number of benchmarks that are:
1. Easy to run (don't require > 4 disks or > 1 client)
2. Scalable to 8 CPUs
3. Relatively easily available
You're left with a short list indeed. That's one thing I'm trying to
work on right now, so any help would be appreciated!
David |
|
| Back to top |
|
 |
|
|
|
|