hyperthreading in database-benchmarks
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
hyperthreading in database-benchmarks
Goto page Previous  1, 2
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
David Kanter
Guest





Posted: Mon Oct 17, 2005 1:40 pm    Post subject: Re: hyperthreading in database-benchmarks Reply with quote

Jens Meyer wrote:
Quote:
The EV8 had quite a few more functional units than anything shipping
today, see Paul DeMone's article:
http://www.realworldtech.com/page.cfm?ArticleID=RWT021802145442&p=2

I hope this extreme braniac architectures will die!

Well, it was canned. So unless API decides to try and clone it...

Quote:
I'd rather like to see cores like that of Sun's Niagara.
My dream-CPU would look like the following:

- four or even six in-order pipes per core
- no speculative execution
- all pipes competing for four execution-units;
one ALU, one load/store-unit, one fp-adder and one fp-multiplier

Can you explain the point of having 4 FUs, with 6 pipelines? That
doesn't make any sense at all. Nobody would design a core that
crazy...

BTW, wouldn't you be better off with a FP-MAC unit instead of a FPA and
FPM unit?

Quote:
- of course: fully pipelined execution-units, i.e. every execution unit
can take one request per clock-cycle from any pipe, but a pipe always
stalls until its request has finished
- i- & d-caches in with sizes in the magnitude of Niagara's l1-caches
- large shared l2-cache (f.e. 2MB for a desktop-CPU and eight MB for a
server-CPU)
- support for execute-ahead (aka scouting) on unused threads on a core

I think that such a simple archicteture would make it much easier to
get high clock-rates as the pipes are two magnitudes simpler than those
of full-blown brainiac cores.

Judging by Niagara's clock rates, I wouldn't hold your breath...

David
Back to top
Jens Meyer
Guest





Posted: Mon Oct 17, 2005 3:45 pm    Post subject: Re: hyperthreading in database-benchmarks Reply with quote

Quote:
Can you explain the point of having 4 FUs, with 6 pipelines? That
doesn't make any sense at all. Nobody would design a core that
crazy...

The pipelines will be longer than that of Niagara if you want to have high
clock-rates. So stalls for taken branches will take longer. And further,
much loads will stall more than one clock-cycle because at high clock-rates,
you won't get a one-cycle l1-cache - not to mention the l2-cache. With both
factors in mind, it is very reasonable to have more pipes than execution-units.

Quote:
BTW, wouldn't you be better off with a FP-MAC unit instead of a
FPA and FPM unit?

No, that's less flexible.

Quote:
I think that such a simple archicteture would make it much easier to
get high clock-rates as the pipes are two magnitudes simpler than those
of full-blown brainiac cores.

Judging by Niagara's clock rates, I wouldn't hold your breath...

Niagara has only a pipe with a depth of five. I think that's the main-reason for
the limited clock-rate. But as server-apps usually wait for l1- and l2-caches and
these delays don't become better in absolute times when growing the clock-rate,
that's not really a constraint. I think a x86-core with the mentioned features of
the simple-pipes will have about the same depth like a Athlon-pipe but will reach
a higher clock-rate.

The only problem I see with such a CPU is that a lot of software isn't
parallelized yet - although writing multithreaded apps is quite easy.
Back to top
Joe Seigh
Guest





Posted: Mon Oct 17, 2005 4:15 pm    Post subject: Re: hyperthreading in database-benchmarks Reply with quote

Jens Meyer wrote:
Quote:


Judging by Niagara's clock rates, I wouldn't hold your breath...


Niagara has only a pipe with a depth of five. I think that's the
main-reason for
the limited clock-rate. But as server-apps usually wait for l1- and
l2-caches and
these delays don't become better in absolute times when growing the
clock-rate,
that's not really a constraint. I think a x86-core with the mentioned
features of
the simple-pipes will have about the same depth like a Athlon-pipe but
will reach
a higher clock-rate.

The only problem I see with such a CPU is that a lot of software isn't
parallelized yet - although writing multithreaded apps is quite easy.

This isn't a problem for software. It's a problem for hardware. It's a
classic chicken and egg problem. The multi-core hardware has to be out
there and have sufficient market saturation in the various niches before
the software in those niches begins to exploit it. The hardware vendors
waited too long before pushing multi-core so they're in a pinch.

I play with lock-free algorithms and have considered writing a lock-free
database to exploit multi-core hw except those things aren't in my price
range yet. I'm in a really low niche unfortunately.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Andy Glew
Guest





Posted: Mon Oct 17, 2005 9:36 pm    Post subject: Re: hyperthreading in database-benchmarks Reply with quote

Quote:
The real question is whether you
are better off with CMP than a wide SMT...hard to say

Pinned to my cubicle wall is a slide labelled "AMD 2005 Analysts
Day", (i.e. public material), supposedly presented by Chuck Moore,
labelled "Multi-threading done right":

It is the classic 4 quadrant diagram:
X-axis "Compute Density and Efficiency"
Y-axis "Consistent Performance"

with the quadrants labelled

Compute Density and Efficiency
Low
Consistent Performance
Low
SMT
High
CMP
High
Consistent Performance
Low
SoEMT
High
Cluster-based Multithreading

with "Cluster based Multithreading"
called out as taking ~50% area investment
for ~80% throughput gain
Back to top
David Hopwood
Guest





Posted: Mon Oct 17, 2005 9:46 pm    Post subject: Re: hyperthreading in database-benchmarks Reply with quote

Andy Glew wrote:
Quote:
The real question is whether you
are better off with CMP than a wide SMT...hard to say

Pinned to my cubicle wall is a slide labelled "AMD 2005 Analysts
Day", (i.e. public material), supposedly presented by Chuck Moore,
labelled "Multi-threading done right":

<http://www.amd.com/us-en/assets/content_type/DownloadableAssets/Chuck_Moore_6-10-05.pdf>
page 4.

--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Back to top
Jens Meyer
Guest





Posted: Mon Oct 17, 2005 10:23 pm    Post subject: Re: hyperthreading in database-benchmarks Reply with quote

Quote:
The only problem I see with such a CPU is that a lot of software isn't
parallelized yet - although writing multithreaded apps is quite easy.

This isn't a problem for software. It's a problem for hardware. It's a
classic chicken and egg problem. ...

Right! And further, it's a problem of benchmarks that are parallelized enough.
A lot of people would be disappointed to see low numbers of single-threaded
benchmarks running on processors like that.
Back to top
Joe Seigh
Guest





Posted: Mon Oct 17, 2005 11:39 pm    Post subject: Re: hyperthreading in database-benchmarks Reply with quote

David Hopwood wrote:
Quote:
Andy Glew wrote:

The real question is whether you
are better off with CMP than a wide SMT...hard to say

Pinned to my cubicle wall is a slide labelled "AMD 2005 Analysts
Day", (i.e. public material), supposedly presented by Chuck Moore,
labelled "Multi-threading done right":


http://www.amd.com/us-en/assets/content_type/DownloadableAssets/Chuck_Moore_6-10-05.pdf
page 4.

What's the "Throughput Architecture" under the Future header on the Architectural

Generations foil? Is this the same as Sun's throughput computing? Is throughput
as standard term?

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
David Kanter
Guest





Posted: Tue Oct 18, 2005 5:25 am    Post subject: Re: hyperthreading in database-benchmarks Reply with quote

Jens Meyer wrote:
Quote:
The only problem I see with such a CPU is that a lot of software isn't
parallelized yet - although writing multithreaded apps is quite easy.

This isn't a problem for software. It's a problem for hardware. It's a
classic chicken and egg problem. ...

Right! And further, it's a problem of benchmarks that are parallelized enough.
A lot of people would be disappointed to see low numbers of single-threaded
benchmarks running on processors like that.

Spot on. If you look at the number of benchmarks that are:

1. Easy to run (don't require > 4 disks or > 1 client)
2. Scalable to 8 CPUs
3. Relatively easily available

You're left with a short list indeed. That's one thing I'm trying to
work on right now, so any help would be appreciated!

David
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB