| Author |
Message |
Del Cecchi
Guest
|
Posted:
Tue Dec 07, 2004 7:28 pm Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
"Grumble" <devnull@kma.eu.org> wrote in message
news:cp4djh$vdh$1@news-rocq.inria.fr...
| Quote: | Del Cecchi wrote:
What braindamaged newsreader are you using that won't let you right
click the link in the newsreader? Even OE does that. So quit whining
and switch to a decent newsreader.
Speaking of brain-damaged newsreaders, take a look at the mess yours
did when you quoted John's message. I rest my case.
|
A few lines got wrapped. That what you are talking about?
del |
|
| Back to top |
|
 |
Grumble
Guest
|
Posted:
Tue Dec 07, 2004 8:19 pm Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
Del Cecchi wrote:
| Quote: | Grumble wrote:
Del Cecchi wrote:
What braindamaged newsreader are you using that won't let you
right click the link in the newsreader? Even OE does that.
So quit whining and switch to a decent newsreader.
Speaking of brain-damaged newsreaders, take a look at the mess
yours did when you quoted John's message. I rest my case.
A few lines got wrapped. That what you are talking about?
|
Yessir!
Perhaps OE-QuoteFix might help if you must use OE? |
|
| Back to top |
|
 |
Guest
|
Posted:
Tue Dec 07, 2004 9:11 pm Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
Yes, it is clear that the memory controller (and the rest of the
NorthBridge) operates at CPU frequency, However, the DRAM controller
operates at DRAM frequency (Address rate).
CPU<->NB<->MC<->DC<->DRAM |
|
| Back to top |
|
 |
Greg Lindahl
Guest
|
Posted:
Tue Dec 07, 2004 10:56 pm Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
In article <pan.2004.12.07.01.37.06.417847@att.bizzzz>,
keith <krw@att.bizzzz> wrote:
| Quote: | Note that the STREAM bandwidth and lmbench latency changes with every
cpuspeedbump. So clearly part of the memory controller is at the cpu
core frequency, or a related frequency, and not at the HT frequency,
or the SDRAM external bus frequency.
That does *not* mean that the memory corntoller runs at the core speed.
It would be nuts to assume such. Would you assume the cashes of the
PII run at the the I/O bus speed?
|
"or a related frequency", i.e. based on the cpu frequency with a
constant divider.
| Quote: | Please reduce the cross-post. Followups set to a group I read.
Isn't his a rather egotistical statement?
|
No, it follows Usenet tradition: post only to groups that you read.
But thanks for giving me the benefit of the doubt.
-- greg |
|
| Back to top |
|
 |
Eric C. Fromm
Guest
|
Posted:
Wed Dec 08, 2004 12:32 am Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
Janne Blomqvist wrote:
| Quote: |
By the time dual core Opterons arrive, I suspect that DDR2-800 will
also be available, thus providing twice the memory BW compared to the
current single core offerings using DDR-400.
And unless the HyperTransport channels get faster or more are added for the |
dual core chip, non-NUMA aware kernels and applications might not always
see the full benefits of that bandwidth doubling. I also wonder how many
DIMMs can be reliably configured on a DDR2-800 bus. There might well
be a capacity trade off required at those speeds.
--
Eric C. Fromm efromm@sgi.com
Principal Engineer Scalable Systems Division
SGI - Silicon Graphics, Inc. Chippewa Falls, Wi. |
|
| Back to top |
|
 |
Greg Lindahl
Guest
|
Posted:
Wed Dec 08, 2004 2:22 am Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
In article <cp50gt$2gvbd1$1@fido.engr.sgi.com>,
Eric C. Fromm <efromm@sgi.com> wrote:
| Quote: | And unless the HyperTransport channels get faster or more are added for the
dual core chip, non-NUMA aware kernels and applications might not always
see the full benefits of that bandwidth doubling.
|
Right. AMD has a roadmap for HT to address this issue. However, there
will be a large number of single-socket systems and systems running
processes that control their locality pretty well (MPI usually falls
into this category) who will all see the full benefit.
IBM had a similar set of issues with Power4 and Power5. They sold
systems with only 1 cpu enabled to address customers who want the most
memory bandwidth. And the inter-cpu links were fast enough that
scaling was pretty good either way.
-- greg |
|
| Back to top |
|
 |
John Savard
Guest
|
Posted:
Wed Dec 08, 2004 5:01 am Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
On Mon, 6 Dec 2004 20:16:21 -0600, "del cecchi" <dcecchi.nojunk@att.net>
wrote, in part:
| Quote: | What braindamaged newsreader are you using that won't let you right
click the link in the newsreader?
|
Clicking on the link in the newsreader, supposing I could do that, would
simply cause the link to open in a browser window. Which is exactly what
I achieved by cutting and pasting.
Maybe some newsreaders do allow right-clicking links. Such newsreaders
would probably also do dangerous and reckless things like rendering HTML
posts instead of displaying them in all their <angle bracket> glory.
This could result in having a brain-damaged computer, were I to view the
wrong post by accident.
As the posting in question was a text posting, this means that the
newsreader would have to guess at what constituted an URL, as well, with
no doubt occasional hilarious results.
John Savard
http://home.ecn.ab.ca/~jsavard/index.html |
|
| Back to top |
|
 |
David Schwartz
Guest
|
Posted:
Wed Dec 08, 2004 7:14 am Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
"Per Ekman" <pek@pdc.kth.se> wrote in message
news:mjewtvuh6u0.fsf@curlew.pdc.kth.se...
| Quote: | Yousuf Khan <bbbl67@ezrs.com> writes:
Actually, there was a story here not so long ago where one of the Linux
distros had been optimized up with NUMA assumptions, and it actually ran
/slower/ than a non-NUMA kernel. In other words the Linux kernel might
have spent more time making complex decisions about memory placement
than it was actually going to save from the latencies.
And the conclusion was that a multi-CPU Opteron system must then be
UMA, rather than that the NUMA "optimizations" were crap?
|
There is a cost to treating memory as NUMA. The benefit you get in
exchange for that cost is dependent upon how NU the MA is. The point is that
MA on an Opteron system with 2 to 8 processors is so close to U that
treating it in most cases, it's effectively U.
The scaling advantage comes largely from the architecture of a single
processor. The memory controller is on the chip. The main reason this
matters is that it means that local memory accesses don't have to content
with any other inter-CPU or I/O traffic. The other advantage comes from the
number of HT interfaces. Corresponding Intel CPUs have only a single FSB
over which all traffic must flow.
Above 8 processors, things get much more complicated. But it doesn't
seem like there's much of a (mainstream commercial) market at that scaling
level yet.
DS |
|
| Back to top |
|
 |
Tony Hill
Guest
|
Posted:
Wed Dec 08, 2004 10:19 am Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
On 06 Dec 2004 14:12:20 +0100, Per Ekman <pek@pdc.kth.se> wrote:
| Quote: | Tony Hill <hilla_nospam_20@yahoo.ca> writes:
It does, but the difference is small, usually less than 10% and often
much closer to 0%.
And sometimes 50%...
|
Sure, there will be extreme cases in everything.
| Quote: | Most users don't use their computer to run STREAM though. Even in the
HPC community where memory bandwidth is king, STREAM is still a rather
extreme case.
I admit I'm from the HPC-sector and memory bandwidth is very important
to many applications here.
|
One thing that you need to keep in mind is that you represent a VERY
small minority here in terms of PC server sales. Just because it
matters to your application probably doesn't have much reference to
the bulk of the buying public, and it almost certainly isn't going to
have implications for what the marketing people write in the trade
rags.
| Quote: | Besides, they do recognize that it is NUMA, just that they are saying
you don't NEED to worry about that if you don't want to because for
the vast majority of times the performance difference is lost in the
noise.
It's a pretty strange argument in my eyes, "If you ignore the
applications that run poorly because of property X, then it makes
sense to downplay property X." True, but not helpful if you have such
an application.
|
Ahh, but it's VERY helpful if you're in the marketing department! :>
In the end, the people that are going to take a performance due to
lack of NUMA optimizations probably already know as much and have
factored it into their buying decisions. The people who are talking
to Dell or HPaq's server sales and are thinking about an Opteron
system but are worried that this here NoooMah thingy might cause their
application to run slow most likely don't have to worry about much.
Hence SUMO.
It's all a matter of perspective.
-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca |
|
| Back to top |
|
 |
George Macdonald
Guest
|
Posted:
Wed Dec 08, 2004 6:09 pm Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
On Wed, 08 Dec 2004 00:19:59 -0500, Tony Hill <hilla_nospam_20@yahoo.ca>
wrote:
| Quote: | On 06 Dec 2004 14:12:20 +0100, Per Ekman <pek@pdc.kth.se> wrote:
Tony Hill <hilla_nospam_20@yahoo.ca> writes:
It does, but the difference is small, usually less than 10% and often
much closer to 0%.
And sometimes 50%...
Sure, there will be extreme cases in everything.
Most users don't use their computer to run STREAM though. Even in the
HPC community where memory bandwidth is king, STREAM is still a rather
extreme case.
I admit I'm from the HPC-sector and memory bandwidth is very important
to many applications here.
One thing that you need to keep in mind is that you represent a VERY
small minority here in terms of PC server sales. Just because it
matters to your application probably doesn't have much reference to
the bulk of the buying public, and it almost certainly isn't going to
have implications for what the marketing people write in the trade
rags.
|
I think you're underestimating the size of the "workstation" market, which
will include people finding they can migrate down to PC-grade CPUs to
replace old "higher power" systems as well as people on the lower-end
fringe who may have grown their problem complexity beyond a uni-PC, or who
*could* get by with a fastish PC but like the comfort of the move up to
dual for future growth. Add them to the current established base of CAD,
engineering and modeling etc. applications and there is a decent sized
market.
There are a lot of mathematical/engineering problems out there which are
just part of everyday business computing - many *used* to be considered HPC
and are now quite routine on desktop sized boxes. In many cases,
proprietary (purchased) software is used and the algorithmic methods are
only understood fairly superficially by the user; what that user wants is
response, whether it's measured in minutes, hours or a day or more. The
software vendor thus feels responsible for supplying the best combination
of software and recommended hardware selection.
Rgds, George Macdonald
"Just because they're paranoid doesn't mean you're not psychotic" - Who, me?? |
|
| Back to top |
|
 |
keith
Guest
|
Posted:
Thu Dec 09, 2004 9:43 am Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
On Tue, 07 Dec 2004 09:56:44 -0800, Greg Lindahl wrote:
| Quote: | In article <pan.2004.12.07.01.37.06.417847@att.bizzzz>,
keith <krw@att.bizzzz> wrote:
Note that the STREAM bandwidth and lmbench latency changes with every
cpuspeedbump. So clearly part of the memory controller is at the cpu
core frequency, or a related frequency, and not at the HT frequency,
or the SDRAM external bus frequency.
That does *not* mean that the memory corntoller runs at the core speed.
It would be nuts to assume such. Would you assume the cashes of the
PII run at the the I/O bus speed?
"or a related frequency", i.e. based on the cpu frequency with a
constant divider.
|
Ok, how many "unrelated frequencies" are there in a CPU? Let's get real
here.
| Quote: | Please reduce the cross-post. Followups set to a group I read.
Isn't his a rather egotistical statement?
No, it follows Usenet tradition: post only to groups that you read.
|
No, that is *not* Usenet tradition. The tradition is to limit
cross-postings to on-topic newsgroups. Cross-posting is not expensive
(unless you have a dran-bamaged newsreader).
| Quote: | But thanks for giving me the benefit of the doubt.
|
Cutting off your audience, particularly those who *you* have responded to
is rude. Sorry if I've ruffled your feathers!
--
Keith |
|
| Back to top |
|
 |
Bernd Paysan
Guest
|
Posted:
Thu Dec 09, 2004 3:15 pm Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
David Schwartz wrote:
| Quote: | The scaling advantage comes largely from the architecture of a single
processor. The memory controller is on the chip. The main reason this
matters is that it means that local memory accesses don't have to content
with any other inter-CPU or I/O traffic.
|
That's only partly true. The Opterons still talk to each other even on local
accesses (coherency tokens only, no real data transfer). This takes both
time and adds to the traffic, since such a token needs to get everywhere.
What's missing here is a "exclusive" bit in the page table, for non-coherent
pages. The OS pretty well knows (or can know) which core is accessing a
page, and for a page that's not shared, the coherency token is not
necessary.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/ |
|
| Back to top |
|
 |
achish777@cox.net
Guest
|
Posted:
Fri Dec 10, 2004 10:47 am Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
Greg Lindahl wrote:
| Quote: | Right. AMD has a roadmap for HT to address this issue.
|
I recently listened to Fred Weber (CTO of AMD) present at Lehman
Brothers 2004 T4
conference. When he was speaking about AMDs future direction he
mentioned HyperTransport 3 and said it would be 5 Gigatransfers/second
and higher. It sounds like alot, but I'm still doing my research to
find out what exactly a "Gigatransfer" is :).
p.s - I noticed that the new HTX standard is speced at 1.8
Gigatransfers/sec. With that number and your new Infinipath adapter
Pathscale showed some impressive MPI latency numbers, it seems its only
going to get much better with HT 3.
Regards,
Garius |
|
| Back to top |
|
 |
Del Cecchi
Guest
|
Posted:
Fri Dec 10, 2004 7:25 pm Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
achish777@cox.net wrote:
| Quote: | Greg Lindahl wrote:
Right. AMD has a roadmap for HT to address this issue.
I recently listened to Fred Weber (CTO of AMD) present at Lehman
Brothers 2004 T4
conference. When he was speaking about AMDs future direction he
mentioned HyperTransport 3 and said it would be 5 Gigatransfers/second
and higher. It sounds like alot, but I'm still doing my research to
find out what exactly a "Gigatransfer" is :).
p.s - I noticed that the new HTX standard is speced at 1.8
Gigatransfers/sec. With that number and your new Infinipath adapter
Pathscale showed some impressive MPI latency numbers, it seems its only
going to get much better with HT 3.
Regards,
Garius
|
A Gigatransfer/s is 10**9 bits per pin or pin pair. It removes the
ambiguity when discussing links whose width is variable, like HT and
many others.
HT has released specifications for transfer rates to 2.4 GT/s. |
|
| Back to top |
|
 |
Keith R. Williams
Guest
|
Posted:
Fri Dec 10, 2004 7:40 pm Post subject:
Re: Pretty good explanation of x86-64 by HP |
|
|
In article <31tpvbF390ri8U1@individual.net>, cecchinospam@us.ibm.com
says...
| Quote: | achish777@cox.net wrote:
Greg Lindahl wrote:
Right. AMD has a roadmap for HT to address this issue.
I recently listened to Fred Weber (CTO of AMD) present at Lehman
Brothers 2004 T4
conference. When he was speaking about AMDs future direction he
mentioned HyperTransport 3 and said it would be 5 Gigatransfers/second
and higher. It sounds like alot, but I'm still doing my research to
find out what exactly a "Gigatransfer" is :).
p.s - I noticed that the new HTX standard is speced at 1.8
Gigatransfers/sec. With that number and your new Infinipath adapter
Pathscale showed some impressive MPI latency numbers, it seems its only
going to get much better with HT 3.
Regards,
Garius
A Gigatransfer/s is 10**9 bits per pin or pin pair. It removes the
ambiguity when discussing links whose width is variable, like HT and
many others.
|
It also eliminates the ambiguity of MHz for DDR (QDR, etc.) transfers.
| Quote: |
HT has released specifications for transfer rates to 2.4 GT/s.
-- |
Keith |
|
| Back to top |
|
 |
|
|
|
|