In article <d0evq2$r45$1@gemini.csx.cam.ac.uk>,
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
In article <1110109738.580076.95890@o13g2000cwo.googlegroups.com>,
already5chosen@yahoo.com> wrote:
Can you share the numbers, please? For example, would 72-socket
(144-core) US-IV based SunFire15K outperform 16-socket (32 cores) p570?
32-socket p595?
How it faires against mid-range (64P) Altix3K?
All for your sort of HPC, of course.
I am afraid not. I have not tried those, and the information I have
on some machines is NDA. What I can tell you that ISN'T under NDA,
is that a Sun F15K beats the balls off an IBM P670 (Regatta). This
is because IBM made a complete pig's ear of the memory management on
the POWER4, which I believe they have fixed in the POWER5.
Perhaps I should clarify this. Precisely the converse was true of
single-CPU performance, including memory access, where the POWER4
beat the balls off everything else. The press was solid with
statements that everything else (except perhaps the Itanium) was
doomed, because the idiots looked solely at the single-CPU Spec
figures.
However, when you start to run N copies of an almost identical
processes in parallel (as is normal for HPC), things degraded badly.
Systems like the Altix and F15K degrade when there is a conflict,
but scale well where there isn't. IBM never claimed that the 670
(actually, I mean 690) would scale linearly but, in the event,
things were a lot worse than expected (especially with respect to
latency, for some reason).
STREAMS figures published by John McCalpin showed this very clearly,
as they also show that the POWER5 has fixed at least the obvious
issues. I can't tell you what the problem was, but I do know that
it wasn't expected. I really can't explain the latency effect.
The effect was that a F15K couldn't even approach a 670/690, CPU
for CPU, but it scales up to c. 100 CPUs (with memory corresponding
to 72), and a 670/690 tops out at 32. There is actually more to this,
because tuning for AIX 5L is considerably harder than for Solaris 9,
especially in this area, and I was therefore referring to the sort
of performance the "ordinary" HPC program would see.
I cannot say what would happen on a single-application HPC system,
where AIX could be tuned for that application. This is a large-
page issue, incidentally - see the AIX documentation for the details.
Until and unless I compare a F25K and 595 myself, I can't say what
they will do (and then it is unlikely I will be able to post).
Regards,
Nick Maclaren.