A.G.McDowell
Guest
|
Posted:
Fri Feb 04, 2005 2:51 am Post subject:
Running out of speed on PC-based systems |
|
|
I hope that the following real life situation will provide an example of
multithreading and demanding applications, as well as hopefully
eliciting some advice for me.
I have been using VTune to poke around a large 'real-time' simulation
system (that is, simulated time = elapsed time to within about 1/10th
second) that is closer to its contractual performance margins than we
would like. The original performance estimates assumed a growth in the
performance of commodity PCs that has not come to pass. The simulation
framework makes it hard (but not impossible) to parallelise the
application within a single PC, but multithreading and multiple shared-
memory cpus may not be the answer. Adding new threads that model other
parts of the system slows down the bottleneck thread even though we
appear to have plenty of available cpu parallelism (a 2-cpu Xeon gives
us 4 virtual cpus, and we appear to have enough total work for about 1.6
of them). VTune shows that cycles per instruction in the bottleneck
thread goes up by 30% in these circumstances, so our theory is that we
are running out of memory bandwidth. Off-loading some work to a separate
machine reclaims some of that 30% performance loss, even though there is
then substantial TCP traffic from one to the other, but further
increases in model complexity start hitting the bottleneck thread again.
The bottleneck thread is simulating a Sparc relative, using software not
under our control. It is responsible for anything from 60-90% of the cpu
consumption, depending on the scope of the model, and what it is being
asked to do. FWIW, the idle loop of the simulated software runs a memory
cleaning task, working its way systematically through the simulated
memory to guard against single-bit errors, and the simulator is more
than just a software Sparc: it is capable of producing a variety of
error and failure conditions on demand.
We are currently running this on a top-line Dell server. I think this
means 2 physical Xeon chips at 3.6GHz with an 800MHz FSB and 400MHz
memory.
Is it plausible that such a system would be bottlenecking on memory
bandwidth, rather than cpu?
Is memory bandwidth running ahead of or behind cpu performance?
Only a small number of such systems will run, so there is a good
argument for throwing money at hardware, rather than software. Are there
niche PC makers out there that could give us a 50-100% increase - no
doubt for a price? We are not happy about overclocking, but do there
exist fast memory subsystems that nevertheless stay within the
manufacturers recommended operating conditions?
--
A.G.McDowell |
|
Greg Lindahl
Guest
|
Posted:
Fri Feb 04, 2005 3:28 am Post subject:
Re: Running out of speed on PC-based systems |
|
|
In article <JAzicKApzpACFwCy@mcdowella.demon.co.uk>,
A.G.McDowell <mcdowella@mcdowella.demon.co.uk> wrote:
| Quote: | We are currently running this on a top-line Dell server. I think this
means 2 physical Xeon chips at 3.6GHz with an 800MHz FSB and 400MHz
memory.
|
It would be worth your while to go look at it running on an
Opteron-based machine, which has considerably faster memory, as well
as much higher bandwidth to memory on SMP systems, if the OS can place
your pages correctly.
| Quote: | Is it plausible that such a system would be bottlenecking on memory
bandwidth, rather than cpu?
|
Yes, or memory latency.
| Quote: | Is memory bandwidth running ahead of or behind cpu performance?
|
Behind, generally. Ditto for memory latency. This is why it's called
"the memory wall".
-- greg |
|