| Author |
Message |
Nick Maclaren
Guest
|
Posted:
Mon Sep 12, 2005 11:11 pm Post subject:
Re: Using hierarchical memory as an acquire memory barrier f |
|
|
In article <axhVe.1600$Kk3.1058@fe1.news.blueyonder.co.uk>,
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> wrote:
| Quote: | Nick Maclaren wrote:
|
|> Cache is supposed to be a transparent abstraction. So is the TLB (software
|> TLBs notwithstanding). Breaking that will break anyone's ability to understand
|> what the system is doing, just in order to try (without necessarily succeeding)
|> to optimize something that isn't a performance bottleneck.
Don't be so certain of the last. I see both cache and TLB handling
being a performance bottleneck on a daily basis, and one of the
solutions to this would involve making them more visible.
The "something that isn't a performance bottleneck" was referring to
acquire memory barriers. Sorry if that wasn't clear.
|
Ah. No, I didn't get that.
However, my statement stands, even for those, though I don't see
it that often. It is a major issue for OpenMP and POSIX threads
codes that attempt to deliver small-grain parallelism.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
David Hopwood
Guest
|
Posted:
Tue Sep 13, 2005 6:18 am Post subject:
Re: Using hierarchical memory as an acquire memory barrier f |
|
|
Nick Maclaren wrote:
| Quote: | David Hopwood <david.nospam.hopwood@blueyonder.co.uk> wrote:
Nick Maclaren wrote:
|> Cache is supposed to be a transparent abstraction. So is the TLB (software
|> TLBs notwithstanding). Breaking that will break anyone's ability to understand
|> what the system is doing, just in order to try (without necessarily succeeding)
|> to optimize something that isn't a performance bottleneck.
Don't be so certain of the last. I see both cache and TLB handling
being a performance bottleneck on a daily basis, and one of the
solutions to this would involve making them more visible.
The "something that isn't a performance bottleneck" was referring to
acquire memory barriers. Sorry if that wasn't clear.
Ah. No, I didn't get that.
However, my statement stands, even for those, though I don't see
it that often. It is a major issue for OpenMP and POSIX threads
codes that attempt to deliver small-grain parallelism.
|
I don't think that the cost of acquire barriers, specifically, is the
cause of any performance problems with OpenMP and pthreads in providing
small-grain parallelism. Note that these barriers are almost certainly
not needed anyway on x86[-64], PPC or SPARC.
--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Sep 13, 2005 8:15 am Post subject:
Re: Using hierarchical memory as an acquire memory barrier f |
|
|
In article <TNpVe.24147$k22.15331@fe2.news.blueyonder.co.uk>,
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> wrote:
| Quote: |
The "something that isn't a performance bottleneck" was referring to
acquire memory barriers. Sorry if that wasn't clear.
Ah. No, I didn't get that.
However, my statement stands, even for those, though I don't see
it that often. It is a major issue for OpenMP and POSIX threads
codes that attempt to deliver small-grain parallelism.
I don't think that the cost of acquire barriers, specifically, is the
cause of any performance problems with OpenMP and pthreads in providing
small-grain parallelism. Note that these barriers are almost certainly
not needed anyway on x86[-64], PPC or SPARC.
|
Not acquire barriers in the very limited sense, no. But the logical
extension of them from the ISA to the HLL interface. I agree that
fiddling with the semantics to cure a hardware non-problem is not
worth the bother, but doing to to address the very real problem I
am referring to is.
In particular, if you make cache and TLB control explicit, an OpenMP
program (not POSIX threads, which is beyond redemption) could insert
the relevant calls at the relevant places. With competent hardware
design, you could even get - heresy - checking!
But I agree that what I was referring to was rather more ambitious
than what most other people may have been thinking of, despite the
fact that it IS the same issue, viewed in the large.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Seongbae Park
Guest
|
Posted:
Tue Sep 13, 2005 8:21 am Post subject:
Re: Using hierarchical memory as an acquire memory barrier f |
|
|
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
| Quote: | In particular, if you make cache and TLB control explicit, an OpenMP
|
Do you really believe that programmers can handle
a full explicit control of cache
which could very well cause a hang or extreme starvation ?
I'm not even sure runtime and compiler writers would be able to handle it
very well
- that's maybe because I haven't given a lot of thought to it though.
Shouldn't they (both general programmers and compiler/runtime writers)
be much better off with an "advice" mechanism ?
Prefetch is one such advice, and you can easily imagine
the reverse of prefetch, and prefetch (and reverse) for TLB as well.
| Quote: | program (not POSIX threads, which is beyond redemption) could insert
the relevant calls at the relevant places. With competent hardware
design, you could even get - heresy - checking!
But I agree that what I was referring to was rather more ambitious
than what most other people may have been thinking of, despite the
fact that it IS the same issue, viewed in the large.
|
BTW, explicit control of TLB is not impossible on the current generation
of programs - most modern hardwares support enough features to make
it possible (even if in a limited fashion)
for system software to provide such a feature,
although they are generally not implemented
nor provided to the user level software in a generally useful form.
--
#pragma ident "Seongbae Park, compiler, http://blogs.sun.com/seongbae/" |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Sep 13, 2005 2:52 pm Post subject:
Re: Using hierarchical memory as an acquire memory barrier f |
|
|
In article <dg625n$5sf$1@news1nwk.SFbay.Sun.COM>, Seongbae Park <Seongbae.Park@Sun.COM> writes:
|> Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
|> > In particular, if you make cache and TLB control explicit, an OpenMP
|>
|> Do you really believe that programmers can handle
|> a full explicit control of cache
|> which could very well cause a hang or extreme starvation ?
A few of us could, but I agree very few.
|> I'm not even sure runtime and compiler writers would be able to handle it
|> very well
|> - that's maybe because I haven't given a lot of thought to it though.
I am sure that they can't, and I have, plus I have a lot of experience!
What you may have missed is that it is ALREADY assumed for things
like OpenMP and POSIX threads. Because they have been attempted
on systems where the hardware does not make any provision for such
program control, they are at best a major headache and often simply
don't work.
There are solutions to this, of course, but only radical ones.
|> Shouldn't they (both general programmers and compiler/runtime writers)
|> be much better off with an "advice" mechanism ?
No. Universal experience is that such systems are at best a waste
of time and quite often are a disaster.
|> BTW, explicit control of TLB is not impossible on the current generation
|> of programs - most modern hardwares support enough features to make
|> it possible (even if in a limited fashion)
|> for system software to provide such a feature,
|> although they are generally not implemented
|> nor provided to the user level software in a generally useful form.
It's impractical, because it is available only at an extreme level
of privilege, and the checking needed to allow program control
makes it prohibitively expensive. Note that it isn't JUST the
actual control, but preventing things like asynchronous system
calls, system calls in other threads and interrupt handlers from
having kernel code broken by user action.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Thomas Lindgren
Guest
|
Posted:
Tue Sep 13, 2005 4:15 pm Post subject:
Re: Using hierarchical memory as an acquire memory barrier f |
|
|
Seongbae Park <Seongbae.Park@Sun.COM> writes:
| Quote: | Do you really believe that programmers can handle
a full explicit control of cache
which could very well cause a hang or extreme starvation ?
I'm not even sure runtime and compiler writers would be able to handle it
very well
- that's maybe because I haven't given a lot of thought to it though.
|
OK, maybe this has been mentioned earlier, but ...
I'd say "it depends". Some architectures, such as the Playstation 2
one, have a fast special-purpose memory ("scratchpad"). Using a memory
abstraction seems simpler and easier than explicit cache and/or TLB
management.
In Linux on Cell, it appears programmers allocate the vector engines,
SPEs, as devices, which means there is no need to virtualize
them. Maybe the same could be done for a fast memory? (Plus some sort
of mmap() to access it.) An explicitly managed TLB could perhaps be
allocated and used as if a device too. For instance, the mmap() call
could also optionally ask for a locked TLB entry if the "TLB device"
can be used. Plenty of options possible here.
On the face of it, such an API seems _reasonably_ straightforward to
implement and use. Programmers will have to handle the case when the
fast memory is not available, of course, but the fast memory is really
just a performance optimization. (What to do if there are many banks
of fast memory with different characteristics? Left as an exercise
for the budding ASPLOS author.)
Best,
Thomas
--
Thomas Lindgren
"It's becoming popular? It must be in decline." -- Isaiah Berlin |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Tue Sep 13, 2005 4:15 pm Post subject:
Re: Using hierarchical memory as an acquire memory barrier f |
|
|
Seongbae Park wrote:
| Quote: | Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
In particular, if you make cache and TLB control explicit, an OpenMP
Do you really believe that programmers can handle
a full explicit control of cache
which could very well cause a hang or extreme starvation ?
I'm not even sure runtime and compiler writers would be able to handle it
very well
- that's maybe because I haven't given a lot of thought to it though.
|
You have a different idea of what explicit control means so I don't
know what mechanism you imagine can cause those kind of problems.
I think a lot of the problems with explicit cache control in the old
days was it wasn't coherent and you either lost updates to memory or
updates showed up in the wrong order according to the memory model
if any (there probably was no memory model).
You're saying you don't trust programmers to be able to implement an api
correctly or efficiently and that the api should be implemented by
hardware types. Ok, no problem. I don't think application programmers
care about where or how the api is implemented. But you can't cheat by
simplifying or not implementing the full api. Java is probably the
closest to this and it still looks like an ISA w/ memory model and the
api's still look like software with a lot of Java Virtual Machine specific
JNI magic.
NUMA and ccNUMA look pretty close to what explicit cache control might
look like. A lot of copying of global memory to/from local memory
for processing locally.
I look forward to hardware api's for multi-threading and parallel programming
that scale well for the 100+ multicore processors they claim we'll have in
10 years. And we need the api's as soon as possible. You can't rewrite
millions of lines of code in a few weeks.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Alexander Terekhov
Guest
|
Posted:
Tue Sep 13, 2005 10:10 pm Post subject:
Re: Using hierarchical memory as an acquire memory barrier f |
|
|
David Hopwood wrote:
[...]
| Quote: | I don't think that the cost of acquire barriers, ... Note that these
barriers are almost certainly not needed anyway on x86[-64], PPC
|
You probably mean ddacq in conjunction with loads (RMW stuff aside
for a moment). Because extra sync is certainly needed for both ccacq_*
and plain acq loads on PPC.
regards,
alexander. |
|
| Back to top |
|
 |
|
|
|
|