Using hierarchical memory as an acquire memory barrier for d
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Using hierarchical memory as an acquire memory barrier for d
Goto page Previous  1, 2
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Nick Maclaren
Guest





Posted: Mon Sep 12, 2005 11:11 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

In article <axhVe.1600$Kk3.1058@fe1.news.blueyonder.co.uk>,
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> wrote:
Quote:
Nick Maclaren wrote:
|
|> Cache is supposed to be a transparent abstraction. So is the TLB (software
|> TLBs notwithstanding). Breaking that will break anyone's ability to understand
|> what the system is doing, just in order to try (without necessarily succeeding)
|> to optimize something that isn't a performance bottleneck.

Don't be so certain of the last. I see both cache and TLB handling
being a performance bottleneck on a daily basis, and one of the
solutions to this would involve making them more visible.

The "something that isn't a performance bottleneck" was referring to
acquire memory barriers. Sorry if that wasn't clear.

Ah. No, I didn't get that.

However, my statement stands, even for those, though I don't see
it that often. It is a major issue for OpenMP and POSIX threads
codes that attempt to deliver small-grain parallelism.


Regards,
Nick Maclaren.
Back to top
David Hopwood
Guest





Posted: Tue Sep 13, 2005 6:18 am    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Nick Maclaren wrote:
Quote:
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> wrote:
Nick Maclaren wrote:

|> Cache is supposed to be a transparent abstraction. So is the TLB (software
|> TLBs notwithstanding). Breaking that will break anyone's ability to understand
|> what the system is doing, just in order to try (without necessarily succeeding)
|> to optimize something that isn't a performance bottleneck.

Don't be so certain of the last. I see both cache and TLB handling
being a performance bottleneck on a daily basis, and one of the
solutions to this would involve making them more visible.

The "something that isn't a performance bottleneck" was referring to
acquire memory barriers. Sorry if that wasn't clear.

Ah. No, I didn't get that.

However, my statement stands, even for those, though I don't see
it that often. It is a major issue for OpenMP and POSIX threads
codes that attempt to deliver small-grain parallelism.

I don't think that the cost of acquire barriers, specifically, is the
cause of any performance problems with OpenMP and pthreads in providing
small-grain parallelism. Note that these barriers are almost certainly
not needed anyway on x86[-64], PPC or SPARC.

--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Back to top
Nick Maclaren
Guest





Posted: Tue Sep 13, 2005 8:15 am    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

In article <TNpVe.24147$k22.15331@fe2.news.blueyonder.co.uk>,
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> wrote:
Quote:

The "something that isn't a performance bottleneck" was referring to
acquire memory barriers. Sorry if that wasn't clear.

Ah. No, I didn't get that.

However, my statement stands, even for those, though I don't see
it that often. It is a major issue for OpenMP and POSIX threads
codes that attempt to deliver small-grain parallelism.

I don't think that the cost of acquire barriers, specifically, is the
cause of any performance problems with OpenMP and pthreads in providing
small-grain parallelism. Note that these barriers are almost certainly
not needed anyway on x86[-64], PPC or SPARC.

Not acquire barriers in the very limited sense, no. But the logical
extension of them from the ISA to the HLL interface. I agree that
fiddling with the semantics to cure a hardware non-problem is not
worth the bother, but doing to to address the very real problem I
am referring to is.

In particular, if you make cache and TLB control explicit, an OpenMP
program (not POSIX threads, which is beyond redemption) could insert
the relevant calls at the relevant places. With competent hardware
design, you could even get - heresy - checking!

But I agree that what I was referring to was rather more ambitious
than what most other people may have been thinking of, despite the
fact that it IS the same issue, viewed in the large.


Regards,
Nick Maclaren.
Back to top
Seongbae Park
Guest





Posted: Tue Sep 13, 2005 8:21 am    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
Quote:
In particular, if you make cache and TLB control explicit, an OpenMP

Do you really believe that programmers can handle
a full explicit control of cache
which could very well cause a hang or extreme starvation ?
I'm not even sure runtime and compiler writers would be able to handle it
very well
- that's maybe because I haven't given a lot of thought to it though.

Shouldn't they (both general programmers and compiler/runtime writers)
be much better off with an "advice" mechanism ?
Prefetch is one such advice, and you can easily imagine
the reverse of prefetch, and prefetch (and reverse) for TLB as well.

Quote:
program (not POSIX threads, which is beyond redemption) could insert
the relevant calls at the relevant places. With competent hardware
design, you could even get - heresy - checking!

But I agree that what I was referring to was rather more ambitious
than what most other people may have been thinking of, despite the
fact that it IS the same issue, viewed in the large.

BTW, explicit control of TLB is not impossible on the current generation
of programs - most modern hardwares support enough features to make
it possible (even if in a limited fashion)
for system software to provide such a feature,
although they are generally not implemented
nor provided to the user level software in a generally useful form.
--
#pragma ident "Seongbae Park, compiler, http://blogs.sun.com/seongbae/"
Back to top
Nick Maclaren
Guest





Posted: Tue Sep 13, 2005 2:52 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

In article <dg625n$5sf$1@news1nwk.SFbay.Sun.COM>, Seongbae Park <Seongbae.Park@Sun.COM> writes:
|> Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
|> > In particular, if you make cache and TLB control explicit, an OpenMP
|>
|> Do you really believe that programmers can handle
|> a full explicit control of cache
|> which could very well cause a hang or extreme starvation ?

A few of us could, but I agree very few.

|> I'm not even sure runtime and compiler writers would be able to handle it
|> very well
|> - that's maybe because I haven't given a lot of thought to it though.

I am sure that they can't, and I have, plus I have a lot of experience!

What you may have missed is that it is ALREADY assumed for things
like OpenMP and POSIX threads. Because they have been attempted
on systems where the hardware does not make any provision for such
program control, they are at best a major headache and often simply
don't work.

There are solutions to this, of course, but only radical ones.

|> Shouldn't they (both general programmers and compiler/runtime writers)
|> be much better off with an "advice" mechanism ?

No. Universal experience is that such systems are at best a waste
of time and quite often are a disaster.

|> BTW, explicit control of TLB is not impossible on the current generation
|> of programs - most modern hardwares support enough features to make
|> it possible (even if in a limited fashion)
|> for system software to provide such a feature,
|> although they are generally not implemented
|> nor provided to the user level software in a generally useful form.

It's impractical, because it is available only at an extreme level
of privilege, and the checking needed to allow program control
makes it prohibitively expensive. Note that it isn't JUST the
actual control, but preventing things like asynchronous system
calls, system calls in other threads and interrupt handlers from
having kernel code broken by user action.


Regards,
Nick Maclaren.
Back to top
Thomas Lindgren
Guest





Posted: Tue Sep 13, 2005 4:15 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Seongbae Park <Seongbae.Park@Sun.COM> writes:

Quote:
Do you really believe that programmers can handle
a full explicit control of cache
which could very well cause a hang or extreme starvation ?
I'm not even sure runtime and compiler writers would be able to handle it
very well
- that's maybe because I haven't given a lot of thought to it though.

OK, maybe this has been mentioned earlier, but ...

I'd say "it depends". Some architectures, such as the Playstation 2
one, have a fast special-purpose memory ("scratchpad"). Using a memory
abstraction seems simpler and easier than explicit cache and/or TLB
management.

In Linux on Cell, it appears programmers allocate the vector engines,
SPEs, as devices, which means there is no need to virtualize
them. Maybe the same could be done for a fast memory? (Plus some sort
of mmap() to access it.) An explicitly managed TLB could perhaps be
allocated and used as if a device too. For instance, the mmap() call
could also optionally ask for a locked TLB entry if the "TLB device"
can be used. Plenty of options possible here.

On the face of it, such an API seems _reasonably_ straightforward to
implement and use. Programmers will have to handle the case when the
fast memory is not available, of course, but the fast memory is really
just a performance optimization. (What to do if there are many banks
of fast memory with different characteristics? Left as an exercise
for the budding ASPLOS author.)

Best,
Thomas
--
Thomas Lindgren
"It's becoming popular? It must be in decline." -- Isaiah Berlin
Back to top
Joe Seigh
Guest





Posted: Tue Sep 13, 2005 4:15 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Seongbae Park wrote:
Quote:
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:

In particular, if you make cache and TLB control explicit, an OpenMP


Do you really believe that programmers can handle
a full explicit control of cache
which could very well cause a hang or extreme starvation ?
I'm not even sure runtime and compiler writers would be able to handle it
very well
- that's maybe because I haven't given a lot of thought to it though.

You have a different idea of what explicit control means so I don't
know what mechanism you imagine can cause those kind of problems.

I think a lot of the problems with explicit cache control in the old
days was it wasn't coherent and you either lost updates to memory or
updates showed up in the wrong order according to the memory model
if any (there probably was no memory model).

You're saying you don't trust programmers to be able to implement an api
correctly or efficiently and that the api should be implemented by
hardware types. Ok, no problem. I don't think application programmers
care about where or how the api is implemented. But you can't cheat by
simplifying or not implementing the full api. Java is probably the
closest to this and it still looks like an ISA w/ memory model and the
api's still look like software with a lot of Java Virtual Machine specific
JNI magic.

NUMA and ccNUMA look pretty close to what explicit cache control might
look like. A lot of copying of global memory to/from local memory
for processing locally.

I look forward to hardware api's for multi-threading and parallel programming
that scale well for the 100+ multicore processors they claim we'll have in
10 years. And we need the api's as soon as possible. You can't rewrite
millions of lines of code in a few weeks.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Alexander Terekhov
Guest





Posted: Tue Sep 13, 2005 10:10 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

David Hopwood wrote:
[...]
Quote:
I don't think that the cost of acquire barriers, ... Note that these
barriers are almost certainly not needed anyway on x86[-64], PPC

You probably mean ddacq in conjunction with loads (RMW stuff aside
for a moment). Because extra sync is certainly needed for both ccacq_*
and plain acq loads on PPC.

regards,
alexander.
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB