Using hierarchical memory as an acquire memory barrier for d
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Using hierarchical memory as an acquire memory barrier for d
Goto page 1, 2  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Joe Seigh
Guest





Posted: Sun Sep 11, 2005 9:29 pm    Post subject: Using hierarchical memory as an acquire memory barrier for d Reply with quote

While perusing patent applications for lock-free techniques, I noticed the one
by Paul McKenney titled "Software implementation of synchronous memory Barriers"
(20020194436) which basically describes an RCU like technique for eliminating the
need for acquire memory barriers by using signals to force them on a as needed
basis. That explains that part of the discussion on RCU+SMR when it occurred on
LKML.

I seems to me you could use the memory hierarchy to accomplish the same
effect. I'm assuming that out of order dependent memory accesses won't cross
memory hierarchy boundaries.

There's two levels of the memory hierarchy we could exploit and possibly without
any special kernel support. One is cache and the other is virtual memory.
Basically, when memory is freed with proper release semantics, either explicit
or supplied by RCU, you use another form of RCU to determine when all
processors have flushed or purged references to that memory from their
cache or from their translation lookaside buffer. The purge/flush instructions
are the quiescent states for the second form of RCU. Once all processors
has quiesced (dropped references) it is safe for writer threads to use that
memory for updates using RCU.

The problem is that as soon as the writer thread starts accessing that memory,
you can start polluting cache and/or the translation lookaside buffer with no
assurance that they will be flushed if a reader thread is dispatched on that
processor before the writer thread completes its updates to memory. The
way around that problem is to use two different memory mappings for the
same memory, one for use by writer threads and one for use by reader threads.
This trick depends on cache working on virtual memory not real memory if
you're using cache as the memory hierarchy level. Also addresses stored
in memory should valid in the reader memory mapping if you don't want
readers doing offset/address to address conversion.

Granularity of memory management would be the cache line size or page size
depending on which mechanism you are using.

On one level, this is similar to what virtual memory managers do when managing
real memory. In fact one of the earliest uses of an RCU like technique was by
MVS to determine when it was safe to reallocate real memory to another
mapping with SIGP signaling to speed things up if it needed to steal pages.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
David Hopwood
Guest





Posted: Mon Sep 12, 2005 12:15 am    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Joe Seigh wrote:
Quote:
While perusing patent applications for lock-free techniques, I noticed
the one by Paul McKenney titled "Software implementation of synchronous
memory Barriers" (20020194436) which basically describes an RCU like
technique for eliminating the need for acquire memory barriers by using
signals to force them on a as needed basis. That explains that part of
the discussion on RCU+SMR when it occurred on LKML.

I seems to me you could use the memory hierarchy to accomplish the same
effect. I'm assuming that out of order dependent memory accesses won't
cross memory hierarchy boundaries.

There's two levels of the memory hierarchy we could exploit and possibly
without any special kernel support. One is cache and the other is virtual
memory. Basically, when memory is freed with proper release semantics, either
explicit or supplied by RCU, you use another form of RCU to determine when
all processors have flushed or purged references to that memory from their
cache or from their translation lookaside buffer. The purge/flush
instructions are the quiescent states for the second form of RCU. Once all
processors has quiesced (dropped references) it is safe for writer threads
to use that memory for updates using RCU.

The problem is that as soon as the writer thread starts accessing that
memory, you can start polluting cache and/or the translation lookaside buffer
with no assurance that they will be flushed if a reader thread is dispatched on
that processor before the writer thread completes its updates to memory. The
way around that problem is to use two different memory mappings for the
same memory, one for use by writer threads and one for use by reader
threads.
This trick depends on cache working on virtual memory not real memory if
you're using cache as the memory hierarchy level.

Madness. Reading all that patent obfuscated English has addled your brain.

Cache is supposed to be a transparent abstraction. So is the TLB (software
TLBs notwithstanding). Breaking that will break anyone's ability to understand
what the system is doing, just in order to try (without necessarily succeeding)
to optimize something that isn't a performance bottleneck.

--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Back to top
Joe Seigh
Guest





Posted: Mon Sep 12, 2005 12:15 am    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

David Hopwood wrote:
Quote:

Madness. Reading all that patent obfuscated English has addled your brain.

Some of the technical publications aren't any better. :) Most of the patents
are on techniques published by the inventors.

Quote:

Cache is supposed to be a transparent abstraction. So is the TLB (software
TLBs notwithstanding). Breaking that will break anyone's ability to
understand
what the system is doing, just in order to try (without necessarily
succeeding)
to optimize something that isn't a performance bottleneck.


You don't need it on most systems since they have dependent load ordering.
They're prefectly capable of running slow without any extra assistance.
But this would allow a more relaxed cache protocol / memory model to be
used that would allow improved hardware performance. Anyway, after
the debacle with getting the x86 memory model semantics confused with its
implementation and attempts at hacks to improve global load ordering,
I would think everyone would be more attuned to the issue.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Nick Maclaren
Guest





Posted: Mon Sep 12, 2005 1:46 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

In article <wQ2Ve.21920$k22.1912@fe2.news.blueyonder.co.uk>,
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> writes:
|>
|> Cache is supposed to be a transparent abstraction. So is the TLB (software
|> TLBs notwithstanding). Breaking that will break anyone's ability to understand
|> what the system is doing, just in order to try (without necessarily succeeding)
|> to optimize something that isn't a performance bottleneck.

Don't be so certain of the last. I see both cache and TLB handling
being a performance bottleneck on a daily basis, and one of the
solutions to this would involve making them more visible.


Regards,
Nick Maclaren.
Back to top
David Hopwood
Guest





Posted: Mon Sep 12, 2005 4:15 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Nick Maclaren wrote:
Quote:
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> writes:
|
|> Cache is supposed to be a transparent abstraction. So is the TLB (software
|> TLBs notwithstanding). Breaking that will break anyone's ability to understand
|> what the system is doing, just in order to try (without necessarily succeeding)
|> to optimize something that isn't a performance bottleneck.

Don't be so certain of the last. I see both cache and TLB handling
being a performance bottleneck on a daily basis, and one of the
solutions to this would involve making them more visible.

The "something that isn't a performance bottleneck" was referring to
acquire memory barriers. Sorry if that wasn't clear.

--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Back to top
Alexander Terekhov
Guest





Posted: Mon Sep 12, 2005 4:15 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Joe Seigh wrote:
[...]
Quote:
Which is why I used the word "most". Dependent load ordering isn't part
of any of the official memory models as far as I know so the behavior
was allowable.

Your knowledge is inaccurate.

Quote:
It's also allowable on x86

In Alpha speak, x86 loads from WB memory (I mean processor consistency)
all have trailing "MB".

regards,
alexander.
Back to top
Alexander Terekhov
Guest





Posted: Mon Sep 12, 2005 4:15 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Joe Seigh wrote:
[...]
Quote:
You don't need it on most systems since they have dependent load ordering.

Compilers can turn dd into cc and reorder, "incorrect" software value
prediction aside for a moment. AFAICS, 20020194436 is about hacking

http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html

It won't help against "incorrect" hardware value prediction, I'm afraid.

http://www.cs.wisc.edu/~cain/pubs/micro01_correct_vp.pdf

regards,
alexander.
Back to top
Joe Seigh
Guest





Posted: Mon Sep 12, 2005 4:15 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Alexander Terekhov wrote:
Quote:
Joe Seigh wrote:
[...]

You don't need it on most systems since they have dependent load ordering.


Compilers can turn dd into cc and reorder, "incorrect" software value
prediction aside for a moment. AFAICS, 20020194436 is about hacking

The OP is addressing hardware not compiler issues.
Quote:

http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html

Which is why I used the word "most". Dependent load ordering isn't part
of any of the official memory models as far as I know so the behavior
was allowable. It's also allowable on x86 so it's conceivable that
future Intel or AMD processors could require memory barriers to guarantee
dependent load ordering. Since that hasn't happened yet, it's impossible
to tell how expensive the memory barrier would be. Since multi-threaded
synchronization mechanisms have certain assumptions of efficiency, it's
always good to have alternative implementations as part of your back up
plans. It's a robustness of design issue.

Quote:
It won't help against "incorrect" hardware value prediction, I'm afraid.

Not at the cache level but you could do it that the virtual memory level
by mapping the new data into the reader memory address range as part of
the release operation. It's probably more expensive than you want it to
be but it has to work because that's what the virtual memory manager does
when it pages in from backing store.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Joe Seigh
Guest





Posted: Mon Sep 12, 2005 4:15 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Joe Seigh wrote:

[...]
Quote:

I seems to me you could use the memory hierarchy to accomplish the same
effect. I'm assuming that out of order dependent memory accesses won't
cross
memory hierarchy boundaries.

There's two levels of the memory hierarchy we could exploit and possibly
without
any special kernel support. One is cache and the other is virtual memory.
Basically, when memory is freed with proper release semantics, either
explicit
or supplied by RCU, you use another form of RCU to determine when all
processors have flushed or purged references to that memory from their
cache or from their translation lookaside buffer. The purge/flush
instructions
are the quiescent states for the second form of RCU. Once all processors
has quiesced (dropped references) it is safe for writer threads to use that
memory for updates using RCU.

[...]


Some forms of speculative execution might break this though by fetching
unrelated data into cache. The most likely would be array traversal
since it can generate addresses not actually valid for the program in
question.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Sander Vesik
Guest





Posted: Mon Sep 12, 2005 4:17 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

In comp.arch Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
Quote:

In article <wQ2Ve.21920$k22.1912@fe2.news.blueyonder.co.uk>,
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> writes:
|
|> Cache is supposed to be a transparent abstraction. So is the TLB (software
|> TLBs notwithstanding). Breaking that will break anyone's ability to understand
|> what the system is doing, just in order to try (without necessarily succeeding)
|> to optimize something that isn't a performance bottleneck.

Don't be so certain of the last. I see both cache and TLB handling
being a performance bottleneck on a daily basis, and one of the
solutions to this would involve making them more visible.

However, thats not what the original poster wanted to optimise - he
wanted to get some benefits for RCU by effectively breaking cache
transparency.

Quote:


Regards,
Nick Maclaren.



--
Sander

+++ Out of cheese error +++
Back to top
Joe Seigh
Guest





Posted: Mon Sep 12, 2005 9:39 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Alexander Terekhov wrote:
Quote:
Joe Seigh wrote:
[...]

Which is why I used the word "most". Dependent load ordering isn't part
of any of the official memory models as far as I know so the behavior
was allowable.


Your knowledge is inaccurate.


I meant the specification. I try not to conflate specification and implementation.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Joe Seigh
Guest





Posted: Mon Sep 12, 2005 9:55 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Sander Vesik wrote:
Quote:
In comp.arch Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:

In article <wQ2Ve.21920$k22.1912@fe2.news.blueyonder.co.uk>,
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> writes:
|
|> Cache is supposed to be a transparent abstraction. So is the TLB (software
|> TLBs notwithstanding). Breaking that will break anyone's ability to understand
|> what the system is doing, just in order to try (without necessarily succeeding)
|> to optimize something that isn't a performance bottleneck.

Don't be so certain of the last. I see both cache and TLB handling
being a performance bottleneck on a daily basis, and one of the
solutions to this would involve making them more visible.


However, thats not what the original poster wanted to optimise - he
wanted to get some benefits for RCU by effectively breaking cache
transparency.


Nick is more correct. I was proposing it as a mechanism for preserving
some of RCU's (or that of lock-free in general) performance if
hardware was changed to improve its performance by relaxing some of the
strict conditions it has to meet now. E.g. relaxing the cache protocols
could improve performance by reducing cache invalidates if the de facto
memory model did not require it.

RCU was developed by Paul McKenney to allow in part, I think, better
performance in a NUMA environment. If you exposed and possibly changed
cache behavior in a so called UMA enviroment you could get some of the
same benefits.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Alexander Terekhov
Guest





Posted: Mon Sep 12, 2005 10:06 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Joe Seigh wrote:
[...]
Quote:
I meant the specification.

x86 loads from WB memory are loads ".acq" per specification of processor
consistency given in

http://research.compaq.com/wrl/people/kourosh/papers/1993-tr-68.pdf

I mean

http://groups.google.com/group/comp.programming.threads/msg/88755b3cd32f9f8c

regards,
alexander.
Back to top
Alexander Terekhov
Guest





Posted: Mon Sep 12, 2005 10:24 pm    Post subject: Re: Using hierarchical memory as an acquire memory barrier f Reply with quote

Joe Seigh wrote:
[...]
Quote:
I try not to conflate specification and implementation.

Conflation or non-conflation aside for a moment, I'm just curious how
your lock-free sem_getvalue() for x86 would look like. Care to share?

regards,
alexander.
Back to top
Joe Seigh
Guest





Posted: Mon Sep 12, 2005 11:00 pm    Post subject: sem_getvalue (was Re: Using hierarchical memory ...) Reply with quote

(followup set to comp.programming.threads)

Alexander Terekhov wrote:
Quote:
Joe Seigh wrote:
[...]

I try not to conflate specification and implementation.


Conflation or non-conflation aside for a moment, I'm just curious how
your lock-free sem_getvalue() for x86 would look like. Care to share?


Depends. The documentation for that is a little strange and it's not
clear if sem_getvalue has any practical use.

To be on the safe side you might have to use the same synchronization
that the rest of the semaphore implementation uses to set the semaphore
value. Though I might be inclined, if it looks like all you could use
sem_getvalue for is as a simple event synchronization object, to just
to do a simple load followed by a mfence, acquire semantics. It's basically
a question of what the observable behavior is no matter what the
documentation says.

I see you've been looking for places to use your compare and
swap "synchronized" load trick. :)



--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB