CAS and LL/SC (was Re: High Level Assembler for MVS & VM & V
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
CAS and LL/SC (was Re: High Level Assembler for MVS & VM & V
Goto page Previous  1, 2, 3, 4 ... 24, 25, 26  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Nick Maclaren
Guest





Posted: Thu Dec 16, 2004 7:29 pm    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

In article <41C193B6.8F090A20@xemaps.com>,
Joe Seigh <jseigh_01@xemaps.com> wrote:
Quote:

Do you mean atomic fetch-operate-and-store? While those are very
useful (e.g. they are immune to contention-induced livelock), they
need some support from the memory system.

I didn't coin the term. I believe it's just a summary of the names that were
used, e.g. fetch-and-add, fetch-and-increment, fetch-and-set, fetch-and-or, etc...

That's not what I asked. I was asking if you were referring to the
atomic operations, or merely composite memory and action ones. There
is a lot of difference.

Quote:
The Itanium fetchadd looks like it could operate either way, either by locking
cache lines or via the memory subsystem.

Locking cache lines needs support from the memory subsystem.

Quote:
I generally avoid using techniques that have those problems. Lock-free techniques
don't require forward progress by other threads so whether they're on the same
cpu or some other cpu doesn't make any difference.

A huge number of problems are caused by ambiguous specifications.
You may avoid some of the issues, but you can't avoid all of them.
And other people may develop on a system which supports more
invariants, and so have trouble when porting.

I am afraid that you are wrong about lock-free techniques, because
their higher levels often DO require such progress. Consider the
simple case of server and client threads using a shared chain or
counter. Without such progress, the algorithm will travel from one
limit to the other, and not run smoothly; that is often unacceptable.

Quote:
Just what I think the trend is. Obviously if software implemented shared memory
has been a wild sucess instead of a disaster, it couldn't be a trend. We'd be
there already. Bottom line, probability of sucessfully implementing software
shared memory 0.00 ... (as many zeros as you want) ... 01. Probability of
hardware designers listening to suggestions from software types, 0.0.

Well, you are welcome to your opinion, but I will predict that it will
not happen, for the reasons I said.


Regards,
Nick Maclaren.
Back to top
Nick Maclaren
Guest





Posted: Thu Dec 16, 2004 7:38 pm    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

In article <Pine.LNX.4.58L.0412152127110.23256@blysk.ds.pg.gda.pl>,
Maciej W. Rozycki <macro@linux-mips.org> wrote:
Quote:

For many OSes (including Linux), you can't guarantee any particular
timing of execution anyway as it may depend on external events like
processing of hardware interrupts which "steal" time from user processes.
The latency effect of emulating LL and SC is not much different from
taking a hardware interrupt between LL and SC (for the SC failing case) or
either immediately before LL or immediately after SC (for the SC
succeeding case). And you can have interrupts that happen at
deterministic intervals.

That's far too much in black and white. The objective isn't
to guarantee a timing, but to produce an acceptable distribution of
timings. Changing relative timings by a factor of 100 can change the
distribution even more drastically.

Consider one-sided 'locking' (e.g. variants on RCU, software CAS etc.,
where not all threads take out the 'lock'). These very often work
well, provided that the unlocked updates can't block the locked ones
out indefinitely. Implementing the latter with system calls can void
that assumption.


Regards,
Nick Maclaren.
Back to top
John Mashey
Guest





Posted: Thu Dec 16, 2004 8:53 pm    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

Joe Seigh wrote:
Probability of
Quote:
hardware designers listening to suggestions from software types, 0.0.

That is total nonsense. There exist large numbers of hardware
designers who listen very hard to suggestions from software people. In
fact, in my experience [a software guy who's spent a lot of years
working with hardware people], there are many hardware designers who
are desperate for timely, knowledgable input from software people,
especially ones who know enough about hardware design to understand the
difference between "inexpensive&fast" and "really expensive", or at
least be willing to learn.

Some aren't, and there have been notable malarchitectures as a result,
but most are quite willing to listen to well-informed input.
Back to top
Joe Seigh
Guest





Posted: Thu Dec 16, 2004 11:07 pm    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

Nick Maclaren wrote:
Quote:

In article <41C193B6.8F090A20@xemaps.com>,
Joe Seigh <jseigh_01@xemaps.com> wrote:

Do you mean atomic fetch-operate-and-store? While those are very
useful (e.g. they are immune to contention-induced livelock), they
need some support from the memory system.

I didn't coin the term. I believe it's just a summary of the names that were
used, e.g. fetch-and-add, fetch-and-increment, fetch-and-set, fetch-and-or, etc...

That's not what I asked. I was asking if you were referring to the
atomic operations, or merely composite memory and action ones. There
is a lot of difference.

Well, there's ones like Pentium "lock xadd" in which the processor locks
the cache line, fetchs the contents of the memory word, adds to it, stores
the result back to memory, and unlocks the cache line.

Then there are ones where not only can you do fetchs, stores, cache line
locking and sychronization, etc... but you can send a request to memory
to perform a certain action on memory and return the contents of memory
(usually the value prior to the operation). The Itanium seems to have
the ability to do it either way depending on what kind of memory it has
attached.

From a programming point of view they're both atomic.

Quote:

....

I generally avoid using techniques that have those problems. Lock-free techniques
don't require forward progress by other threads so whether they're on the same
cpu or some other cpu doesn't make any difference.

A huge number of problems are caused by ambiguous specifications.
You may avoid some of the issues, but you can't avoid all of them.
And other people may develop on a system which supports more
invariants, and so have trouble when porting.

I am afraid that you are wrong about lock-free techniques, because
their higher levels often DO require such progress. Consider the
simple case of server and client threads using a shared chain or
counter. Without such progress, the algorithm will travel from one
limit to the other, and not run smoothly; that is often unacceptable.

I'm well aware of that problem. From a hardware point of view, there's nothing
I can do about it since I have no influence on hardware design. From an OS kernel
point of view, nothing there either unless I want to modify the kernel but at least
that's possbile with Linux. Usually, working around the problem is the easiest
course of action.

Plus, I usually sanity check things. If I develope a lock-free technique, I'll performance
test it against one or more conventional solutions and against other lock-free solutions.
The results aren't always what you'd expect. You can learn a lot from what the os is doing
wrong sometimes.

Joe Seigh
Back to top
Ian Shef
Guest





Posted: Fri Dec 17, 2004 12:58 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

Terje Mathisen <terje.mathisen@hda.hydro.com> wrote in
news:cpq9h4$dfa$1@osl016lin.hda.hydro.com:

Quote:
Ian Shef wrote:
No, it is still possible to provide locking mechanisms entirely in
software (assuming that the instruction set and other hardware provides
certain minimal capabilities - and MIPS R2000 can meet these
conditions). There are papers (I used to have one by Leslie Lamport,
if I remember the name correctly) that describe how to do this. This
is how you were supposed to perform locking on the MIPS R2000. (At
least there was some piece of MIPS R2000 documentation that provided
pointers to papers on software locking algorithms - that is how I found
the papers by Lamport).

AFAIR, Lamport's algorithm depends on having unlimited range counters!

This does not work for unlimited uptimes, but might be good enough,
particularly on 64-bit machines.

As far as I can remember, the algorithm that I saw had no counting [1]. It
had no limit as far as uptime. However, I no longer have the reference, so
I may be wrong.

[1] There was a version with counting, but it was used so that a resource
could be shared simulataneously by M processes out of N (and not just the 1
out of N case). However, the count was not larger than about M, so there
was no practical limit - and it still wasn't an uptinme issue.

Darn, I really need to find that paper!

.... and in response to a nother part of this thread: I am just a dumb
hardware designer, but I the percentage of time that I listen to software
folks is significantly greater than 0%.


--
Ian Shef 805/F6 * These are my personal opinions
Raytheon Company * and not those of my employer.
PO Box 11337 *
Tucson, AZ 85734-1337 *
Back to top
Ian Shef
Guest





Posted: Fri Dec 17, 2004 2:41 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

Ian Shef <invalid@avoiding.spam> wrote in
news:Xns95C18411BA517vaj4088ianshef@138.126.254.210:

Quote:
Terje Mathisen <terje.mathisen@hda.hydro.com> wrote in
news:cpq9h4$dfa$1@osl016lin.hda.hydro.com:

Ian Shef wrote:
No, it is still possible to provide locking mechanisms entirely in
software (assuming that the instruction set and other hardware
provides certain minimal capabilities - and MIPS R2000 can meet these
conditions). There are papers (I used to have one by Leslie Lamport,
if I remember the name correctly) that describe how to do this. This
is how you were supposed to perform locking on the MIPS R2000. (At
least there was some piece of MIPS R2000 documentation that provided
pointers to papers on software locking algorithms - that is how I
found the papers by Lamport).

AFAIR, Lamport's algorithm depends on having unlimited range counters!

This does not work for unlimited uptimes, but might be good enough,
particularly on 64-bit machines.

As far as I can remember, the algorithm that I saw had no counting [1].
It had no limit as far as uptime. However, I no longer have the
reference, so I may be wrong.

[1] There was a version with counting, but it was used so that a
resource could be shared simulataneously by M processes out of N (and
not just the 1 out of N case). However, the count was not larger than
about M, so there was no practical limit - and it still wasn't an
uptinme issue.

Darn, I really need to find that paper!

... and in response to a nother part of this thread: I am just a dumb
hardware designer, but I the percentage of time that I listen to
software folks is significantly greater than 0%.



I found the basic paper:

A Fast Mutual Exclusion Algorithm
Leslie Lamport
November 14, 1985
revised October 31, 1986

at: http://research.microsoft.com/users/lamport/pubs/fast-mutex.pdf

and it does not require range counters (at least, not that I can find). It
does require that each process have a unique identifier (e.g. an integer)
that can be read and written atomically. There is also an implicit
assumption that all operations occur in the specified order.
This is the 1 out of N case; I have not found the more general M out of N
algorithm yet.

--
Ian Shef 805/F6 * These are my personal opinions
Raytheon Company * and not those of my employer.
PO Box 11337 *
Tucson, AZ 85734-1337 *
Back to top
John Mashey
Guest





Posted: Fri Dec 17, 2004 4:12 am    Post subject: Re: ulocks.h (was: Re: CAS and LL/SC) Reply with quote

re: ulocks.h

At MIPS, we should have defined a set no later than 2Q86, recommended
across MIPS customers.

A joint team of MIPS & SGI folks did a SYS V port, then each group took
the OS in its own different directions, for various reasons.

The SGI ulocks.h set was done sometime later, I don't recall when.

=====
To reiterate the point: whenver possible, it is a really, really good
idea to define a standard API for something, so that software gets
written using it, that hides the details of lower level hardware, so
that you can improve things over time without disrupting existing
software, and so new features are actually used in practice quicker.

One of my favorite examples of an exceptionally well-thought-out API
stack would be OpenGL, which manged to sruvive huge improvements in
hardware, and made some sense across a huge range from minimal graphics
hardware to extremely powerful hardware.
Another good example would be the various implementations of AS/400.
Back to top
John Mashey
Guest





Posted: Fri Dec 17, 2004 4:13 am    Post subject: Re: ulocks.h (was: Re: CAS and LL/SC) Reply with quote

re: ulocks.h

At MIPS, we should have defined a set no later than 2Q86, recommended
across MIPS customers.

A joint team of MIPS & SGI folks did a SYS V port, then each group took
the OS in its own different directions, for various reasons.

The SGI ulocks.h set was done sometime later, I don't recall when.

=====
To reiterate the point: whenver possible, it is a really, really good
idea to define a standard API for something, so that software gets
written using it, that hides the details of lower level hardware, so
that you can improve things over time without disrupting existing
software, and so new features are actually used in practice quicker.

One of my favorite examples of an exceptionally well-thought-out API
stack would be OpenGL, which manged to sruvive huge improvements in
hardware, and made some sense across a huge range from minimal graphics
hardware to extremely powerful hardware.
Another good example would be the various implementations of AS/400.
Back to top
Joe Seigh
Guest





Posted: Fri Dec 17, 2004 4:19 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

Ian Shef wrote:

Quote:
I found the basic paper:

A Fast Mutual Exclusion Algorithm
Leslie Lamport
November 14, 1985
revised October 31, 1986

at: http://research.microsoft.com/users/lamport/pubs/fast-mutex.pdf

and it does not require range counters (at least, not that I can find). It
does require that each process have a unique identifier (e.g. an integer)
that can be read and written atomically. There is also an implicit
assumption that all operations occur in the specified order.

"Algorithms for Mutual Exclusion" by Raynal has most of the distributed algorithms
including Lamport's and Peterson's.

Quote:
This is the 1 out of N case; I have not found the more general M out of N
algorithm yet.

M out of N? You mean like a read/write lock? I had a technique for making a

bakery algorithm into a reader/writer solution. I applied it to Hehner and
Shyamasundar's algorithm (also in Raynal's book), did a quick write up and
had it published in the April 1990 Operating Systems Review. It's obvious I
couldn't get anyone to review it and suggest editorial changes. I got the
impression that the editors of OSR weren't too happy about that. :)

Joe Seigh
Back to top
Dan Koren
Guest





Posted: Sun Dec 19, 2004 7:48 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

"(John Mashey)" <old_systems_guy@yahoo.com> wrote in message
news:1103068097.926843.101230@c13g2000cwb.googlegroups.com...
Quote:
LL/SC: Livermore S-1.
Maybe there was something earlier thart had it, but
this is where I thought it came from.


LL/SC instructions were introduced with the CDC-6600.


Quote:
http://www.cs.clemson.edu/~mark/s1_alumni.html lists
some of the alunmi, of which a bunch worked at MIPS
at some point. I especially recall Earl Killian being
involved with this.

MIPS R2000 (MIPS-I) didn't have any synchronization ops
on purpose, because every mechanism we knew caused at
least some customer to tell us why it wasn't good enough :-)


The story I remember hearing from one of the original MIPS
architects (who shall remain unnamed ;-)) was that atomic
synchronization instructions were used too infrequently
to justify putting them in hardware ;-)

The same person also argued that espresso was a good
estimator of database workload behavior ;-))


Quote:
LL/SC was added in MIPS-II to get minimal operations able
to synthesize a lot of people's favorite ones, and by then
we felt we knew much better what people needed.


The truth of the matter is that we were trying to design
servers as if they were CAD workstations. No wonder we
went out of business (not the only reason of course).



Dan Koren
(another MIPS alum)
Back to top
Dan Koren
Guest





Posted: Sun Dec 19, 2004 7:55 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

"Joe Seigh" <jseigh_01@xemaps.com> wrote in message
news:41C193B6.8F090A20@xemaps.com...
Quote:

Just what I think the trend is. Obviously if software
implemented shared memory has been a wild sucess instead
of a disaster, it couldn't be a trend. We'd be there
already. Bottom line, probability of sucessfully
implementing software shared memory 0.00 ... (as
many zeros as you want) ... 01.


There have been a number (not particularly high) of
attempts to build distributed systems with software
providing cache coherence. Their rate of success
approximately equals the balance remaining in
the respective companies' bank accounts... ;-)


Quote:
Probability of hardware designers listening to
suggestions from software types, 0.0.


I do disagree.

Hardware designers do listen to software types.
They just don't understand what they hear, or
think they know better.



dk
Back to top
Dan Koren
Guest





Posted: Sun Dec 19, 2004 7:55 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

"Terje Mathisen" <terje.mathisen@hda.hydro.com> wrote in message
news:cprmsl$4sr$1@osl016lin.hda.hydro.com...
Quote:
Joe Seigh wrote:
Terje Mathisen wrote:
AFAIR, Lamport's algorithm depends on having unlimited range counters!

This does not work for unlimited uptimes, but might be good enough,
particularly on 64-bit machines.


Peterson's algorithm fixes that. It's also a distributed algorithm like
Lamport's.

What about the required memory barriers?

It seems like all these algorithms (Lamport, Dekker, Peterson,
Bakery...) starts with the assumption that all memory updates are
globally consistent, i.e. that all memory operations will be globally
visible in program order, right?



Yes, but I believe Lamport has recently published an
improved algorithm that can deal with some cases of
relaxed memory ordering.



dk
Back to top
Dan Koren
Guest





Posted: Sun Dec 19, 2004 7:55 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

"John Mashey" <old_systems_guy@yahoo.com> wrote in message
news:1103212386.248755.173980@z14g2000cwz.googlegroups.com...
Quote:

Joe Seigh wrote:
Probability of
hardware designers listening to suggestions from software types, 0.0.

That is total nonsense. There exist large numbers of hardware
designers who listen very hard to suggestions from software people. In
fact, in my experience [a software guy who's spent a lot of years
working with hardware people], there are many hardware designers who
are desperate for timely, knowledgable input from software people,
especially ones who know enough about hardware design to understand the
difference between "inexpensive&fast" and "really expensive", or at
least be willing to learn.

Some aren't, and there have been notable malarchitectures as a result,
but most are quite willing to listen to well-informed input.


Shouldn't we get some competing views, from, say Oracle? ;-)



dk
Back to top
Dan Koren
Guest





Posted: Sun Dec 19, 2004 7:55 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

"Terje Mathisen" <terje.mathisen@hda.hydro.com> wrote in message
news:cpq9h4$dfa$1@osl016lin.hda.hydro.com...
Quote:

AFAIR, Lamport's algorithm depends on
having unlimited range counters!


The algorithm does not use counters.


Quote:
This does not work for unlimited uptimes,
but might be good enough, particularly on
64-bit machines.


See above.


All that Lamport's algorithm requires is:

a) sequential memory: i.e. the relative
order of reads and writes appears the
same as "viewed" by each processor.

b) N+2 memory locations per lock, where
N is the number of competing threads
or processors. The MIPS mutex library
used a couple of ingenious tricks to
cut this down to only 1 (extra) word
per lock, and N+1 words per system
(or mutex group).


dk
Back to top
Dan Koren
Guest





Posted: Sun Dec 19, 2004 7:55 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

"Joe Seigh" <jseigh_01@xemaps.com> wrote in message
news:41C0A21E.E7977761@xemaps.com...
Quote:

If you are doing this kind of stuff, you have to
be prepared to improvise since you never know what
the hardware designers will provide.


One horror story with the MIPS mutex library was
with a port of a database product that used the
library and that was failing on a high end SMP.

The customer was a very large company with lots
of lawyers. The port had been commissioned and
paid for by the hardware vendor, who did not
tell the database vendor the port would be used
on SMP hardware, and did not provide SMP systems
for development or testing. Everyone was pointing
fingers at everybody else, and threatening to sue.

Well it turned out the box had sligthly relaxed
memory ordering ;-) with 4-deep write buffers ;-))

We replaced every write in the algorithm with 5
writes in sequence, recompiled with optimization
turned off, and voila! everything worked!


Quote:
I used to think it was pretty obivous that double
wide compare and swap was as important as single
wide compare and swap. But not to AMD apparently.
So even though lock-free LIFO stacks have been
around longer than I've been programming, which
is a long time, you can't do it on AMD 64 bit
processors.


Nope.

DCAS (and in fact MCAS) instructions can be synthesized
from single word CAS instructions. Check Mark Moir's
papers on the subject (or send me e-mail if you have
difficulty finding them).



dk
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2, 3, 4 ... 24, 25, 26  Next
Page 3 of 26

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB