| Author |
Message |
Dan Koren
Guest
|
Posted:
Sun Dec 19, 2004 7:55 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
"Ian Shef" <invalid@avoiding.spam> wrote in message
news:Xns95C0841AA347Bvaj4088ianshef@138.126.254.210...
| Quote: | nmm1@cus.cam.ac.uk (Nick Maclaren) wrote in news:cpp433$s4r$1
No, it is still possible to provide locking mechanisms
entirely in software (assuming that the instruction set
and other hardware provides certain minimal capabilities
- and MIPS R2000 can meet these conditions). There are
papers (I used to have one by Leslie Lamport, if I remember
the name correctly) that describe how to do this. This is
how you were supposed to perform locking on the MIPS R2000.
(At least there was some piece of MIPS R2000 documentation
that provided pointers to papers on software locking
algorithms - that is how I found the papers by Lamport).
|
The database engineering group at Mips actually developed a
mutex library that used Lamport's algorithm and handed it
over to the database companies to use in their ports for
MIPS based platforms. It improved performance dramatically
over the earlier ports that used OS semaphores.
| Quote: | The hardware mechanisms save clock cycles, but the software
algorithm that I saw was pretty efficient for the case that
the lock is available without contention.
|
The straight path through Lamport's algorithm requires only
7 memory accesses. On a 25 MHz Mips 3260 we measured 250k
lock/unlock operation pairs per second, compared to only
5kops/sec using OS semaphores.
dk |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Sun Dec 19, 2004 7:55 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
"(John Mashey)" <old_systems_guy@yahoo.com> wrote in message
news:1103133987.669749.262680@f14g2000cwb.googlegroups.com...
| Quote: |
Weird: I see Nick's post that refers to my post, but that post doesn't
show up for me. This happened to another post in the 128-bit thread.
Has anyone else had this problem?
|
Unfortunately, there is no law (yet)
requiring news servers to stay in sync!
dk |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Sun Dec 19, 2004 7:55 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
"Seongbae Park" <Seongbae.Park@Sun.COM> wrote in message
news:cppvrn$ke2$1@news1nwk.SFbay.Sun.COM...
| Quote: |
This doesn't make sense to me
- unless you stop all other threads between LL/SC and/or mask interrupt,
the semantics can not be preserved.
How do you guarantee there will be no intervening store
to the same location from other processor ?
|
You don't.
Neither does the LL/SC sequence.
All that it guarantees is that if
memory is modified between LL and
SC, the latter will fail.
dk |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Sun Dec 19, 2004 7:55 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:cpp433$s4r$1@gemini.csx.cam.ac.uk...
| Quote: | In article <1103068097.926843.101230@c13g2000cwb.googlegroups.com>,
(John Mashey) <old_systems_guy@yahoo.com> wrote:
MIPS R2000 (MIPS-I) didn't have any synchronization ops on purpose,
because every mechanism we knew cauased at least some customer to tell
us why it wasn't good enough :-) LL/SC was added in MIPS-II to get
minimal operations able to synthesize a lot of people's favorite ones,
and by then we felt we knew much better what people needed.
Interesting. The lack of such instructions effectively means that
shared-memory, parallel-thread applications are unsupported, but that
was not a serious issue then.
|
Pardon me, but it was a *VERY* serious issue!
We had to port Oracle, Informix, Sybase, Ingres
and other multi-threaded database servers, that
relied on test-and-set or similar instructions
for inter-thread and/or process synchronization.
We lost a significant amount of business to Sun
for a while because our database ports had poor
performance, caused by the use of OS semaphores
for synchronization and the lack of asynchronous
i/o in the earlier versions of RISC/os.
It wasn't fun, at least in the beginning. Things
got quite a bit more interesting after we wrote
a mutex library that used Lamport's algorithm
for MIPS-I (R2000/3000) and LL/SC for MIPS-II
(R4000/6000) in a fashion that was completely
transparent to the higher level software.
dk |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Sun Dec 19, 2004 4:54 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
In article <41c4f7d3$1@news.meer.net>, Dan Koren <dankoren@yahoo.com> wrote:
| Quote: |
There have been a number (not particularly high) of
attempts to build distributed systems with software
providing cache coherence. Their rate of success
approximately equals the balance remaining in
the respective companies' bank accounts... ;-)
|
The number is higher than you might think - I have personally had
the tedium of being told why, THIS time, it will be different over
half a dozen times. Several of those companies are still trading,
and a few are very profitable; none of those projects are still
active, though some seem to rise from the dead at intervals until
someone shines the bright light of reality on them, when they vanish
into a puff of smoke.
| Quote: | Probability of hardware designers listening to
suggestions from software types, 0.0.
I do disagree.
Hardware designers do listen to software types.
They just don't understand what they hear, or
think they know better.
|
That is, unfortunately, true.
The biggest problems here are the dominance of both by marketing,
and the fact that there are so few people remaining who know that
(a) things could be done so much better and (b) how much money it
could save in the long term. Note that I say "know", because this
isn't based on theory, but upon experience from the time when there
was more variation in system designs.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Sun Dec 19, 2004 7:14 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Dan Koren wrote:
| Quote: |
"Joe Seigh" <jseigh_01@xemaps.com> wrote in message
news:41C0A21E.E7977761@xemaps.com...
I used to think it was pretty obivous that double
wide compare and swap was as important as single
wide compare and swap. But not to AMD apparently.
So even though lock-free LIFO stacks have been
around longer than I've been programming, which
is a long time, you can't do it on AMD 64 bit
processors.
Nope.
DCAS (and in fact MCAS) instructions can be synthesized
from single word CAS instructions. Check Mark Moir's
papers on the subject (or send me e-mail if you have
difficulty finding them).
|
DCAS is two discontiguous words compare and swap (from MC68020
and MC68030) though you could use two adjacent words and it would
be equivalent to a double wide CAS like cmpxchg8b on ia32 and
cmpxchg16b on ia32-ext64. DCAS is what Moir's group uses to
implement atomic thread-safe reference counting. DWCAS (double
wide CAS) is what I've used to implement atomic thread-safe counting.
I can also implement it using LL/SC with a somewhat different
algorithm.
DWCAS is also used to implement lock-free LIFO queues without the ABA
problem.
It was even added to Java in JSR-166 in the form of one of the words
being version count or mark bit to support some lock-free algorithms
that depend on compare and swap working on more than just one word.
I believe they are calling MCAS KCSS (k-compare, single swap). It only
claims to be obstruction-free according to Herlihy's definition meaning
it probably wouldn't scale as well as CAS or DWCAS. In scalability, KCSS
is probably in the same neighborhood as adaptive mutexes which are really
efficient when there's no contention. KCSS would come in useful if you
had low contention and needed async safety and wanted to avoid deadlock.
Hmm... Maybe after DWCAS becomes obsolete, I could write a paper on
atomic lock-free reference counting using it. It would be interesting
to have a collection of papers of synchronization techniques based on
obsolete hardware synchronization primatives.
Joe Seigh |
|
| Back to top |
|
 |
del cecchi
Guest
|
Posted:
Mon Dec 20, 2004 7:55 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:cq3q4t$521$1@gemini.csx.cam.ac.uk...
| Quote: | In article <41c4f7d3$1@news.meer.net>, Dan Koren <dankoren@yahoo.com
wrote:
There have been a number (not particularly high) of
attempts to build distributed systems with software
providing cache coherence. Their rate of success
approximately equals the balance remaining in
the respective companies' bank accounts... ;-)
The number is higher than you might think - I have personally had
the tedium of being told why, THIS time, it will be different over
half a dozen times. Several of those companies are still trading,
and a few are very profitable; none of those projects are still
active, though some seem to rise from the dead at intervals until
someone shines the bright light of reality on them, when they vanish
into a puff of smoke.
Probability of hardware designers listening to
suggestions from software types, 0.0.
I do disagree.
Hardware designers do listen to software types.
They just don't understand what they hear, or
think they know better.
That is, unfortunately, true.
The biggest problems here are the dominance of both by marketing,
and the fact that there are so few people remaining who know that
(a) things could be done so much better and (b) how much money it
could save in the long term. Note that I say "know", because this
isn't based on theory, but upon experience from the time when there
was more variation in system designs.
Regards,
Nick Maclaren.
|
Actually hardware types do (or at least did) listen to software types up
here on the tundra. But it is dark and cold in the winter and there
isn't much better to do. But you have managed to confuse me, Nick. I
could have sworn that you were an advocate of software managed DSM as
opposed to NUMA hardware managed DSM. Was I mistaken? Or were you not
implying something about cache coherence in the software managed DSM?
del cecchi |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Mon Dec 20, 2004 7:55 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
"Joe Seigh" <jseigh_01@xemaps.com> wrote in message
news:41C08E8B.FA58D4A7@xemaps.com...
| Quote: |
Seongbae Park wrote:
Maciej W. Rozycki <macro@linux-mips.org> wrote:
...
Note that as long as you go UP this can be handled at the OS level.
For
example Linux handles user mode RI traps on LL and SC when run on
MIPS-I
processors and emulates the instructions. There's a considerable
performance loss, of course, but the semantics of these operations is
preserved.
Maciej
This doesn't make sense to me
- unless you stop all other threads between LL/SC
and/or mask interrupt, the semantics can not be preserved.
How do you guarantee there will be no intervening store
to the same location from other processor ?
Write protect the load locked target.
|
Cute -- but how do you protect against another thread or
process running on another CPU that is trying to do the
same? In order to write protect the load locked target
one needs to update the attributes of the target page
in the page table -- how does one protect that?
Worse yet, while all this is going on, suppose another
thread or process running on another CPU is trying to
write into a different address within the same page,
and triggers a spurious access violation....
So this is quite a bit more complex that it looks at
first sight -- if it is at all feasible.
dk |
|
| Back to top |
|
 |
Per Schröder
Guest
|
Posted:
Mon Dec 20, 2004 7:50 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Eric Smith wrote:
| Quote: | Who invented the Load Locked and Store Conditional instructions used
in the MIPS, Power, and Alpha architectures, and where else have they
been used?
|
Slightly OT, but I remember programming on the Norsk Data ND-500/5000
systems. These were the new 32-bit machines designed to complement their
successfull ND-100 16-bit line.
The ND-500 had the (user-mode) instructions SOLO and TUTTI. When you
executed a SOLO instruction, your program would not be preempted within the
next n (n=50??) cycles. After you were done with your critical section, you
should execute a TUTTI instruction which would enable preemption again.
I don't remember if the OS implemented some punishing mechanisms for
processes that forgot to execute TUTTI soon enough...
For this to work, you should arrange any data to be touched to be locked in
memory before executing SOLO to avoid page faults. But you didn't have to
lock your code in memory by simply ;-) arranging to have the entire code of
your critical section on the same page.
Of course, this mechanism was invented before Norsk Data introduced SMP
systems... ;-)
/Per Schröder |
|
| Back to top |
|
 |
John Mashey
Guest
|
Posted:
Mon Dec 20, 2004 11:58 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Dan Koren wrote:
| Quote: | "(John Mashey)" <old_systems_guy@yahoo.com> wrote in message
news:1103068097.926843.101230@c13g2000cwb.googlegroups.com...
LL/SC: Livermore S-1.
Maybe there was something earlier thart had it, but
this is where I thought it came from.
LL/SC instructions were introduced with the CDC-6600.
What are they called there? I hadn't remebmered there being anything |
like that, and I reviewed:
http://ed-thelen.org/comp-hist/CDC-6600-R-M.html
The PP's had add/subtract to memory, but that isn't LL/SC.
| Quote: |
http://www.cs.clemson.edu/~mark/s1_alumni.html lists
some of the alunmi, of which a bunch worked at MIPS
at some point. I especially recall Earl Killian being
involved with this.
MIPS R2000 (MIPS-I) didn't have any synchronization ops
on purpose, because every mechanism we knew caused at
least some customer to tell us why it wasn't good enough :-)
The story I remember hearing from one of the original MIPS
architects (who shall remain unnamed ;-)) was that atomic
synchronization instructions were used too infrequently
to justify putting them in hardware ;-)
|
That is incorrect, and I *know* because I was there, and heavily
involved.
The MMU, cache-control, exception-handling, coprocessor0 & related
features were mostly designed by negotiation between the VLSI folks and
the OS group (i.e., myself and a few people who worked for me).
We (the OS group) had various thoughts about synchronization
operations, and I certainly did some asking around, and the kind of
input I got from various people went like, with X..Z describing typical
CPU instructions:
a) X is simply not good enough, we like Y better.
b) Y is simply not good enough, we like Z better.
c) You need all of X, Y, and Z. [and we weren't going to get that].
d) X, Y, and Z are all useless, for a serious SMP, you need a special
interconnect mechanism that does scalable locking, and the last thing
in the world you want is something that does a bus-lock.
Recall the minimalist design style: none of us were keen about putting
half-baked features in that consumed resources, but worse, that we were
committed to forever [i.e., especially user-level stuff.]
When we asked for things, we had to negotiate/iterate with VLSI group,
but usually, the VLSI attitude was "If you can tell us exactly what you
want, we'll try to do it, or at least discuss alternative
implementations that might be easier." Many of the VLSI team were
ecstatic to actually get useful input. We didn't get everything we
wanted, but we got a lot, and we certainly cared about OS and
multi-tasking performance, and we got tradeoffs to help those whenever
we could.
I believe, that had we definitively said "We MUST have test-and-set (or
one of the other simpler mechanisms)", we would likely have gotten it,
albeit not without some grumbling (since it would have been the only
combined load/store operation).
However:
a) SMP was an explicit non-goal of the R2000. [Some of us were big SMP
fans, but there was already too much to do, so the rule was not to do
anything that would make SMP unnecessarily difficult, but don't add to
schedule length or die space to enable SMP. Expect to do that later
(the minimal possible in R3000, and then substantially in R4000). IBM
followed a similar path with RS/6000.
b) For the first rounds of machines, with uniprocessors only, we (OS
group) believed we could do without synch ops inside the kernel, since
we could do mutual exclusion other ways. Like I said before, the
mistake was in not providing a decent user-level library, with a
fast-path through kernel, and that was entirely because it got lost in
the frenzy, and that wasn't the VLSI group's fault.
c) Anyway, the result was that we (OS group) were not in a position to
tell the VLSI group exactly what we wanted, and we were reluctant to
spec something that would be there I was in discussions of the form "Do
you want test-and-set, or something different?" That was 2Q85.
forever, and still not do the job, and we weren't willing to stall the
schedule a couple months to work though all the issues properly.
Other people may remember this differently, but not all designers were
involved with all parts of the design; I was certainly in the middle of
this one. As for believing that it was left out because it was
infrequently used:
a) In general, of course we tried to leave things out that were
infrequently used.
b) But, if something was infrequently used, but structurally necessary
[like halfword load/stores for dealing with device drivers], we got it
in some form or other.
So, one more time: the synchronization omission was on purpose, simply
because we could not specify a committed-forever feature early enough,
and we thought we could live without it for a while, until we knew
something we thought would really work well. |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Dec 21, 2004 12:34 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
In article <1103569098.536503.28680@z14g2000cwz.googlegroups.com>,
John Mashey <old_systems_guy@yahoo.com> wrote:
| Quote: |
I believe, that had we definitively said "We MUST have test-and-set (or
one of the other simpler mechanisms)", we would likely have gotten it,
albeit not without some grumbling (since it would have been the only
combined load/store operation).
|
It is interesting that you didn't, for reasons below.
| Quote: | However:
a) SMP was an explicit non-goal of the R2000. ...
|
See below.
| Quote: | b) For the first rounds of machines, with uniprocessors only, we (OS
group) believed we could do without synch ops inside the kernel, since
we could do mutual exclusion other ways. Like I said before, the
mistake was in not providing a decent user-level library, with a
fast-path through kernel, and that was entirely because it got lost in
the frenzy, and that wasn't the VLSI group's fault.
|
Actually, I think that you are wrong there. Designing such a library
is no easier than designing hardware. Doing one in haste could well
have led to problems down the line. Unless you know of any existing
good designs (i.e. both theoretically and practically adequate,
robust and efficiently implementable). Do you?
| Quote: | c) Anyway, the result was that we (OS group) ...
a) In general, of course we tried to leave things out that were
infrequently used.
b) But, if something was infrequently used, but structurally necessary
[like halfword load/stores for dealing with device drivers], we got it
in some form or other.
So, one more time: the synchronization omission was on purpose, simply
because we could not specify a committed-forever feature early enough,
and we thought we could live without it for a while, until we knew
something we thought would really work well.
|
That strikes me as very sensible, with one niggle.
The thing that you CAN'T do that way is to allow run-time systems
people to write decent signal handling, and to allow applications
to make themselves resistant against interruption. Without the
ability to do at least a compare-and-swap on a pointer, you are
limited to praying that you don't interrupt at a critical moment.
No, blocking signals is NOT an adequate solution, nor is supporting
only the model of setting a flag and returning. Those are designed
for kernel-level code and non-urgent signally, respectively, and are
no substitute for being able to cancel the current operation and
back off to the appropriate point.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Tue Dec 21, 2004 7:57 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Hi Mash,
"John Mashey" <old_systems_guy@yahoo.com> wrote in message
news:1103569098.536503.28680@z14g2000cwz.googlegroups.com...
| Quote: |
Dan Koren wrote:
"(John Mashey)" <old_systems_guy@yahoo.com> wrote in message
news:1103068097.926843.101230@c13g2000cwb.googlegroups.com...
LL/SC: Livermore S-1.
Maybe there was something earlier thart had it, but
this is where I thought it came from.
LL/SC instructions were introduced with the CDC-6600.
What are they called there? I hadn't remebmered there being anything
like that, and I reviewed:
http://ed-thelen.org/comp-hist/CDC-6600-R-M.html
The PP's had add/subtract to memory, but that isn't LL/SC.
|
The document you point to is the general reference
that covers the entire 6000 series. There was a
separate supplement for the 6500/6600 describing
multiprocessor features and extended core storage.
LL/SC were not separate instructions. If memory
serves they were just ordinary reads and writes
to extended core storage that were performed
after setting a flag in the ECS controller.
The store would fail if an intervening write
had occured from the other processor. IIRC
this was called a "guarded sequence" (or
some funny name like that), and required
ECS to be installed in the system. I will
need to dig my CDC manuals out of storage
if you'd like more details.
| Quote: | http://www.cs.clemson.edu/~mark/s1_alumni.html lists
some of the alunmi, of which a bunch worked at MIPS
at some point. I especially recall Earl Killian being
involved with this.
MIPS R2000 (MIPS-I) didn't have any synchronization ops
on purpose, because every mechanism we knew caused at
least some customer to tell us why it wasn't good enough :-)
The story I remember hearing from one of the original MIPS
architects (who shall remain unnamed ;-)) was that atomic
synchronization instructions were used too infrequently
to justify putting them in hardware ;-)
That is incorrect, and I *know* because I was there, and
heavily involved.
The MMU, cache-control, exception-handling, coprocessor0 &
related features were mostly designed by negotiation between
the VLSI folks and the OS group (i.e., myself and a few people
who worked for me).
|
What I reported does not contradict your version. I quoted
directly one of the people who were quite influential in
the design. It was his opinion. It may not have been the
*only* or even the *primary* reason for leaving out test
and set or equivalent instructions, but there can be no
doubt that he had tremendous input, and influenced the
architecture and the design quite significantly. I would
prefer not to identify the person publicly, however I will
be happy to provide the info directly to you [hint: it was
the same person who proposed taking espresso as a good
surrogate for database workloads]. BTW I hope we're not
going to start washing MIPS' dirty linen in public.
| Quote: | We (the OS group) had various thoughts about synchronization
operations, and I certainly did some asking around, and the kind of
input I got from various people went like, with X..Z describing typical
CPU instructions:
a) X is simply not good enough, we like Y better.
b) Y is simply not good enough, we like Z better.
c) You need all of X, Y, and Z. [and we weren't going to get that].
d) X, Y, and Z are all useless, for a serious SMP, you need a special
interconnect mechanism that does scalable locking, and the last thing
in the world you want is something that does a bus-lock.
|
If bus lock is the only way to avoid data corruption, why not?
The purpose of computer systems is to solve people's problems,
not to keep busses happy! ;-)
| Quote: | Recall the minimalist design style: none of us were keen about putting
half-baked features in that consumed resources, but worse, that we were
committed to forever [i.e., especially user-level stuff.]
|
Synchronization primitives were well enough understood by the
early and mid '80s that adding a test-and-set or compare-and-
swap would have been a no-brainer. I find it difficult to
believe that anyone experienced in serious systems software
development would call such primitives "half-baked".
| Quote: | When we asked for things, we had to negotiate/iterate with VLSI group,
but usually, the VLSI attitude was "If you can tell us exactly what you
want, we'll try to do it, or at least discuss alternative
implementations that might be easier." Many of the VLSI team were
ecstatic to actually get useful input. We didn't get everything we
wanted, but we got a lot, and we certainly cared about OS and
multi-tasking performance, and we got tradeoffs to help those whenever
we could.
I believe, that had we definitively said "We MUST have test-and-set (or
one of the other simpler mechanisms)", we would likely have gotten it,
albeit not without some grumbling (since it would have been the only
combined load/store operation).
|
Then why didn't you just do it? It would have saved the rest
of the industry tens (if not hundreds) of person years of
engineering effort wasted to band aid the problem.
And the objections to combined load/store operations must be
regarded as superstition. Clearly one cannot provide mutual
exclusion in hardware without some form of combined load/store,
and doing it in software has much uglier problems that were
quite well understood since Allan Burns' Ph.D. dissertation
in 1981.
| Quote: | However:
a) SMP was an explicit non-goal of the R2000. [Some of us were big SMP
fans, but there was already too much to do, so the rule was not to do
anything that would make SMP unnecessarily difficult, but don't add to
schedule length or die space to enable SMP. Expect to do that later
(the minimal possible in R3000, and then substantially in R4000). IBM
followed a similar path with RS/6000.
|
a) As it turned out, the R3000 did not have test-and-set either.
b) IBM could afford to make countless blunders in the RS/6000.
It wasn't their bread-and-butter, as the R3000 was for MIPS,
and it didn't matter if it floundered in the market (which
it almost did for a while).
c) The amount of logic or die space required by test-and-set or
compare-and-swap is quite small.
| Quote: | b) For the first rounds of machines, with uniprocessors only, we (OS
group) believed we could do without synch ops inside the kernel, since
we could do mutual exclusion other ways. Like I said before, the
mistake was in not providing a decent user-level library, with a
fast-path through kernel, and that was entirely because it got lost in
the frenzy, and that wasn't the VLSI group's fault.
|
No, but the point of this discussion is not to find fault with
any one group in particular. By the way, I hope we're not going
to start washing all the dirty MIPS' linen in public. The point
is to understand the mindset and thought processes that led to
such architectural blunders so that they won't be repeated and
that future generations of designers can learn from them.
In fairness, all the early RISC designs suffered from multiple
architectural blunders that could not be justified by the high
cost of silicon alone, this was by no means a Mips monopoly.
PA-RISC cold not queue interrupts, SPARC went to market without
multiply instructions and no way to turn off virtual memory, etc..
| Quote: | c) Anyway, the result was that we (OS group) were not in a position to
tell the VLSI group exactly what we wanted, and we were reluctant to
spec something that would be there I was in discussions of the form "Do
you want test-and-set, or something different?" That was 2Q85.
forever, and still not do the job, and we weren't willing to stall the
schedule a couple months to work though all the issues properly.
|
Perhaps. When in doubt, however, one can do much worse than to
copy a small part from an existing architecture. Test-and-set
and compare-and-swap were not exactly unheard of by 1985.
| Quote: | Other people may remember this differently, but not all designers were
involved with all parts of the design; I was certainly in the middle of
this one. As for believing that it was left out because it was
infrequently used:
a) In general, of course we tried to leave things out that were
infrequently used.
b) But, if something was infrequently used, but structurally necessary
[like halfword load/stores for dealing with device drivers], we got it
in some form or other.
So, one more time: the synchronization omission was on purpose, simply
because we could not specify a committed-forever feature early enough,
and we thought we could live without it for a while, until we knew
something we thought would really work well.
|
Well, history did prove this decision was misguided, didn't it?
The amount of aggravation, time and effort that MIPS, its OEM's
and ISV's suffered as a result was completely out of proportion
with whatever little schedule (or die area) gains were achieved.
dk |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Tue Dec 21, 2004 7:57 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:cq79gu$i32$1@gemini.csx.cam.ac.uk...
| Quote: | In article <1103569098.536503.28680@z14g2000cwz.googlegroups.com>,
John Mashey <old_systems_guy@yahoo.com> wrote:
I believe, that had we definitively said "We MUST have test-and-set (or
one of the other simpler mechanisms)", we would likely have gotten it,
albeit not without some grumbling (since it would have been the only
combined load/store operation).
It is interesting that you didn't, for reasons below.
However:
a) SMP was an explicit non-goal of the R2000. ...
See below.
b) For the first rounds of machines, with uniprocessors only, we (OS
group) believed we could do without synch ops inside the kernel, since
we could do mutual exclusion other ways. Like I said before, the
mistake was in not providing a decent user-level library, with a
fast-path through kernel, and that was entirely because it got lost in
the frenzy, and that wasn't the VLSI group's fault.
Actually, I think that you are wrong there. Designing such a library
is no easier than designing hardware. Doing one in haste could well
have led to problems down the line. Unless you know of any existing
good designs (i.e. both theoretically and practically adequate,
robust and efficiently implementable). Do you?
|
The mutex library for MIPS was released in 1991.
It was developed by database engineering.
dk |
|
| Back to top |
|
 |
John Mashey
Guest
|
Posted:
Tue Dec 21, 2004 7:57 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
One more time, then I'm off to ski.
Dan Koren wrote:
| Quote: | Hi Mash,
"John Mashey" <old_systems_guy@yahoo.com> wrote in message
news:1103569098.536503.28680@z14g2000cwz.googlegroups.com...
Dan Koren wrote:
"(John Mashey)" <old_systems_guy@yahoo.com> wrote in message
news:1103068097.926843.101230@c13g2000cwb.googlegroups.com...
LL/SC: Livermore S-1.
Maybe there was something earlier thart had it, but
this is where I thought it came from.
LL/SC instructions were introduced with the CDC-6600.
What are they called there? I hadn't remebmered there being
anything
like that, and I reviewed:
http://ed-thelen.org/comp-hist/CDC-6600-R-M.html
The PP's had add/subtract to memory, but that isn't LL/SC.
The document you point to is the general reference
that covers the entire 6000 series. There was a
separate supplement for the 6500/6600 describing
multiprocessor features and extended core storage.
LL/SC were not separate instructions. If memory
serves they were just ordinary reads and writes
to extended core storage that were performed
after setting a flag in the ECS controller.
The store would fail if an intervening write
had occured from the other processor. IIRC
this was called a "guarded sequence" (or
some funny name like that), and required
ECS to be installed in the system. I will
need to dig my CDC manuals out of storage
if you'd like more details.
http://www.cs.clemson.edu/~mark/s1_alumni.html lists
some of the alunmi, of which a bunch worked at MIPS
at some point. I especially recall Earl Killian being
involved with this.
MIPS R2000 (MIPS-I) didn't have any synchronization ops
on purpose, because every mechanism we knew caused at
least some customer to tell us why it wasn't good enough :-)
The story I remember hearing from one of the original MIPS
architects (who shall remain unnamed ;-)) was that atomic
synchronization instructions were used too infrequently
to justify putting them in hardware ;-)
That is incorrect, and I *know* because I was there, and
heavily involved.
The MMU, cache-control, exception-handling, coprocessor0 &
related features were mostly designed by negotiation between
the VLSI folks and the OS group (i.e., myself and a few people
who worked for me).
What I reported does not contradict your version. I quoted
directly one of the people who were quite influential in
the design. It was his opinion. It may not have been the
*only* or even the *primary* reason for leaving out test
and set or equivalent instructions, but there can be no
doubt that he had tremendous input, and influenced the
architecture and the design quite significantly. I would
prefer not to identify the person publicly, however I will
be happy to provide the info directly to you [hint: it was
the same person who proposed taking espresso as a good
surrogate for database workloads]. BTW I hope we're not
going to start washing MIPS' dirty linen in public.
We (the OS group) had various thoughts about synchronization
operations, and I certainly did some asking around, and the kind of
input I got from various people went like, with X..Z describing
typical
CPU instructions:
a) X is simply not good enough, we like Y better.
b) Y is simply not good enough, we like Z better.
c) You need all of X, Y, and Z. [and we weren't going to get that].
d) X, Y, and Z are all useless, for a serious SMP, you need a
special
interconnect mechanism that does scalable locking, and the last
thing
in the world you want is something that does a bus-lock.
If bus lock is the only way to avoid data corruption, why not?
The purpose of computer systems is to solve people's problems,
not to keep busses happy! ;-)
Recall the minimalist design style: none of us were keen about
putting
half-baked features in that consumed resources, but worse, that we
were
committed to forever [i.e., especially user-level stuff.]
Synchronization primitives were well enough understood by the
early and mid '80s that adding a test-and-set or compare-and-
swap would have been a no-brainer. I find it difficult to
believe that anyone experienced in serious systems software
development would call such primitives "half-baked".
|
It would have been a perfectly adequate solution to have provided the
user-level library and the fast-path kernel mechanism, and I planned to
do that, but we just got consumed by all the other surprises, like
getting volatile right, and debugging kernels compiled with global
optimizers, and I just forgot until it was too late, and it indeed
caused trouble later...
The typical test-and-set of the day specified a bus-lock. Some of us
had worked on/managed OSs for SMPs of various flavors, and we talked to
others, who had done so, were doing so, or planned to do so, and most
basically said: "We WON'T use it, because it locks the main system bus
and it is simply not scalable. We've got our own thing that's better
(And it's different from everyone else's, and it involves system
architecture, not CPU architecture.)"
For example: Sequent SLIC bus, but others likewise, including SGI
(later, but of course we talked to them, and of course, the original
SYS V port onto MIPS was done by a team that was half-MIPS, half-SGI,
working for me.)
In that era, SMP people were either designing their own proprietary
CPU+systems
[in which one can take care of the problem end-to-end], or they were
using commodity micros, and doing their own system architecture, and
quite typically, their own cache controllers and synchronization
features.
I've described in this newsgroup, in gory detail, things that we did in
hardware, that in retrospect, we shouldn't have, and other things that
we didn't do, that in retrospect, we should have done, and have known
it.
In retrospect:
a) I would STILL not have put test-and-set or equivalent in the R2000.
I think we could have gotten it if we'd thought it was worth having,
but we didn't, and I still don't, at least not in any form that we
could have done then. I suppose, that with the experience of 512P SGI
Origin3000s in hand, and with some of the work that we did on
synchronization & cache mechanisms for some (canceled) MIPS CPUs in the
later 1990s, that maybe I could go back and spec some operations that
would actually have lasted and scaled ...but that's water long over the
dam, and there's no way in the world to have understood the big-ccNUMA
issues in 2Q85.
b) We definitely *should* have provided the standard user library and
the fast-path [in one of several ways, of which the simplest was
described before, and used no more hardware than was already there.]
Not doing that early was a mistake ... but it was all mine.
| Quote: |
When we asked for things, we had to negotiate/iterate with VLSI
group,
but usually, the VLSI attitude was "If you can tell us exactly what
you
want, we'll try to do it, or at least discuss alternative
implementations that might be easier." Many of the VLSI team were
ecstatic to actually get useful input. We didn't get everything we
wanted, but we got a lot, and we certainly cared about OS and
multi-tasking performance, and we got tradeoffs to help those
whenever
we could.
I believe, that had we definitively said "We MUST have test-and-set
(or
one of the other simpler mechanisms)", we would likely have gotten
it,
albeit not without some grumbling (since it would have been the
only
combined load/store operation).
Then why didn't you just do it? It would have saved the rest
of the industry tens (if not hundreds) of person years of
engineering effort wasted to band aid the problem.
Like I said, I still woudn't have done it, I'd have done the kernel |
fast path, and that would have been plenty good enough.
| Quote: | And the objections to combined load/store operations must be
regarded as superstition. Clearly one cannot provide mutual
exclusion in hardware without some form of combined load/store,
and doing it in software has much uglier problems that were
quite well understood since Allan Burns' Ph.D. dissertation
in 1981.
However:
a) SMP was an explicit non-goal of the R2000. [Some of us were big
SMP
fans, but there was already too much to do, so the rule was not to
do
anything that would make SMP unnecessarily difficult, but don't add
to
schedule length or die space to enable SMP. Expect to do that
later
(the minimal possible in R3000, and then substantially in R4000).
IBM
followed a similar path with RS/6000.
a) As it turned out, the R3000 did not have test-and-set either.
Right, because in fact, the the people most interested in doing SMPs |
with R3000s HAD to have the R3000 addition that allowed an external
cache invalidate, and didn't care about having test-and-set, because
they didn't think it scaled far enough, and they had their own synch
busses.
| Quote: |
b) IBM could afford to make countless blunders in the RS/6000.
It wasn't their bread-and-butter, as the R3000 was for MIPS,
and it didn't matter if it floundered in the market (which
it almost did for a while).
c) The amount of logic or die space required by test-and-set or
compare-and-swap is quite small.
|
It doesn't have to be big, but it is certainly irregular, i.e., it's
unlike any other load, and die space on the R2000 was at a premium. In
retrospect, if I'd have fought for anything irregular, it would have
been to make LDC1/SDC1 work. If I'd realized the pain of that one, I'd
have even given up a few TLB entries.
| Quote: |
b) For the first rounds of machines, with uniprocessors only, we
(OS
group) believed we could do without synch ops inside the kernel,
since
we could do mutual exclusion other ways. Like I said before, the
mistake was in not providing a decent user-level library, with a
fast-path through kernel, and that was entirely because it got lost
in
the frenzy, and that wasn't the VLSI group's fault.
No, but the point of this discussion is not to find fault with
any one group in particular. By the way, I hope we're not going
to start washing all the dirty MIPS' linen in public. The point
is to understand the mindset and thought processes that led to
such architectural blunders so that they won't be repeated and
that future generations of designers can learn from them.
Again, we fundamentally disagree: I think omitting TS was perfectly |
fine, it was not doing the API that was the cause of pain. We omitted
lots of things, on purpose, and were usually right, but usually because
we'd worked out teh details of the software interfaces, and then
executed on them.
| Quote: | In fairness, all the early RISC designs suffered from multiple
architectural blunders that could not be justified by the high
cost of silicon alone, this was by no means a Mips monopoly.
PA-RISC cold not queue interrupts, SPARC went to market without
multiply instructions and no way to turn off virtual memory, etc..
c) Anyway, the result was that we (OS group) were not in a position
to
tell the VLSI group exactly what we wanted, and we were reluctant
to
spec something that would be there I was in discussions of the form
"Do
you want test-and-set, or something different?" That was 2Q85.
forever, and still not do the job, and we weren't willing to stall
the
schedule a couple months to work though all the issues properly.
Perhaps. When in doubt, however, one can do much worse than to
copy a small part from an existing architecture. Test-and-set
and compare-and-swap were not exactly unheard of by 1985.
Other people may remember this differently, but not all designers
were
involved with all parts of the design; I was certainly in the
middle of
this one. As for believing that it was left out because it was
infrequently used:
a) In general, of course we tried to leave things out that were
infrequently used.
b) But, if something was infrequently used, but structurally
necessary
[like halfword load/stores for dealing with device drivers], we got
it
in some form or other.
So, one more time: the synchronization omission was on purpose,
simply
because we could not specify a committed-forever feature early
enough,
and we thought we could live without it for a while, until we knew
something we thought would really work well.
Well, history did prove this decision was misguided, didn't it?
No. It proved we goofed in not doing the API with a fast-path.
The amount of aggravation, time and effort that MIPS, its OEM's
and ISV's suffered as a result was completely out of proportion
with whatever little schedule (or die area) gains were achieved.
dk |
|
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Tue Dec 21, 2004 7:24 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
John Mashey wrote:
| Quote: |
One more time, then I'm off to ski.
....
It would have been a perfectly adequate solution to have provided the
user-level library and the fast-path kernel mechanism, and I planned to
do that, but we just got consumed by all the other surprises, like
getting volatile right, and debugging kernels compiled with global
optimizers, and I just forgot until it was too late, and it indeed
caused trouble later...
|
What was "getting volatile right". The C standard doesn't recogize multithreading
or multiprocessing so volaltile is pretty much useless in that regard. The standard
does allow volatile to be implemetation dependent so perhaps "getting volatile right"
meant something to somebody somewhere. Was it documented or just assumed that everyone
would know what that actually meant.
Currently I'm trying to determine what guarantees the Linux kernel atomic_read is actually
atomic. Volatile may or may not be involved since it used in the atomic_t definition. But
whether it's gcc is guaranteed to atomically access 32 bit ints or just 32 bit volatile ints
is not documented. So you don't know if someone actually verified with the gcc maintainers
if this is true or just assumed it was.
Joe Seigh |
|
| Back to top |
|
 |
|
|
|
|