| Author |
Message |
Terje Mathisen
Guest
|
Posted:
Tue Jan 25, 2005 5:05 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Nick Maclaren wrote:
| Quote: | In article <1106608559.393100.303700@c13g2000cwb.googlegroups.com>,
David Kanter <dkanter@gmail.com> wrote:
In fairness, all the early RISC designs suffered from multiple
architectural blunders that could not be justified by the high
cost of silicon alone, this was by no means a Mips monopoly.
PA-RISC cold not queue interrupts, SPARC went to market without
multiply instructions and no way to turn off virtual memory, etc..
What about POWER and Alpha? Did those avoid most of the major blunders
(obviously, they avoided the lack of a multiply)?
I haven't looked at POWER in detail, but Alpha got several things
badly wrong.
|
POWER have always seemed quite nice. The very first implementation
needed to use multiple chips, which caused a 3-cycle (afair the Byte
article) latency between doing an integer compare, moving the result to
the branch unit and finally being able to branch on the result.
| Quote: |
It originally had only full register loads and stores, which is
unsuitable for implementing C. That was fixed.
|
IMHO C wasn't the worst problem:
Memorymapped IO to 8/16/32-bit device registers with destructive read or
write was a harder problem, and in this case DEC couldn't simply define
away the problem either. The workaround entailed using alternate address
ranges afair. It seemed like a horrible hack at the time, and it
probably generated quite a few bugs in kernel mode drivers. :-(
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Joseph Seigh
Guest
|
Posted:
Tue Jan 25, 2005 5:34 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
On Tue, 25 Jan 2005 10:14:28 +0100, Per Schröder <per@mimer.se> wrote:
| Quote: | glen herrmannsfeldt wrote:
Nick Maclaren wrote:
I haven't looked at POWER in detail, but Alpha got several things
badly wrong.
It originally had only full register loads and stores, which is
unsuitable for implementing C. That was fixed.
It had load/store and instructions for manipulating bytes
in a register, and is a RISC machine.
What happens when you add byte and halfword store?
It loads the fullword into a register, modifies the
byte or halfword, and writes it back again. With ECC
that is the only way it can be done. It will be just
as slow in either case, though it takes a few more
instructions in the original version.
This leads to wrong results if you are using threads and you have two
adjacent bytes (in the same 32-bit word) that are protected by different
mutexes.
Of course, you could do *all* byte and 16-bit accesses by LL/SC sequences.
This would reinstate correct semantics but the cost would be prohibitive.
You could also declare that having adjacent bytes protected by different
mutexes is UNSUPPORTED (it's a bad idea anyway), but unfortunately a lot of
programmers wouldn't notice, and the net result would be that threaded
programs that worked fine on other architectures would randomly fail when
ported to Alpha.
|
This is called "word tearing". It's mainly a problem from Posix pthreads and
ANSI C not talking to each other, most of the fault being AFAICT the C
committee who appear to be extremely hostile and antipathic towards threading
issues. It's one the reasons there has to be a separate Posix compliance
certification for C compilers. There's a simple solution without any performance
hit but there's no point in mentioning it since it requires C support.
On the other hand, since threaded programming is non-portable even with Posix
pthreads, you don't have to work as hard on portability as you would with pure
non-threaded programming.
--
Joe Seigh |
|
| Back to top |
|
 |
Per Schröder
Guest
|
Posted:
Tue Jan 25, 2005 5:55 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Joseph Seigh wrote:
| Quote: | This is called "word tearing". It's mainly a problem from Posix pthreads
and ANSI C not talking to each other, most of the fault being AFAICT the C
committee who appear to be extremely hostile and antipathic towards
threading issues. It's one the reasons there has to be a separate Posix
compliance certification for C compilers. There's a simple solution
without any performance hit but there's no point in mentioning it since it
requires C support.
|
Please do mention it. I'm curious!
/Per Schröder |
|
| Back to top |
|
 |
Joseph Seigh
Guest
|
Posted:
Tue Jan 25, 2005 6:42 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
On Tue, 25 Jan 2005 13:55:39 +0100, Per Schröder <per@mimer.se> wrote:
| Quote: | Joseph Seigh wrote:
This is called "word tearing". It's mainly a problem from Posix pthreads
and ANSI C not talking to each other, most of the fault being AFAICT the C
committee who appear to be extremely hostile and antipathic towards
threading issues. It's one the reasons there has to be a separate Posix
compliance certification for C compilers. There's a simple solution
without any performance hit but there's no point in mentioning it since it
requires C support.
Please do mention it. I'm curious!
You create a new attribute which forces alignment to a safe boundary. You |
could call something like "shared", but not "volatile" which is has too
much screwed up semantic baggage to try to reuse that. So for examples
// shared at struct level
shared struct {
char x;
char y;
} z;
// shared struct with imbedded mutex
shared struct {
pthread_mutex_t mutex; // mutex typedef has shared attribute
char x;
char y;
} z;
// shared at field level
shared struct { // shared probably implied here anyway
shared char x;
shared char y;
} z;
Most compilers have alignment directives currently but you have to
explicitly use them now, meaning you have to know the platform specific
parameters. Though if the alignment is provided as an attribute, I
suppose you could write a "shared" macro. You'd have to use the max
of the word tearing boundary and the objects natural boundary, e.g.
if word tearing is 4 byte boundary and the object requires 8 byte
boundary, you need to leave it at 8 bytes. Though, I'm not sure
how you'd get the macro to set alignment on the following field.
But how many architectures besides alpha have this problem?
--
Joe Seigh |
|
| Back to top |
|
 |
Eric P.
Guest
|
Posted:
Tue Jan 25, 2005 7:55 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Joseph Seigh wrote:
| Quote: |
This is called "word tearing". It's mainly a problem from Posix pthreads and
ANSI C not talking to each other, most of the fault being AFAICT the C
committee who appear to be extremely hostile and antipathic towards threading
issues. It's one the reasons there has to be a separate Posix compliance
certification for C compilers. There's a simple solution without any performance
hit but there's no point in mentioning it since it requires C support.
On the other hand, since threaded programming is non-portable even with Posix
pthreads, you don't have to work as hard on portability as you would with pure
non-threaded programming.
|
This has nothing to do with Posix or ANSI C, other than they are
also affected. The problem is just as present if you are running
Fortran apps on VMS or Windows NT.
The problem is that they built a machine that behaves unlike
any other machine on the market. There was no general usage of
multithreading at that time so it was not a clearly wrong decision
(all interrupt and AST code was affected, but who cares about that
hacker stuff anyway).
But the market changed in 1992 with WNT making kernel scheduled
multithreading generally available, and other OSs followed.
Apps writting for other platforms now broke on Alpha.
This left ISVs and IT to pick up the tab for customizing apps
so they run properly on that machine, which became yet another
market barrier against the Alpha.
Eric |
|
| Back to top |
|
 |
Eric P.
Guest
|
Posted:
Tue Jan 25, 2005 8:33 pm Post subject:
Re: CAS and LL/SC |
|
|
Nick Maclaren wrote:
| Quote: |
In article <YxbJd.20816$8u5.17772@bgtnsc04-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
"Andi Kleen" <freitag@alancoxonachip.com> wrote in message
news:m3brbfou1q.fsf@averell.firstfloor.org...
To bring it back on topic to comp.arch: morale is to never add
silly address space limits to registers that cause such problems
later.
Yes. Or, in a slightly different formulation, don't use memory mapped I/O
at all (at least not for general purpose processors where the stringent
requreiments of some embedded systems don't apply).
Amen.
|
Rubbish. This has nothing to do with memory mapped IO.
How about: Don't try to address more than 4GB on a 32 bit machine.
Or even better: Use 36 bits addresses and an IOMMU to map PCI/32
into the larger address space. OS support has be available for
over a decade.
Memory mapped IO works just fine and has proven over many decades
to be the most flexible and functional while being simplest
approach, when the system design is not botched.
| Quote: | Merging this with another thread (the request for specialised
coprocessors using an I/O interface), you end up with it being
regarded as a good thing to have a CPU core pretending to be an
I/O device pretending to be some memory attached to another CPU.
Stop the world - I want to get off.
|
I'm not sure if you are being sarcastic for or sarcastic against.
At any rate, if the lowest cost and most flexible solution is
to have the core cpu sitting waiting for a device register,
what is the problem?
| Quote: | As I have said before, traditional engineering is the wrong model;
Darwinian evolution of the sort that leads to the panda's 'thumb'
is a much better one .....
Regards,
Nick Maclaren.
|
Eric |
|
| Back to top |
|
 |
Jan Vorbrüggen
Guest
|
Posted:
Tue Jan 25, 2005 8:51 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
| Quote: | The problem is that they built a machine that behaves unlike
any other machine on the market. There was no general usage of
multithreading at that time so it was not a clearly wrong decision
(all interrupt and AST code was affected, but who cares about that
hacker stuff anyway).
|
ASTs are hacker stuff? Do say!
| Quote: | But the market changed in 1992 with WNT making kernel scheduled
multithreading generally available, and other OSs followed.
Apps writting for other platforms now broke on Alpha.
This left ISVs and IT to pick up the tab for customizing apps
so they run properly on that machine, which became yet another
market barrier against the Alpha.
|
But those apps were broken to begin with, their brokenness was just
exposed more easily on Alpha.
Jan |
|
| Back to top |
|
 |
Eric P.
Guest
|
Posted:
Tue Jan 25, 2005 8:56 pm Post subject:
Re: CAS and LL/SC |
|
|
Bernd Paysan wrote:
| Quote: |
Eric P. wrote:
Bernd Paysan wrote:
On the other hand, with all this virtualization talk (Vanderpool): Please
add an IO processor that takes messages from the host OS and talks to the
IO all by itself - including MMU tasks for DMA, and such stuff. And make
it a multithreaded processor.
Why do you think that asymmetric multiprocessing is better compared
to adding another main processor with an SMP OS? In SMP once the
cpu is finished housekeeping it can be used for running apps.
An IO coprocessor cannot.
An IO processor is a simplistic device that mostly moves data around. Apart
from that, it will only provide a small "abstraction layer" for accessing
IO. IO has a lot of properties that makes it awful to access with the main
processor:
* It is slow
* It typically needs to be uncached
|
An IO Processor (IOP) still needs coherent access to main memory/cache.
The IOP itself does not need instruction and local data cache.
But if it is already present, as it would be in an SMP system,
then there is no harm in using it.
| Quote: | * It needs strict ordered accesses
and it often comes with dreadful design mistakes:
* Destructive reads (no speculation! No prefetch!)
* Polling
When I'm talking about an IO processor, I'm talking about logic that takes
in the order of 10k gates. Then, the advantage to adding another main
processor is obvious:
* The IO processor costs cents or less
|
Your conclusion is based on the assumption that introducing a second
cpu architecture, with all its associated design, development and
support costs, is lower cost than just adding a second main cpu.
In an era where cpus costs $10 to $150, and programmers cost
$1000 to $2000 per day, I suggest that this is false economics.
In any case there will be a bunch of transistors spinning their
wheels waiting for a device register. Offloading the task does not
necessarily make this cheaper. To my eye all it does is shuffle
the work around and make it more complex.
IMO what might be worth _looking_ at would be where the core and IOP
cpus had the same ISA. The IOP would be a much simplified design.
This would be like an 80386 attached to a Pentium 4. A better
example would be an OOO quad issue Alpha with a cheapy in order,
no floating point, single issue Alpha IOP. Even so I am not
yet convinced this is worth the problems it would cause.
| Quote: | Furthermore, there are advantages from an OS design point of view:
* The IO processor can be used as more generic interface to the IO - it can
translate commands to device-specific ones. That's what VMs need: A
"channel processor" where the real machine can use the real IO processor,
and the virtual machine can filter the channel commands and pass them along
to the real IO processor without much overhead. IO processor commands are
big chunks of semantically connected work, not single register transfers.
|
Such an interface can certainly be done with or without IOPs.
How would this API and mechanism differ from current command packet
based IO in VMS or WNT? (I exclude Linux only because I haven't
found a decent internals description of its "AIO Replay Traps"
mechanism yet.)
| Quote: | Compare that to the normal IO situation on the PCI/ISA bus of a PC, where
you get single register accesses and you ping-pong between the VM and the
OS.
|
Yes, but this is exactly the point. In both SMP and IOP the cpu
spins waiting for the register. Afterwards an IOP just sits there.
In SMP it goes back to work.
Eric |
|
| Back to top |
|
 |
Eric P.
Guest
|
Posted:
Tue Jan 25, 2005 9:24 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Jan Vorbrüggen wrote:
| Quote: |
The problem is that they built a machine that behaves unlike
any other machine on the market. There was no general usage of
multithreading at that time so it was not a clearly wrong decision
(all interrupt and AST code was affected, but who cares about that
hacker stuff anyway).
ASTs are hacker stuff? Do say!
|
"hacker stuff anyway :-)"
| Quote: | But the market changed in 1992 with WNT making kernel scheduled
multithreading generally available, and other OSs followed.
Apps writting for other platforms now broke on Alpha.
This left ISVs and IT to pick up the tab for customizing apps
so they run properly on that machine, which became yet another
market barrier against the Alpha.
But those apps were broken to begin with, their brokenness was just
exposed more easily on Alpha.
|
Not so. Replacing atomic byte and word stores with read-modify-write
sequences *changes* such accesses to be non-interruptable on that
platform. Granted such usage is an unstated assumption, but so
what - who guarenteed that 32 bits int accesses are atomic?
These are assumptions that are valid on most (all?) platforms
but that one, and are used by lots of code including that of their
existing customers VAX code.
My point was mostly about erecting market barriers against your
own product. Causing others to incur expense to use your product
is probably not the best way to take over the market.
Eric |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Jan 25, 2005 9:34 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
In article <35mjfuF4kdrkoU1@individual.net>,
=?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <jvorbrueggen-not@mediasec.de> wrote:
| Quote: | It originally had only full register loads and stores, which is
unsuitable for implementing C. That was fixed.
I believe the driving force in providing less-than-32-bit-read and
-writes was memory-mapped I/O, not implementing C (which was needed
from day one in any case). The workarounds in hard- and software for
the first systems were Nor Pretty and prone to misuse and errors, so
the necessary instructions were added.
|
That's essentially the same problem. In all cases (threads, memory
mapping and interrupts), the problems and solutions have a lot in
common.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
David Kanter
Guest
|
Posted:
Tue Jan 25, 2005 9:39 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
What about Tandem? Their boxes still use MIPS, although, like SGI they
are transitioning to IA64 in the near future.
Those Tandem boxes are quite expensive...beautiful margins for those HP
folks... |
|
| Back to top |
|
 |
David Kanter
Guest
|
Posted:
Tue Jan 25, 2005 9:46 pm Post subject:
Re: SGI on Opteron (Was: Re: CAS and LL/SC (was Re: High Lev |
|
|
Not for long. As I understand, there is no way to do lockstepping with
Montecito, because of it's dynamic clocking capability.
That being said, I think Intel has always made RAS a higher priority
than AMD...
David Kanter |
|
| Back to top |
|
 |
Bernd Paysan
Guest
|
Posted:
Tue Jan 25, 2005 10:04 pm Post subject:
Re: CAS and LL/SC |
|
|
Eric P. wrote:
| Quote: | An IO Processor (IOP) still needs coherent access to main memory/cache.
The IOP itself does not need instruction and local data cache.
But if it is already present, as it would be in an SMP system,
then there is no harm in using it.
|
The IOP needs the same sort of access to main memory/cache as a DMA unit.
Now tell me there's no DMA unit in the chipset ;-).
| Quote: | Your conclusion is based on the assumption that introducing a second
cpu architecture, with all its associated design, development and
support costs, is lower cost than just adding a second main cpu.
In an era where cpus costs $10 to $150, and programmers cost
$1000 to $2000 per day, I suggest that this is false economics.
|
A main CPU costs $10 to $150. A 10k gates one as part of the chipset cost
cents. And yes, that includes development, since a chipset is sold in 10 to
100 million units.
| Quote: | In any case there will be a bunch of transistors spinning their
wheels waiting for a device register. Offloading the task does not
necessarily make this cheaper. To my eye all it does is shuffle
the work around and make it more complex.
|
Offloading the task to a cheaper device does make it cheaper. And it doesn't
make it more complex. The IOPs of the 40 years ago past weren't build
because it made the system more complex or more difficult to handle, rather
the other way round.
| Quote: | Yes, but this is exactly the point. In both SMP and IOP the cpu
spins waiting for the register. Afterwards an IOP just sits there.
In SMP it goes back to work.
|
I don't need a few cents processor to go back to work. I am talking about a
few cents processor take off stupid workload of a processor with tripple
number $ price tag.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/ |
|
| Back to top |
|
 |
Joseph Seigh
Guest
|
Posted:
Tue Jan 25, 2005 10:32 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
On Tue, 25 Jan 2005 11:24:56 -0500, Eric P. <eric_pattison@sympaticoREMOVE.ca> wrote:
| Quote: | Jan Vorbrüggen wrote:
But those apps were broken to begin with, their brokenness was just
exposed more easily on Alpha.
Not so. Replacing atomic byte and word stores with read-modify-write
sequences *changes* such accesses to be non-interruptable on that
platform. Granted such usage is an unstated assumption, but so
what - who guarenteed that 32 bits int accesses are atomic?
These are assumptions that are valid on most (all?) platforms
but that one, and are used by lots of code including that of their
existing customers VAX code.
|
IBM ran into the same problem when they first went to multiprocessing
and the bit ops, OI and NI, were no longer atomic.
| Quote: |
My point was mostly about erecting market barriers against your
own product. Causing others to incur expense to use your product
is probably not the best way to take over the market.
|
I'm waiting for someone (on something other than alpha) to break the
implicit memory ordering on dependent loads since the explicit
memory models will allow them to do so. They get to find out they
seriously broke Linux SMP scalability for their processor.
--
Joe Seigh |
|
| Back to top |
|
 |
David Kanter
Guest
|
Posted:
Wed Jan 26, 2005 12:51 am Post subject:
Re: CAS and LL/SC |
|
|
Isn't that what zSeries machines do? I was under the impression that
they have 'channel' controllers that are relatively similar to what you
are describing.
Not being a zSeries person, I can neither conform nor deny such
allegatons : )
Perhaps someone who is a zSeries person could clear this up.
David Kanter |
|
| Back to top |
|
 |
|
|
|
|