| Author |
Message |
David Hopwood
Guest
|
Posted:
Tue Sep 06, 2005 5:26 am Post subject:
Re: Intel x86 memory model question |
|
|
Joe Seigh wrote:
| Quote: | David Hopwood wrote:
Joe Seigh wrote:
"Despite the fact that Pentium 4, Intel Xeon, and P6 family
processors support processor ordering, Intel does not guarantee that
future processors will support this model. To make software portable
to future processors, it is recommended that operating systems provide
critical region and resource control constructs and API’s (application
program interfaces) based on I/O, locking, and/or serializing
instructions be used to synchronize access to shared areas of
memory in multiple-processor systems."
This is all perfectly sensible. "Future processors" from Intel are not
necessarily ISA-compatible with x86 anyway. For example, you need to
recompile to use long mode in EM64T. Also note that it doesn't say
"future x86 processors". Maybe they were talking about Itanic.
Even if they weren't talking about IA-64 or a different mode, it's
still a good idea to avoid dependencies on the memory model in
*applications*, since it is more difficult to change all apps that
have such dependencies than it is to change threading libraries in OS
and language implementations. In fact OS/lang-impl maintainers half
expect stuff to rot on new hardware, and hopefully remember what they
depended on. Application maintainers generally don't (if they ever
understood it in the first place). This is what I've been saying
consistently.
Yes, your adversion to anarchist application programmers doing their
own thing is well known. :)
|
Right, I am absolutely convinced that the roles of application
programmer and infrastructure programmer should be clearly separated
(even if there are a few people with the ability and expertise needed
to successfully do both).
| Quote: | Anyway, this issue doesn't have anything to do with what we were talking
about, which is whether the current architected x86 model allows a
particular behaviour.
That one? And what do people think the memory model that only
"I/O, locking, and/or serializing instructions" can synchronize is?
You're overanalysing a fairly loosely worded recommendation.
I'm not sure what you're saying here. That all future processors
from Intel that don't have processor ordering won't be x86?
|
Well, they won't be x86-as-we-know-it. OSes, compilers, etc. will
have to be changed to run on or generate code for this new x86-like
thing, and changes in the memory model will probably be only one issue
they need to deal with.
| Quote: | And that the synchronization intructions in these future processors
won't be similar to the one's in x86? That Intel is telling people
in an x86 manual to start writing portable code not now but when
they get to the future processor?
|
Of course not. Read what they actually wrote.
--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Tue Sep 06, 2005 6:13 am Post subject:
Re: Intel x86 memory model question |
|
|
David Hopwood wrote:
| Quote: | Joe Seigh wrote:
David Hopwood wrote:
That one? And what do people think the memory model that only
"I/O, locking, and/or serializing instructions" can synchronize is?
You're overanalysing a fairly loosely worded recommendation.
I'm not sure what you're saying here. That all future processors
from Intel that don't have processor ordering won't be x86?
Well, they won't be x86-as-we-know-it. OSes, compilers, etc. will
have to be changed to run on or generate code for this new x86-like
thing, and changes in the memory model will probably be only one issue
they need to deal with.
And that the synchronization intructions in these future processors
won't be similar to the one's in x86? That Intel is telling people
in an x86 manual to start writing portable code not now but when
they get to the future processor?
Of course not. Read what they actually wrote.
|
I did. It sounded to me like they said if you want to write
portable code, don't assume processor ordering but use the
locking and serializing instructions instead on the current
processors.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Alexander Terekhov
Guest
|
Posted:
Tue Sep 06, 2005 2:01 pm Post subject:
Re: Intel x86 memory model question |
|
|
Andy Glew wrote:
[...]
| Quote: | I think that the overall intention is that placing MFENCE before and
after every memory reference is supposed to get you SC semantics.
|
But without remote write atomicity, I suppose. And, BTW, that's what
revised Java volatiles do. I mean JSR-133 memory model.
| Quote: | However, MFENCE, LFENCE, and SFENCE were defined after my time, and I
suspect that their definitions are not quite complete enough for what
you want. In particular, *FENCE really only work wrt WC cacheable
memory, and do not drain external buffers such as may occur in bus
bridges.
|
My reading of the specs is that MFENCE is guaranteed to provide
store-load barrier.
P1: X = 1; R1 = Y;
P2: Y = 1; R2 = X;
(R1, R2) = (0, 0) is allowed under pure PC, but
P1: X = 1; MFENCE; R1 = Y;
P2: Y = 1; MFENCE; R2 = X;
(R1, R2) = (0, 0) is NOT allowed.
| Quote: | In general, the P6 and Wmt families' mechanism for ensuring
ordering, waiting for global observability, only works for perfectly
vanilla WC cacheable memory, and is frequently violated wrt other
memory types. So I do not want to guarantee that it will work for
things like WC cached memory that is private to a graphics
accelerator.
|
I want to know whether MFENCE provides store-load barrier for WB
memory.
| Quote: |
You may be right that using the cmpxchg as you describe achieves SC on
x86. However, I need to think about it a bit more, since the
reasoning you provide is implementation specific, not architectural.
|
I'm just reading the specs.
CMPXCHG on x86 always performs a (hopefully StoreLoad+LoadLoad fenced)
load followed by a (LoadStore+StoreStore fenced) store (plus trailing
MFENCE, so to speak). Locked CMPXCHG is supposed to be "fully fenced".
Regarding safety net for remote write atomicity, I rely on the
following CMPXCHG wording:
"The destination operand is written back if the comparison fails;
otherwise, the source operand is written into the destination.
(The processor never produces a locked read without also
producing a locked write.)"
I suspect that (locked) XADD(addr, 0) will also work... but I'm
somewhat missing strong language about mandatory write as in CMPXCHG.
[... cmpxchg could well be implemented without any fencing ...]
"Locked operations are atomic with respect to all other memory
operations and all externally visible events. Only instruction
fetch and page table accesses can pass locked instructions. Locked
instructions can be used to synchronize data written by one
processor and read by another processor.
For the P6 family processors, locked operations serialize all
outstanding load and store operations (that is, wait for them to
complete). This rule is also true for the Pentium 4 and Intel Xeon
processors, with one exception: load operations that reference
weakly ordered memory types (such as the WC memory type) may not
be serialized."
| Quote: | You are confusing implementation with semantics.
|
Fix the specs, then.
And explain how can one achieve classic SC semantics for WB memory.
regards,
alexander. |
|
| Back to top |
|
 |
David Hopwood
Guest
|
Posted:
Tue Sep 06, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
Joe Seigh wrote:
| Quote: | David Hopwood wrote:
Joe Seigh wrote:
David Hopwood wrote:
Of course not. Read what they actually wrote.
I did. It sounded to me like they said if you want to write
portable code, don't assume processor ordering but use the
locking and serializing instructions instead on the current
processors.
But OSes, thread libraries and language implementations *aren't* portable
code.
I do not think that word means what you think it means.
Note that I am an ex-kernel developer and have created enough
sychronization api's that run on totally different platforms.
|
You are totally missing the point. OSes, thread libraries and language
implementations have some code that needs to be adapted to each hardware
architecture. If the memory model were to change in future processors
that are otherwise x86-like, this code would have to change. It's not a
big deal, because this platform-specific code is maintained by people who
know how to change it, and because there are few enough OSes, thread
libraries, and language implementations for the total effort involved
not to be very great. It would, however, be a big deal if existing x86
*applications* stopped working on an otherwise x86-compatible processor.
--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> |
|
| Back to top |
|
 |
David Hopwood
Guest
|
Posted:
Tue Sep 06, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
Joe Seigh wrote:
| Quote: | David Hopwood wrote:
Joe Seigh wrote:
I'm not sure what you're saying here. That all future processors
from Intel that don't have processor ordering won't be x86?
Well, they won't be x86-as-we-know-it. OSes, compilers, etc. will
have to be changed to run on or generate code for this new x86-like
thing, and changes in the memory model will probably be only one issue
they need to deal with.
And that the synchronization intructions in these future processors
won't be similar to the one's in x86? That Intel is telling people
in an x86 manual to start writing portable code not now but when
they get to the future processor?
Of course not. Read what they actually wrote.
I did. It sounded to me like they said if you want to write
portable code, don't assume processor ordering but use the
locking and serializing instructions instead on the current
processors.
|
But OSes, thread libraries and language implementations *aren't* portable
code.
--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Tue Sep 06, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
David Hopwood wrote:
| Quote: | Joe Seigh wrote:
David Hopwood wrote:
Of course not. Read what they actually wrote.
I did. It sounded to me like they said if you want to write
portable code, don't assume processor ordering but use the
locking and serializing instructions instead on the current
processors.
But OSes, thread libraries and language implementations *aren't* portable
code.
|
I do not think that word means what you think it means.
Note that I am an ex-kernel developer and have created enough sychronization
api's that run on totally different platforms. I've created an atomically
threadsafe reference counted smart pointer that has two totally different
implmentations on two different architectures. Given that Sun Microsystems'
research division couldn't manage to do this and could only do it is on a
obsolete architecture, I'd say I have a pretty good idea what portability is
and what its issues are.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Tue Sep 06, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
Alexander Terekhov wrote:
| Quote: | Andy Glew wrote:
You are confusing implementation with semantics.
Fix the specs, then.
|
I think you can assume that the serializing stuff does the right thing.
If not and you have strong reason to believe otherwise, then you should
short Intel stock as you'd stand a pretty good chance of making a fortune.
Basically, no OS would work correctly on an Intel based multi-processor
server and Intel would be out of that business. Also Intel would be
screwed in the multi-core workstation and desktop market as it would be
too late to fix the current processors going into production.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Tue Sep 06, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
David Hopwood wrote:
| Quote: | Joe Seigh wrote:
David Hopwood wrote:
But OSes, thread libraries and language implementations *aren't*
portable
code.
I do not think that word means what you think it means.
Note that I am an ex-kernel developer and have created enough
sychronization api's that run on totally different platforms.
You are totally missing the point. OSes, thread libraries and language
implementations have some code that needs to be adapted to each hardware
architecture. If the memory model were to change in future processors
that are otherwise x86-like, this code would have to change. It's not a
big deal, because this platform-specific code is maintained by people who
know how to change it, and because there are few enough OSes, thread
libraries, and language implementations for the total effort involved
not to be very great. It would, however, be a big deal if existing x86
*applications* stopped working on an otherwise x86-compatible processor.
|
I am talking about that. You insist on maintaining that I advocate
applications hardcode platform specific assembly code into their
source. I never have advocated that.
But when you design these api's you have to have a pretty good idea
what kinds of things an be ported and what assumptions you are making
about the memory model. Since I've actually done this kind of stuff
I probably have a much better idea than you have what the actual issues
are.
And yes, there isn't any assumption about the memory model that can't
be broken by a hardware designer. The only thing that keeps hardware
companies from breaking widely used api's like Posix pthreads is they
might go out of business if they did. Hence, shorting Intel stock
might be a good idea if you believe they did do that. But saying
that we should only use widespread api's and not ever create any
new ones is ridiculous.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Eric P.
Guest
|
Posted:
Tue Sep 06, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
Alexander Terekhov wrote:
| Quote: |
My reading of the specs is that MFENCE is guaranteed to provide
store-load barrier.
P1: X = 1; R1 = Y;
P2: Y = 1; R2 = X;
(R1, R2) = (0, 0) is allowed under pure PC, but
P1: X = 1; MFENCE; R1 = Y;
P2: Y = 1; MFENCE; R2 = X;
(R1, R2) = (0, 0) is NOT allowed.
|
Are you sure you are not being inconsistent in example 2 here?
(wrt what you answered yesterday about S/LFENCE).
If MFENCE is just an SFENCE+LFENCE, and neither of those guarantee
delivery or receipt of invalidates, then P1 can have a stale Y
and P2 a stale X. The MFENCE does nothing but prevent bypassing.
Eric |
|
| Back to top |
|
 |
Eric P.
Guest
|
Posted:
Tue Sep 06, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
"Eric P." wrote:
| Quote: |
Alexander Terekhov wrote:
My reading of the specs is that MFENCE is guaranteed to provide
store-load barrier.
P1: X = 1; R1 = Y;
P2: Y = 1; R2 = X;
(R1, R2) = (0, 0) is allowed under pure PC, but
P1: X = 1; MFENCE; R1 = Y;
P2: Y = 1; MFENCE; R2 = X;
(R1, R2) = (0, 0) is NOT allowed.
Are you sure you are not being inconsistent in example 2 here?
(wrt what you answered yesterday about S/LFENCE).
If MFENCE is just an SFENCE+LFENCE, and neither of those guarantee
delivery or receipt of invalidates, then P1 can have a stale Y
and P2 a stale X. The MFENCE does nothing but prevent bypassing.
Eric
|
Forget it, I see. With two processors Y can be stale on P1,
or X stale on P2, but not both.
Eric |
|
| Back to top |
|
 |
Alexander Terekhov
Guest
|
Posted:
Tue Sep 06, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
"Eric P." wrote:
| Quote: |
Alexander Terekhov wrote:
My reading of the specs is that MFENCE is guaranteed to provide
store-load barrier.
P1: X = 1; R1 = Y;
P2: Y = 1; R2 = X;
(R1, R2) = (0, 0) is allowed under pure PC, but
P1: X = 1; MFENCE; R1 = Y;
P2: Y = 1; MFENCE; R2 = X;
(R1, R2) = (0, 0) is NOT allowed.
Are you sure you are not being inconsistent in example 2 here?
(wrt what you answered yesterday about S/LFENCE).
|
PC implies both LFENCE and SFENCE ordering constraints. I don't
think that you've got invalidations stuff entirely accurate, but
the basic logic is correct.
| Quote: |
If MFENCE is just an SFENCE+LFENCE,
|
No.
SFENCE is store-store barrier and LFENCE is load-load barrier.
store-store + load-load != store-load.
MFENCE ensures that preceding writes are made globally visible
before subsequent reads are performed (store-load barrier)...
plus it imposes all other PC ordering constraints (load-load +
load-store + store-store).
regards,
alexander. |
|
| Back to top |
|
 |
Alexander Terekhov
Guest
|
Posted:
Wed Sep 14, 2005 8:15 am Post subject:
Re: Intel x86 memory model question |
|
|
Hey Mr. andy.glew@intel.com,
you better fix the specs, really. It's not funny anymore.
http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/default.aspx
"When multiprocessor systems based on the x86 architecture were being
designed, the designers needed a memory model that would make most
programs just work, while still allowing the hardware to be reasonably
efficient. The resulting specification requires writes from a
single processor to remain in order with respect to other writes, but
does not constrain reads at all.
Unfortunately, a guarantee about write order means nothing if reads
are unconstrained. After all, it does not matter that A is written
before B if every reader reading B followed by A has reads reordered
so that the pre-update value of B and the post-update value of A is
seen. The end result is the same: write order seems reversed. Thus,
as specified, the x86 model does not provide any stronger guarantees
than the ECMA model.
It is my belief, however, that the x86 processor actually implements
a slightly different memory model than is documented. While this model
has never failed to correctly predict behavior in my experiments, and
it is consistent with what is publicly known about how the hardware
works, it is not in the official specification. New processors might
break it."
regards,
alexander. |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Wed Sep 14, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
Alexander Terekhov wrote:
that Intel's technical writers aren't entirely sure who their audience
actually is and mix up the specification, which is of interest to programmers,
and the implementation, which is of interest to engineers. Andy's last
comment, which appeared to me to be about implementation, certainly didn't
help.
It also doesn't help that Intel has a tradition of not architecting multi-processing
support and do it on the fly as Intel adds in multi-processing support, in clear
contrast to how other companies have documented multi-processing support in their
architectures. You had companies building Intel based multi-processors before Intel
even supported multi-processing, which meant the memory model they implemented may
or may not have matched what Intel later documented as the official memory model.
This is apparently now a tradition and there's a comment to this effect in the Intel
documentation.
"Also, software should not depend on processor ordering in situations where
the system hardware does not support this memory-ordering model."
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
jmbw
Joined: 17 Sep 2005
Posts: 1
Location: Monson, MA, USA
|
Posted:
Sat Sep 17, 2005 5:14 pm Post subject:
Re: Intel x86 memory model question |
|
|
| David Hopwood wrote: |
It would, however, be a big deal if existing x86 *applications* stopped working on an otherwise x86-compatible processor.
|
Uh, bad news! I just spent a very unpleasant couple of weeks debugging a multi-threaded all-assembly-language application (no libraries) which has worked for *years* on SMP P2/P3/P4 boxes under Linux. Then everything blew up on a customer's new dual P4 Xeon box, even though it worked fine on my own older dual P4 Xeon box (and we've tried matching kernel versions). Obviously the application has always been meticulous about using LOCKed instructions on shared data, and on using locks (spinlocks, sort of) to cover multi-word shared data, but the new behavior is that non-LOCKed reads and writes are apparently totally out of order, so stuff that I work on while holding a lock doesn't necessarily get changed until after I've released the lock (and someone else thinks they own it). So, total chaos with that until I added CPUIDs to my acquire-lock and release-lock routines. And, any shared variable written with a MOV (there's no LOCKed MOV since there's no need for one with aligned accesses, so this has always been safe) takes effect whenever it feels like it. So now I'm having to change all those shared MOVs to XCHGes, and do CPUID or MFENCE or LOCK OR foo,0 before doing read-only accesses to shared variables, just to get it working again. This is all ring-3 application code, so things have definitely broken in real life.
| Quote: |
But OSes, thread libraries and language implementations *aren't* portable code.
|
Huh? Of course they are, that's the only reason Intel is in business! That's why I love being a PC assembly programmer, the bare metal has had 24 years of excellent backwards compatibility, while there's been total chaos with the OS and HLL fads that have come and gone (FOUR different OS generations from Microsoft alone?). Sure each processor generation adds new features, and better ways to do the old stuff (invalidating PTEs individually on the 486 was nice, and VME was awesome) and there's more state to save/restore on context switches if you're using the new regs, but if you program the new chip to be just a faster version of the old chip, it always works. Until now... |
|
| Back to top |
|
 |
Alexander Terekhov
Guest
|
Posted:
Sat Oct 22, 2005 4:15 pm Post subject:
Re: Intel x86 memory model question |
|
|
Joe Seigh wrote: ...
http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-October/000728.html
<quote>
Enough people from Intel who can speak authoritatively about this
for me to confidently believe it have said (a) "locked" instructions
and mfence DO have global ordering properties on current and
near-future x86s (b) Intel now realizes that this should have been
documented and will try to ensure that it is (c) Intel does not want
to promise that this will hold forever, and might be interested in
engaging with different language-level standards groups to see if
there is a way to weaken total SC-ness of lock/volatile/atomic specs
to avoid multiple observer ordering agreement requirements that do not
impact practical programs.
</quote>
regards,
alexander. |
|
| Back to top |
|
 |
|
|
|
|