Intel x86 memory model question
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Intel x86 memory model question
Goto page 1, 2, 3, 4, 5  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Joe Seigh
Guest





Posted: Tue Aug 30, 2005 4:15 pm    Post subject: Intel x86 memory model question Reply with quote

The question isn't what is the x86 memory model. If you
want to discuss that, you are welcome to join the fray on
c.p.t. The question is why can't or why doesn't Intel
want to document the x86 memory model since apparently
what is in the System Programming Guide is *not* the
memory model. I.e. not as far as program observable
behavior is concerned though it may be if you have
tracing scopes attached to the memory bus.

Is this some kind of Intel State Secret? Is writing
correct multi-threaded programs not in Intel's interest?

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Guest






Posted: Wed Aug 31, 2005 12:15 am    Post subject: Re: Intel x86 memory model question Reply with quote

I didn't find it in the Intel book I have (Pentium Pro)

But chapter 7 in Volume 2 of AMD x86-64 Architecture Programmer's
Manual (System Programming) describes AMD's side of the situation,
starting on page 191 of the Purple Volume.

The problem is when you consider the number of memory modes {UC, CD,
WC, WP, WT and WB} that no simplistic statement can fully address what
the programmer can assume about memory and its ordering properties.
WriteBack (cacheable) memory is, however, Processor Consistent.
Back to top
Guest






Posted: Wed Aug 31, 2005 12:15 am    Post subject: Re: Intel x86 memory model question Reply with quote

Joe Seigh wrote:
Quote:
The question isn't what is the x86 memory model. If you
want to discuss that, you are welcome to join the fray on
c.p.t. The question is why can't or why doesn't Intel
want to document the x86 memory model since apparently
what is in the System Programming Guide is *not* the
memory model. I.e. not as far as program observable
behavior is concerned though it may be if you have
tracing scopes attached to the memory bus.


I don't understand what's particularly wrong with paragraph 7.2.2
ftp://download.intel.com/design/Pentium4/manuals/25366816.pdf
Could you be a bit more specific.

Quote:
Is this some kind of Intel State Secret? Is writing
correct multi-threaded programs not in Intel's interest?


Obviously, writing correct multi-threaded SMP programs is in Intel's
interest. However, according to my understanding, Intel couldn't care
less about _lockless_ multi-threaded SMP programs. The reasons are
clear:
1. That's such a tiny niche!
2. Average programmer can't do it correctly regardless of the quality
of documentation.

Quote:
--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Joe Seigh
Guest





Posted: Wed Aug 31, 2005 12:15 am    Post subject: Re: Intel x86 memory model question Reply with quote

Joe Seigh wrote:
Quote:

processor 1 stores into X
processor 2 see the store by 1 into X and stores into Y

So the store into Y occurred after causal reasoning.

processor 3 loads from Y
processor 3 loads from X

If loads were in order you could infer that if processor 3
sees the new value of Y then it will see the new value of X.
But the rules for processor consistency *clearly* state that
you will necessarily see stores by different processors in
order.
that should be


But the rules for processor consistency *clearly* state that
you will not necessarily see stores by different processors in
order.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Joe Seigh
Guest





Posted: Wed Aug 31, 2005 12:15 am    Post subject: Re: Intel x86 memory model question Reply with quote

MitchAlsup@aol.com wrote:
Quote:
I didn't find it in the Intel book I have (Pentium Pro)

But chapter 7 in Volume 2 of AMD x86-64 Architecture Programmer's
Manual (System Programming) describes AMD's side of the situation,
starting on page 191 of the Purple Volume.

The problem is when you consider the number of memory modes {UC, CD,
WC, WP, WT and WB} that no simplistic statement can fully address what
the programmer can assume about memory and its ordering properties.
WriteBack (cacheable) memory is, however, Processor Consistent.


The argument being presented in c.p.t. is that processor consistency
implies loads are in order, perhaps instigated by something Andy Glew
said about this here
http://groups.google.com/group/comp.arch/msg/96ec4a9fb75389a2

AFAICT, this is not true for 3 or more processors. E.g.

processor 1 stores into X
processor 2 see the store by 1 into X and stores into Y

So the store into Y occurred after causal reasoning.

processor 3 loads from Y
processor 3 loads from X

If loads were in order you could infer that if processor 3
sees the new value of Y then it will see the new value of X.
But the rules for processor consistency *clearly* state that
you will necessarily see stores by different processors in
order.

While there are still ordering constraints on the loads they
don't have to be strictly in order as Andy incorrectly infers.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Joe Seigh
Guest





Posted: Wed Aug 31, 2005 12:15 am    Post subject: Re: Intel x86 memory model question Reply with quote

already5chosen@yahoo.com wrote:
Quote:
Joe Seigh wrote:

The question isn't what is the x86 memory model. If you
want to discuss that, you are welcome to join the fray on
c.p.t. The question is why can't or why doesn't Intel
want to document the x86 memory model since apparently
what is in the System Programming Guide is *not* the
memory model. I.e. not as far as program observable
behavior is concerned though it may be if you have
tracing scopes attached to the memory bus.



I don't understand what's particularly wrong with paragraph 7.2.2
ftp://download.intel.com/design/Pentium4/manuals/25366816.pdf
Could you be a bit more specific.

Some people are interpreting processor consistency as implying
reads are in order and the statment
1. Reads can be carried out speculatively and in any order.
only applying to speculative reads (commit criteria being
in order at time of commit).
Quote:


Is this some kind of Intel State Secret? Is writing
correct multi-threaded programs not in Intel's interest?



Obviously, writing correct multi-threaded SMP programs is in Intel's
interest. However, according to my understanding, Intel couldn't care
less about _lockless_ multi-threaded SMP programs. The reasons are
clear:
1. That's such a tiny niche!
2. Average programmer can't do it correctly regardless of the quality
of documentation.


You package as part of a (hopefully) easy to use api such as a
synchronized queue (which can use locks or be lock-free in the
implementation).


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Eric P.
Guest





Posted: Wed Aug 31, 2005 4:15 pm    Post subject: Re: Intel x86 memory model question Reply with quote

Joe Seigh wrote:
Quote:

Joe Seigh wrote:

processor 1 stores into X
processor 2 see the store by 1 into X and stores into Y

So the store into Y occurred after causal reasoning.

processor 3 loads from Y
processor 3 loads from X

If loads were in order you could infer that if processor 3
sees the new value of Y then it will see the new value of X.
But the rules for processor consistency *clearly* state that
you will necessarily see stores by different processors in
order.
that should be

But the rules for processor consistency *clearly* state that
you will not necessarily see stores by different processors in
order.

I see what you are getting at, but for this to occur the new value
of Y would have to arrive at P3 before the new value of X from P1,
implying the msg from P2 to P3 somehow passed the msg from P1 to P3.
This would mean that no update order at all could be concluded
and the whole system would break.

Since they clearly do function, this is obviously not how they work :-)

Eric
Back to top
Seongbae Park
Guest





Posted: Wed Aug 31, 2005 4:46 pm    Post subject: Re: Intel x86 memory model question Reply with quote

Joe Seigh <jseigh_01@xemaps.com> wrote:
....
Quote:
It turns out the x86 memory model is defined, it's just not defined in the
IA-32 manuals which is where you would expect it to be defined. It's defined
in the Itanium manuals and is equivalent to Sparc TSO memory model.

2.1.2 Loads and Stores
In the Itanium architecture, a load instruction has either unordered or acquire semantics while a
store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and
release stores (st.rel), the memory reference stream of an Itanium-based program can be made to
operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to
provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an
Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.

I suspect the above paragraph is stronger than what it really wanted to say.
It seems that the intention was to say
that Itanium can correctly emulate x86 by running effectively in a TSO mode,
since x86's memory model is not stronger than TSO.

On http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx:
Quote:
the memory model for X86 can be described as:
1. All stores are actually store.release.
2. All loads are normal loads.
3. Any use of the LOCK prefix (e.g. ?LOCK CMPXCHG? or ?LOCK INC?) creates a full fence.
--

#pragma ident "Seongbae Park, compiler, http://blogs.sun.com/seongbae/"
Back to top
Joe Seigh
Guest





Posted: Wed Aug 31, 2005 9:29 pm    Post subject: Re: Intel x86 memory model question Reply with quote

Eric P. wrote:
Quote:
Joe Seigh wrote:

Joe Seigh wrote:

processor 1 stores into X
processor 2 see the store by 1 into X and stores into Y

So the store into Y occurred after causal reasoning.

processor 3 loads from Y
processor 3 loads from X

If loads were in order you could infer that if processor 3
sees the new value of Y then it will see the new value of X.
But the rules for processor consistency *clearly* state that
you will necessarily see stores by different processors in
order.

that should be

But the rules for processor consistency *clearly* state that
you will not necessarily see stores by different processors in
order.


I see what you are getting at, but for this to occur the new value
of Y would have to arrive at P3 before the new value of X from P1,
implying the msg from P2 to P3 somehow passed the msg from P1 to P3.
This would mean that no update order at all could be concluded
and the whole system would break.

Since they clearly do function, this is obviously not how they work :-)


It turns out the x86 memory model is defined, it's just not defined in the
IA-32 manuals which is where you would expect it to be defined. It's defined
in the Itanium manuals and is equivalent to Sparc TSO memory model.

2.1.2 Loads and Stores
In the Itanium architecture, a load instruction has either unordered or acquire semantics while a
store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and
release stores (st.rel), the memory reference stream of an Itanium-based program can be made to
operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to
provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an
Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Seongbae Park
Guest





Posted: Wed Aug 31, 2005 9:57 pm    Post subject: Re: Intel x86 memory model question Reply with quote

Seongbae Park <Seongbae.Park@Sun.COM> wrote:
Quote:
Joe Seigh <jseigh_01@xemaps.com> wrote:
...
It turns out the x86 memory model is defined, it's just not defined in the
IA-32 manuals which is where you would expect it to be defined. It's defined
in the Itanium manuals and is equivalent to Sparc TSO memory model.

2.1.2 Loads and Stores
In the Itanium architecture, a load instruction has either unordered or acquire semantics while a
store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and
release stores (st.rel), the memory reference stream of an Itanium-based program can be made to
operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to
provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an
Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.

I suspect the above paragraph is stronger than what it really wanted to say.
It seems that the intention was to say
that Itanium can correctly emulate x86 by running effectively in a TSO mode,
since x86's memory model is not stronger than TSO.

I take this back.
Actually the above statement depends on whether IA64 is RCsc or RCpc.
If it is RCpc, then by definition all special accesses are PC in RCpc,
and turning every accesses special accesses just turns in into PC.
If it is RCsc, then it is not really a TSO but SC which is stronger than PC
and hence can run the program correctly.

I didn't bother to look at IA64 manual - anybody care to comment on this ?
but I suspect that IA64 is RCpc and the manual is exactly correct after all.
--
#pragma ident "Seongbae Park, compiler, http://blogs.sun.com/seongbae/"
Back to top
Eric P.
Guest





Posted: Thu Sep 01, 2005 12:15 am    Post subject: Re: Intel x86 memory model question Reply with quote

Joe Seigh wrote:
Quote:

Eric P. wrote:
Joe Seigh wrote:

Joe Seigh wrote:

processor 1 stores into X
processor 2 see the store by 1 into X and stores into Y

So the store into Y occurred after causal reasoning.

processor 3 loads from Y
processor 3 loads from X

If loads were in order you could infer that if processor 3
sees the new value of Y then it will see the new value of X.
But the rules for processor consistency *clearly* state that
you will necessarily see stores by different processors in
order.

that should be

But the rules for processor consistency *clearly* state that
you will not necessarily see stores by different processors in
order.


I see what you are getting at, but for this to occur the new value
of Y would have to arrive at P3 before the new value of X from P1,
implying the msg from P2 to P3 somehow passed the msg from P1 to P3.
This would mean that no update order at all could be concluded
and the whole system would break.

Since they clearly do function, this is obviously not how they work :-)


It turns out the x86 memory model is defined, it's just not defined in the
IA-32 manuals which is where you would expect it to be defined. It's defined
in the Itanium manuals and is equivalent to Sparc TSO memory model.

2.1.2 Loads and Stores
In the Itanium architecture, a load instruction has either unordered or acquire semantics while a
store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and
release stores (st.rel), the memory reference stream of an Itanium-based program can be made to
operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to
provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an
Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.

I think the underlying question you asked about the x86 is:

Does the Intel Processor Consistency model require processors
to wait for all other processors to acknowledge receipt of their
invalidates before any are allowed to use the new value?

The section 7.2.2 memory ordering info does not define an answer.

This would likely depend on the bus protocol details.
It might be implemented by having P1 send an invalidate X to P2
and not reply to a request from P2 for a read of the new value of
X until it had received an the invalidate acknowledgment from P3.

I haven't paid any attention to the I64 acquire release mechanism
as I figure I'll never run into it, so I'm not sure if that is
the same as a release.

Eric
Back to top
Joe Seigh
Guest





Posted: Thu Sep 01, 2005 12:15 am    Post subject: Re: Intel x86 memory model question Reply with quote

Seongbae Park wrote:
Quote:
Joe Seigh <jseigh_01@xemaps.com> wrote:
...

It turns out the x86 memory model is defined, it's just not defined in the
IA-32 manuals which is where you would expect it to be defined. It's defined
in the Itanium manuals and is equivalent to Sparc TSO memory model.

2.1.2 Loads and Stores
In the Itanium architecture, a load instruction has either unordered or acquire semantics while a
store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and
release stores (st.rel), the memory reference stream of an Itanium-based program can be made to
operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to
provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an
Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.


I suspect the above paragraph is stronger than what it really wanted to say.
It seems that the intention was to say
that Itanium can correctly emulate x86 by running effectively in a TSO mode,
since x86's memory model is not stronger than TSO.


Hmm, that's possible. If you take IA-32's loads as being unordered they're not
entirely unordered due to the processor consistency model. It's likely that
nobody uses processor consistency as a programming memory model but since Intel
specified it as part of the memory model they have to adhere to it for compatibility
reasons. Is this the reason Itanium runs so slow in IA-32 mode? Because it has
to use ld.acq instead of ld for IA-32 loads? All because they used a memory
model that was more convenient for hardware architects than for programmers?

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Ricardo Bugalho
Guest





Posted: Thu Sep 01, 2005 3:22 pm    Post subject: Re: Intel x86 memory model question Reply with quote

On Wed, 31 Aug 2005 21:57:58 +0000, Seongbae Park wrote:

Quote:
I didn't bother to look at IA64 manual - anybody care to comment on this ?
but I suspect that IA64 is RCpc and the manual is exactly correct after
all.

It's RCpc indeed.
Back to top
Ricardo Bugalho
Guest





Posted: Thu Sep 01, 2005 3:36 pm    Post subject: Re: Intel x86 memory model question Reply with quote

On Wed, 31 Aug 2005 18:02:34 -0400, Eric P. wrote:


Quote:

I think the underlying question you asked about the x86 is:

Does the Intel Processor Consistency model require processors to wait
for all other processors to acknowledge receipt of their invalidates
before any are allowed to use the new value?


It does not.
The most straightforward example is buffered store forwarding: when a CPU
writes a value into memory, it can read it again directly from the store
buffer, even before it tries to make it visible to other processors.
Back to top
Alexander Terekhov
Guest





Posted: Thu Sep 01, 2005 4:15 pm    Post subject: Re: Intel x86 memory model question Reply with quote

Ricardo Bugalho wrote:
Quote:

On Wed, 31 Aug 2005 21:57:58 +0000, Seongbae Park wrote:

I didn't bother to look at IA64 manual - anybody care to comment on this ?
but I suspect that IA64 is RCpc and the manual is exactly correct after
all.

It's RCpc indeed.

Not quite. Release stores to *WB* memory are constrained to ensure
"remote write atomicity". Classic RCpc is weaker in this respect
(and that's what makes RC != TSO). You better not rely on this
property because emulating it on CELLs (for example) will make your
ports run really slow. ;-)

regards,
alexander.
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page 1, 2, 3, 4, 5  Next
Page 1 of 5

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB