| Author |
Message |
Eric P.
Guest
|
Posted:
Thu Jan 27, 2005 11:21 pm Post subject:
Re: CAS and LL/SC |
|
|
Stephen Fuld wrote:
| Quote: |
"Eric P." <eric_pattison@sympaticoREMOVE.ca> wrote in message
news:41F666D4.8CDCE446@sympaticoREMOVE.ca...
Nick Maclaren wrote:
In article <YxbJd.20816$8u5.17772@bgtnsc04-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
"Andi Kleen" <freitag@alancoxonachip.com> wrote in message
news:m3brbfou1q.fsf@averell.firstfloor.org...
To bring it back on topic to comp.arch: morale is to never add
silly address space limits to registers that cause such problems
later.
Yes. Or, in a slightly different formulation, don't use memory mapped
I/O
at all (at least not for general purpose processors where the stringent
requreiments of some embedded systems don't apply).
Amen.
Rubbish. This has nothing to do with memory mapped IO.
How about: Don't try to address more than 4GB on a 32 bit machine.
Or even better: Use 36 bits addresses and an IOMMU to map PCI/32
into the larger address space. OS support has be available for
over a decade.
Wouldn't this run into the same kinds of problems when the CPU went to a 64
bit address space, as X-86 is now doing?
|
It shouldn't. But then again, it doesn't need to have problems now
either though I think can understand why x86 does.
The x64 has a 40 bit physical space and 64 (48) bit virtual. From
the x64's point of view the 32 bit PCI bus would occupy at most a
small 32 bit sub region within that physical space. The x64 virtual
address is large enough to directly access any location in both
memory and PCI space. There could even be multiple PCI spaces.
The IOMMU allows a 32 bit PCI device to DMA to all of the 40 bit
physical space. A 64 bit PCI device can access all of memory directly,
though may still benefit from using scatter gather in the IOMMU.
The x86 has 36 bit physical space so the PCI bus could reside
in that physical space. But it only has 32 virtual address so
it cannot directly access both 4GB memory and PCI space.
The x86 system could have located the PCI space up high in
its 36 bit physical space and used the page tables to relocate
it down into the 32 bit virtual space. But that would mean the PCI
space could only be accessed when virtual addressing was enabled,
which at a minimum complicates BIOS boot device enumeration.
That would also probably have confused some OS's (e.g. Win98).
That could be compensated for but would require OS changes.
So the path of least resistance was to plop the PCI space down
onto the top 500 MB of memory.
Eric |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Thu Jan 27, 2005 11:39 pm Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Andrew Reilly wrote:
| Quote: | On Thu, 27 Jan 2005 01:45:06 +0100, Terje Mathisen wrote:
Since this setup could be done once (as long as the IO window addr
stayed put (?), I'd assume you'd normally generate the needed pointers
during driver init, and then simply do regular (membar protected and
preshifted) accesses to them?
Probably not if you were using portable driver code, as you would be in
Linux/*BSD. Might be able to get away with it on Ultrix, but I imaginge
that even VMS drivers wanted to remain portable to equivalent Vax boxes.
|
I'd use runtime caching of the required pointers, not compile them in!
| Quote: |
I don't imagine that this sort of thing would be a performance bottleneck,
anyway. Most of the heavy IO lifting is done with DMA these days, rather
than banging on device registers.
|
Well, yeah, but that's a slippery slope you know: "These machines are so
fast that it doesn't matter if I use an interpreted bubble sort on a
potentially huge array, I can just tell my customers to buy faster &more
boxes, leading to even more licence income for me!
:-)
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Andrew Reilly
Guest
|
Posted:
Fri Jan 28, 2005 4:42 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
On Thu, 27 Jan 2005 20:39:50 +0100, Terje Mathisen wrote:
| Quote: | Andrew Reilly wrote:
On Thu, 27 Jan 2005 01:45:06 +0100, Terje Mathisen wrote:
Since this setup could be done once (as long as the IO window addr
stayed put (?), I'd assume you'd normally generate the needed pointers
during driver init, and then simply do regular (membar protected and
preshifted) accesses to them?
Probably not if you were using portable driver code, as you would be in
Linux/*BSD. Might be able to get away with it on Ultrix, but I imaginge
that even VMS drivers wanted to remain portable to equivalent Vax boxes.
I'd use runtime caching of the required pointers, not compile them in!
|
I don't know about Linux, but the BSDs pretty universally use a framework
of macros and subroutines to abstract the bus architectures (and DMA
mechansims and restrictions) away from device drivers, so that
device drivers can be portable across both processor and bus
architectures, which is particularly important now that "busses" are
tunnelled through trees of controller chips and cables. I just don't
think that there's room in such a framework for caching shortcuts. Maybe
there is. I haven't been that deep into it, myself.
| Quote: | I don't imagine that this sort of thing would be a performance
bottleneck, anyway. Most of the heavy IO lifting is done with DMA
these days, rather than banging on device registers.
Well, yeah, but that's a slippery slope you know: "These machines are so
fast that it doesn't matter if I use an interpreted bubble sort on a
potentially huge array, I can just tell my customers to buy faster &more
boxes, leading to even more licence income for me!
:-)
|
I know you're smily-ing there, but just by way of comparison, what IS the
wall-time latency for a read across a 33MHz PCI-2 connector (for example)
compared to the execution time of something like the suggested macro, on
a modern OOO processor, like a 21264?
I was always amused by how many potential instruction slots would
evaporate every time I did a byte read across an ISA bus (1us minimum...)
I'm pretty sure that unavoidable read latencies are why peripherals do
bulk data transfers with DMA when they can...
Cheers,
--
Andrew |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Sat Jan 29, 2005 1:55 am Post subject:
Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM |
|
|
Andrew Reilly wrote:
| Quote: | I know you're smily-ing there, but just by way of comparison, what IS the
wall-time latency for a read across a 33MHz PCI-2 connector (for example)
compared to the execution time of something like the suggested macro, on
a modern OOO processor, like a 21264?
|
As long as you can start the next read before the hw has started waiting
for it, the time lost is of course zero.
| Quote: |
I was always amused by how many potential instruction slots would
evaporate every time I did a byte read across an ISA bus (1us minimum...)
|
Even worse, I can still remember the time when I had to code hw accesses
with spin loops or other delays (like a few NOPs) between operations,
simply because the relevant (broken-by-design!) IO card had a minimum
recovery time between back-to-back operations, and no way to signal WAIT
to the cpu when it tried to run faster than this.
I.e. either you waited long enough, or your driver broke.
Of course, when/if you verified said driver by making a slow mockup with
debug function calls for all port IO operations, it all 'just worked'
due to the call/return time delays.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Guest
|
Posted:
Tue Feb 01, 2005 12:01 am Post subject:
Alpha IO accesses, Was: CAS and LL/SC |
|
|
"FredK" <fred.nospam@nospam.dec.com> writes:
| Quote: | No. The problems overlap, but are not identical. To provide a way
to access IO space, the platform groups invented a "sparse" address
space in which some of the low-order address bits of the VA would be
used to create the length and offset of the partial word read or
write - *and* the data itself would need to be shifted into the
correct alignment for the operation - this was termed "swizzling".
This requirement was such that device drivers needed to be
rototilled - especially those that did not use OS supplied access
routines (such as graphics drivers) and exxplicitly read/wrote to a
device directly. This was a business problem in the NT space.
|
For Fred,
Why not use the 7000 IOP design to drive what ever bus you care to
use? Package it all in a bus interface chip, define a common
CPU side interface and go? This should have been possible with the
164s and on. Make the SW a lot less twisted as well...
--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be. |
|
| Back to top |
|
 |
FredK
Guest
|
Posted:
Tue Feb 01, 2005 8:54 pm Post subject:
Re: Alpha IO accesses, Was: CAS and LL/SC |
|
|
<prep@prep.synonet.com> wrote in message
news:877jltbeyi.fsf_-_@prep.synonet.com...
| Quote: | "FredK" <fred.nospam@nospam.dec.com> writes:
No. The problems overlap, but are not identical. To provide a way
to access IO space, the platform groups invented a "sparse" address
space in which some of the low-order address bits of the VA would be
used to create the length and offset of the partial word read or
write - *and* the data itself would need to be shifted into the
correct alignment for the operation - this was termed "swizzling".
This requirement was such that device drivers needed to be
rototilled - especially those that did not use OS supplied access
routines (such as graphics drivers) and exxplicitly read/wrote to a
device directly. This was a business problem in the NT space.
For Fred,
Why not use the 7000 IOP design to drive what ever bus you care to
use? Package it all in a bus interface chip, define a common
CPU side interface and go? This should have been possible with the
164s and on. Make the SW a lot less twisted as well...
|
Paul,
I think you dropped into the thread late. Ultimately the issue was that the
systems being built were using a "common" PC-like bus structure - starting
with the original AlphaPC - to make them price competetive. The platform
designers had a CPU architecture that precluded simple linear access for
byte/word, and they hacked a solution.
While this platform may have been an attractive NT system (if
there had not been other fatal flaws in how it was priced/marketed) - it
caused a problem in that x86 derived drivers would need to be heavily
modified regardless of the IO interface. The push to linear byte/word
addressing in the CPU architecture would solve at least one of the problems
facing NT drivers (but not wholly solve it, since write ordering problems
still remained). But I think the belief was that adding proper write
barriers was a much more manageable task than getting NT drivers
writers to make major changes needed for sparse access, or a unique
IOP interface.
Then you add atomicity issues, code bloat for simple string handling,
poorly aligned structures and badly written code... and the realization
by the CPU architects that byte/word access wasn't really as bad or
as hard as they thought. |
|
| Back to top |
|
 |
|
|
|
|