CAS and LL/SC (was Re: High Level Assembler for MVS & VM & V
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
CAS and LL/SC (was Re: High Level Assembler for MVS & VM & V
Goto page Previous  1, 2, 3 ... 23, 24, 25, 26  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Stephen Fuld
Guest





Posted: Wed Jan 26, 2005 1:54 am    Post subject: Re: CAS and LL/SC Reply with quote

"Eric P." <eric_pattison@sympaticoREMOVE.ca> wrote in message
news:41F666D4.8CDCE446@sympaticoREMOVE.ca...
Quote:
Nick Maclaren wrote:

In article <YxbJd.20816$8u5.17772@bgtnsc04-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
"Andi Kleen" <freitag@alancoxonachip.com> wrote in message
news:m3brbfou1q.fsf@averell.firstfloor.org...

To bring it back on topic to comp.arch: morale is to never add
silly address space limits to registers that cause such problems
later.

Yes. Or, in a slightly different formulation, don't use memory mapped
I/O
at all (at least not for general purpose processors where the stringent
requreiments of some embedded systems don't apply).

Amen.

Rubbish. This has nothing to do with memory mapped IO.
How about: Don't try to address more than 4GB on a 32 bit machine.
Or even better: Use 36 bits addresses and an IOMMU to map PCI/32
into the larger address space. OS support has be available for
over a decade.

Wouldn't this run into the same kinds of problems when the CPU went to a 64
bit address space, as X-86 is now doing?

Quote:
Memory mapped IO works just fine and has proven over many decades
to be the most flexible and functional while being simplest
approach, when the system design is not botched.

For some definitions of "works just fine", etc. Besides, with the relativly
increased latency to main memory, the number of instructions needed to
"cover" the loads outstanding for I/O gets longer, which puts a strain on
other CPU resources. Also, the many upgrades to PCI, and the different
semantics when moving from ISA to PCI, etc. are symptoms of misdesigns.

--
- Stephen Fuld
e-mail address disguised to prevent spam
Back to top
Eric P.
Guest





Posted: Wed Jan 26, 2005 1:57 am    Post subject: Re: CAS and LL/SC Reply with quote

Bernd Paysan wrote:
Quote:

Eric P. wrote:
An IO Processor (IOP) still needs coherent access to main memory/cache.
The IOP itself does not need instruction and local data cache.
But if it is already present, as it would be in an SMP system,
then there is no harm in using it.

The IOP needs the same sort of access to main memory/cache as a DMA unit.

Which is coherent I hope, so we agree.

Quote:
Now tell me there's no DMA unit in the chipset ;-).

I'm not sure what you mean. Each device has its own DMA unit if it
needs one.

If there is an IOP, then I assume you wanted to use it to program
the device DMA unit. The device driver in the main cpu contructs a
command packet and hands it to IOP, probably stuffs it in a double
linked list and sets an attention flag. A single linked list from
which the pkt ptr would then be copied in and the list order
reversed would also work and avoids the spinlocks.

The IOP, running a minature quasi real time OS, polls the attention
flag, sees the new pkt, parses pkt and runs the same driver code that
it would have run on the main cpu to program to the 'slow' device
dma registers. It then starts the dma.

When IO completes, device interrupts IOP, which marks cmd pkt
as complete, stuffs pkt in a reply queue, interrupts main cpu,
and starts next IO.

If you assume a common data bus, rather than separate IO bus,
then the main cpu could talk directly to the device it it chooses to.
If the IO bus is separate, it should offload the common bus but you
have no choice and must use the IOP to talk to the device registers.

Quote:
In any case there will be a bunch of transistors spinning their
wheels waiting for a device register. Offloading the task does not
necessarily make this cheaper. To my eye all it does is shuffle
the work around and make it more complex.

Offloading the task to a cheaper device does make it cheaper. And it doesn't
make it more complex. The IOPs of the 40 years ago past weren't build
because it made the system more complex or more difficult to handle, rather
the other way round.

40 years ago the main cpu cost $1,000,000 and the IOP probably $50,000.
Systems programmers got maybe $25,000 /yr, probably because of lower
costs and less overall demand. The design decisions that were
reasonable in that situation are not relevent to todays costs.

Quote:
Yes, but this is exactly the point. In both SMP and IOP the cpu
spins waiting for the register. Afterwards an IOP just sits there.
In SMP it goes back to work.

I don't need a few cents processor to go back to work. I am talking about a
few cents processor take off stupid workload of a processor with tripple
number $ price tag.

Oh I understand what you mean. And I agree it is largely 'stupid work'.
I'm just not bothered by a relatively cheap cpu wasting a uSec or so
waiting for a PCI device if I can just plug in another cheap cpu.

Eric
Back to top
FredK
Guest





Posted: Wed Jan 26, 2005 2:23 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

No. The problems overlap, but are not identical. To provide a way to
access IO space, the platform groups invented a "sparse" address
space in which some of the low-order address bits of the VA would
be used to create the length and offset of the partial word read or
write - *and* the data itself would need to be shifted into the correct
alignment for the operation - this was termed "swizzling".

This requirement was such that device drivers needed to be
rototilled - especially those that did not use OS supplied access
routines (such as graphics drivers) and exxplicitly read/wrote to
a device directly. This was a business problem in the NT space.

Partial word read/write to memory was a second level consideration
(IIRC) and frankly I don't think would have caused the change by
itself. Program sizes could go down, some sequences could get
more efficient/faster, but nothing extraordinary. I'm not even sure
that multi-threading issues even were considered - since even with
the partial word instructions, I think only a cache line is "atomic"
without LL/SC.



"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:ct5sfc$kv5$1@gemini.csx.cam.ac.uk...
Quote:
In article <35mjfuF4kdrkoU1@individual.net>,
=?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <jvorbrueggen-not@mediasec.de> wrote:
It originally had only full register loads and stores, which is
unsuitable for implementing C. That was fixed.

I believe the driving force in providing less-than-32-bit-read and
-writes was memory-mapped I/O, not implementing C (which was needed
from day one in any case). The workarounds in hard- and software for
the first systems were Nor Pretty and prone to misuse and errors, so
the necessary instructions were added.

That's essentially the same problem. In all cases (threads, memory
mapping and interrupts), the problems and solutions have a lot in
common.


Regards,
Nick Maclaren.
Back to top
FredK
Guest





Posted: Wed Jan 26, 2005 2:27 am    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

"Terje Mathisen" <terje.mathisen@hda.hydro.com> wrote in message
news:ct5cm3$e3j$1@osl016lin.hda.hydro.com...
Quote:

IMHO C wasn't the worst problem:

Memorymapped IO to 8/16/32-bit device registers with destructive read or
write was a harder problem, and in this case DEC couldn't simply define
away the problem either. The workaround entailed using alternate address
ranges afair. It seemed like a horrible hack at the time, and it
probably generated quite a few bugs in kernel mode drivers. :-(


Well, not as many as you would think. The problem was getting drivers
to do the work to begin with. My experience has always been (having
done a *lot* of graphics code on Alpha) that write ordering was the
harder problem (to get all the right ones there, but only those you need,
and still get performance). The code tricks for *that* was fun *and*
had lots of bugs ;-)
Back to top
Bernd Paysan
Guest





Posted: Wed Jan 26, 2005 2:45 am    Post subject: Re: CAS and LL/SC Reply with quote

David Kanter wrote:

Quote:
Isn't that what zSeries machines do? I was under the impression that
they have 'channel' controllers that are relatively similar to what you
are describing.

Yes, the 'channel' controllers from the zSeries date back to the /360
or /370 (Lynn Wheeler might know better). The mainframe designs from back
then have ideas that the microprocessor folks have forgotten and are now
forced to reimplement step by step. Virtualization is one of them (the /360
wasn't, but the /370 was). Those who forget history have to repeat it.

Quote:
Not being a zSeries person, I can neither conform nor deny such
allegatons : )

The CDC6600 had a similar concept.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Back to top
Bernd Paysan
Guest





Posted: Wed Jan 26, 2005 3:13 am    Post subject: Re: CAS and LL/SC Reply with quote

Eric P. wrote:
Quote:
Now tell me there's no DMA unit in the chipset ;-).

I'm not sure what you mean. Each device has its own DMA unit if it
needs one.

The DMA unit are two parts. The device wants to read from or write to main
memory, and the chipset has to satisfy that request. An IOP might sit
between the DMA request and the real memory, handling things like virtual
memory or scatter gathering (scatter gathering is mostly used to implement
the fragmented DMA transfers you get when translating virtual to physical
addresses).

Actually, DMA is a sort of "poor man's IOP". The DMA takes over the most
tedious workload, the bulk data transfer. Platforms with IO processors
don't need DMA, the device just requests and acknowledges transfers like an
IDE device in PIO mode.

Quote:
If there is an IOP, then I assume you wanted to use it to program
the device DMA unit. The device driver in the main cpu contructs a
command packet and hands it to IOP, probably stuffs it in a double
linked list and sets an attention flag. A single linked list from
which the pkt ptr would then be copied in and the list order
reversed would also work and avoids the spinlocks.

The IOP, running a minature quasi real time OS, polls the attention
flag, sees the new pkt, parses pkt and runs the same driver code that
it would have run on the main cpu to program to the 'slow' device
dma registers. It then starts the dma.

When IO completes, device interrupts IOP, which marks cmd pkt
as complete, stuffs pkt in a reply queue, interrupts main cpu,
and starts next IO.

Yes, that's the general idea. If you add an IOP to the PC architecture, you
would still have the possibility to bypass it, for compatibility reasons.
The IOP does not necessarily have to run a full "device driver", it's also
possible if the command packet contains enough details (i.e. a sort of
"mini-program") to send details through on-the-fly (like "you have to set
register x to y, and then poll register z for bit q"). And for
packet-oriented IOs like USB or SCSI (SATA), it is sufficient if the lowest
level (the IOP) knows how to send and receive packets.

Quote:
40 years ago the main cpu cost $1,000,000 and the IOP probably $50,000.
Systems programmers got maybe $25,000 /yr, probably because of lower
costs and less overall demand. The design decisions that were
reasonable in that situation are not relevent to todays costs.

The design decisions from back then seem more reasonable every day (though
clearly not all of them, and if you repeat the design-by-committee mistakes
from IBM back then, you'll get even more bloated designs ;-).

Mass-market products (and the RTOS for the IO processor would be as mass
market as the chipset with this processor) still have the same cost
relation as back then. Instead of selling some hundred CPUs for $1M, you
now sell some hundred million CPUs - not for $1, but for >$100.

Quote:
Oh I understand what you mean. And I agree it is largely 'stupid work'.
I'm just not bothered by a relatively cheap cpu wasting a uSec or so
waiting for a PCI device if I can just plug in another cheap cpu.

That's why IOPs have to be multithreaded, so you don't even have to plug in
something physical.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Back to top
Bernd Paysan
Guest





Posted: Wed Jan 26, 2005 3:32 am    Post subject: Re: CAS and LL/SC Reply with quote

Nick Maclaren wrote:
Quote:
Merging this with another thread (the request for specialised
coprocessors using an I/O interface), you end up with it being
regarded as a good thing to have a CPU core pretending to be an
I/O device pretending to be some memory attached to another CPU.
Stop the world - I want to get off.

Why stop? Use the speed to lift off better ;-)

An IOP doesn't look like I/O to the main CPU, it looks like another task.
The IO side of the IOP might look as weird as it looks due to legacy
reasons. At least you can make the IOP registers wide enough to track the
addressing width of the real CPU.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Back to top
Anne & Lynn Wheeler
Guest





Posted: Wed Jan 26, 2005 3:38 am    Post subject: Re: CAS and LL/SC Reply with quote

Bernd Paysan <bernd.paysan@gmx.de> writes:
Quote:
Yes, the 'channel' controllers from the zSeries date back to the
/360 or /370 (Lynn Wheeler might know better). The mainframe designs
from back then have ideas that the microprocessor folks have
forgotten and are now forced to reimplement step by
step. Virtualization is one of them (the /360 wasn't, but the /370
was). Those who forget history have to repeat it.

The CDC6600 had a similar concept.

channel programs were in 360 ... and there were channel processors
that executed channel programs. channel programs were composed of
sequence of channel command words (8 byte instructions) that were
executed one at a time by the channel processor. most 360s were
microcoded and many of the channel processors were integrated with the
same engine running the 360 instructions ... just a different block of
microcode.

i/o supervisor would store the address of the start of a channel
program in the CAW (channel address word) and issue a start I/O (SIO)
to the specific channel number.

the faster 360s (360/65) had separate boxes that executed channel
programs (as opposed to channel processing being integrated on the
same engine with the instruction processor)

cp67 virtual machine support running on 360/67 (basically 360/65 with
virtual memory hardware) has had the problems. channel programs are
defined as specifying real addresses. Virtual machine support required
intercepting the SIO, analysing the virtual machines I/O sequence,
making a copy of it ... and (at least) translating all of the virtual
machine specified "address" to the "real" machine address. If the I/O
operation involved a transfer that cross a page boundary and the pages
weren't contiguous ... then the copied channel program had to also
specify multiple non-contiguous addresses. Furthermore the virtual
pages had to be pined/fixed in real memory (at their translated
address) for the duration of the i/o operation.

this scenario has continued ... for all the virtual memory operating
systems. in the initial prototype of the batch, real-memory os/360
operating system to 370 virtual storage ... a copy of CP67's ccw
translation was grafted onto the side of the operating system ...
since it had to do similar virtual address to real address
translation, pin/fixing of virtual pages, etc.

part of the paradigm is that almost all i/o operations tended to
always be direct .... the normal paradigm as been direct asyncronous
i/o operation (no buffer moves needed), even at the application level
(as opposed to operating system creating construct of transfers going
on behind the scenes).

the 370/158 was integrated channels. the next generation was 303x.
they took a 370/158 and eliminated the 370 microcode ... just leaving
the integrated channel migrated and renamed it the 303x channel
director. 370/158s and 370/168s were then repackaged as 3031 and 3032
using the 303x channel director (in some sense 3031 was a two
processor, smp 158 ... except instead of two processors with both
integrated channel microcode and 370 microcode ... one processor just
had the channel microcode and the other just had the 370 microcode).
The 3033 started out being 370/168 circuit diagram remapped to faster
chip technology ... supposedly resulting in 20% faster processor than
168. During the cycle tho, there were additional stuff done ot the
3033 so that it eventually was about 50% faster than 168.

One of the issues in 370 with faster processors was the syncronous
hand-shake required by the SIO instruction between the processor
engine and outboard channel engine. The other problem was the
significant impact on cache hit ratios from asyncronous interrupts.
"XA", besides introducing 31bit virtual addressing ... also introduced
a new I/O processing paradigm ... where new processing engine could
handle much of the asyncronous crude involved with i/o ... and present
a much more pleasant queued and low-overhead paradigm to the main
processor. The other thing it addresses was real-time I/O redrive. In
the 360/370 paradigm ... when there was a queue of requests for a
drive, the processor got interrupted, processed the interrupt and
eventually got around to redriving the device from the queued
requests. The more pleasant queued interface allowed a external
real-time engine the capability of redriving queued requests. All
targeted at lowering the tight serialization between i/o operations
and the instruction engine.

There is a subset of virtual machine support built into all the
mainframe processors called logical partitions. It has most of the
characteristics of the original cp67 ... but built into the hardware
microcode and supporting a limited number of concurrent virtual
machines (partitions). It has simplified the ccw translation process
because each partition's memory is contiguous and fixed in storage
i.e. simple base & bound rather than fragmented non-contiguous pages.
The i/o subsystem can directly use the base&bound values avoiding
needing to make an on-the-fly copy of every channel program and modify
the storage addresses.

this is sort of high level overview ... a lot of the actual stuff are
the nits in the low level details.

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Back to top
Terje Mathisen
Guest





Posted: Wed Jan 26, 2005 1:07 pm    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

FredK wrote:

Quote:
No. The problems overlap, but are not identical. To provide a way to
access IO space, the platform groups invented a "sparse" address
space in which some of the low-order address bits of the VA would
be used to create the length and offset of the partial word read or
write - *and* the data itself would need to be shifted into the correct
alignment for the operation - this was termed "swizzling".

Since they only had three bits left over at the low end, they had to
embed some info in at least one high-order address bit as well, i.e.
you'd need each memory-mapped device to be mapped several times.

(BTW, the Weitek fp coprocessor boards for early x86 machines did
something very similar, by grabbing a full 64K of addressing range, and
then using the 16 least significant address bits as a way to determine
the opcode to be performed.)

I.e. a byte (t_uint8) access needs all three bits to determine offset,
while a t_uint16 needs two offset bits and a a t_uint32 only needs a
single offset bit. This means that 16 and 32-bit accesses could share a
range by using the low-order bit to select the access size.

A more orthogonal setup would have four ranges, and the remaining
address bits would be exactly as if you were doing regular 8/16/32/64
bit operations. I'm guessing this is what DEC did?

Quote:
This requirement was such that device drivers needed to be
rototilled - especially those that did not use OS supplied access
routines (such as graphics drivers) and exxplicitly read/wrote to
a device directly. This was a business problem in the NT space.

Partial word read/write to memory was a second level consideration
(IIRC) and frankly I don't think would have caused the change by
itself. Program sizes could go down, some sequences could get
more efficient/faster, but nothing extraordinary. I'm not even sure
that multi-threading issues even were considered - since even with
the partial word instructions, I think only a cache line is "atomic"
without LL/SC.

Right.

Terje

--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Back to top
Jan Vorbrüggen
Guest





Posted: Wed Jan 26, 2005 2:09 pm    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

Quote:
But those apps were broken to begin with, their brokenness was just
exposed more easily on Alpha.
Not so. Replacing atomic byte and word stores with read-modify-write
sequences *changes* such accesses to be non-interruptable on that
platform. Granted such usage is an unstated assumption, but so
what - who guarenteed that 32 bits int accesses are atomic?
These are assumptions that are valid on most (all?) platforms
but that one, and are used by lots of code including that of their
existing customers VAX code.

The VAX architecture guaranteed that memory accesses on a uniprocessor
were atomic, as were certain instructions that did read-modify-writes
- for instance, the queue instructions. However, in all multiprocessor
systems only the use of the interlocked instructions (queue manipulation,
setting/clearing bits, ADAWI - add aligned word interlocked) was guaran-
teed this property across processors. Thus, the programmer had to know
when accessing shared data structures in what synchronization domain he
was working. Most of that knowledge was hid behind appropriate APIs.
Where that isn't possible - e.g., interactions between an AST and mainline
code - there are clear instructions on the Dos and Donts, and OS support
for critical sections in the form of temporarily disabling ASTs from user
mode.

Summary: Any thread-based/parallel code on VAX that accessed a shared
data structure by just using code that was oblivious of the fact was
broken to begin with - but you would get away with it in many common
situations (e.g., on a uniprocessor system or even a small SMP that
hadn't reduced quantum). Nothing new here - the same holds true of many
other systems. And that was my point - even all those guys who thought
"all the world's a VAX" were already breaking the rules on the VAX, and
they got their just rewards when they tried to run their junk elsewhere.

Jan
Back to top
Jan Vorbrüggen
Guest





Posted: Wed Jan 26, 2005 2:17 pm    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

Quote:
That's essentially the same problem. In all cases (threads, memory
mapping and interrupts), the problems and solutions have a lot in
common.

I disagree. A lot of accesses to memory don't care about being thread-
safe or not, or being interrupted and repeated later, etc - that's the
basis for your often-repeated argument, Nick, that a memory system in
a parallel processor should not support cache coherency on every access
but only on those where the programmer says it is required, and subject
to certain restrictions. All those nice properties don't hold for your
run-of-the-mill (E)ISA and even PCI device - you often cannot repeat
the device memory access without things going wrong, whether you access
a register as an entity one, two or four bytes long might make a difference
to its operation, and so on. Of course, you might say that these designs
are broken in the first place, and I would be the last to disagree -
but then I think a lot of other things in this world are broken by design
as well, and we don't get to change them at the wave of a wand (reading
too much Harry Potter lately, it appears). So DEC just had to live with
it, and changing the Alpha architecture apparently was less pain - i.e.,
cost less money - than adapting each and every new system and all the
software driving it and its devices to suit the architecture.

Jan
Back to top
Jan Vorbrüggen
Guest





Posted: Wed Jan 26, 2005 2:19 pm    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

Quote:
A more orthogonal setup would have four ranges, and the remaining
address bits would be exactly as if you were doing regular 8/16/32/64
bit operations. I'm guessing this is what DEC did?

I believe they did - although the architecture already had 32-bit memory
ops to support a lot of the VAX code, so you only needed three mappings.

Jan
Back to top
Jan Vorbrüggen
Guest





Posted: Wed Jan 26, 2005 2:26 pm    Post subject: Re: CAS and LL/SC Reply with quote

Quote:
The IOP needs the same sort of access to main memory/cache as a DMA unit.
Which is coherent I hope, so we agree.

But all it needs to do its get the most recent data (for writes) and
invalidate non-memory copies on reads. No need for write-backs to memory
from a cache, for instance, on reads.

Jan
Back to top
Nick Maclaren
Guest





Posted: Wed Jan 26, 2005 4:30 pm    Post subject: Re: CAS and LL/SC (was Re: High Level Assembler for MVS & VM Reply with quote

In article <35p5giF338ourU1@individual.net>,
=?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <jvorbrueggen-not@mediasec.de> wrote:
Quote:
That's essentially the same problem. In all cases (threads, memory
mapping and interrupts), the problems and solutions have a lot in
common.

I disagree. A lot of accesses to memory don't care about being thread-
safe or not, or being interrupted and repeated later, etc - that's the
basis for your often-repeated argument, Nick, that a memory system in
a parallel processor should not support cache coherency on every access
but only on those where the programmer says it is required, and subject
to certain restrictions. ...

That isn't what I meant. What I meant is that the C language defined what
activities are safe for such uses in the same way, the compiler needs to
generate the same code to allow safe access, and so on. Whether you
actually NEED any of that is another matter entirely.


Regards,
Nick Maclaren.
Back to top
Sander Vesik
Guest





Posted: Wed Jan 26, 2005 6:31 pm    Post subject: Re: CAS and LL/SC Reply with quote

Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
Quote:
In article <1106441291.618619@haldjas.folklore.ee>,
Sander Vesik <sander@haldjas.folklore.ee> wrote:

Oh, and I'd expect "tagging" to get more common in the future. We've
passed through a period when "all 32 bits" were used for addressing
but we are now entering a period when there is "no chance" that all
64 bits will be significant.

I'm not convinced we are about to enter such period and fully expect
people to crumble about people having done stupid tricks in 10-15
years time.

No, it's the programs that will crumble :-)

While that's possible, I don't think that it is likely. It would
certainly be likely if there were a resurgence of the models that
map the whole filing system into memory, but I don't see much sign
of that.

The aggregate RAM of a cluster may well exceed 48 bits by then, but
I don't see virtual shared memory making headway, either. And I don't
see the directly addressible RAM of a single thread of control getting
over 48 bits for several decades. Not that it couldn't be done, but
I don't think that is the way things will go.

Using 8 bits for flags would be safe for a long time - using 16 would
be safe for a while. I agree that architecting the latter would be
foolish, given the lifespan of architectures.

In 15 years, machines with 1000 simultaneous hw threads from a combination
of smp + mt are imho liekly to be rather commonplace. Using say cumulative
36 bits per thread (after all, the number of software threads is liekly to
be far higher), it gets you to 48 bits. Thats just 8 bits away from hitting
flags. SO large systems running on high end machines in 15 years would
imho not feel comfortable with having 56bits of address + 8 bits of flags,
especialy if kernel takes away one of the bits anyways.

Quote:


Regards,
Nick Maclaren.

--
Sander

+++ Out of cheese error +++
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2, 3 ... 23, 24, 25, 26  Next
Page 24 of 26

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB