| Author |
Message |
A Man Crying Alone In The
Guest
|
Posted:
Sun Oct 23, 2005 4:15 pm Post subject:
A stupid post about Intel's latest computer chip ( s) |
|
|
Come on now, you are less of an idiot to understand this,
IBM/INTEL architecture,
REGISTER_1 ( A storage location)
REGISTER_2 ( A storage location)
REGISTER_3 ( A storage location)
( etc. . . . )
REGISTER_16 ( A storage location)
V.S
single stack enhanced architecture, ( dynamic frequency profiled)
STACK_1 [ 1..8]
STACK_2 [ 1..4]
STACK_3 [ 1..4]
Which one do you believe requires less chip internal hardware wires?
( and, thus, a higher efficiency of "Turing" machine language
expression ( and code profile))
( HINT : Have you every read about minimal ANSI FORTH machines? )
MIMD Multiple Instruction Multiple Data
VLIW Variable Length Instruction Word
MPP Multiple Parallel Processors ( many SMPs linked together like an
interconnecting LEGO(tm)-like block game to add more processing power )
SMP Symmetric Multiple Processor ( like, multiple cores on a single CPU
chip)
( between sixteen and two with IBM/Intel, set at a constant factor of
sixteen and derivative of super-scalable application dynamic frequency
profile )
I have been shouting news of the VLIW SMP MPP FORTH formula to
Washington and has been published, since 1996, all around the St. Paul
and Minneapolis Minnesota area.
However, IBM/Intel continues to shout anti-news. |
|
| Back to top |
|
 |
Jerry Avins
Guest
|
Posted:
Sun Oct 23, 2005 4:15 pm Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
A Man Crying Alone In The Wilderness wrote:
| Quote: | Come on now, you are less of an idiot to understand this,
|
Have you been skipping your meds?
Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ |
|
| Back to top |
|
 |
Guest
|
Posted:
Sun Oct 23, 2005 10:27 pm Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
Come on now, you are less of an idiot to understand this,
IBM/INTEL architecture,
REGISTER_1 ( A storage location)
REGISTER_2 ( A storage location)
REGISTER_3 ( A storage location)
( etc. . . . )
REGISTER_16 ( A storage location)
V.S
single stack enhanced architecture, ( dynamic frequency profiled)
STACK_1 [ 1..8]
STACK_2 [ 1..4]
STACK_3 [ 1..4]
Which one do you believe requires less chip internal hardware wires?
( and, thus, a higher efficiency of "Turing" machine language
expression ( and code profile))
( HINT : Have you every read about minimal ANSI FORTH machines? )
MIMD Multiple Instruction Multiple Data
VLIW Variable Length Instruction Word
MPP Multiple Parallel Processors ( many SMPs linked together like an
interconnecting LEGO(tm)-like block game to add more processing power )
SMP Symmetric Multiple Processor ( like, multiple cores on a single CPU
chip)
( between sixteen and two with IBM/Intel, set at a constant factor of
sixteen and derivative of super-scalable application dynamic frequency
profile )
I have been shouting news of the VLIW SMP MPP FORTH formula to
Washington and has been published, since 1996, all around the St. Paul
and Minneapolis Minnesota area.
However, IBM/Intel continues to shout anti-news.
---
A simple enumeration of basic primitives with a stack enhanced
architecture yields an powerful micro processor core. ( For example
ANSI FORTH machine implicit and explicit primitives, JUMP_IF_ZERO JUMP
CALL RETURN LITERAL 0< AND XOR DROP OVER DUP @ ! 2* 2/ >R R> INVERT + )
---
Jerry Avins wrote:
| Quote: | A Man Crying Alone In The Wilderness wrote:
Come on now, you are less of an idiot to understand this,
Have you been skipping your meds?
Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ |
|
|
| Back to top |
|
 |
The Ghost In The Machine
Guest
|
Posted:
Sun Oct 23, 2005 11:00 pm Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
In sci.math, A Man Crying Alone In The Wilderness
<cpu16x1832@wmconnect.com>
wrote
on 23 Oct 2005 08:11:24 -0700
<1130080284.332527.14090@g14g2000cwa.googlegroups.com>:
| Quote: | Come on now, you are less of an idiot to understand this,
IBM/INTEL architecture,
REGISTER_1 ( A storage location)
REGISTER_2 ( A storage location)
REGISTER_3 ( A storage location)
( etc. . . . )
REGISTER_16 ( A storage location)
|
Actually, I think the registers are stored in internal flipflops
in the microprocessor itself.
| Quote: |
V.S
single stack enhanced architecture, ( dynamic frequency profiled)
STACK_1 [ 1..8]
STACK_2 [ 1..4]
STACK_3 [ 1..4]
Which one do you believe requires less chip internal hardware wires?
( and, thus, a higher efficiency of "Turing" machine language
expression ( and code profile))
( HINT : Have you every read about minimal ANSI FORTH machines? )
MIMD Multiple Instruction Multiple Data
VLIW Variable Length Instruction Word
MPP Multiple Parallel Processors ( many SMPs linked together like an
interconnecting LEGO(tm)-like block game to add more processing power )
SMP Symmetric Multiple Processor ( like, multiple cores on a single CPU
chip)
( between sixteen and two with IBM/Intel, set at a constant factor of
sixteen and derivative of super-scalable application dynamic frequency
profile )
I have been shouting news of the VLIW SMP MPP FORTH formula to
Washington and has been published, since 1996, all around the St. Paul
and Minneapolis Minnesota area.
However, IBM/Intel continues to shout anti-news.
|
An interesting question. Given what little I know
about modern chip design (I worked in a fab as a
software engineer 15-20 years ago so absorbed a little
microelectronics by osmosis :-) ), it would appear to me
that it's little if any difference, though it depends on
the specifics of how the stack is implemented.
At the software level, most stacks are basically regions
of memory, accessed via a stack pointer. A logical
implementation (to me, anyway) of a stack machine at
the hardware level would be a region of memory, a stack
pointer, and a bus of width 4 * wordwidth. This bus would
contain at most 4 words and would then be fed to the ALU.
The ALU would decides to write at most 2 words back to the
bus, which would stick them back on the stack. The stack
pointer would have various wiggles on it to "pop 2",
"pop 1", "push 2", and "push 1", perhaps.
Note that I'm not really specifying a word size, although
most contemporary architectures would be 32 or 64 bits.
Ideally one could do this in a "bit slice" fashion; just
add more chips for bigger words. However, I'm not sure
how well that will work for the more complex instructions.
(It's worth noting here that
http://computer.howstuffworks.com/question299.htm
suggests that the G4 has a 128-bit internal bus.)
The arithmetic instructions would be fairly simple:
ADD: take 2 operands, SP++, shove result, set conditionflags
SUB: take 2 operands, SP++, shove result, set conditionflags
NEG: take 1 operand, shove result, set conditionflags
MUL: take 2 operands, shove product and overflow, set conditionflags
MULA: take 3 operands, SP++, shove product and overflow, set conditionflags
DIV: take 3 operands, SP++, shove quotient and remainder,
set conditionflags
MULDIV: take 3 operands, SP+=2, shove results, set conditionflags
MULDIVMOD: take 3 operands, SP++, shove results, set conditionflags
MULADIV: take 4 operands, SP+=3, shove results, set conditionflags
MULADIVMOD: take 4 operands, SP+=2, shove results, set conditionflags
The logical instructions:
AND, OR, XOR: take 2 operands, SP++, shove resuilt, set conditionflags
NAND, NOR: take 2 operands, SP++, shove resuilt, set conditionflags
NOT: take 1 operand, shove result, set conditionflags
TEST: take 1 operand, SP++, set conditionflags
BITTEST: take 2 operands, SP+=2, set conditionflags
The "bitfiddle" instructions:
LSHIFT: take 2 operands, SP++, shove result, set conditionflags
RSHIFT: take 2 operands, SP++, shove result, set conditionflags
ARSHIFT: take 2 operands, SP++, shove result, set conditionflags
(the main difference is the handling of the sign bit)
LSHIFT2: take 3 operands, SP++, shove result, set conditionflags
RSHIFT2: take 3 operands, SP++, shove result, set conditionflags
ARSHIFT2: take 3 operands, SP++, shove result, set conditionflags
The "stackfiddle" instructions:
DUP: take 1 operand, SP--, shove results, set conditionflags
DUP2: take 2 operands, SP-=2, shove results, set conditionflags
DROP: SP++
DROP2: SP+=2
SWAP: take 2 operands, swap 'em, set conditionflags
SWAP2: take 4 operands, swap 'em, set conditionflags
OVER: take 2 operands, SP++, shove result, set conditionflags
OVER2: take 4 operands, SP+=2, shove result, set conditionflags
PICK:
This one's tricky. One could implement 1PICK=DUP, 2PICK=OVER,
3PICK, and 4PICK easily enough; beyond that one would have
to engineer instructions to move around the stack and a temporary
holding register -- or one can add to/subtract from the stack
(a sort of DROPn instruction) as opposed to merely incrementing
and decrementing it.
PICK2:
Similar to PICK except it uses word pairs instead of words.
ROLL:
This one's even trickier. Two words are fetched from the stack,
then the stack rotated. SP += 2 after the operation but a lot
is happening in between instruction start and instruction end.
Note that 2 1 ROLL (or was it 1 2 ROLL?) is equivalent to
a SWAP.
ROLL2:
Similar to ROLL except it uses word pairs instead of words.
LOAD1:
Whatever word's following in the instruction stream, push it
onto the stack.
LOAD2, LOAD3, LOAD4:
Similar to LOAD1 except more data is pushed.
Arbitrary memory fetch:
I'm not sure how to properly structure this, but here's one fairly
obvious method:
FETCH: take 2 operands, SP++, push result, set conditionflags
The first operand is a base address, the second an offset.
Usage might be along the lines of '5 MADDR FETCH', which
picks the word from the location MADDR+5 and pushes it
onto the stack.
RFETCH: Same as FETCH except the operands are reversed. This one
might be useful in certain structure contexts; e.g.
in "MADDR 5 RFETCH", one could define "5 RFETCH" as
"GETCFLAGS" and use it everywhere. Of course
one could define RFETCH as "SWAP FETCH" anyway.
STORE: take 3 operands, SP+=3.
RSTORE: take 3 operands, SP+=3.
MSWAP: take 3 operands, SP+=2, push result, set conditionflags
This is basically an atomic swap, which is of some
importance to proper implementation of locking, semaphors,
and monitors.
RMSWAP: take 3 operands, SP+=2, push result, set conditionflags
Program control:
Depending on desire, one might have a separate stack for the control
instructions (as I recall, in some Forths, one has R> and >R
"words"). One could then do things in a fairly obvious fashion:
JMP: whatever word's following, replace the top of the R-stack with it.
CALL: push the word following onto the R-stack.
RET: DROP for the R-stack.
JMPI: pop the top of the numeric stack and replace the top of the
R-stack with it.
SAV: take the top of the R-stack and push it onto the numeric stack.
BRANCH: add the word following to the top of the R-stack. Depending
on desired sophistication one can have signed byte offset,
int16 offset, and int32 offset variants as well.
And of course one has the conditional modified forms,
which would simply test for various combinations of the
condition flags in this particular design, but other
microprocessors actually bother to look at the top word
of the stack or the contents of a register.
TRAP and TRAPRET instructions might be useful; these
would allow for simple context switching and scheduling.
Some PROBE instructions might allow for access to the
user's space from the kernel, for checking purposes.
I/O would only be permitted from the kernel context;
these might include the usual READ and WRITE instructions
for port manipulation, and some form of DMA setup which
would relinquish the outside data bus for a short time
to allow transference between device and physical memory.
I'm not sure how I'd handle various issues such as
virtual -> physical page translation, and initial program
load.
Granted, this is an ad hoc machine design; I'd have to
burrow deep into the JVM whitepaper to see how they do it,
and there's a few issues regarding name lookup in there.
This is also designed without regard to losing machine
control; ideally, it would be virtually impossible to
"hack" the machine by e.g. LOAD #x JMPI to jump to an
undesired (well, undesired by the system or algorithm
designer, anyway) location. Or one could obliterate
the operand stack (oops) or the program stack (extremely
dangerous).
There's some interesting issues regarding clocking. Does
one really need a system clock? Some intriguing designs
were suggested last decade that basically ran "as fast
as possible". Admittedly, these probably dropped into
the bit bucket, as DRAM requires a clock anyway.
And then there's pipelining -- basically, doing two things
at once. For example, HP PA Risc had some interesting
things going on that could execute one extra instruction
immediately following a conditional branch, regardless of
whether that branch was actually taken or not. The G4
can fetch and execute three instructions per clock cycle.
Small wonder that processors such as the 1802, 6502, and 8088
are relatively puny in transistor number (and capability)
whereas modern processors are pushing the 50 million mark,
and modern machines the 400 W mark (the original PC-XT used
all of 63.5 watts, if that).
But it's not just because the words are bigger (the 1802 had
16-bit address and 8-bit data capability; AMD-64 has 64-bit
address and data capability).
Perhaps it's time to step back a bit and contemplate the
bigger question: who (or what) should control the machine?
--
#191, ewill3@earthlink.net
It's still legal to go .sigless. |
|
| Back to top |
|
 |
A Man Crying Alone In The
Guest
|
Posted:
Sun Oct 23, 2005 11:16 pm Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
Come on now, you are less of an idiot to understand this,
IBM/INTEL architecture,
REGISTER_1 ( A storage location)
REGISTER_2 ( A storage location)
REGISTER_3 ( A storage location)
( etc. . . . )
REGISTER_16 ( A storage location)
V=2ES
single stack enhanced architecture, ( dynamic frequency profiled)
STACK_1 [ 1..8]
STACK_2 [ 1..4]
STACK_3 [ 1..4]
Which one do you believe requires less chip internal hardware wires?
( and, thus, a higher efficiency of "Turing" machine language
expression ( and code profile))
( HINT : Have you every read about minimal ANSI FORTH machines? )
MIMD Multiple Instruction Multiple Data
VLIW Variable Length Instruction Word
MPP Multiple Parallel Processors ( many SMPs linked together like an
interconnecting LEGO(tm)-like block game to add more processing power )
SMP Symmetric Multiple Processor ( like, multiple cores on a single CPU
chip)
( between sixteen and two with IBM/Intel, set at a constant factor of
sixteen and derivative of super-scalable application dynamic frequency
profile )
I have been shouting news of the VLIW SMP MPP FORTH formula to
Washington and has been published, since 1996, all around the St. Paul
and Minneapolis Minnesota area.
However, IBM/Intel continues to shout anti-news.
---
A simple enumeration of basic primitives with a stack enhanced
architecture yields an powerful micro processor core. ( For example
ANSI FORTH machine implicit and explicit primitives, JUMP_IF_ZERO JUMP
CALL RETURN LITERAL 0< AND XOR DROP OVER DUP @ ! 2* 2/ >R R> INVERT + )
---
The Ghost In The Machine wrote:
<SNIP>
| Quote: | Note that I'm not really specifying a word size, although
most contemporary architectures would be 32 or 64 bits.
|
Maybe investigate a 16-bit 16-way SMP core dual-bus architecture with
16-bit instructions aligned every 64-bits for an optimum primitives
profile.
regards,
maw |
|
| Back to top |
|
 |
maghas@Ryugyong.Hotel
Guest
|
Posted:
Sun Oct 23, 2005 11:56 pm Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
<cpu16x1832@wmconnect.com> wrote in message
news:1130088424.198227.277960@g44g2000cwa.googlegroups.com...
Come on now, you are less of an idiot to understand this,
FORTH never went anywhere for a good reason.
Totally un-maintainable. |
|
| Back to top |
|
 |
Ken Smith
Guest
|
Posted:
Mon Oct 24, 2005 12:02 am Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
In article <fudt23-d6s.ln1@sirius.tg00suus7038.net>,
The Ghost In The Machine <ewill@sirius.tg00suus7038.net> wrote:
[...]
| Quote: | I for one would think it depends on what one wants to optimize.
[1] Raw chip speed -- how fast can that sucker go?
[2] Chip power dissipation.
[3] Chip size.
[4] Number of transistors. (This is not quite the same as chip size,
since other variables include fanin or fanout per transistor.)
[5] Number of transistor flips during execution of a specific problem
(e.g., Erastothene's Sieve). Presumably, this is related
to [2].
|
[6] How fast it will go running something compiled with a C compiler a
mere mortal can design.
In a very pipelined machine, you can get more speed per transistor by
making it the compilers job to make sure that two numbers aren't trying to
go down the same bus. If different instructions have all manner of
different timings, coming up with the optimum code can be very tricky.
--
--
kensmith@rahul.net forging knowledge |
|
| Back to top |
|
 |
Mark Nudelman
Guest
|
Posted:
Mon Oct 24, 2005 12:15 am Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
On 10/23/2005 8:11 AM, A Man Crying Alone In The Wilderness wrote:
| Quote: | Come on now, you are less of an idiot to understand this,
|
Perhaps if you could write in grammatical English, people could
understand what you're trying to say.
| Quote: | IBM/INTEL architecture,
REGISTER_1 ( A storage location)
REGISTER_2 ( A storage location)
REGISTER_3 ( A storage location)
( etc. . . . )
REGISTER_16 ( A storage location)
V.S
single stack enhanced architecture, ( dynamic frequency profiled)
STACK_1 [ 1..8]
STACK_2 [ 1..4]
STACK_3 [ 1..4]
Which one do you believe requires less chip internal hardware wires?
( and, thus, a higher efficiency of "Turing" machine language
expression ( and code profile))
|
This is a meaningless question. A stack architecture could be
implemented with fewer "wires" (by which I think you mean "gates") if it
stores the stack entirely in off-chip memory, but then it would be much
slower than a register-based machine. Most reasonable stack machines
keep the top N stack entries in on-chip registers, which makes it look
pretty similar to a register-based architecture from the point of view
of chip resources. On the other hand, a register-based machine could
keep its registers in off-chip memory in order to save gates, but this
would be a pretty stupid design.
However, counting gates (or "wires") is not the way to determine the
efficiency of a chip. In general, chips with more gates are MORE
efficient, since they implement a lot of optimizations which are not
possible in smaller chips.
But possibly I entirely misunderstood your point, because your posting
is very unclear.
Also, when people reply to you and you just repost your original post as
a reply to them, it makes it look like you can't understand their
replies (or that you're a bot). You should at least respond to the
substance of posts that reply to you.
--Mark |
|
| Back to top |
|
 |
Jerry Avins
Guest
|
Posted:
Mon Oct 24, 2005 12:15 am Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
Mark Nudelman wrote:
...
| Quote: | Also, when people reply to you and you just repost your original post as
a reply to them, it makes it look like you can't understand their
replies (or that you're a bot). You should at least respond to the
substance of posts that reply to you.
|
A troll is a troll is a troll.
Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ |
|
| Back to top |
|
 |
A Man Crying Alone In The
Guest
|
Posted:
Mon Oct 24, 2005 12:15 am Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
maghas@Ryugyong.Hotel wrote:
| Quote: | cpu16x1832@wmconnect.com> wrote in message
news:1130088424.198227.277960@g44g2000cwa.googlegroups.com...
Come on now, you are less of an idiot to understand this,
FORTH never went anywhere for a good reason.
Totally un-maintainable.
|
Presumably for the same reason you understand all machine code is
un-maintainable. Maybe read some more to develop you knowledge of
computer programming languages and their relationship to machine code.
Regards,
maw |
|
| Back to top |
|
 |
A Man Crying Alone In The
Guest
|
Posted:
Mon Oct 24, 2005 12:15 am Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
Mark Nudelman wrote:
| Quote: | On 10/23/2005 8:11 AM, A Man Crying Alone In The Wilderness wrote:
Come on now, you are less of an idiot to understand this,
Perhaps if you could write in grammatical English, people could
understand what you're trying to say.
IBM/INTEL architecture,
REGISTER_1 ( A storage location)
REGISTER_2 ( A storage location)
REGISTER_3 ( A storage location)
( etc. . . . )
REGISTER_16 ( A storage location)
V.S
single stack enhanced architecture, ( dynamic frequency profiled)
STACK_1 [ 1..8]
STACK_2 [ 1..4]
STACK_3 [ 1..4]
Which one do you believe requires less chip internal hardware wires?
( and, thus, a higher efficiency of "Turing" machine language
expression ( and code profile))
This is a meaningless question. A stack architecture could be
implemented with fewer "wires" (by which I think you mean "gates") if it
stores the stack entirely in off-chip memory, but then it would be much
slower than a register-based machine. Most reasonable stack machines
keep the top N stack entries in on-chip registers, which makes it look
pretty similar to a register-based architecture from the point of view
of chip resources. On the other hand, a register-based machine could
keep its registers in off-chip memory in order to save gates, but this
would be a pretty stupid design.
However, counting gates (or "wires") is not the way to determine the
efficiency of a chip. In general, chips with more gates are MORE
efficient, since they implement a lot of optimizations which are not
possible in smaller chips.
But possibly I entirely misunderstood your point, because your posting
is very unclear.
Also, when people reply to you and you just repost your original post as
a reply to them, it makes it look like you can't understand their
replies (or that you're a bot). You should at least respond to the
substance of posts that reply to you.
--Mark
|
Because I took the time to explain the terms clearly, maybe read more
stack machine architecture, in general, most modern stack machine
architures, of the last ten years, focus upon on-chip "stack"
registers.
Here is something you may consider,
http://groups.google.com/group/comp.lang.java.machine/msg/b400d03ddc0f5a4f?dmode=source&hl=en
http://groups.google.com/group/comp.lang.forth/msg/2c7a2008f7d2fbd2?dmode=source&hl=en
Please be kindly enough to quote /entirely/ with what information you
are more knowledgeable so at to disagree ( or, agree) with in
finding(s).
Regards,
maw |
|
| Back to top |
|
 |
The Ghost In The Machine
Guest
|
Posted:
Mon Oct 24, 2005 12:15 am Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
In sci.math, Mark Nudelman
<markn@greenwoodsoftware.com>
wrote
on Sun, 23 Oct 2005 13:35:30 -0700
<jcKdnVFD0K2PacbeRVn-ow@comcast.com>:
| Quote: | On 10/23/2005 8:11 AM, A Man Crying Alone In The Wilderness wrote:
Come on now, you are less of an idiot to understand this,
Perhaps if you could write in grammatical English, people could
understand what you're trying to say.
IBM/INTEL architecture,
REGISTER_1 ( A storage location)
REGISTER_2 ( A storage location)
REGISTER_3 ( A storage location)
( etc. . . . )
REGISTER_16 ( A storage location)
V.S
single stack enhanced architecture, ( dynamic frequency profiled)
STACK_1 [ 1..8]
STACK_2 [ 1..4]
STACK_3 [ 1..4]
Which one do you believe requires less chip internal hardware wires?
( and, thus, a higher efficiency of "Turing" machine language
expression ( and code profile))
This is a meaningless question. A stack architecture could be
implemented with fewer "wires" (by which I think you mean "gates")
|
The two are not unrelated. Cross polysilicate with diffusion, and
one has a transistor gate (FET) -- at least, for NMOS, PMOS, or
CMOS architectures. Of course in CMOS one has to have another
transistor somewhere else, of the opposing type in the transistor
wiring "graph": a 2-input NAND gate, in particular, has 2 N-types
in series and 2 P-types in parallel.
(With my luck all this is in a FAQ somewhere. :-) )
| Quote: | if it
stores the stack entirely in off-chip memory, but then it would be much
slower than a register-based machine.
|
I for one think he's thinking internal stack. An 8-way
numeric stack, though, is rather small, considering that
modern micros have 2 MB or more internal memory cache.
I'd probably want to use two 1kword or 4Kword barrel shift
registers. The ALU would connect to the top four slices
of the barrel. Ideally, the physical layout would in fact
look a bit like a barrel, to optimize propagation delay.
However, I'm far from expert in this stuff.
One possibility might be to replace the flipflops in
a traditional barrel shift register with a DRAM unit
(transistor + capacitor); the barrel shift register would
then *have* to shift (either forward or backward) every
clockpulse transition, perhaps.
| Quote: | Most reasonable stack machines
keep the top N stack entries in on-chip registers, which makes it look
pretty similar to a register-based architecture from the point of view
of chip resources. On the other hand, a register-based machine could
keep its registers in off-chip memory in order to save gates, but this
would be a pretty stupid design.
|
Actually, the 486 does exactly this, if one switches contexts.
Basically, the registers are shoved into a TSS structure
in memory.
I suspect more modern chips have similar capabilities.
| Quote: |
However, counting gates (or "wires") is not the way to determine the
efficiency of a chip. In general, chips with more gates are MORE
efficient, since they implement a lot of optimizations which are not
possible in smaller chips.
|
I for one would think it depends on what one wants to optimize.
[1] Raw chip speed -- how fast can that sucker go?
[2] Chip power dissipation.
[3] Chip size.
[4] Number of transistors. (This is not quite the same as chip size,
since other variables include fanin or fanout per transistor.)
[5] Number of transistor flips during execution of a specific problem
(e.g., Erastothene's Sieve). Presumably, this is related
to [2].
| Quote: |
But possibly I entirely misunderstood your point, because your posting
is very unclear.
Also, when people reply to you and you just repost your original post as
a reply to them, it makes it look like you can't understand their
replies (or that you're a bot). You should at least respond to the
substance of posts that reply to you.
|
I'm not sure my reply was all that basic. :-) But it's clear he
didn't pursue the details thereof.
--
#191, ewill3@earthlink.net
It's still legal to go .sigless. |
|
| Back to top |
|
 |
The Ghost In The Machine
Guest
|
Posted:
Mon Oct 24, 2005 12:15 am Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
In sci.math, A Man Crying Alone In The Wilderness
<cpu16x1832@wmconnect.com>
wrote
on 23 Oct 2005 11:16:57 -0700
<1130091417.708926.74340@g49g2000cwa.googlegroups.com>:
| Quote: | Come on now, you are less of an idiot to understand this,
IBM/INTEL architecture,
REGISTER_1 ( A storage location)
REGISTER_2 ( A storage location)
REGISTER_3 ( A storage location)
( etc. . . . )
REGISTER_16 ( A storage location)
V=2ES
single stack enhanced architecture, ( dynamic frequency profiled)
STACK_1 [ 1..8]
STACK_2 [ 1..4]
STACK_3 [ 1..4]
Which one do you believe requires less chip internal hardware wires?
|
Depends on how one pushes and pops the stacks, perhaps. I'll admit
I don't see STACK_1 being deep enough. Also, is there a reason for
3 separate stacks? My hypothetical required only two: numeric
values and codepointers.
Did you anticipate using something along the lines of dual barrel
shift registers? That makes some sense, if it's fast enough;
however, there's a lot of issues regarding pipelining with a
stack register architecture; basically, the second instruction
can't execute until the first one's done playing with the stack.
At least in a register-based architecture where one has the code
sequence
SUB AX, BX
ADD CX, DX
one could conceivably be executing the SUB instruction and the ADD
instruction more or less simultaneously. Ideally, though, the
simpler architecture would run at a higher clockrate.
Perhaps if you were to clarify what you mean by "chip internal hardware
wiring"? For instance, does that mean:
[1] die size, given a certain transistor size?
[2] total wiring area?
[3] number of vias?
[4] a combination of the above?
Note also that buffer transistors -- those things that have to drive
the outside world pins -- are huge compared to the internal wiring.
And there's a lot of them. Try to optimize the internal wiring
too much and one might just waste space.
| Quote: |
( and, thus, a higher efficiency of "Turing" machine language
expression ( and code profile))
|
Turing machines don't do arithmetic all that well. If one postulates,
for example, a decimal number, followed by a blank, followed by
another decimal number, followed by an indefinite number of blanks,
one could do the following.
state 0, any char but blank; write that char, right, state 0
state 0, blank: write blank, back up, state 1
state 1, char '0': write '0', go to state 2-0
state 1, char '1': write '1', go to state 2-1
....
state 1, char '9': write '9', go to state 2-9
state 2-x, blank: write blank, right, go to state 3-x
state 3-x, any char but blank, write that char, right, stay in this state
state 3-x, blank: write blank, left, go to state 4-x
state 4-x, char 'y': write 'y', right, go to state 5-{x+y} or 6-{x+y-10}
I could go on but it gets pretty tedious. :-) And that's for
*addition*; I shudder what I would have to do for multiplication
or division.
If one postulates two binary numbers as opposed to two decimal
ones, the machine gets slightly simpler but it's still pretty
tedious.
Of course one could postulate a 2^32+1 character alphabet, and
an impossibly huge state matrix, if one wishes. That gets
slightly silly, though.
| Quote: |
( HINT : Have you every read about minimal ANSI FORTH machines? )
|
Can't say I have. I know a little Forth; it's a strange language,
which can modify itself. Very interesting and efficient, but it
doesn't do files all that well; the traditional method involves
numbered screen loading, as I recall. Of course that was way
back then.
| Quote: |
MIMD Multiple Instruction Multiple Data
VLIW Variable Length Instruction Word
MPP Multiple Parallel Processors ( many SMPs linked together like an
interconnecting LEGO(tm)-like block game to add more processing power )
|
Bit-slice architectures have been known for years, if not decades.
| Quote: | SMP Symmetric Multiple Processor ( like, multiple cores on a single CPU
chip)
( between sixteen and two with IBM/Intel, set at a constant factor of
sixteen and derivative of super-scalable application dynamic frequency
profile )
I have been shouting news of the VLIW SMP MPP FORTH formula to
Washington and has been published, since 1996, all around the St. Paul
and Minneapolis Minnesota area.
However, IBM/Intel continues to shout anti-news.
---
A simple enumeration of basic primitives with a stack enhanced
architecture yields an powerful micro processor core. ( For example
ANSI FORTH machine implicit and explicit primitives, JUMP_IF_ZERO JUMP
CALL RETURN LITERAL 0< AND XOR DROP OVER DUP @ ! 2* 2/ >R R> INVERT + )
---
The Ghost In The Machine wrote:
SNIP
Note that I'm not really specifying a word size, although
most contemporary architectures would be 32 or 64 bits.
Maybe investigate a 16-bit 16-way SMP core dual-bus architecture with
16-bit instructions aligned every 64-bits for an optimum primitives
profile.
|
Why so low? 64-bit is the way to go, if one can afford the die space.
The practical considerations are these:
[1] How many die per year can one fabricate? Note that this is a
function of wafer size, yield, transistor size, and process complexity;
the smaller the transistors the more sensitive they are to
process variations.
[2] How much does each die cost to make?
[3] How much can one sell each die for?
[4] How well does a die actually work in regards to contemporary
microprocessors?
I don't see FORTH being limited to 16-bit.
--
#191, ewill3@earthlink.net
It's still legal to go .sigless. |
|
| Back to top |
|
 |
A Man Crying Alone In The
Guest
|
Posted:
Mon Oct 24, 2005 7:12 am Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
Ken Smith wrote:
| Quote: | In article <fudt23-d6s.ln1@sirius.tg00suus7038.net>,
The Ghost In The Machine <ewill@sirius.tg00suus7038.net> wrote:
[...]
I for one would think it depends on what one wants to optimize.
[1] Raw chip speed -- how fast can that sucker go?
[2] Chip power dissipation.
[3] Chip size.
[4] Number of transistors. (This is not quite the same as chip size,
since other variables include fanin or fanout per transistor.)
[5] Number of transistor flips during execution of a specific problem
(e.g., Erastothene's Sieve). Presumably, this is related
to [2].
[6] How fast it will go running something compiled with a C compiler a
mere mortal can design.
In a very pipelined machine, you can get more speed per transistor by
making it the compilers job to make sure that two numbers aren't trying to
go down the same bus. If different instructions have all manner of
different timings, coming up with the optimum code can be very tricky.
--
--
kensmith@rahul.net forging knowledge
|
As balanced for high microprocessor efficiency, an MPP SMP stack
machine architecture for FORTH, C, Scheme, Java,
you-name-it-computer-programming-language
It uses a simple stack to stack messaging, for both SMP multi core,
internally and MPP, CPU16-to-CPU16, externally, for simply solving MIMD
and a host of other SMP multi core chip design problems, ...
as you may read, a C compiler is almost an IBM/Intel no-brainer,
In general, microprocessor efficiency minimizes transistor count and
maximizes utilization of those transistors, however, externally,
traditional "bandwidth" benchmarking program suites, a 'raw' efficiency
will be displayed, even more so where a benchmark relies upon parallel
architectures, I guess ten to ONE HUNDRED times faster, for some
real-world practical parallel programming benchmark suites. (
hydrodynamic or thermodynamic modeling, etc. )
This model is the most efficient SMP MPP microprocessor model I have
reference, a hydrid of Mr. Moore's work and mine, and, as a final note,
I am having difficulty developing my chip model any further than this,
URL,
http://groups.google.com/group/comp.lang.java.machine/msg/b400d03ddc0f5a4f?dmode=source&hl=en
Here is 16-bit VLIW protocol reference, ( from dynamic profiling)
URL,
http://groups.google.com/group/comp.lang.java.programmer/msg/028ab82ac81f0014?dmode=source&hl=en
Regards,
maw |
|
| Back to top |
|
 |
Guest
|
Posted:
Mon Oct 24, 2005 8:15 am Post subject:
Re: A stupid post about Intel's latest computer chip ( s) |
|
|
The Ghost In The Machine wrote:
| Quote: | In sci.math, A Man Crying Alone In The Wilderness
cpu16x1832@wmconnect.com
wrote
on 23 Oct 2005 11:16:57 -0700
1130091417.708926.74340@g49g2000cwa.googlegroups.com>:
Come on now, you are less of an idiot to understand this,
IBM/INTEL architecture,
REGISTER_1 ( A storage location)
REGISTER_2 ( A storage location)
REGISTER_3 ( A storage location)
( etc. . . . )
REGISTER_16 ( A storage location)
V=2ES
single stack enhanced architecture, ( dynamic frequency profiled)
STACK_1 [ 1..8]
STACK_2 [ 1..4]
STACK_3 [ 1..4]
Which one do you believe requires less chip internal hardware wires?
Depends on how one pushes and pops the stacks, perhaps. I'll admit
I don't see STACK_1 being deep enough. Also, is there a reason for
3 separate stacks? My hypothetical required only two: numeric
values and codepointers.
|
I currently use five stacks, for my "Holy Grail" almost all purpose
super scalable multi core architecture model,
COPIED FROM ANOTHER POST, URL,
http://groups.google.com/group/comp.lang.forth/msg/2c7a2008f7d2fbd2?dmode=source&hl=en
"
Example extended on-chip stack register map,
sixteen ( 16) return stack elements,
eight ( 8) parameter stack elements,
four ( 4) Supplementary stack elements ( X, Y),
thirty two ( 32) status /machine state logic/ stack elements
"
Regards,
maw |
|
| Back to top |
|
 |
|
|
|
|