| Author |
Message |
Peter \"Firefly\" Lund
Guest
|
Posted:
Mon Aug 08, 2005 4:15 pm Post subject:
internal call/ret stack |
|
|
When was the first CPU implementation that emulated the return stack
internally in order to make the return instructions go faster?
How old is the idea?
-Peter |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Mon Aug 08, 2005 4:15 pm Post subject:
Re: internal call/ret stack |
|
|
In article <1123517066.543564.161710@g14g2000cwa.googlegroups.com>,
MitchAlsup@aol.com writes:
|> In a pendanic way, some of the earliest machines simply stored the
|> return address in a (particular) GPR so that a return could use the
|> address in that register. Does this qualifies as a 1 deep stack? This
|> trick goes back to at lest the early 1960's, and quite probably into
|> the 1950s.
Definitely the 1950s. Perhaps the 1940s :-)
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Guest
|
Posted:
Mon Aug 08, 2005 4:15 pm Post subject:
Re: internal call/ret stack |
|
|
In a pendanic way, some of the earliest machines simply stored the
return address in a (particular) GPR so that a return could use the
address in that register. Does this qualifies as a 1 deep stack? This
trick goes back to at lest the early 1960's, and quite probably into
the 1950s. |
|
| Back to top |
|
 |
Colonel Forbin
Guest
|
Posted:
Mon Aug 08, 2005 10:22 pm Post subject:
Re: internal call/ret stack |
|
|
In article <dd805a$7c7$1@gemini.csx.cam.ac.uk>,
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
| Quote: |
In article <1123517066.543564.161710@g14g2000cwa.googlegroups.com>,
MitchAlsup@aol.com writes:
|> In a pendanic way, some of the earliest machines simply stored the
|> return address in a (particular) GPR so that a return could use the
|> address in that register. Does this qualifies as a 1 deep stack? This
|> trick goes back to at lest the early 1960's, and quite probably into
|> the 1950s.
Definitely the 1950s. Perhaps the 1940s :-)
|
Rumor has it that people of a certain European region invented a similar
notion long before.
A couple fishermen went out in a rented boat and had a very productive
day in a certain spot.
One of them wisely decided to mark this spot so they could return to it.
He painted a big "X" on the bottom of their boat.
Upon returing to port, he revealed his inventive idea to his business
partner.
The partner chastised him vehemently: "How do you know we'll get the
same boat?"
The same applies to computer optimization. "How do you know we'll get
the same workload?" :) |
|
| Back to top |
|
 |
Anton Ertl
Guest
|
Posted:
Mon Aug 08, 2005 11:30 pm Post subject:
Re: internal call/ret stack |
|
|
MitchAlsup@aol.com writes:
| Quote: | In a pendanic way, some of the earliest machines simply stored the
return address in a (particular) GPR so that a return could use the
address in that register. Does this qualifies as a 1 deep stack?
|
And some architectures (IIRC 8051) have an architectural on-chip
return stack with depth >1 that is not accessible in any other way.
However, I guess that the OP was asking about non-architectural return
stacks that are just used for predicting the real return addresses
(which are coming from another source).
The 21064 (1992) had one; I don't remember earlier CPUs that had one.
- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html |
|
| Back to top |
|
 |
Peter \"Firefly\" Lund
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
On Mon, 8 Aug 2005 MitchAlsup@aol.com wrote:
| Quote: | In a pendanic way, some of the earliest machines simply stored the
return address in a (particular) GPR so that a return could use the
address in that register. Does this qualifies as a 1 deep stack? This
|
No ;)
I meant an internal stack that shadows the external stack for return
addresses.
Did the 88K chips use that, for example?
-Peter |
|
| Back to top |
|
 |
glen herrmannsfeldt
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
Nick Maclaren wrote:
| Quote: | In article <1123517066.543564.161710@g14g2000cwa.googlegroups.com>,
MitchAlsup@aol.com writes:
|> In a pendanic way, some of the earliest machines simply stored the
|> return address in a (particular) GPR so that a return could use the
|> address in that register. Does this qualifies as a 1 deep stack? This
|> trick goes back to at lest the early 1960's, and quite probably into
|> the 1950s.
Definitely the 1950s. Perhaps the 1940s :-)
|
And still used by the usual calling convention in z/Architecture.
For the OS/360 days it was convenient in allowing a static save
area when code didn't need to be reentrant, but only serially
reusable (IBM term). Fortran compiles generated static data areas
and static register save areas.
When you need reentrancy, dynamic storage allocated as a doubly linked
list also works for the register save area.
The PDP-8 stores the return address as the first word of the called
routine and then starts executing at the following word. This worked
well until it was desirable to store programs in ROM.
-- glen |
|
| Back to top |
|
 |
Peter \"Firefly\" Lund
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
On Mon, 8 Aug 2005, Anton Ertl wrote:
| Quote: | And some architectures (IIRC 8051) have an architectural on-chip
return stack with depth >1 that is not accessible in any other way.
|
You could access it just fine on the 8051. I don't think you could on
the (very simple) ST62, though.
| Quote: | However, I guess that the OP was asking about non-architectural return
stacks that are just used for predicting the real return addresses
(which are coming from another source).
|
Precisely.
| Quote: | The 21064 (1992) had one; I don't remember earlier CPUs that had one.
|
That early? I thought only 21164 and up had one. But it sounds unlikely
that it should be the first, doesn't it?
-Peter |
|
| Back to top |
|
 |
Peter \"Firefly\" Lund
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
On Mon, 8 Aug 2005 MitchAlsup@aol.com wrote:
| Quote: | address in that register. Does this qualifies as a 1 deep stack? This
trick goes back to at lest the early 1960's, and quite probably into
the 1950s.
|
I think this qualifies:
C.F. Webb, "Subroutine call/return stack", IBM Technical Disclosure
Bulletin 30(11):221-225, Apr. 1988.
I couldn't find a web-accessible copy, though.
-Peter |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
"Peter "Firefly" Lund" <firefly@diku.dk> wrote in message
news:Pine.LNX.4.61.0508081538090.12917@tyr.diku.dk...
| Quote: |
When was the first CPU implementation that emulated the return stack
internally in order to make the return instructions go faster?
|
The performance gains from such would be negligible,
unless the procedures or subroutines from which the
code returns touch no memory at all.
| Quote: | How old is the idea?
|
Probably older than you and me. Quite a few ISA's
leave the return address in a general register
after subroutine calls, so returns from leaf
procedures can use it directly without the
need to load the return address from the stack.
Think about it this way: if you're going to
restore an entire stack frame from memory,
one more register doesn't make that much of
a difference.
One sees hw return stacks mostly in microcode,
since microcode is usually not allowed to store
internal state to main memory. Many microcode
architectures have some number of scratch
registers plus a stack that holds up to n
microcode return addresses, where n is
typically a small number (3 is pretty
common).
dk |
|
| Back to top |
|
 |
Alex McDonald
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
glen herrmannsfeldt wrote:
| Quote: | Nick Maclaren wrote:
In article <1123517066.543564.161710@g14g2000cwa.googlegroups.com>,
MitchAlsup@aol.com writes:
|> In a pendanic way, some of the earliest machines simply stored the
|> return address in a (particular) GPR so that a return could use the
|> address in that register. Does this qualifies as a 1 deep stack? This
|> trick goes back to at lest the early 1960's, and quite probably into
|> the 1950s.
Definitely the 1950s. Perhaps the 1940s :-)
And still used by the usual calling convention in z/Architecture.
|
For S360/370, R14 was used as a return address; the standard convention
was BALR (or BASR) R14,R15, and as R15 was used as both entry point and
return code, R12 was normally used as a base register. It was very
rare indeed to see anything other than R14 return, R15 entry point/rc
linkage; although any registers could have been used for the purpose, it
was considered very bad form to do so, as it had the downside of making
dumps difficult to analyse.
| Quote: |
For the OS/360 days it was convenient in allowing a static save
area when code didn't need to be reentrant, but only serially
reusable (IBM term). Fortran compiles generated static data areas
and static register save areas.
When you need reentrancy, dynamic storage allocated as a doubly linked
list also works for the register save area.
|
Dynamically allocated save areas could lead to performance problems with
lots of GETMAIN/FREEMAIN calls to small-ish blocks for the save area and
any associated, non-constant and discardable work data. Efficent
re-entrant code was always a problem, especially with the limitations of
4k addressable blocks of code or data with a single register; it tended
to encourage small routines operating on small blocks of data, and hence
lots of save areas.
A better solution was to use R13 in the main entry point to GETMAIN once
a block, and treat it like a display on a stack -- STM and A R13,const
on entry to a subroutine, LM and S R13,const on exit. The doubly linked
list was then only required if you needed the save areas formatted in a
dump. I built up quite an extensive library of source & object code
based on this technique.
Calculating the total required size of the area with multiple CSECTs in
the final executable was always problematic. I seem to remember
employing Q type constants to do this (a link edit feature that PL/I
used iirc), but I couldn't tell you how exactly; it's been nearly three
decades since I last did this in anger.
I don't claim to have invented this technique, but I don't remember ever
being taught it. I wonder how many times it's been "rediscovered"?
--
Regards
Alex McDonald |
|
| Back to top |
|
 |
Peter \"Firefly\" Lund
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
On Mon, 8 Aug 2005, Dan Koren wrote:
| Quote: | When was the first CPU implementation that emulated the return stack
internally in order to make the return instructions go faster?
|
(The key word was "emulated". "the return instructions" was also
important.)
| Quote: | The performance gains from such would be negligible,
unless the procedures or subroutines from which the
code returns touch no memory at all.
|
Or touch only cached memory. In any case, they are implemented (even on
my lowly AMD K6-2) and are a key feature of the control transfer
prediction machinery of modern CPUs.
| Quote: | How old is the idea?
Probably older than you and me. Quite a few ISA's
|
Maybe, maybe not.
| Quote: | leave the return address in a general register
|
I know, but that's a completely different issue.
| Quote: | after subroutine calls, so returns from leaf
procedures can use it directly without the
need to load the return address from the stack.
Think about it this way: if you're going to
restore an entire stack frame from memory,
one more register doesn't make that much of
a difference.
|
Finding the instruction to return to quickly can be of immense value.
| Quote: | One sees hw return stacks mostly in microcode,
since microcode is usually not allowed to store
internal state to main memory. Many microcode
architectures have some number of scratch
registers plus a stack that holds up to n
microcode return addresses, where n is
typically a small number (3 is pretty
common).
|
Data General MV/8000 apparently had a stack of 16 levels -- and 4096
microcode words of 75 bits each. And two decode ROMs (containing the
address of the first microinstruction implementing a given
macroinstruction for the Nova and for the Eagle) + a predecoder that
selected between the two decode ROMs.
Jonathan S. Blau, Charles J. Holland, David L. Keating, "The
Micro-Architecture of the Eclipse MV/8000: Conception and Implementation",
some obscure IEEE publication, 1980.
4096 x 75 bits seems like a lot to me -- but then again, I have no useful
intuition about microcode.
-Peter |
|
| Back to top |
|
 |
Peter \"Firefly\" Lund
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
On Mon, 8 Aug 2005, glen herrmannsfeldt wrote:
| Quote: | The PDP-8 stores the return address as the first word of the called
routine and then starts executing at the following word. This worked
well until it was desirable to store programs in ROM.
|
How did they handle multitasking?
-Peter |
|
| Back to top |
|
 |
glen herrmannsfeldt
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
Peter "Firefly" Lund wrote:
| Quote: | On Mon, 8 Aug 2005, Dan Koren wrote:
When was the first CPU implementation that emulated the return stack
internally in order to make the return instructions go faster?
(The key word was "emulated". "the return instructions" was also
important.)
|
Emulated is a funny word here. Does it still notice if the
actual return address on the stack is changed? Unless the
architecture specified that it wasn't checked, it would seem to
give the wrong results in some cases. Using it for speculative
instruction prefetch, though, could be done until the real
return address came off the stack.
(snip)
| Quote: | Data General MV/8000 apparently had a stack of 16 levels -- and 4096
microcode words of 75 bits each. And two decode ROMs (containing the
address of the first microinstruction implementing a given
macroinstruction for the Nova and for the Eagle) + a predecoder that
selected between the two decode ROMs.
|
(snip)
| Quote: | 4096 x 75 bits seems like a lot to me -- but then again, I have no
useful intuition about microcode.
|
A little wide for the usual vertical microcode, not quite wide
enough for usual horizontal. I believe there are machines that
execute one microinstruction for most macroinstructions, possibly
over 100 bits wide.
-- glen |
|
| Back to top |
|
 |
glen herrmannsfeldt
Guest
|
Posted:
Tue Aug 09, 2005 12:16 am Post subject:
Re: internal call/ret stack |
|
|
Alex McDonald wrote:
(snip regarding return addresses in registers)
| Quote: | For S360/370, R14 was used as a return address; the standard convention
was BALR (or BASR) R14,R15, and as R15 was used as both entry point and
return code, R12 was normally used as a base register. It was very rare
indeed to see anything other than R14 return, R15 entry point/rc
linkage; although any registers could have been used for the purpose, it
was considered very bad form to do so, as it had the downside of making
dumps difficult to analyse.
|
I have seen it done for internal routines. Also, for coroutines
you can do something like:
BALR R7,R7
so that it transfers between routines using only one linkage register.
(snip)
| Quote: | When you need reentrancy, dynamic storage allocated as a doubly linked
list also works for the register save area.
Dynamically allocated save areas could lead to performance problems with
lots of GETMAIN/FREEMAIN calls to small-ish blocks for the save area and
any associated, non-constant and discardable work data. Efficent
re-entrant code was always a problem, especially with the limitations of
4k addressable blocks of code or data with a single register; it tended
to encourage small routines operating on small blocks of data, and hence
lots of save areas.
|
If you are allocating automatic variables then it can be added onto
that. But yes, the GETMAIN overhead is high enough that one should do
larger GETMAINs and suballocate them, for save areas and automatic data.
| Quote: | A better solution was to use R13 in the main entry point to GETMAIN once
a block, and treat it like a display on a stack -- STM and A R13,const
on entry to a subroutine, LM and S R13,const on exit. The doubly linked
list was then only required if you needed the save areas formatted in a
dump. I built up quite an extensive library of source & object code
based on this technique.
|
Save area traceback is nice sometimes.
-- glen |
|
| Back to top |
|
 |
|
|
|
|