internal call/ret stack
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
internal call/ret stack
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Peter \"Firefly\" Lund
Guest





Posted: Mon Aug 08, 2005 4:15 pm    Post subject: internal call/ret stack Reply with quote

When was the first CPU implementation that emulated the return stack
internally in order to make the return instructions go faster?

How old is the idea?

-Peter
Back to top
Nick Maclaren
Guest





Posted: Mon Aug 08, 2005 4:15 pm    Post subject: Re: internal call/ret stack Reply with quote

In article <1123517066.543564.161710@g14g2000cwa.googlegroups.com>,
MitchAlsup@aol.com writes:
|> In a pendanic way, some of the earliest machines simply stored the
|> return address in a (particular) GPR so that a return could use the
|> address in that register. Does this qualifies as a 1 deep stack? This
|> trick goes back to at lest the early 1960's, and quite probably into
|> the 1950s.

Definitely the 1950s. Perhaps the 1940s :-)


Regards,
Nick Maclaren.
Back to top
Guest






Posted: Mon Aug 08, 2005 4:15 pm    Post subject: Re: internal call/ret stack Reply with quote

In a pendanic way, some of the earliest machines simply stored the
return address in a (particular) GPR so that a return could use the
address in that register. Does this qualifies as a 1 deep stack? This
trick goes back to at lest the early 1960's, and quite probably into
the 1950s.
Back to top
Colonel Forbin
Guest





Posted: Mon Aug 08, 2005 10:22 pm    Post subject: Re: internal call/ret stack Reply with quote

In article <dd805a$7c7$1@gemini.csx.cam.ac.uk>,
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
Quote:

In article <1123517066.543564.161710@g14g2000cwa.googlegroups.com>,
MitchAlsup@aol.com writes:
|> In a pendanic way, some of the earliest machines simply stored the
|> return address in a (particular) GPR so that a return could use the
|> address in that register. Does this qualifies as a 1 deep stack? This
|> trick goes back to at lest the early 1960's, and quite probably into
|> the 1950s.

Definitely the 1950s. Perhaps the 1940s :-)

Rumor has it that people of a certain European region invented a similar
notion long before.

A couple fishermen went out in a rented boat and had a very productive
day in a certain spot.

One of them wisely decided to mark this spot so they could return to it.
He painted a big "X" on the bottom of their boat.

Upon returing to port, he revealed his inventive idea to his business
partner.

The partner chastised him vehemently: "How do you know we'll get the
same boat?"

The same applies to computer optimization. "How do you know we'll get
the same workload?" :)
Back to top
Anton Ertl
Guest





Posted: Mon Aug 08, 2005 11:30 pm    Post subject: Re: internal call/ret stack Reply with quote

MitchAlsup@aol.com writes:
Quote:
In a pendanic way, some of the earliest machines simply stored the
return address in a (particular) GPR so that a return could use the
address in that register. Does this qualifies as a 1 deep stack?

And some architectures (IIRC 8051) have an architectural on-chip
return stack with depth >1 that is not accessible in any other way.

However, I guess that the OP was asking about non-architectural return
stacks that are just used for predicting the real return addresses
(which are coming from another source).

The 21064 (1992) had one; I don't remember earlier CPUs that had one.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
Back to top
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

On Mon, 8 Aug 2005 MitchAlsup@aol.com wrote:

Quote:
In a pendanic way, some of the earliest machines simply stored the
return address in a (particular) GPR so that a return could use the
address in that register. Does this qualifies as a 1 deep stack? This

No ;)

I meant an internal stack that shadows the external stack for return
addresses.

Did the 88K chips use that, for example?

-Peter
Back to top
glen herrmannsfeldt
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

Nick Maclaren wrote:

Quote:
In article <1123517066.543564.161710@g14g2000cwa.googlegroups.com>,
MitchAlsup@aol.com writes:
|> In a pendanic way, some of the earliest machines simply stored the
|> return address in a (particular) GPR so that a return could use the
|> address in that register. Does this qualifies as a 1 deep stack? This
|> trick goes back to at lest the early 1960's, and quite probably into
|> the 1950s.

Definitely the 1950s. Perhaps the 1940s :-)

And still used by the usual calling convention in z/Architecture.

For the OS/360 days it was convenient in allowing a static save
area when code didn't need to be reentrant, but only serially
reusable (IBM term). Fortran compiles generated static data areas
and static register save areas.

When you need reentrancy, dynamic storage allocated as a doubly linked
list also works for the register save area.


The PDP-8 stores the return address as the first word of the called
routine and then starts executing at the following word. This worked
well until it was desirable to store programs in ROM.

-- glen
Back to top
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

On Mon, 8 Aug 2005, Anton Ertl wrote:

Quote:
And some architectures (IIRC 8051) have an architectural on-chip
return stack with depth >1 that is not accessible in any other way.

You could access it just fine on the 8051. I don't think you could on
the (very simple) ST62, though.

Quote:
However, I guess that the OP was asking about non-architectural return
stacks that are just used for predicting the real return addresses
(which are coming from another source).

Precisely.

Quote:
The 21064 (1992) had one; I don't remember earlier CPUs that had one.

That early? I thought only 21164 and up had one. But it sounds unlikely
that it should be the first, doesn't it?

-Peter
Back to top
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

On Mon, 8 Aug 2005 MitchAlsup@aol.com wrote:

Quote:
address in that register. Does this qualifies as a 1 deep stack? This
trick goes back to at lest the early 1960's, and quite probably into
the 1950s.

I think this qualifies:

C.F. Webb, "Subroutine call/return stack", IBM Technical Disclosure
Bulletin 30(11):221-225, Apr. 1988.

I couldn't find a web-accessible copy, though.

-Peter
Back to top
Dan Koren
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

"Peter "Firefly" Lund" <firefly@diku.dk> wrote in message
news:Pine.LNX.4.61.0508081538090.12917@tyr.diku.dk...
Quote:

When was the first CPU implementation that emulated the return stack
internally in order to make the return instructions go faster?


The performance gains from such would be negligible,
unless the procedures or subroutines from which the
code returns touch no memory at all.


Quote:
How old is the idea?


Probably older than you and me. Quite a few ISA's
leave the return address in a general register
after subroutine calls, so returns from leaf
procedures can use it directly without the
need to load the return address from the stack.

Think about it this way: if you're going to
restore an entire stack frame from memory,
one more register doesn't make that much of
a difference.

One sees hw return stacks mostly in microcode,
since microcode is usually not allowed to store
internal state to main memory. Many microcode
architectures have some number of scratch
registers plus a stack that holds up to n
microcode return addresses, where n is
typically a small number (3 is pretty
common).



dk
Back to top
Alex McDonald
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

glen herrmannsfeldt wrote:
Quote:
Nick Maclaren wrote:

In article <1123517066.543564.161710@g14g2000cwa.googlegroups.com>,
MitchAlsup@aol.com writes:
|> In a pendanic way, some of the earliest machines simply stored the
|> return address in a (particular) GPR so that a return could use the
|> address in that register. Does this qualifies as a 1 deep stack? This
|> trick goes back to at lest the early 1960's, and quite probably into
|> the 1950s.


Definitely the 1950s. Perhaps the 1940s :-)


And still used by the usual calling convention in z/Architecture.

For S360/370, R14 was used as a return address; the standard convention
was BALR (or BASR) R14,R15, and as R15 was used as both entry point and
return code, R12 was normally used as a base register. It was very
rare indeed to see anything other than R14 return, R15 entry point/rc
linkage; although any registers could have been used for the purpose, it
was considered very bad form to do so, as it had the downside of making
dumps difficult to analyse.

Quote:

For the OS/360 days it was convenient in allowing a static save
area when code didn't need to be reentrant, but only serially
reusable (IBM term). Fortran compiles generated static data areas
and static register save areas.

When you need reentrancy, dynamic storage allocated as a doubly linked
list also works for the register save area.


Dynamically allocated save areas could lead to performance problems with
lots of GETMAIN/FREEMAIN calls to small-ish blocks for the save area and
any associated, non-constant and discardable work data. Efficent
re-entrant code was always a problem, especially with the limitations of
4k addressable blocks of code or data with a single register; it tended
to encourage small routines operating on small blocks of data, and hence
lots of save areas.

A better solution was to use R13 in the main entry point to GETMAIN once
a block, and treat it like a display on a stack -- STM and A R13,const
on entry to a subroutine, LM and S R13,const on exit. The doubly linked
list was then only required if you needed the save areas formatted in a
dump. I built up quite an extensive library of source & object code
based on this technique.

Calculating the total required size of the area with multiple CSECTs in
the final executable was always problematic. I seem to remember
employing Q type constants to do this (a link edit feature that PL/I
used iirc), but I couldn't tell you how exactly; it's been nearly three
decades since I last did this in anger.

I don't claim to have invented this technique, but I don't remember ever
being taught it. I wonder how many times it's been "rediscovered"?

--
Regards
Alex McDonald
Back to top
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

On Mon, 8 Aug 2005, Dan Koren wrote:

Quote:
When was the first CPU implementation that emulated the return stack
internally in order to make the return instructions go faster?

(The key word was "emulated". "the return instructions" was also
important.)

Quote:
The performance gains from such would be negligible,
unless the procedures or subroutines from which the
code returns touch no memory at all.

Or touch only cached memory. In any case, they are implemented (even on
my lowly AMD K6-2) and are a key feature of the control transfer
prediction machinery of modern CPUs.

Quote:
How old is the idea?


Probably older than you and me. Quite a few ISA's

Maybe, maybe not.

Quote:
leave the return address in a general register

I know, but that's a completely different issue.

Quote:
after subroutine calls, so returns from leaf
procedures can use it directly without the
need to load the return address from the stack.

Think about it this way: if you're going to
restore an entire stack frame from memory,
one more register doesn't make that much of
a difference.

Finding the instruction to return to quickly can be of immense value.

Quote:
One sees hw return stacks mostly in microcode,
since microcode is usually not allowed to store
internal state to main memory. Many microcode
architectures have some number of scratch
registers plus a stack that holds up to n
microcode return addresses, where n is
typically a small number (3 is pretty
common).

Data General MV/8000 apparently had a stack of 16 levels -- and 4096
microcode words of 75 bits each. And two decode ROMs (containing the
address of the first microinstruction implementing a given
macroinstruction for the Nova and for the Eagle) + a predecoder that
selected between the two decode ROMs.

Jonathan S. Blau, Charles J. Holland, David L. Keating, "The
Micro-Architecture of the Eclipse MV/8000: Conception and Implementation",
some obscure IEEE publication, 1980.

4096 x 75 bits seems like a lot to me -- but then again, I have no useful
intuition about microcode.

-Peter
Back to top
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

On Mon, 8 Aug 2005, glen herrmannsfeldt wrote:

Quote:
The PDP-8 stores the return address as the first word of the called
routine and then starts executing at the following word. This worked
well until it was desirable to store programs in ROM.

How did they handle multitasking?

-Peter
Back to top
glen herrmannsfeldt
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

Peter "Firefly" Lund wrote:

Quote:
On Mon, 8 Aug 2005, Dan Koren wrote:

When was the first CPU implementation that emulated the return stack
internally in order to make the return instructions go faster?

(The key word was "emulated". "the return instructions" was also
important.)

Emulated is a funny word here. Does it still notice if the
actual return address on the stack is changed? Unless the
architecture specified that it wasn't checked, it would seem to
give the wrong results in some cases. Using it for speculative
instruction prefetch, though, could be done until the real
return address came off the stack.

(snip)

Quote:
Data General MV/8000 apparently had a stack of 16 levels -- and 4096
microcode words of 75 bits each. And two decode ROMs (containing the
address of the first microinstruction implementing a given
macroinstruction for the Nova and for the Eagle) + a predecoder that
selected between the two decode ROMs.

(snip)
Quote:
4096 x 75 bits seems like a lot to me -- but then again, I have no
useful intuition about microcode.

A little wide for the usual vertical microcode, not quite wide
enough for usual horizontal. I believe there are machines that
execute one microinstruction for most macroinstructions, possibly
over 100 bits wide.

-- glen
Back to top
glen herrmannsfeldt
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

Alex McDonald wrote:

(snip regarding return addresses in registers)

Quote:
For S360/370, R14 was used as a return address; the standard convention
was BALR (or BASR) R14,R15, and as R15 was used as both entry point and
return code, R12 was normally used as a base register. It was very rare
indeed to see anything other than R14 return, R15 entry point/rc
linkage; although any registers could have been used for the purpose, it
was considered very bad form to do so, as it had the downside of making
dumps difficult to analyse.

I have seen it done for internal routines. Also, for coroutines
you can do something like:

BALR R7,R7

so that it transfers between routines using only one linkage register.

(snip)

Quote:
When you need reentrancy, dynamic storage allocated as a doubly linked
list also works for the register save area.

Dynamically allocated save areas could lead to performance problems with
lots of GETMAIN/FREEMAIN calls to small-ish blocks for the save area and
any associated, non-constant and discardable work data. Efficent
re-entrant code was always a problem, especially with the limitations of
4k addressable blocks of code or data with a single register; it tended
to encourage small routines operating on small blocks of data, and hence
lots of save areas.

If you are allocating automatic variables then it can be added onto
that. But yes, the GETMAIN overhead is high enough that one should do
larger GETMAINs and suballocate them, for save areas and automatic data.

Quote:
A better solution was to use R13 in the main entry point to GETMAIN once
a block, and treat it like a display on a stack -- STM and A R13,const
on entry to a subroutine, LM and S R13,const on exit. The doubly linked
list was then only required if you needed the save areas formatted in a
dump. I built up quite an extensive library of source & object code
based on this technique.

Save area traceback is nice sometimes.

-- glen
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB