internal call/ret stack
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
internal call/ret stack
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: internal call/ret stack Reply with quote

On Mon, 8 Aug 2005, glen herrmannsfeldt wrote:

Quote:
Emulated is a funny word here. Does it still notice if the
actual return address on the stack is changed? Unless the
architecture specified that it wasn't checked, it would seem to
give the wrong results in some cases. Using it for speculative
instruction prefetch, though, could be done until the real
return address came off the stack.

It's only used for speculation, sort of like branch target prediction,
only for returns. It is not part of the visible architecture - it only
shows up as bad or good performance. If all goes right, procedure calls
are almost as fast as inline code. If it doesn't... ;)

Quote:
A little wide for the usual vertical microcode, not quite wide
enough for usual horizontal. I believe there are machines that
execute one microinstruction for most macroinstructions, possibly
over 100 bits wide.

Thanks. What about the number of instructions? I think there were
slightly more than 400 instructions supported on that machine (by whatever
count) -- but all macroinstruction implementations ended with a pop (and
if the microstack was empty, you would get the address from the decode ROM
and begin the execution of the next macroinstruction), which made it easy
to reuse the implementations of other macroinstructions. At least, that's
what they boasted in the paper.

As far as I understand the Nova instruction set, each instruction word
consists of 16 bits, partitioned into a number of almost independent
fields. Because of that structure I don't see why so many
microinstructions would be necessary -- you should only get lots of
instructions if you multiply the variation in the fields together. I
don't know the Eagle (32-bit) instruction set but it is supposed to be
similar to the Nova's.

-Peter
Back to top
glen herrmannsfeldt
Guest





Posted: Tue Aug 09, 2005 5:35 am    Post subject: Re: internal call/ret stack Reply with quote

Peter "Firefly" Lund wrote:

(snip, I wrote)
Quote:
A little wide for the usual vertical microcode, not quite wide
enough for usual horizontal. I believe there are machines that
execute one microinstruction for most macroinstructions, possibly
over 100 bits wide.

Thanks. What about the number of instructions? I think there were
slightly more than 400 instructions supported on that machine (by
whatever count) -- but all macroinstruction implementations ended with a
pop (and if the microstack was empty, you would get the address from the
decode ROM and begin the execution of the next macroinstruction), which
made it easy to reuse the implementations of other macroinstructions.
At least, that's what they boasted in the paper.

Well, if it is two sets of 400 instructions that is about five
each. I think more usual would be one bit that says pop after
this instruction. The decrease in the number of microinstructions
executed more than makes up for the extra width.

Quote:
As far as I understand the Nova instruction set, each instruction word
consists of 16 bits, partitioned into a number of almost independent
fields. Because of that structure I don't see why so many
microinstructions would be necessary -- you should only get lots of
instructions if you multiply the variation in the fields together. I
don't know the Eagle (32-bit) instruction set but it is supposed to be
similar to the Nova's.

There is always a tradeoff between microinstruction width and the number
of microinstructions per macro instruction. Unrolling loops would
speed things up but take more control store. Most likely it is worth
doing.

-- glen
Back to top
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 5:38 am    Post subject: Re: internal call/ret stack Reply with quote

On Mon, 8 Aug 2005, glen herrmannsfeldt wrote:

Quote:
Well, if it is two sets of 400 instructions that is about five

I think it's 400 in total.

Quote:
each. I think more usual would be one bit that says pop after
this instruction.

And I think that's how they did it.

Quote:
There is always a tradeoff between microinstruction width and the number
of microinstructions per macro instruction. Unrolling loops would
speed things up but take more control store. Most likely it is worth
doing.

But the Nova instruction set looks like a really simple and stupid LIW
machine. Why try to fold the microcode for those different fields into
/one/ stream of microinstructions?

-Peter
Back to top
Dan Koren
Guest





Posted: Tue Aug 09, 2005 6:16 am    Post subject: Re: internal call/ret stack Reply with quote

"Peter "Firefly" Lund" <firefly@diku.dk> wrote in message
news:Pine.LNX.4.61.0508090135400.9343@tyr.diku.dk...
Quote:
On Mon, 8 Aug 2005, glen herrmannsfeldt wrote:

The PDP-8 stores the return address as the first word of the called
routine and then starts executing at the following word. This worked
well until it was desirable to store programs in ROM.

How did they handle multitasking?



With great care! ;-)

HP-1000 also used a similar call/return convention.

The trick was that jump to subroutine instructions
turned off interrupts for the duration of the
immediately following instruction, which allowed
the return address to be stored some place else
in memory in uninterruptible fashion.


dk
Back to top
Anton Ertl
Guest





Posted: Tue Aug 09, 2005 8:15 am    Post subject: Re: internal call/ret stack Reply with quote

"Peter \"Firefly\" Lund" <firefly@diku.dk> writes:
Quote:
I meant an internal stack that shadows the external stack for return
addresses.

Did the 88K chips use that, for example?

The 88100 didn't. With the two-cycle branch latency, and the option
of delayed branches, it would not make much sense.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
Back to top
glen herrmannsfeldt
Guest





Posted: Tue Aug 09, 2005 8:15 am    Post subject: Re: internal call/ret stack Reply with quote

Peter "Firefly" Lund wrote:

(snip)

Quote:
But the Nova instruction set looks like a really simple and stupid LIW
machine. Why try to fold the microcode for those different fields into
/one/ stream of microinstructions?

Is it like VAX compatibility mode where it could execute PDP-11
instructions? It was an option in early VAX which made the
transition easier.

It is simple, but it allows an easy upgrade path for people
with existing machines.

(Also, the early S/360 machines had optional emulators
for previous IBM machines, again allowing for an easy upgrade
while allowing time to convert to the new system.)


-- glen
Back to top
Terje Mathisen
Guest





Posted: Tue Aug 09, 2005 8:15 am    Post subject: Re: internal call/ret stack Reply with quote

Alex McDonald wrote:
Quote:
A better solution was to use R13 in the main entry point to GETMAIN once
a block, and treat it like a display on a stack -- STM and A R13,const
on entry to a subroutine, LM and S R13,const on exit. The doubly linked
list was then only required if you needed the save areas formatted in a
dump. I built up quite an extensive library of source & object code
based on this technique.

Calculating the total required size of the area with multiple CSECTs in
the final executable was always problematic. I seem to remember
employing Q type constants to do this (a link edit feature that PL/I
used iirc), but I couldn't tell you how exactly; it's been nearly three
decades since I last did this in anger.

I don't claim to have invented this technique, but I don't remember ever
being taught it. I wonder how many times it's been "rediscovered"?

Borland used something quite similar in some of their compiler/library
products, where you would skip checking memory allocation calls on a
bunch of small allocations, after first doing a single check for
availability of enough memory (+ a guard area).

Since this was a single-tasking OS (PCDOS), this was enough to guarantee
that all the tiny individual allocs would succeed.

Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Back to top
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 8:16 am    Post subject: Re: internal call/ret stack Reply with quote

On Tue, 9 Aug 2005, Anton Ertl wrote:

Quote:
The 88100 didn't. With the two-cycle branch latency, and the option
of delayed branches, it would not make much sense.

Thanks. I know almost nothing about that family. I've heard it's very
clean, though.

-Peter
Back to top
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 8:16 am    Post subject: Re: internal call/ret stack Reply with quote

On Tue, 9 Aug 2005, Terje Mathisen wrote:

Quote:
Borland used something quite similar in some of their compiler/library
products, where you would skip checking memory allocation calls on a
bunch of small allocations, after first doing a single check for
availability of enough memory (+ a guard area).

Are you thinking about Turbo Vision? Wasn't it more a question of
ensuring there was enough memory left to handle the allocations inherent
in displaying the error message to the user and cleaning up afterwards?

Siimilar tricks are common with compiled Standard ML code: heap allocation
is just a question of advancing a pointer except in the rare case where a
garbage collection is called for. Several functions that call each other
in just the right way can share the allocation and the check for whether
to garbage collect or not.

-Peter
Back to top
Peter \"Firefly\" Lund
Guest





Posted: Tue Aug 09, 2005 8:16 am    Post subject: Re: internal call/ret stack Reply with quote

On Mon, 8 Aug 2005, glen herrmannsfeldt wrote:

Quote:
Is it like VAX compatibility mode where it could execute PDP-11
instructions? It was an option in early VAX which made the
transition easier.

Not really. The early VAX machines had two different instruction sets:
PDP-11 and native VAX. The MV/8000 had the old 16-bit Nova instructions
AND the newer 32-bit instructions as an extension, fitted into bit
combinations that weren't used for the Nova instructions. Thus, there was
no need for mode switching; you could mix both instruction sets in the
same program. That's what they used the predecoder for: it decided
whether the instruction looked like an old or a new instruction and
enabled the rigth decode ROM.

Quote:
(Also, the early S/360 machines had optional emulators
for previous IBM machines, again allowing for an easy upgrade
while allowing time to convert to the new system.)


It is a very good idea.

-Peter
Back to top
Alex Colvin
Guest





Posted: Tue Aug 09, 2005 12:40 pm    Post subject: Re: internal call/ret stack Reply with quote

forbin@dev.nul (Colonel Forbin) writes:

Quote:
This left the issue of how to synchronize the clocks of the two
systems. The H-200 dated from the days of big front panels with
lots of blinkenlights and switches, including a processor
halt/run/single step switch.

The GE/Honeywell 600 wasn't without its share of tiny toasty light bulbs.
Later these withered into a few rows of cool LEDs.

http://www.geekculture.com/geeklove/sphere9.html

--
mac the naïf
Back to top
Colonel Forbin
Guest





Posted: Tue Aug 09, 2005 2:37 pm    Post subject: Re: internal call/ret stack Reply with quote

In article <Pine.LNX.4.61.0508090959220.27053@tyr.diku.dk>,
Peter \"Firefly\" Lund <firefly@diku.dk> wrote:
Quote:

On Mon, 8 Aug 2005, glen herrmannsfeldt wrote:

Is it like VAX compatibility mode where it could execute PDP-11
instructions? It was an option in early VAX which made the
transition easier.

Not really. The early VAX machines had two different instruction sets:
PDP-11 and native VAX. The MV/8000 had the old 16-bit Nova instructions
AND the newer 32-bit instructions as an extension, fitted into bit
combinations that weren't used for the Nova instructions. Thus, there was
no need for mode switching; you could mix both instruction sets in the
same program. That's what they used the predecoder for: it decided
whether the instruction looked like an old or a new instruction and
enabled the rigth decode ROM.

(Also, the early S/360 machines had optional emulators
for previous IBM machines, again allowing for an easy upgrade
while allowing time to convert to the new system.)

When Honeywell acquired the GE 600 line of mainframes, they produced
an "emulator" to supposedly facilitate migration from their H-200
architecture. The "S2P" unit was effectively a H-200 series mainframe
with the memory removed and the memory bus extended to the SCU of the
H-6000 or Level 66 system sort of like a DMA card in a modern PC.

The GE 600 was a modular architecture with different components which
could be largely independently upgraded. CPUs interfaced with a SCU
(System Control Unit) to which memory was attached. IOPs handled
I/O tasks, and I/O subsystems had programmed controllers not unlike
IBM 360 channel controllers.

This left the issue of how to synchronize the clocks of the two
systems. The H-200 dated from the days of big front panels with
lots of blinkenlights and switches, including a processor
halt/run/single step switch.

So, the Honeywell engineers just wired up the clock signal from
the H-6000 to the single step diagnostic mode circuit in the
H-200 CPU so that the CPU was "single stepped" at its full clock
rate by the interface/emulation software on the H-6000.

I had one of these S2P units at a PPOE. We had transitioned
most of the applications to the DPS-8, so the main use of the
S2P was to label 9-track tapes (the operators liked the OS/2000
utilities for this).

By this point the S2P unit was so obsolete that Honeywell really
didn't want it back, so they pretty much wrote it off our lease
agreement.

Another useful application of the S2P was to entertain guests.
The DPS-8 was a set of rather mundane looking large black
boxes.

Due to the impressive blinkenlights display, we would open
the diagnostic panel on the S2P and have the operators run some
OS/2000 utilities to make them flash impressively.

An interesting aspect of the S2P was the "scientific unit" or
floating point coprocessor. For some reason this had a blue
cabinet instead of black. It was a large double bay unit
built with a mix of discrete and first gen IC logic. The
whole thing was less powerful than a 4MHz 8087. The backplanes
were wire wrap. This unit had the annoying habit of throwing
stray hardware interrupts. Since the S2P was directly connected
to the memory controller of the DPS-8, this would often crash
the entire mainframe.

The DPS-8 was acquired because a ruptured steam pipe had cooked
the H-2040A that preceeded it. We used to speculate that
the S2P "scientific unit" we had was the same one that was
destroyed in that mishap and that Honeywell had just dried it
out and given it back.

There's something amusing about working in a shop that has
really old iron.
Back to top
Terje Mathisen
Guest





Posted: Tue Aug 09, 2005 3:18 pm    Post subject: Re: internal call/ret stack Reply with quote

Peter "Firefly" Lund wrote:

Quote:
On Tue, 9 Aug 2005, Terje Mathisen wrote:

Borland used something quite similar in some of their compiler/library
products, where you would skip checking memory allocation calls on a
bunch of small allocations, after first doing a single check for
availability of enough memory (+ a guard area).


Are you thinking about Turbo Vision? Wasn't it more a question of
ensuring there was enough memory left to handle the allocations inherent
in displaying the error message to the user and cleaning up afterwards?

Turbo Vision is correct. While at Novell I developed a program for
Borland using their (at the time inhouse) Turbo Vision for C+++.

A horrible job, mostly due to the fact that they hadn't used this
translated library at all yet, so all the bugs were mine to discover. :-(

Anyway, bunching up allocations like this avoided the need for a _lot_
of nested error checking and explicit error handling.
Quote:

Siimilar tricks are common with compiled Standard ML code: heap
allocation is just a question of advancing a pointer except in the rare
case where a garbage collection is called for. Several functions that
call each other in just the right way can share the allocation and the
check for whether to garbage collect or not.

:-)

Terje

--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Back to top
Peter Dickerson
Guest





Posted: Tue Aug 09, 2005 3:21 pm    Post subject: Re: internal call/ret stack Reply with quote

"Peter "Firefly" Lund" <firefly@diku.dk> wrote in message
news:Pine.LNX.4.61.0508082358310.9343@tyr.diku.dk...
Quote:
On Mon, 8 Aug 2005, Anton Ertl wrote:

And some architectures (IIRC 8051) have an architectural on-chip
return stack with depth >1 that is not accessible in any other way.

You could access it just fine on the 8051. I don't think you could on
the (very simple) ST62, though.

However, I guess that the OP was asking about non-architectural return
stacks that are just used for predicting the real return addresses
(which are coming from another source).

Precisely.

The 21064 (1992) had one; I don't remember earlier CPUs that had one.

That early? I thought only 21164 and up had one. But it sounds unlikely
that it should be the first, doesn't it?

-Peter

I'm moving house on Friday. I have literally just thrown my 21064 books away
think I won't need those again...

Peter (D)
Back to top
Torben Ægidius Mogensen
Guest





Posted: Tue Aug 09, 2005 4:16 pm    Post subject: Re: internal call/ret stack Reply with quote

"Peter \"Firefly\" Lund" <firefly@diku.dk> writes:

Quote:
On the other hand, ML programs tend to allocate far too much memory
for their own good "because garbage collection is cheap". Well, it
is, but...
A lot of it is due to compiler generated stuff rather than programmer
visible stuff.

SML/NJ allocates invocation frames on the heap, for example. It's all
nice and general and stuff (the old-timers here will love it) but it
is also a stupid thing to do because they /do/ behave in a very
special way that does not NEED full generality and which makes it easy
to special-case
by using the CPU stack. Suddenly your locality becomes a lot better...

It has always struck me as a bad decision to heap-allocate the frames,
despite Andrew Appel's arguments that copying GC is cheap because it
ignores dead data (and hence has no cost for dead stack frames).
There are lots of flaws in that argument, which I'm sure most of you
can see. Just about the only advantage of heap-alloacted stack frames
is that they allow easy imlementation of call/cc, but since that is a
non-standard extension of SML, I never use it.

But SML doesn't have to have to have that bad space behaviour. One
advantage of languages with automatic deallocation is that the
implementor has a lot of freedom in when data is deallocated, so you
can do things like escape analysis and region inference to get better
locality and earlier deallocation.

Torben
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB