| Author |
Message |
Eliot Miranda
Guest
|
Posted:
Thu Sep 15, 2005 12:15 am Post subject:
Re: interrupting for overflow and loop termination |
|
|
John Mashey wrote:
| Quote: | David Hopwood wrote:
andrewspencers@yahoo.com wrote:
Terje Mathisen wrote:
A slightly different situation is where you have code that in practice
always handles integers that fit in a single word, but that can't be
statically guaranteed to do so, and the language specification says that
bignum arithmetic must be supported -- the obvious example being Smalltalk.
There were some attempts to support this in hardware (e.g. "Smalltalk on
a RISC"; also something on SPARC that I can't remember the details of),
but it turned out to be easier and faster for implementations of Smalltalk
and similar languages to use other tricks that don't require hardware support.
Yes.
1) There was Berkeley SOAR as noted, and SPARC included ADD/SUB Tagged,
which used the high-30 bits as integers, and the low 2 bits as tags; if
either low 2-bit field were non-zero, it trapped.
2) ~1988, while working on MIPS-II, I/we spent a lot of time talking
with Smalltalk & LISP friends, potential customers, etc, asking:
"Are there any modest extensions that would help you a lot, and would
be reasonable to implement?
Short answer: NO.
Longer answer:
a) They said either give them a complete, tailored solution [which they
didn't expect], or just make the CPU run fast, but don't bother with
minor enhancements. Some said they knew about the SPARC feature, but
didn't use it.
|
This would include Peter Deutsch and the design of HPS his 2nd dynamic
translation (JIT) VM. The tag pattern for immediate integers was
already chosen to be 11, and changing it just for SPARC when the
performance boost would be below 10% in all but micro-bencmarks just
isn't worth it. However, were the SPARC designers to have allowed the
trap mask to be a variable part of per-thread state, or even better, to
be specified in the instruction itself (eliminating problems combining
different language implementations in one program) then we would have
made use of it (certainly code exists to use it).
The most convenient design would be not a trap but a branch or skip.
Something like "add and skip on overflow or if either operand's tag
pattern doesn't match X". Now with 64-bit implementations one would
also want to specify the width of the tag field (one bit would suit HPS;
its 32-bit and 64-bit implementations use a single bit to tag immediate
integers.
| Quote: | b) Some said: they were all doing fairly portable versions, had learned
a lot of good tricks, and minor improvements that required major
structural changes just weren't worth it.
|
hence the need for any instructions to provide flexibility and not
dictate particular bit patterns...
[snip]
| Quote: | Anyway, it's pretty clear that relevant mechanisms were being discussed
~20 years ago, but nobody seems to have figured out features that
actually make implementation sense. I'd be delighted to see a
well-informed proposal that had sensible hardware/software
implementations and really helped LISP/Smalltalk/ADA and hopefully
other languages...
|
We could use a tagged add/sub and skip on overflow or tag mismatch, and
a tagged compare and skip on tag mismatch, where the tag field can be
flexibly specified to suit both 32-bit implementations (typical tags
least significant two bits) and 64-bit implementations (typical tags
least significant three or four bits).
If 6 bits were dedicated to the tag specification, two would be the size
of the tag field
00 -> least significant bit
01 -> least significant two bits
10 -> least significant three bits
11 -> least significant four bits
The remaining four bits would specify the required tag pattern, bits
excess to the tag size being ignored.
The two operands would be interpreted as 2's complement signed integers
in the remaining non-tag bits. The add/sub instructions would skip or
annul the following instruction if either operand's tag pattern didn't
match the tag specification or if the result overflowed. The compare
instructions would skip or annul the following instruction if either
operand's tag pattern didn't match the tag specification.
The value of the result register of the tagged add/sub would have the
same tag pattern as the operands. Result value is undefined if overflow
or tag mismatch (i.e. I don't think one would typically be interested in
the result).
Code sequences for polymorphic add/sub or compare would then look like
fetch operand one
fetch operand two
tagged add/sub
branch Ldone
code for non-tagged case (method lookup)
...
Ldone:
Code for compare sequences would depend on whether one needed to take a
conditional branch or produce a result. So one could use an instruction
that would skip the next two instructions on tag mismatch.
If tagged compare skips the next instruction then tagged compare for a
conditional branch might look like
fetch operand one
fetch operand two
tagged compare
branch Lcond
code for non-tagged case (method lookup)
...
compare result of non-tagged compare against TRUE value
branch if equal Ltrue
compare result of non-tagged compare against FALSE value
branch if equal Lfalse
call notBooleanError
Lcond: branch on equal Ltrue
Lfalse:
If it skips the following two instructions then
fetch operand one
fetch operand two
tagged compare
branch if equal Ltrue
branch Lfalse
code for non-tagged case (method lookup)
...
compare result of non-tagged compare against TRUE value
branch if equal Ltrue
compare result of non-tagged compare against FALSE value
branch if equal Lfalse
call notBooleanError
which isn't much of a saving...
One could also make use of a tagged add/sub immediate as there's a high
dynamic frequency of var + 1 in most (Smalltalk) programs. The
immediate value would omit the tag pattern and be shifted by the tag
size to increase useful range.
--
_______________,,,^..^,,,____________________________
Eliot Miranda Smalltalk - Scene not herd |
|
| Back to top |
|
 |
John Mashey
Guest
|
Posted:
Thu Sep 15, 2005 8:15 am Post subject:
Re: interrupting for overflow and loop termination |
|
|
John Mashey wrote:
| Quote: | c) I spent some time with David Kay (of XEROX PARC/Smalltalk fame) on
this, i.e., were there features that would be substantially helpful?
|
Eliot Miranda correctly points out that I had to mean Alan Kay, not
David; sorry Alan. (David Kay's activities are in a rather different
domain). |
|
| Back to top |
|
 |
John Mashey
Guest
|
Posted:
Thu Sep 15, 2005 8:15 am Post subject:
Re: interrupting for overflow and loop termination |
|
|
Eliot Miranda wrote:
| Quote: | John Mashey wrote:
Longer answer:
a) They said either give them a complete, tailored solution [which they
didn't expect], or just make the CPU run fast, but don't bother with
minor enhancements. Some said they knew about the SPARC feature, but
didn't use it.
This would include Peter Deutsch and the design of HPS his 2nd dynamic
translation (JIT) VM.
|
Lots of good details deleted...
1) The suggestions would probably fit HP PA better than MIPs, as it has
extensive "annul-next-instruction" features.
2) My comment above was indeed a paraphrase of Peter's comments,
although somewhat similar thoughts came from others as well. |
|
| Back to top |
|
 |
Eliot Miranda
Guest
|
Posted:
Fri Sep 16, 2005 9:18 pm Post subject:
Re: interrupting for overflow and loop termination |
|
|
John Mashey wrote:
| Quote: | Eliot Miranda wrote:
John Mashey wrote:
Longer answer:
a) They said either give them a complete, tailored solution [which they
didn't expect], or just make the CPU run fast, but don't bother with
minor enhancements. Some said they knew about the SPARC feature, but
didn't use it.
This would include Peter Deutsch and the design of HPS his 2nd dynamic
translation (JIT) VM.
Lots of good details deleted...
1) The suggestions would probably fit HP PA better than MIPs, as it has
extensive "annul-next-instruction" features.
|
And I was being dense. Skipping is unnecessary. More natural would be
to set condition flags for overflow if the tags were wrong. In most
dynamic languages there will be a test for overflow following the
addition in cases where its not known if the tags are correct, so one
can fold the tag and overflow test together.
So the instructions would be tagged add/sub with overflow and compare
tagged with overflow. Overflow would be set if the tags didn't match
the tag spec or if the arithmetic overflowed.
Code sequences for polymorphic add/sub or compare would then look like
fetch operand one
fetch operand two
tagged add/sub
branch no overflow Ldone
code for non-tagged & overflow cases (method lookup)
...
Ldone:
--
_______________,,,^..^,,,____________________________
Eliot Miranda Smalltalk - Scene not herd |
|
| Back to top |
|
 |
John Mashey
Guest
|
Posted:
Fri Sep 16, 2005 9:48 pm Post subject:
Re: interrupting for overflow and loop termination |
|
|
Eliot Miranda wrote:
| Quote: | John Mashey wrote:
Code sequences for polymorphic add/sub or compare would then look like
fetch operand one
fetch operand two
tagged add/sub
branch no overflow Ldone
code for non-tagged & overflow cases (method lookup)
...
Ldone:
|
Just out of curiosity, what's known about the dynamic frequencies of
the tweo cases? |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Sat Sep 17, 2005 9:44 pm Post subject:
Re: interrupting for overflow and loop termination |
|
|
John Mashey wrote:
| Quote: | Eliot Miranda wrote:
John Mashey wrote:
Code sequences for polymorphic add/sub or compare would then look like
fetch operand one
fetch operand two
tagged add/sub
branch no overflow Ldone
code for non-tagged & overflow cases (method lookup)
...
Ldone:
Just out of curiosity, what's known about the dynamic frequencies of
the tweo cases?
|
It doesn't really matter:
The amount of code needed in the overflow case is so vastly larger than
for regular ("small") ints that it will always be a win to do it this way.
I.e. even if you only had 10% small ints, the savings possible by being
able to handle these with 2-3 inline instructions is enough to make it a
good idea.
In reality, due to the need for things like array and loop indices,
small adds will probably be a _lot_ more common than the bigint kind.
The x86 INTO style single-byte trap would be the best possible way to
handle it as long as you also had the fast user-level trap handler you
wished you had been able to fit into MIPS architecture:
You could compile code to be nearly as compact as for C ints, keeping
good locality etc, while still handling the bigint operations just a
little bit slower than if you called bigint library functions directly.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
John Mashey
Guest
|
Posted:
Sun Sep 18, 2005 12:15 am Post subject:
Re: interrupting for overflow and loop termination |
|
|
Terje Mathisen wrote:
| Quote: | John Mashey wrote:
Eliot Miranda wrote:
John Mashey wrote:
Code sequences for polymorphic add/sub or compare would then look like
fetch operand one
fetch operand two
tagged add/sub
branch no overflow Ldone
code for non-tagged & overflow cases (method lookup)
...
Ldone:
Just out of curiosity, what's known about the dynamic frequencies of
the tweo cases?
It doesn't really matter:
|
Well, actually it does (in general, perhaps not in this case), which is
why I asked.
Statistical frequencies *always* matter, at least in the sense that
while it may not matter much if something happens 45 or 50% of the
time, it may matter whether it happens 10% or 1% or .1% or .0000001%.
For example, this whole thread started on overflow. Many vendors chose
to have condition codes and branch-on-overflow, which in fact meant
that the ~100% of executions that did not overflow either paid for
having a bunch of useless branch tests, or else left them out. A few
of us decided that trap-on-overflow was perfectly adequate, even if it
meant that the handling was more complex when it did happen.
Anyway, philosophically, I suspect that in this case you are right,
BUT:
a) Human intuition is notoriously bad about such things.
b) Given a choice of guessing about numbers or {measuring | asking
somebody who knows} I'll take the latter. Guessing is only for when
you just can't get the data in a reasonable cost or time.
c) This class of feature, and the chocie of the specific design turn
out to depend a great deal on:
- The existing ISA, if any, that one is adding onto. Specifically,
branch architectures differ quite a bit, and the approach for this
would likely differ.
- The feature design would also likely depend somewhat on the sorts of
pipeline implementations you expect, say roughly: single-issue,
multiple-issue in-order, and multiple-issue out-of order, and also, the
general approach to branch prediction.
So anyway, what do we actually *know* abotu the relative frequencies?
| Quote: | The amount of code needed in the overflow case is so vastly larger than
for regular ("small") ints that it will always be a win to do it this way.
I.e. even if you only had 10% small ints, the savings possible by being
able to handle these with 2-3 inline instructions is enough to make it a
good idea.
In reality, due to the need for things like array and loop indices,
small adds will probably be a _lot_ more common than the bigint kind.
The x86 INTO style single-byte trap would be the best possible way to
handle it as long as you also had the fast user-level trap handler you
wished you had been able to fit into MIPS architecture:
You could compile code to be nearly as compact as for C ints, keeping
good locality etc, while still handling the bigint operations just a
little bit slower than if you called bigint library functions directly.
Terje
--
- <Terje.Mathisen@hda.hydro.com
"almost all programming can be viewed as an exercise in caching" |
|
|
| Back to top |
|
 |
Eliot Miranda
Guest
|
Posted:
Sun Sep 18, 2005 8:15 am Post subject:
Re: interrupting for overflow and loop termination |
|
|
John Mashey wrote:
| Quote: | Eliot Miranda wrote:
John Mashey wrote:
Code sequences for polymorphic add/sub or compare would then look like
fetch operand one
fetch operand two
tagged add/sub
branch no overflow Ldone
code for non-tagged & overflow cases (method lookup)
...
Ldone:
Just out of curiosity, what's known about the dynamic frequencies of
the tweo cases?
|
I can collect accurate figures in a week (off for a quick vacation :) :)
). But my gut is that in ordinary codes the ratio approaches 100% to
0%. tagged/Small/immediate integers cover the vast array of cases
(iteration, most arithmetic). Only special codes (e.g. security) will
overflow into large integer arithmetic. This is in part confirmation
bias because as Terje points out the relative cost of tagged vs boxed
arithmetic is at least an order of magnitude worse and so the system is
written to avoid the cost as much as possible. But its mostly the fact
that in a symbolic language like Smalltalk the highest dynamic frequency
for integer math is likely to be for iteration.
--
_______________,,,^..^,,,____________________________
Eliot Miranda Smalltalk - Scene not herd |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Sun Sep 18, 2005 8:15 am Post subject:
Re: interrupting for overflow and loop termination |
|
|
Eliot Miranda wrote:
| Quote: |
John Mashey wrote:
Just out of curiosity, what's known about the dynamic frequencies of
the tweo cases?
I can collect accurate figures in a week (off for a quick vacation :) :)
|
Please do, and good for you! (Collecting figures and vacation).
| Quote: | ). But my gut is that in ordinary codes the ratio approaches 100% to
0%. tagged/Small/immediate integers cover the vast array of cases
(iteration, most arithmetic). Only special codes (e.g. security) will
overflow into large integer arithmetic. This is in part confirmation
bias because as Terje points out the relative cost of tagged vs boxed
arithmetic is at least an order of magnitude worse and so the system is
|
Actually, it might be even worse: Even if most bigint math happens to
stay within (say) four words of storage, and even if we disregard the
cost of the trap (because the alternative is to always call outline
functions directly), then you still have the call/return overhead, the
cost of comparing the lengths of the two operands, possibly the cost of
allocating the return value, and the actual cost of doing the multi-word
addition.
The small/immediate case will take just one or two cycles, three at the
very maximum, while it would be very easy to have two or even three
branch misses, each taking ~10 cycles, inside the bigint function.
| Quote: | written to avoid the cost as much as possible. But its mostly the fact
that in a symbolic language like Smalltalk the highest dynamic frequency
for integer math is likely to be for iteration.
|
Right, it is hard to imagine how it could be otherwise.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Sun Sep 18, 2005 4:15 pm Post subject:
Re: interrupting for overflow and loop termination |
|
|
John Mashey wrote:
| Quote: | Terje Mathisen wrote:
It doesn't really matter:
Well, actually it does (in general, perhaps not in this case), which is
|
Oh, absolutely.
| Quote: | why I asked.
Statistical frequencies *always* matter, at least in the sense that
while it may not matter much if something happens 45 or 50% of the
time, it may matter whether it happens 10% or 1% or .1% or .0000001%.
|
Right.
| Quote: | Anyway, philosophically, I suspect that in this case you are right,
BUT:
a) Human intuition is notoriously bad about such things.
|
Which is why one of the very first pieces of asm code I ever wrote was a
routine to interrogate the (keyboard) timer chip on a PC, allowing ~
microsecond resolution timings.
After 25+ years of performance-optimized programming, I still get
regularly surprised when the actual performance of some pice of code
doesn't conform to what I initially guessed it would be. :-)
The most serious one was a couple of years ago when I wrote asm code for
the inner loop of one of the EAS contenders, and didn't get even close
to (i.e within a factor of 1.5) the speedup I expected.
OTOH, after implementing all of it (together with a couple of other
guys), including the outer loop & other housekeeping stuff, the final
speedup was 3X the best C code, which was better than my initial estimate.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Sun Sep 18, 2005 4:15 pm Post subject:
Re: interrupting for overflow and loop termination |
|
|
glen herrmannsfeldt wrote:
| Quote: | Terje Mathisen wrote:
(snip)
The x86 INTO style single-byte trap would be the best possible way to
handle it as long as you also had the fast user-level trap handler you
wished you had been able to fit into MIPS architecture:
How much difference does it make at run time, though.
In the case of a conditional branch based on the overflow bit,
INTO trap instruction, or overflow trap without any added instructions,
nothing after that can be committed until the status bit is available.
I would assume that a branch on overflow would be predicted as not
taken.
Other than the cost of the actual instruction bits, which are pretty
variable over different architectures, would you expect any difference
at runtime for the three cases?
|
The only important difference is that INTO is effectively a
predicted-not-taken call, making it very easy to return to the correct
point after fixup.
To replace this with an inline branch instruction would require the
capability to save the next (i.e. return) address.
If your architecture can do this, then by all means use an inline
branch-and-link instead of the INTO style check!
The only real difference in this case would be that the branch opcode
probably takes 4 bytes, vs the single byte of INTO. However, as long as
you have fixed 16 or 32-bit instruction sizes anyway, it really doesn't
matter.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
glen herrmannsfeldt
Guest
|
Posted:
Sun Sep 18, 2005 4:15 pm Post subject:
Re: interrupting for overflow and loop termination |
|
|
Terje Mathisen wrote:
(snip)
| Quote: | The x86 INTO style single-byte trap would be the best possible way to
handle it as long as you also had the fast user-level trap handler you
wished you had been able to fit into MIPS architecture:
|
How much difference does it make at run time, though.
In the case of a conditional branch based on the overflow bit,
INTO trap instruction, or overflow trap without any added instructions,
nothing after that can be committed until the status bit is available.
I would assume that a branch on overflow would be predicted as not
taken.
Other than the cost of the actual instruction bits, which are pretty
variable over different architectures, would you expect any difference
at runtime for the three cases?
-- glen |
|
| Back to top |
|
 |
glen herrmannsfeldt
Guest
|
Posted:
Sun Sep 18, 2005 10:02 pm Post subject:
Re: interrupting for overflow and loop termination |
|
|
Terje Mathisen wrote:
| Quote: | glen herrmannsfeldt wrote:
|
(snip)
| Quote: | In the case of a conditional branch based on the overflow bit,
INTO trap instruction, or overflow trap without any added instructions,
nothing after that can be committed until the status bit is available.
I would assume that a branch on overflow would be predicted
as not taken.
Other than the cost of the actual instruction bits, which are pretty
variable over different architectures, would you expect any difference
at runtime for the three cases?
The only important difference is that INTO is effectively a
predicted-not-taken call, making it very easy to return to the correct
point after fixup.
|
I was only considering the cost of the instruction, especially
in the case that it isn't taken.
| Quote: | To replace this with an inline branch instruction would require the
capability to save the next (i.e. return) address.
|
If you have only one in a loop you know where it is...
| Quote: | If your architecture can do this, then by all means use an inline
branch-and-link instead of the INTO style check!
|
But how much cost is there in not being able to retire the instruction
until the overflow status is known? Assuming the usual out of order,
but in order retire model.
| Quote: | The only real difference in this case would be that the branch opcode
probably takes 4 bytes, vs the single byte of INTO. However, as long as
you have fixed 16 or 32-bit instruction sizes anyway, it really doesn't
matter.
|
Imprecise interrupts were well hated in the days when they were
around, but consider the case where you really don't need to know
exactly where the interrupt is. Maybe special ADDIIO, ADD with
Imprecise Interrupt on Overflow instruction. You could use them in
a loop, and would know which iteration of the loop (put a barrier
instruction near the end of the loop), but not necessarily exactly
where.
Then again, maybe all you need is a sticky overflow bit. You could
do some set of calculations and test once at the end for any overflow,
and clear the sticky overflow bit at that time.
I don't see conditional interrupts any better than conditional
branches as far as pipelined out-of-order processors are concerned.
-- glen |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Mon Sep 19, 2005 12:08 am Post subject:
Re: interrupting for overflow and loop termination |
|
|
glen herrmannsfeldt wrote:
| Quote: | Terje Mathisen wrote:
To replace this with an inline branch instruction would require the
capability to save the next (i.e. return) address.
If you have only one in a loop you know where it is...
|
This would be a _very_ special case, since it would require a loop with
only a single int variable, or at least you must be able to prove that
only this int can ever overflow. In that case you can of course skip the
overflow testing.
| Quote: | If your architecture can do this, then by all means use an inline
branch-and-link instead of the INTO style check!
But how much cost is there in not being able to retire the instruction
until the overflow status is known? Assuming the usual out of order,
but in order retire model.
|
Not much: Processing can easily continue for at least one or two cycles,
forwaring the preliminary result to the next user, and as long as the
vast majority of these ints won't actually overflow, the delayed
retirement does not correspond to any actual branch misses.
| Quote: | Then again, maybe all you need is a sticky overflow bit. You could
do some set of calculations and test once at the end for any overflow,
and clear the sticky overflow bit at that time.
|
Nice idea. If overflows are really rare, then it would be a win to redo
the entire loop to save testing on every operation.
| Quote: |
I don't see conditional interrupts any better than conditional
branches as far as pipelined out-of-order processors are concerned.
|
Rather the opposite, since branch handling is optimized with serious hw
resources anyway.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Andy Freeman
Guest
|
Posted:
Mon Sep 19, 2005 4:15 pm Post subject:
Re: interrupting for overflow and loop termination |
|
|
Terje Mathisen wrote:
| Quote: | Eliot Miranda wrote:
highest dynamic frequency for integer math is likely to be for iteration.
Right, it is hard to imagine how it could be otherwise.
|
Most checks on iteration vars can be pulled out of the loop. In most
of the
others, it can be done every <large constant> iterations. |
|
| Back to top |
|
 |
|
|
|
|