| Author |
Message |
Nick Maclaren
Guest
|
Posted:
Tue Aug 09, 2005 4:16 pm Post subject:
Re: Branch prediction using counters. |
|
|
In article <ddadvn$g56$3@blue.rahul.net>,
kensmith@green.rahul.net (Ken Smith) writes:
|>
|> Yes, there are lots of other examples, but I think the Booth's case is the
|> simplest real case to see the problem with.
It does, however, give the impression that the problem applies only
to very esoteric codes. It doesn't.
|> BTW: I disagree with the suggestion that jumps in OO programs are by
|> there nature harder to predict. Virtual method dispatching is a hard type
|> of branch to predict but in non-OO code, the virtual method call would be
|> replaced with a long switch statement. In good compilers, long switches
|> are implemented with jump tables. "goto table[i]" is very hard to
|> predict.
Neither is hard to predict, if you do it right. Few, if any, modern
systems do.
What is needed is a compiler-generated instruction that sets up a
datum to be used for branch prediction, which is then quoted in
the branch, and used by the hardware. The instruction history
alone is useless for such codes, but the object address is
(obviously) excellent. This has the advantage that information
from a previous branch in the same function could be used.
For example, consider the following instructions:
Set predictor. It takes up to 2 registers, and sets one of a
few (4-16) predictors based on their contents (in an unspecified
fashion).
Predict branch. It takes a predictor and also specifies a
preferred direction, a resulting direction or none. It must
immediately precede a branch.
This would allow a compiler to take the following:
IF a > b THEN ... FI;
. . .
IF b < a THEN ... FI;
and issue:
Set_predictor 1 a b
. . .
Predict_branch 1 +1 [ indicating a resulting direction ]
. . .
Predict_branch 1 -2 [ indicating a preferred direction ]
I have little idea how effective this would be, but my gut feeling
is that it would be better than correct approaches for such codes.
As far as I know, I have rendered this unpatentable by posting :-)
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
John Savard
Guest
|
Posted:
Tue Aug 09, 2005 4:16 pm Post subject:
Re: Branch prediction using counters. |
|
|
On Mon, 8 Aug 2005 10:32:18 +0200, "Skybuck Flying" <nospam@hotmail.com>
wrote, in part:
| Quote: | 1. Is it common to use counters as described above to do branch prediction
or is this something novel ? ;)
2. Suppose it's not novel... then why only use 1 bit to do branch prediction
?
|
Actually, it's standard practice now to use a counter. But the counter
is inside the chip, not in RAM or in higher-level language. It is
usually a saturating two-bit wide counter.
And usually the counters are used in conjunction with a branch history
table (not to be confused with butylated hydroxytoluene).
John Savard
http://www.quadibloc.com/index.html
_________________________________________
Usenet Zone Free Binaries Usenet Server
More than 120,000 groups
Unlimited download
http://www.usenetzone.com to open account |
|
| Back to top |
|
 |
Peter \"Firefly\" Lund
Guest
|
Posted:
Tue Aug 09, 2005 4:16 pm Post subject:
Re: Branch prediction using counters. |
|
|
On Tue, 9 Aug 2005, Ken Smith wrote:
| Quote: | there nature harder to predict. Virtual method dispatching is a hard type
of branch to predict but in non-OO code, the virtual method call would be
|
No, virtual methods are not that expensive (*). It turns out that each
call site usually have very few targets at a time, often only one.
You can use the call site address to predict which of a number of saved
target addresses (or even saved target instructions) to go to.
You can also use software methods to achieve the same thing: your VM (or
whatever) arranges a compare instruction that checks that the target is
what you think it is + a conditional jump + a call with a hardcoded target
address. If the target is different from what you thought, then the
conditional jump will jump to code that handles the new target. Usually,
the jump will be predicted to be a fall-through so this is a cheap way of
handling typical OO calls.
*) It depends on what resources the CPU designers had available and what
kinds of code it was deemed necessary to make fast. The software method
is nice in that it only requires simple branch prediction (which is
relatively cheap) but it can be hard to make your
compiler/linker/loader/VM work that way. The hardware method requires
more resources for caching addresses/instructions but it will work with
more naive toolchains/languages.
-Peter |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Aug 09, 2005 4:16 pm Post subject:
Re: Branch prediction using counters. |
|
|
In article <Pine.LNX.4.61.0508091729380.9761@tyr.diku.dk>,
"Peter \"Firefly\" Lund" <firefly@diku.dk> writes:
|> On Tue, 9 Aug 2005, Ken Smith wrote:
|>
|> > there nature harder to predict. Virtual method dispatching is a hard type
|> > of branch to predict but in non-OO code, the virtual method call would be
|>
|> No, virtual methods are not that expensive (*). It turns out that each
|> call site usually have very few targets at a time, often only one.
|> You can use the call site address to predict which of a number of saved
|> target addresses (or even saved target instructions) to go to.
Oh, really? That doesn't apply to non-trivial GUI codes.
A very large number of component widgets are very complex, with
a lot of object-dependent branching internally, and are used in
very different ways from higher-level widgets. Now, this is NOT
how to do it, but it is the programming paradigm that is most
common in that area.
|> You can also use software methods to achieve the same thing: your VM (or
|> whatever) arranges a compare instruction that checks that the target is
|> what you think it is + a conditional jump + a call with a hardcoded target
|> address. If the target is different from what you thought, then the
|> conditional jump will jump to code that handles the new target. Usually,
|> the jump will be predicted to be a fall-through so this is a cheap way of
|> handling typical OO calls.
Regrettably not, in the case I mention above. Each component
widget may have a dozen major attributes, which interact in
truly horrible ways. That would increase the code size by
a large factor.
I agree that nobody in his right mind would ever design code
like that, but that isn't the point. That is how it has been
designed.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Tom Linden
Guest
|
Posted:
Tue Aug 09, 2005 4:16 pm Post subject:
Re: Branch prediction using counters. |
|
|
On 9 Aug 2005 14:41:45 GMT, Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
| Quote: | This would allow a compiler to take the following:
IF a > b THEN ... FI;
. . .
IF b < a THEN ... FI;
and issue:
Set_predictor 1 a b
. . .
Predict_branch 1 +1 [ indicating a resulting direction ]
. . .
Predict_branch 1 -2 [ indicating a preferred direction ]
I have little idea how effective this would be, but my gut feeling
is that it would be better than correct approaches for such codes.
As far as I know, I have rendered this unpatentable by posting
|
as an aside, the Regulus architecture which was the follow on to the Power
6
from Computer Consoles had a cute feature. If the value of a < b could be
determined in advance a bit was set and when execution reached that point,
only the appropriate branch made it to the pipeline. Multiflow tried
executing
in parallel, selecting the right branch when the truth value was known,
but I think
it became a bit unwieldy at that time (ca. 1984) |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Aug 09, 2005 9:23 pm Post subject:
Re: Branch prediction using counters. |
|
|
In article <1123603712.829819.88980@g43g2000cwa.googlegroups.com>,
"Andy Freeman" <anamax@earthlink.net> writes:
|> Nick Maclaren wrote:
|> > What is needed is a compiler-generated instruction that sets up a
|> > datum to be used for branch prediction, which is then quoted in
|> > the branch, and used by the hardware. The instruction history
|> > alone is useless for such codes, but the object address is
|> > (obviously) excellent. This has the advantage that information
|> > from a previous branch in the same function could be used.
|> ...
|> > As far as I know, I have rendered this unpatentable by posting :-)
|>
|> US Patent 5,949,995.
Thanks. I think that my posting includes an aspect that isn't
in that, but I can't be bothered to check. It certainly covers
the basic principle.
I did a search on Google groups to see if I had posted the
method before the patent, but it seems not. It is pretty obvious,
in my view.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Ken Smith
Guest
|
Posted:
Wed Aug 10, 2005 12:24 am Post subject:
Re: Branch prediction using counters. |
|
|
In article <ddafb9$ir7$1@gemini.csx.cam.ac.uk>,
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
| Quote: |
In article <ddadvn$g56$3@blue.rahul.net>,
kensmith@green.rahul.net (Ken Smith) writes:
|
|> Yes, there are lots of other examples, but I think the Booth's case is the
|> simplest real case to see the problem with.
It does, however, give the impression that the problem applies only
to very esoteric codes. It doesn't.
|
Are you suggesting that Booth's divide is "esoteric"? It seems to be a
very common bit of code.
| Quote: | |> BTW: I disagree with the suggestion that jumps in OO programs are by
|> there nature harder to predict. Virtual method dispatching is a hard type
|> of branch to predict but in non-OO code, the virtual method call would be
|> replaced with a long switch statement. In good compilers, long switches
|> are implemented with jump tables. "goto table[i]" is very hard to
|> predict.
[...] |
| Quote: | What is needed is a compiler-generated instruction that sets up a
datum to be used for branch prediction, which is then quoted in
the branch, and used by the hardware.
|
This could sort of work but it is adding an instruction in the process of
all branches. The added instruction time will eat away at any advantage
gained.
As discribed, it also would only work if the same routine is called for
the same object. You would be better off feeding the address of the
Virtual Method Table into this added instruction. In most C++ programs,
all objects of a given type share the same VMT and hence any use of a
object of a given type would give you the information needed to predict
jumps.
Remember that some of the RISC and DSP processors duck the problem by
letting some small number of instructions after the branch happen before
the branch takes effect. This only adds instructions to the code in cases
were nothing can be done before the branch happens. I think this is a
much better way to go if you want a lot of speed per transistor.
| Quote: | The instruction history
alone is useless for such codes, but the object address is
(obviously) excellent.
|
No, the object's address is not obviously "excellent", at least to me.
Please explain what you mean.
[..long description removed..]
| Quote: | I have little idea how effective this would be, but my gut feeling
is that it would be better than correct approaches for such codes.
As far as I know, I have rendered this unpatentable by posting :-)
|
Maybe they'll name it after you. You'll be famous!
--
--
kensmith@rahul.net forging knowledge |
|
| Back to top |
|
 |
Ken Smith
Guest
|
Posted:
Wed Aug 10, 2005 12:35 am Post subject:
Re: Branch prediction using counters. |
|
|
In article <Pine.LNX.4.61.0508091729380.9761@tyr.diku.dk>,
Peter \"Firefly\" Lund <firefly@diku.dk> wrote:
| Quote: | On Tue, 9 Aug 2005, Ken Smith wrote:
there nature harder to predict. Virtual method dispatching is a hard type
of branch to predict but in non-OO code, the virtual method call would be
No, virtual methods are not that expensive (*). It turns out that each
call site usually have very few targets at a time, often only one.
|
Thinking of some OO code I've done:
The typical case is that a given call could dispatch the, lets say, "edit"
method for some 25 different types. Do you consider 25 a "small" number.
BTW: There are quite a few cases over 25 in the code I was thinking of.
| Quote: |
You can use the call site address to predict which of a number of saved
target addresses (or even saved target instructions) to go to.
|
Are you intending to save this information on the CPU chip? If so,
couldn't the same number of transistors be put to much better use?
| Quote: | You can also use software methods to achieve the same thing: your VM (or
whatever) arranges a compare instruction that checks that the target is
what you think it is + a conditional jump + a call with a hardcoded target
address. If the target is different from what you thought, then the
conditional jump will jump to code that handles the new target. Usually,
the jump will be predicted to be a fall-through so this is a cheap way of
handling typical OO calls.
|
No, this is not a cheap way of doing VMTs. The Virtual Method Table, is
the fast way of doing it. There is no point in optimizing a slower
method if you want speed. The only case where your method would be faster
is if the majority of times you come to the call the same object type is
in use. I can't think of a program I've written were that was true.
--
--
kensmith@rahul.net forging knowledge |
|
| Back to top |
|
 |
Andrew Reilly
Guest
|
Posted:
Wed Aug 10, 2005 6:52 am Post subject:
Re: Branch prediction using counters. |
|
|
On Wed, 10 Aug 2005 00:24:09 +0000, Ken Smith wrote:
| Quote: | Remember that some of the RISC and DSP processors duck the problem by
letting some small number of instructions after the branch happen before
the branch takes effect. This only adds instructions to the code in cases
were nothing can be done before the branch happens. I think this is a
much better way to go if you want a lot of speed per transistor.
|
Up to 40 instructions, in the case of the TI C6000 series: 5 instruction
issue slots in the branch shadow, up to eight instructions per slot.
Makes loop pre-amble and termination of otherwise single-cycle loops
amusing, but do-able. There are various multi-cycle NOP instructions to
fiddle the other cases without taking up too much code space.
As well as good speed/transistor, this does help with predictability too.
Cheers,
--
Andrew |
|
| Back to top |
|
 |
Paul Hovnanian P.E.
Guest
|
Posted:
Wed Aug 10, 2005 8:15 am Post subject:
Re: Branch prediction using counters. |
|
|
This sort of thing is done at higher levels in various search
algorithms. Far more advanced techniques than just counters are used.
Building this sort of thing into processor firmware or general purpose
library routines might impose an inordinate burden on execution time if
it was employed for every branch. For cases where the branch execution
probabilities don't change much, just profiling the code and adjusting
the design will get you just as much gain. In the few cases where a
search can be optimized, the designer should be able to apply the
correct heuristics based on knowledge of the problem domain.
--
Paul Hovnanian mailto:Paul@Hovnanian.com
------------------------------------------------------------------
If you can't beat them, arrange to have them beaten.
-- George Carlin |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Wed Aug 10, 2005 8:15 am Post subject:
Re: Branch prediction using counters. |
|
|
In article <ddbhf9$g1l$4@blue.rahul.net>,
Ken Smith <kensmith@green.rahul.net> wrote:
| Quote: | |
|> Yes, there are lots of other examples, but I think the Booth's case is the
|> simplest real case to see the problem with.
It does, however, give the impression that the problem applies only
to very esoteric codes. It doesn't.
Are you suggesting that Booth's divide is "esoteric"? It seems to be a
very common bit of code.
|
The mind boggles! I have never seen it used, but I can believe that
it is. The most usual software divide is Newton-Raphson. Whatever
the case, only a few systems lack hardware division for the standard
types, very few applications use it for non-standard types, and the
proportion of time spent in such code is miniscule!
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Andy Freeman
Guest
|
Posted:
Wed Aug 10, 2005 4:15 pm Post subject:
Re: Branch prediction using counters. |
|
|
Ken Smith wrote:
| Quote: | In article <ddafb9$ir7$1@gemini.csx.cam.ac.uk>,
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
What is needed is a compiler-generated instruction that sets up a
datum to be used for branch prediction, which is then quoted in
the branch, and used by the hardware.
This could sort of work but it is adding an instruction in the process of
all branches. The added instruction time will eat away at any advantage
gained.
|
You don't have to do it for all branches, just the ones where doing so
is a good idea. (That's one argument against "quoting" an explicit
prediction datum in the branch instruction.)
The cost of a branch mispredict varies with implementation, but 10
cycles is not unknown and more than 5 cycles is reasonably common in
the high-performance world. Since each cycle can be 3-5 instructions,
using a couple of instructions to avoid a mispredict is usually a good
trade (as long as they don't stall).
| Quote: | Remember that some of the RISC and DSP processors duck the problem by
letting some small number of instructions after the branch happen before
the branch takes effect. This only adds instructions to the code in cases
were nothing can be done before the branch happens. I think this is a
much better way to go if you want a lot of speed per transistor.
|
A small number of instructions isn't enough and it's usually very hard
to find more than 1-2 (on average). You need to find 20-50 if you're
actually trying to go fast. (Yes, you can pick an implementation where
1-2 is enough, but those implementations are slower in general.)
| Quote: | The instruction history
alone is useless for such codes, but the object address is
(obviously) excellent.
No, the object's address is not obviously "excellent", at least to me.
Please explain what you mean.
|
The object at a given address tends to have the same type for "a
while".
-andy |
|
| Back to top |
|
 |
Skybuck Flying
Guest
|
Posted:
Wed Aug 10, 2005 4:15 pm Post subject:
Re: Branch prediction using counters. |
|
|
"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:ddafb9$ir7$1@gemini.csx.cam.ac.uk...
| Quote: |
In article <ddadvn$g56$3@blue.rahul.net>,
kensmith@green.rahul.net (Ken Smith) writes:
|
|> Yes, there are lots of other examples, but I think the Booth's case is
the
|> simplest real case to see the problem with.
It does, however, give the impression that the problem applies only
to very esoteric codes. It doesn't.
|> BTW: I disagree with the suggestion that jumps in OO programs are by
|> there nature harder to predict. Virtual method dispatching is a hard
type
|> of branch to predict but in non-OO code, the virtual method call would
be
|> replaced with a long switch statement. In good compilers, long
switches
|> are implemented with jump tables. "goto table[i]" is very hard to
|> predict.
Neither is hard to predict, if you do it right. Few, if any, modern
systems do.
What is needed is a compiler-generated instruction that sets up a
datum to be used for branch prediction, which is then quoted in
the branch, and used by the hardware. The instruction history
alone is useless for such codes, but the object address is
(obviously) excellent. This has the advantage that information
from a previous branch in the same function could be used.
For example, consider the following instructions:
Set predictor. It takes up to 2 registers, and sets one of a
few (4-16) predictors based on their contents (in an unspecified
fashion).
Predict branch. It takes a predictor and also specifies a
preferred direction, a resulting direction or none. It must
immediately precede a branch.
This would allow a compiler to take the following:
IF a > b THEN ... FI;
. . .
IF b < a THEN ... FI;
and issue:
Set_predictor 1 a b
. . .
Predict_branch 1 +1 [ indicating a resulting direction ]
. . .
Predict_branch 1 -2 [ indicating a preferred direction ]
I have little idea how effective this would be, but my gut feeling
is that it would be better than correct approaches for such codes.
As far as I know, I have rendered this unpatentable by posting :-)
|
Dude if this stuff is encoded into the jump instruction then it falls under
my description of inserting counters into jump instructions and therefore I
should still be able to patent it and not you :P
Bye,
Skybuck :) |
|
| Back to top |
|
 |
Skybuck Flying
Guest
|
Posted:
Wed Aug 10, 2005 4:15 pm Post subject:
Re: Branch prediction using counters. |
|
|
"Paul Hovnanian P.E." <Paul@Hovnanian.com> wrote in message
news:42F97744.959EEC35@Hovnanian.com...
| Quote: | This sort of thing is done at higher levels in various search
algorithms. Far more advanced techniques than just counters are used.
Building this sort of thing into processor firmware or general purpose
library routines might impose an inordinate burden on execution time if
|
Bullshit... the cpu has extra gates and logic and memory (internal) to
handle it only thing required is more bandwidth for the extra counter data
:)
| Quote: | it was employed for every branch. For cases where the branch execution
probabilities don't change much, just profiling the code and adjusting
the design will get you just as much gain. In the few cases where a
search can be optimized, the designer should be able to apply the
correct heuristics based on knowledge of the problem domain.
|
Again the programmer makes the decision. If he is wrong or usage pattern
changes these decision are no longer valid :D
Have a nice day =D
Bye,
Skybuck. |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Wed Aug 10, 2005 4:15 pm Post subject:
Re: Branch prediction using counters. |
|
|
In article <ddd6ou$6q0$1@news3.zwoll1.ov.home.nl>,
"Skybuck Flying" <nospam@hotmail.com> writes:
|>
|> Dude if this stuff is encoded into the jump instruction then it falls under
|> my description of inserting counters into jump instructions and therefore I
|> should still be able to patent it and not you :P
Look, Bubba, no counters! Try rereading what I posted.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
|
|
|
|