Branch prediction using counters.
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Branch prediction using counters.
Goto page Previous  1, 2, 3, 4  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Dan Koren
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: Branch prediction using counters. Reply with quote

"Ben Bradley" <ben_nospam_bradley@frontiernet.net> wrote in message
news:jicff1hrkkkucio1ljkdfugtrm4sfkuvld@4ax.com...
Quote:
In comp.arch,sci.electronics.design, On Mon, 08 Aug 2005 16:38:23 GMT,
forbin@dev.nul (Colonel Forbin) wrote:

In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions
and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.

Actually this is sort of analogous to runtime profiling feedback to
compilers, which is used on a number of systems.

The problem is that it is often impossible to predict an event until
the outcome is observed (think quantum mechanics).

If we knew the outcome of each branch, we could simply eliminate the
branch in most cases and still have a reasonable approximation.

Inserting counters into production code typically comes at a great
price in terms of overhead.

We typically do this in development in order to optimize the most
costly sections of code for the most common outcomes, but trying
to put some sort of low level adaptive optimization metrics into
production code would consume more in overhead than the cost of
doing the task to begin with.

I recall reading of a processor that does pipelining through
branches (I don't remember which, but it might be a TI DSP) by using
several execution units. One executes the instruction(s) where the
branch is taken, the other executes the code where the branch is not
taken. When the branch is decided, the results from the corresponding
execution unit are taken and execution continues from there, and the
other results are 'thrown away' unused. This method takes a lot more
silicon space for the processor, but results in a speed increase that
apparently can't be done otherwise.



This is usually referred to as "speculative execution".



dk
Back to top
Dan Koren
Guest





Posted: Tue Aug 09, 2005 12:16 am    Post subject: Re: Branch prediction using counters. Reply with quote

"Skybuck Flying" <nospam@hotmail.com> wrote in message
news:dd89al$p17$1@news5.zwoll1.ov.home.nl...
Quote:

"Colonel Forbin" <forbin@dev.nul> wrote in message
news:3HNJe.52896$zY4.35348@tornado.ohiordc.rr.com...
In article <dd881i$dh1$1@news5.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Exactly motherfucker... so go spent all your time on discussing my
filthy
wortless ideas and then... write nice reports about them how much they
rule
or suck... meanwhile I have influenced the worrrrrrlldddddddd heheheh.

The fact of the matter is that I might be tooooooooooo lazzzzyyy to do
it
myself... so go ahead and use it, abuse it, test it, analyze it
etc......

But in the end I tooooooooooooooo will learn if it was usefull or not...
:PPPPPPPPPP******* hihihihihihihihihihihihihihihi

Bye,
Skybuck. ohhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
hehe.

I think I was wrong about the crystal meth. This guy needs to take his
Haldol. His probation officer is worried...

You speak in difficult medication terms... I need a manual...



You should try an automatic first.

Easier to control when one doesn't
know what one is doing.



dk
Back to top
Ken Smith
Guest





Posted: Tue Aug 09, 2005 1:18 am    Post subject: Re: Branch prediction using counters. Reply with quote

In article <dd7sjv$1vs$1@news1.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:
[... me ..]
Quote:
No, its is not common. For that matter I've never heard of the idea being
used. There are problems that the counter. It should not be part of the
instruction because that would mean that code space has to be written into
as the program runs. This will slow things down.

If it slows down the system then that would mean the bottleneck is in the
bandwidth/speed of the bus/whatever used to read/write instructions from/to
main memory.

I may not be the bus its self that is the limitting factor. The memory
device that holds the code may have a longer write time than read time.
This may be a physical limitation of the storage technology. To give an
extreme example, if the code is stored on a drum memory, this will reduce
the speed to one instruction per rotation.

Also, a counter is by its nature a slow operation because the highest bit
depends on all the bits below it. There are ways around this.


Quote:
How much memory bandwidth/speed is available to transfer instruction from/to
main memory where program code is located ?
Never enough.
;) and how much of that
bandwidth is generally used for application or maybe even games ? ;)
All of it.


Quote:
In other words... is bandwidth from main memory to cpu and back, truely the
limiting factor ? ;)

No, it is but one limiting factor among many.

BTW: If the code in question is a Booth's divide routine, the branch
prediction you suggest may actually slow the operation down instead of
speeding it up.

--
--
kensmith@rahul.net forging knowledge
Back to top
Ken Smith
Guest





Posted: Tue Aug 09, 2005 1:28 am    Post subject: Re: Branch prediction using counters. Reply with quote

In article <q1LJe.78310$5N3.61411@bgtnsc05-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
[...]
Quote:
You are mising an important point. There typically is no need to write to
the code space at all. Instruction caches have no write to memory
circuitry.

You are jumping to the wild assumption that the machine has caches at all.
The CPU could be tightly coupled to the system RAM. We could make life
even more interesting for the OP by making it one of those old bit-slicy
things where the code comes from a few dozen fast ROM chips and the data
is in static RAM.

Quote:
To write the counters to memory you would have to add an
additional path, and if you were going to write the updated value on every
branch, you are going to add a huge amount of memory traffic

We could always make a special code store with a counter for each location
built into it. That way the extra transfer would not be needed at the
mere cost of making it about 10 times as expensive.

[...]
Quote:
probably a small gain (hint, current branch predictors do pretty well
already).
... and CPUs without them are fairly fast anyway.


--
--
kensmith@rahul.net forging knowledge
Back to top
Ken Smith
Guest





Posted: Tue Aug 09, 2005 1:42 am    Post subject: Re: Branch prediction using counters. Reply with quote

In article <dfydncj3QpTKOmrfRVn-oA@comcast.com>,
Bob Monsen <rcsurname@comcast.net> wrote:
[...]
Quote:
GCC has a profiling mode that will allow one to capture this
information, for optimizing MIPS output (where it is quite useful, since
they have alternate opcodes depending on which branch is more likely).
As I recall (I've never used it) you run your code for a while, and then
get a dump of the counters, which you then feed back into GCC.

Profiling can be useful on nearly any modern processor if you can't see by
inspection which way the conditionals land. If you are hand tuning some
ASM code for the maximum speed, knowing which jumps are likely helps to
minimize the out of line jumps.

BTW: You can pulse a port bit high if you jump and another if you don't
and use a frequency counter to find the odds. This works well on
microcontrollers.

--
--
kensmith@rahul.net forging knowledge
Back to top
Bob Monsen
Guest





Posted: Tue Aug 09, 2005 7:12 am    Post subject: Re: Branch prediction using counters. Reply with quote

Ken Smith wrote:
Quote:
In article <dfydncj3QpTKOmrfRVn-oA@comcast.com>,
Bob Monsen <rcsurname@comcast.net> wrote:
[...]

GCC has a profiling mode that will allow one to capture this
information, for optimizing MIPS output (where it is quite useful, since
they have alternate opcodes depending on which branch is more likely).
As I recall (I've never used it) you run your code for a while, and then
get a dump of the counters, which you then feed back into GCC.


Profiling can be useful on nearly any modern processor if you can't see by
inspection which way the conditionals land. If you are hand tuning some
ASM code for the maximum speed, knowing which jumps are likely helps to
minimize the out of line jumps.

BTW: You can pulse a port bit high if you jump and another if you don't
and use a frequency counter to find the odds. This works well on
microcontrollers.


When I am coding assembler, I find it's almost always obvious which way
is more likely. Unfortunately, compilers don't usually understand what
you are trying to do all that well, and thus often get it wrong. The
profile to feedback scheme might be useful because the compiler can then
do the proper optimization by itself. However, any time I care that
much, I profile the code and rewrite any expensive chunks in assembler.

--
Regards,
Bob Monsen

If a little knowledge is dangerous, where is the man who has
so much as to be out of danger?
Thomas Henry Huxley, 1877
Back to top
Skybuck Flying
Guest





Posted: Tue Aug 09, 2005 8:15 am    Post subject: Re: Branch prediction using counters. Reply with quote

"Ken Smith" <kensmith@green.rahul.net> wrote in message
news:dd909m$ru3$2@blue.rahul.net...
Quote:
In article <dd7sjv$1vs$1@news1.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:
[... me ..]
No, its is not common. For that matter I've never heard of the idea
being
used. There are problems that the counter. It should not be part of
the
instruction because that would mean that code space has to be written
into
as the program runs. This will slow things down.

If it slows down the system then that would mean the bottleneck is in the
bandwidth/speed of the bus/whatever used to read/write instructions
from/to
main memory.

I may not be the bus its self that is the limitting factor. The memory
device that holds the code may have a longer write time than read time.
This may be a physical limitation of the storage technology. To give an
extreme example, if the code is stored on a drum memory, this will reduce
the speed to one instruction per rotation.

Also, a counter is by its nature a slow operation because the highest bit
depends on all the bits below it. There are ways around this.


How much memory bandwidth/speed is available to transfer instruction
from/to
main memory where program code is located ?
Never enough.
;) and how much of that
bandwidth is generally used for application or maybe even games ? ;)
All of it.

In other words... is bandwidth from main memory to cpu and back, truely
the
limiting factor ? ;)

No, it is but one limiting factor among many.

BTW: If the code in question is a Booth's divide routine, the branch
prediction you suggest may actually slow the operation down instead of
speeding it up.

For god's sakes what does booth's algorithm have to do with jump
instructions.

http://jingwei.eng.hmc.edu/~rwang/e85/lectures/arithmetic_html/node10.html

See ya.
Back to top
Bob Monsen
Guest





Posted: Tue Aug 09, 2005 8:15 am    Post subject: Re: Branch prediction using counters. Reply with quote

glen herrmannsfeldt wrote:
Quote:
Bob Monsen wrote:
(snip)

When I am coding assembler, I find it's almost always obvious which
way is more likely. Unfortunately, compilers don't usually understand
what you are trying to do all that well, and thus often get it wrong.
The profile to feedback scheme might be useful because the compiler
can then do the proper optimization by itself. However, any time I
care that much, I profile the code and rewrite any expensive chunks in
assembler.


It isn't always so obvious in assembler.

The ESA/390 instructions BXH and BXLE, branch on index high, and
branch on index low or equal, have opposite branch prediction logic.

There was a post once from someone who had coded a loop with both,
and found very different execution times. With the suggestion that
it might be branch prediction, the two were coded with the opposite
branch logic, and the timing difference was reversed, as would be
expected from branch prediction.

It is, though, not so obvious to an assembly programmer which
way the branch prediction logic is set up.

-- glen


Actually, what I meant is that it is usually obvious to an assembly
coder which path through a branch is usually taken. Note that the MIPS
architecture has different opcodes for both cases; for example, it has a
BLTZAL (Branch on Less Than Zero And Link) and BLTZALL (Branch On Less
Than Zero And Link Likely), which do exactly the same thing, but the
second one is presumably somewhat faster if the branch is more likely.

I guess they took your example to heart when designing the instruction set.

--
Regards,
Bob Monsen

If a little knowledge is dangerous, where is the man who has
so much as to be out of danger?
Thomas Henry Huxley, 1877
Back to top
glen herrmannsfeldt
Guest





Posted: Tue Aug 09, 2005 8:15 am    Post subject: Re: Branch prediction using counters. Reply with quote

Bob Monsen wrote:
(snip)

Quote:
When I am coding assembler, I find it's almost always obvious which way
is more likely. Unfortunately, compilers don't usually understand what
you are trying to do all that well, and thus often get it wrong. The
profile to feedback scheme might be useful because the compiler can then
do the proper optimization by itself. However, any time I care that
much, I profile the code and rewrite any expensive chunks in assembler.

It isn't always so obvious in assembler.

The ESA/390 instructions BXH and BXLE, branch on index high, and
branch on index low or equal, have opposite branch prediction logic.

There was a post once from someone who had coded a loop with both,
and found very different execution times. With the suggestion that
it might be branch prediction, the two were coded with the opposite
branch logic, and the timing difference was reversed, as would be
expected from branch prediction.

It is, though, not so obvious to an assembly programmer which
way the branch prediction logic is set up.

-- glen
Back to top
Ken Smith
Guest





Posted: Tue Aug 09, 2005 2:01 pm    Post subject: Re: Branch prediction using counters. Reply with quote

In article <dd9fvt$s9b$1@news5.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:
[...]
Quote:
For god's sakes what does booth's algorithm have to do with jump
instructions.

Take a look at what is suggest be done and you should be able to see that,
when coded for speed, other than the loop jumps, the jumps in the first
few passes through the code are worse than useless for predicting the
jumps later.

For the loop instructions, the branch condition is known well in advance.
In the fastest RISC machines, this allows the loop jumps to happen with no
lost cycles.

--
--
kensmith@rahul.net forging knowledge
Back to top
Ken Smith
Guest





Posted: Tue Aug 09, 2005 2:18 pm    Post subject: Re: Branch prediction using counters. Reply with quote

In article <ddad91$ei4$1@gemini.csx.cam.ac.uk>,
Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
[... me ...]
Quote:
|> Take a look at what is suggest be done and you should be able to see that,
|> when coded for speed, other than the loop jumps, the jumps in the first
|> few passes through the code are worse than useless for predicting the
|> jumps later.

The same applies to much of the code in many graph theoretic
algorithms (VERY common and often dominating the time in some
fields) and the "object oriented" programming paradigm favoured
by many C++ and GUI programmers. In the latter case, the correct
key for prediction is the contents of a register and not what
the branch did last time.

Yes, there are lots of other examples, but I think the Booth's case is the
simplest real case to see the problem with.

BTW: I disagree with the suggestion that jumps in OO programs are by
there nature harder to predict. Virtual method dispatching is a hard type
of branch to predict but in non-OO code, the virtual method call would be
replaced with a long switch statement. In good compilers, long switches
are implemented with jump tables. "goto table[i]" is very hard to
predict.

--
--
kensmith@rahul.net forging knowledge
Back to top
Andy Freeman
Guest





Posted: Tue Aug 09, 2005 4:15 pm    Post subject: Re: Branch prediction using counters. Reply with quote

Nick Maclaren wrote:
Quote:
What is needed is a compiler-generated instruction that sets up a
datum to be used for branch prediction, which is then quoted in
the branch, and used by the hardware. The instruction history
alone is useless for such codes, but the object address is
(obviously) excellent. This has the advantage that information
from a previous branch in the same function could be used.
....
As far as I know, I have rendered this unpatentable by posting :-)

US Patent 5,949,995.

You can use an instruction that references the branch or its prediction
instead of a branch that references the prediction datum. The
advantage of the former is that it doesn't require any additional
information in branch instructions and the prediction instructions are
a lot like stores.

-andy
Back to top
Keith Williams
Guest





Posted: Tue Aug 09, 2005 4:16 pm    Post subject: Re: Branch prediction using counters. Reply with quote

In article <42f7e3eb$1@news.meer.net>, dankoren@yahoo.com says...
Quote:
"Ben Bradley" <ben_nospam_bradley@frontiernet.net> wrote in message
news:jicff1hrkkkucio1ljkdfugtrm4sfkuvld@4ax.com...
In comp.arch,sci.electronics.design, On Mon, 08 Aug 2005 16:38:23 GMT,
forbin@dev.nul (Colonel Forbin) wrote:

In article <dd756m$38b$1@news3.zwoll1.ov.home.nl>,
Skybuck Flying <nospam@hotmail.com> wrote:

Some time ago I already had this idea of inserting counters into if
statements to predict which branch is most likely to be executed. At the
lower levels this would mean inserting counters into jump instructions
and
recording which branch is executed the most by incrementing or maybe even
decrementing counters everytime the branch gets executed.

Actually this is sort of analogous to runtime profiling feedback to
compilers, which is used on a number of systems.

The problem is that it is often impossible to predict an event until
the outcome is observed (think quantum mechanics).

If we knew the outcome of each branch, we could simply eliminate the
branch in most cases and still have a reasonable approximation.

Inserting counters into production code typically comes at a great
price in terms of overhead.

We typically do this in development in order to optimize the most
costly sections of code for the most common outcomes, but trying
to put some sort of low level adaptive optimization metrics into
production code would consume more in overhead than the cost of
doing the task to begin with.

I recall reading of a processor that does pipelining through
branches (I don't remember which, but it might be a TI DSP) by using
several execution units. One executes the instruction(s) where the
branch is taken, the other executes the code where the branch is not
taken. When the branch is decided, the results from the corresponding
execution unit are taken and execution continues from there, and the
other results are 'thrown away' unused. This method takes a lot more
silicon space for the processor, but results in a speed increase that
apparently can't be done otherwise.



This is usually referred to as "speculative execution".

Yes, but speculative execution doesn't usually include following both

paths. Going both ways after branches gets ugly on the second and
subsequent branches. ;-). The PPC-970 can have ~200 instructions in
flight (half in the prefetch/decode stages and half between issue and
completion). To keep the pipe full, it will speculatively fetch/execute
up to sixteen branches deep, but will speculatively execute only one
path of the 65K possible. There are optimization rules for branches
(branches backwards are usually taken, forwards usually not, etc.) and
the programmer can drop hints to override these defaults.

--
Keith
Back to top
Nick Maclaren
Guest





Posted: Tue Aug 09, 2005 4:16 pm    Post subject: Re: Branch prediction using counters. Reply with quote

In article <MPG.1d627c16fde3672b989b65@news.individual.net>,
Keith Williams <krw@att.bizzzz> writes:
|>
|> Yes, but speculative execution doesn't usually include following both
|> paths. Going both ways after branches gets ugly on the second and
|> subsequent branches. ;-). The PPC-970 can have ~200 instructions in
|> flight (half in the prefetch/decode stages and half between issue and
|> completion). To keep the pipe full, it will speculatively fetch/execute
|> up to sixteen branches deep, but will speculatively execute only one
|> path of the 65K possible. There are optimization rules for branches
|> (branches backwards are usually taken, forwards usually not, etc.) and
|> the programmer can drop hints to override these defaults.

Which is fine for the simplest codes, but useless for ones which
are hard to predict. An old rule is that 20% of instructions
are branches, but that might drop to 10% in some ISAs. Anyway,
unless you can predict close to 95% correctly, speculatively
executing 16 branches ahead isn't going to help.

The thing that CAN be done both ways is cache preloading, though
it can increase the memory bandwidth unacceptably. Something
that could be done is a priority based cache preloader, fed with
data from the speculative execution. But even that doesn't help
all that much for difficult codes.


Regards,
Nick Maclaren.
Back to top
Nick Maclaren
Guest





Posted: Tue Aug 09, 2005 4:16 pm    Post subject: Re: Branch prediction using counters. Reply with quote

In article <ddad0d$g56$2@blue.rahul.net>, kensmith@green.rahul.net (Ken Smith) writes:
|> In article <dd9fvt$s9b$1@news5.zwoll1.ov.home.nl>,
|> Skybuck Flying <nospam@hotmail.com> wrote:
|> [...]
|> >For god's sakes what does booth's algorithm have to do with jump
|> >instructions.
|>
|> Take a look at what is suggest be done and you should be able to see that,
|> when coded for speed, other than the loop jumps, the jumps in the first
|> few passes through the code are worse than useless for predicting the
|> jumps later.

The same applies to much of the code in many graph theoretic
algorithms (VERY common and often dominating the time in some
fields) and the "object oriented" programming paradigm favoured
by many C++ and GUI programmers. In the latter case, the correct
key for prediction is the contents of a register and not what
the branch did last time.


Regards,
Nick Maclaren.
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2, 3, 4  Next
Page 2 of 4

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB