| Author |
Message |
Vikas Mishra
Guest
|
Posted:
Tue Aug 16, 2005 8:15 am Post subject:
Non Power of 2 Cache Sizes |
|
|
Hi Folks,
We were having a discussion today about having a "non power of 2
instruction cache". Conceptually I would think that the only "hard"
reason for having a power of 2 is that the line index calculation is
trivial and for a non power of 2 it would be some complicated
calculation like modulus with 40K (for a size of say 40KB) or something
equally strange.
But is there any other real reason besides this why "non power of 2
caches" can't be used ? I seemed to remember having read somewhere that
cache's can't be non power of two but I don't remember where. I have
searched Hennessey and Patterson ( "A Quantitative Approach" ) but
wasn't able to find anything specific which said whether this is
possible or not.
In case there have been real/experimental designs with non power of 2
caches, could you point me to references.
Thanks & Regards,
Vikas |
|
| Back to top |
|
 |
Anton Ertl
Guest
|
Posted:
Tue Aug 16, 2005 8:15 am Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
"Vikas Mishra" <vikas.mishra@gmail.com> writes:
| Quote: | In case there have been real/experimental designs with non power of 2
caches, could you point me to references.
|
SuperSPARC, 21164, Pentium 4, 21364, Itanium 2.
- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Tue Aug 16, 2005 1:22 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
"Anton Ertl" <anton@mips.complang.tuwien.ac.at> wrote in message
news:2005Aug16.094402@mips.complang.tuwien.ac.at...
| Quote: | "Vikas Mishra" <vikas.mishra@gmail.com> writes:
In case there have been real/experimental designs with non power of 2
caches, could you point me to references.
SuperSPARC, 21164, Pentium 4, 21364, Itanium 2.
|
Some of the PA-RISC chips have 1.75 MB on chip.
dk |
|
| Back to top |
|
 |
Grumble
Guest
|
Posted:
Tue Aug 16, 2005 1:25 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
Anton Ertl wrote:
| Quote: | Vikas Mishra wrote:
In case there have been real/experimental designs with non power of 2
caches, could you point me to references.
SuperSPARC, 21164, Pentium 4, 21364, Itanium 2.
|
Do you mean the P4's trace cache? |
|
| Back to top |
|
 |
Patrick Schaaf
Guest
|
Posted:
Tue Aug 16, 2005 1:28 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
"Vikas Mishra" <vikas.mishra@gmail.com> writes:
| Quote: | We were having a discussion today about having a "non power of 2
instruction cache". Conceptually I would think that the only "hard"
reason for having a power of 2 is that the line index calculation is
trivial and for a non power of 2 it would be some complicated
calculation like modulus with 40K (for a size of say 40KB) or something
equally strange.
|
Not neccessarily, I would think, except for direct mapped caches.
Way selection is done with an associative tag structure, and I
don't see any reason why that would be more efficient when done
with a power-of-two number of ways. So you could have a 256k*7-way
cache, for a total of 1.75 MB. Some googling finds this page,
which describes exactly that for a PA-RISC processor:
http://www.arcade-eu.info/overview/2005/ev7.html
best regards
Patrick |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Tue Aug 16, 2005 3:49 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
Vikas Mishra wrote:
| Quote: | Hi Folks,
We were having a discussion today about having a "non power of 2
instruction cache". Conceptually I would think that the only "hard"
reason for having a power of 2 is that the line index calculation is
trivial and for a non power of 2 it would be some complicated
calculation like modulus with 40K (for a size of say 40KB) or something
equally strange.
But is there any other real reason besides this why "non power of 2
caches" can't be used ? I seemed to remember having read somewhere that
cache's can't be non power of two but I don't remember where. I have
searched Hennessey and Patterson ( "A Quantitative Approach" ) but
wasn't able to find anything specific which said whether this is
possible or not.
|
Besides the quite common multi-way caches, where "multi" isn't a power
of two, but the size of each way is, it is definitely possible to have
strange sizes, it is just _very_ hard to make them fast enough!
It should be easier to have non-power-of-two memory banks, since RAM is
a _lot_ slower than cache (presumably :-) anyway, and even before
multi-level caches became common there was a mainframe with something
like 17-way main memory. (Honeywell?)
If you are going to do something like this (i.e. having a prime number
of banks), then it is probably a good idea to select a prime of the form
2^n-1 or 2^n+1, since you can do div/mod operations on these a lot
faster than on a random modulus.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Anton Ertl
Guest
|
Posted:
Tue Aug 16, 2005 3:58 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
Grumble <devnull@kma.eu.org> writes:
| Quote: | Anton Ertl wrote:
Vikas Mishra wrote:
In case there have been real/experimental designs with non power of 2
caches, could you point me to references.
SuperSPARC, 21164, Pentium 4, 21364, Itanium 2.
Do you mean the P4's trace cache?
|
Yes.
- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html |
|
| Back to top |
|
 |
Anton Ertl
Guest
|
Posted:
Tue Aug 16, 2005 3:58 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
mailer-daemon@bof.de (Patrick Schaaf) writes:
| Quote: | "Vikas Mishra" <vikas.mishra@gmail.com> writes:
Way selection is done with an associative tag structure, and I
don't see any reason why that would be more efficient when done
with a power-of-two number of ways.
|
I guess there is some reason, because set-associative caches with a
power-of-2 number of ways are quite common.
Hmm, maybe the reason is marketing; If the Athlon 64 had 64*9=576KB L2
cache, nobody would be able to remember that:-).
The EV7 is an Alpha, not a PA-RISC CPU.
- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Aug 16, 2005 4:05 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
In article <ddsgb9$4em$1@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.mathisen@hda.hydro.com> writes:
|>
|> Besides the quite common multi-way caches, where "multi" isn't a power
|> of two, but the size of each way is, it is definitely possible to have
|> strange sizes, it is just _very_ hard to make them fast enough!
Sorry, Terje, but you are suffering from a lack of imagination!
It is hard to make level 1 data caches fast enough, but that is
about all.
|> If you are going to do something like this (i.e. having a prime number
|> of banks), then it is probably a good idea to select a prime of the form
|> 2^n-1 or 2^n+1, since you can do div/mod operations on these a lot
|> faster than on a random modulus.
Nope. That is true only if you use the current, ghastly, designs.
Such a cache structure is undesirable on performance grounds, even
with a power of two, because it is really nasty when your array
sizes match the cache size.
What you should do is to hash the address - as this need not be a
cryptographic hash, doesn't have to be perfect, and doesn't have
to be done in software, it can be fast. Such a hash can produce
values in any range.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Vikas Mishra
Guest
|
Posted:
Tue Aug 16, 2005 4:11 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
Patrick Schaaf wrote:
| Quote: | "Vikas Mishra" <vikas.mishra@gmail.com> writes:
We were having a discussion today about having a "non power of 2
instruction cache". Conceptually I would think that the only "hard"
reason for having a power of 2 is that the line index calculation is
trivial and for a non power of 2 it would be some complicated
calculation like modulus with 40K (for a size of say 40KB) or something
equally strange.
Not neccessarily, I would think, except for direct mapped caches.
Way selection is done with an associative tag structure, and I
don't see any reason why that would be more efficient when done
with a power-of-two number of ways. So you could have a 256k*7-way
cache, for a total of 1.75 MB. Some googling finds this page,
which describes exactly that for a PA-RISC processor:
http://www.arcade-eu.info/overview/2005/ev7.html
|
Patrick thanks. So is there a specific restriction that you are aware of for
a Direct Mapped Cache ? Actually I should have been more specific in my
question - I was interested more in a direct mapped cache. My apologies for
the same.
Regards,
Vikas |
|
| Back to top |
|
 |
Torben Ęgidius Mogensen
Guest
|
Posted:
Tue Aug 16, 2005 4:13 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
nmm1@cus.cam.ac.uk (Nick Maclaren) writes:
| Quote: | What you should do is to hash the address - as this need not be a
cryptographic hash, doesn't have to be perfect, and doesn't have
to be done in software, it can be fast. Such a hash can produce
values in any range.
|
Yes, but if you start with a power of two address range and hash down
to a range that isn't a power of two, some results will occur more
often than others. If the range of addresses is much larger than the
range of hash values, this won't be significant, though.
What matters more is that you are probably going to use binary to
represent the hash values, and if the range isn't a power of two, you
won't use these bits optimally. Again, if the range is only slightly
smaller than a power of two (say, 2^n-1), it won't matter much.
Torben |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Aug 16, 2005 4:15 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
In article <7zoe7y5e3z.fsf@tyr.diku.dk>,
torbenm@tyr.diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=) writes:
|>
|> > What you should do is to hash the address - as this need not be a
|> > cryptographic hash, doesn't have to be perfect, and doesn't have
|> > to be done in software, it can be fast. Such a hash can produce
|> > values in any range.
|>
|> Yes, but if you start with a power of two address range and hash down
|> to a range that isn't a power of two, some results will occur more
|> often than others. If the range of addresses is much larger than the
|> range of hash values, this won't be significant, though.
|>
|> What matters more is that you are probably going to use binary to
|> represent the hash values, and if the range isn't a power of two, you
|> won't use these bits optimally. Again, if the range is only slightly
|> smaller than a power of two (say, 2^n-1), it won't matter much.
Well, yes, but in what sense is taking a number modulo a fixed
range not a hashing function? All of those statements apply to
that as well, as they do when using conventional cache lookup
on a system where physical addresses ranges are not from 2^K
to 2^(K+L) with K > L.
The point is that you are free to choose a hash function to be
fast as well as adequately uniform, and it isn't hard once you
get beyond the L1 data path.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Dan Koren
Guest
|
Posted:
Tue Aug 16, 2005 4:15 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message
news:ddsh9b$dk9$1@gemini.csx.cam.ac.uk...
| Quote: |
In article <ddsgb9$4em$1@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.mathisen@hda.hydro.com> writes:
|
|> Besides the quite common multi-way caches, where "multi" isn't a power
|> of two, but the size of each way is, it is definitely possible to have
|> strange sizes, it is just _very_ hard to make them fast enough!
Sorry, Terje, but you are suffering from a lack of imagination!
It is hard to make level 1 data caches fast enough, but that is
about all.
|> If you are going to do something like this (i.e. having a prime number
|> of banks), then it is probably a good idea to select a prime of the
form
|> 2^n-1 or 2^n+1, since you can do div/mod operations on these a lot
|> faster than on a random modulus.
Nope. That is true only if you use the current, ghastly, designs.
Such a cache structure is undesirable on performance grounds, even
with a power of two, because it is really nasty when your array
sizes match the cache size.
What you should do is to hash the address - as this need not be a
cryptographic hash, doesn't have to be perfect, and doesn't have
to be done in software, it can be fast. Such a hash can produce
values in any range.
|
Hashing is becoming a lost art,
just like everything else of
importance in computing ;-)
dk |
|
| Back to top |
|
 |
Chris Colohan
Guest
|
Posted:
Tue Aug 16, 2005 4:15 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
"Dan Koren" <dankoren@yahoo.com> writes:
| Quote: | Hashing is becoming a lost art,
just like everything else of
importance in computing ;-)
|
Ever used Perl? Everything is a scalar or a hash... (You just don't
get to choose the hash function used.)
:-)
Chris
--
Chris Colohan Email: chris@colohan.ca PGP: finger colohan@cs.cmu.edu
Web: www.colohan.com Phone: (412)268-4751 |
|
| Back to top |
|
 |
John Mashey
Guest
|
Posted:
Tue Aug 16, 2005 11:04 pm Post subject:
Re: Non Power of 2 Cache Sizes |
|
|
Vikas Mishra wrote:
| Quote: | Patrick Schaaf wrote:
Not neccessarily, I would think, except for direct mapped caches.
Way selection is done with an associative tag structure, and I
don't see any reason why that would be more efficient when done
with a power-of-two number of ways. So you could have a 256k*7-way
cache, for a total of 1.75 MB. Some googling finds this page,
which describes exactly that for a PA-RISC processor:
Patrick thanks. So is there a specific restriction that you are aware of for
a Direct Mapped Cache ? Actually I should have been more specific in my
question - I was interested more in a direct mapped cache. My apologies for
the same.
|
1) Maybe someone has implemented a non-power-of-2 direct-mapped cache,
but I've never heard of one. I've seen designs where there was some
simple [low-gate-delay] hashing inot a power-of-2 cache, but that was
for performance (avoidance of hot spots, especially in virtual caches),
not to allow a non-power-of-2 cache. Addressing bits cost, whether for
registers, cachelines, or memory addresses, so people try not to waste
them.
Of course, I can't think of any real designer who would waste a second
of thought about putting a DIV/MOD in the the address path. [Google
comp.arch: mashey powers 2 1994].
2) Somebody said, it doesn't really cost much to have non-power-of-2
set-associative caches, and there is a very good reason to do this
sometimes.
In real-world chip design, things like floorplans, wire lengths, gate
delays, design costs, time-to-market, etc, etc, actually matter.
Suppose you are considering a design with the usual collection of
units, and the data-cache is 2-way. You may well have filled the die.
On the other hand, you might be pad/bump limited, i.e., in order to get
the number of power/grounds and signals you need, there is a minimum
size for the chip, and there may be extra space left. What do you do
with the space?
Suppose there is not enough space to double the cache size, but there
is enough space to do 3-way. That can be a very attractive
alternative, as it is much easier to design than say, redesignign the
pipeline, adding functional units, etc. Alternatively, if you had
planned an 8-way cache, and misestimated space on other units, it might
just be easier to go to 7-way, assuming that the layout let you make
use of the freed-up die space [sometimes it does, sometimes it
doesn't.]
Such flexibility is often quite useful.
In the MIPS R2000, we had a 64-element TLB, with a 6-bit index, but
(unlike the size of a direct-mapped cache), there was nothing magic
about 64. It could have been 63 or 60 with minimal bother, and that
flexibility was valuable, because for a while, it didn't look like 64
would fit. Had the design been one that required power-of-2, I would
have been very nervous that we'd have been forced to go to 32. |
|
| Back to top |
|
 |
|
|
|
|