Athlon cache question.
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Athlon cache question.
Goto page 1, 2  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Jouni Osmala
Guest





Posted: Sat Dec 11, 2004 3:01 pm    Post subject: Athlon cache question. Reply with quote

64Kb of L1 cache, 2 way associative, and 4kb page size.
virtually addressed, physicly tagged, how that works I don't know.
Or did I got something wrong.
First thing to clarify.
Answers I do Verify.
Schoolwork this not be.
The knowledge is just for me.
These are logic level computer basics,
My school asks semiconductor physics.

I know my poetics are lame.
For me its just the same.

Jouni Osmala
Helsinki University of Technology.
Student
Back to top
Kornilios Kourtis
Guest





Posted: Sat Dec 11, 2004 3:01 pm    Post subject: Re: Athlon cache question. Reply with quote

Jouni Osmala <josmala@cc.hut.fi> wrote:
Quote:
virtually addressed, physicly tagged, how that works I don't know.

I think it goes like this:

[virtual address] = [virtual page nr][offset]

the following translation are conducted in parallel:

[virtual page nr] ==(TLB)==> [page physicall addr]
[offset] ==(L1)===> [tag][data]

if ([tag] == [page physicall addr])
send [data] to cpu /* HIT! */

This way you don't have to serial access the TLB (for virt2phys)
and then the L1.

--
Kornilios Kourtis

Computers are useless. They can only give you answers.
- Pablo Picasso
Back to top
Kornilios Kourtis
Guest





Posted: Sat Dec 11, 2004 3:01 pm    Post subject: Re: Athlon cache question. Reply with quote

Jouni Osmala <josmala@cc.hut.fi> wrote:
Quote:
Kornilios Kourtis wrote:
Jouni Osmala <josmala@cc.hut.fi> wrote:

virtually addressed, physicly tagged, how that works I don't know.


I think it goes like this:

[virtual address] = [virtual page nr][offset]

the following translation are conducted in parallel:

[virtual page nr] ==(TLB)==> [page physicall addr]
[offset] ==(L1)===> [tag][data]

if ([tag] == [page physicall addr])
send [data] to cpu /* HIT! */

This way you don't have to serial access the TLB (for virt2phys)
and then the L1.

I did know THIS. But the real problem is that
4kb page size* 2 ways =8kb not 64kb. They need more physical bits than
they know without going to TLB and I just don't know how they do THAT.
Offset is too small for the tag alone.


I think that they need to address (cache size)/(cacheline size) and
not the whole size of the cache. The cacheline on athlon is 64 bytes,
so the physical bits should be enough.

--
Kornilios Kourtis

Computers are useless. They can only give you answers.
- Pablo Picasso
Back to top
Kornilios Kourtis
Guest





Posted: Sat Dec 11, 2004 3:01 pm    Post subject: Re: Athlon cache question. Reply with quote

Kornilios Kourtis <kkourt@no.cslab.more.ece.spam.ntua.gr> wrote:
Quote:
Jouni Osmala <josmala@cc.hut.fi> wrote:
Kornilios Kourtis wrote:
I did know THIS. But the real problem is that
4kb page size* 2 ways =8kb not 64kb. They need more physical bits than
they know without going to TLB and I just don't know how they do THAT.
Offset is too small for the tag alone.


I think that they need to address (cache size)/(cacheline size) and
not the whole size of the cache. The cacheline on athlon is 64 bytes,
so the physical bits should be enough.


No that's *NOT* the answer
Just ignore me :)

--
Kornilios Kourtis

Computers are useless. They can only give you answers.
- Pablo Picasso
Back to top
Jouni Osmala
Guest





Posted: Sat Dec 11, 2004 3:42 pm    Post subject: Re: Athlon cache question. Reply with quote

Kornilios Kourtis wrote:
Quote:
Jouni Osmala <josmala@cc.hut.fi> wrote:

virtually addressed, physicly tagged, how that works I don't know.


I think it goes like this:

[virtual address] = [virtual page nr][offset]

the following translation are conducted in parallel:

[virtual page nr] ==(TLB)==> [page physicall addr]
[offset] ==(L1)===> [tag][data]

if ([tag] == [page physicall addr])
send [data] to cpu /* HIT! */

This way you don't have to serial access the TLB (for virt2phys)
and then the L1.

I did know THIS. But the real problem is that
4kb page size* 2 ways =8kb not 64kb. They need more physical bits than
they know without going to TLB and I just don't know how they do THAT.
Offset is too small for the tag alone.

Jouni Osmala
Back to top
Niels Jørgen Kruse
Guest





Posted: Sun Dec 12, 2004 1:33 am    Post subject: Re: Athlon cache question. Reply with quote

Kornilios Kourtis <kkourt@NO.cslab.MORE.ece.SPAM.ntua.gr> wrote:

Quote:
Jouni Osmala <josmala@cc.hut.fi> wrote:
virtually addressed, physicly tagged, how that works I don't know.

I think it goes like this:

[virtual address] = [virtual page nr][offset]

[virtual line address] = [remainder][virtual index]
(remainder is high bits of virtual page nr, not used as a unit)
Quote:

the following translation are conducted in parallel:

[virtual page nr] ==(TLB)==> [page physicall addr]
[offset] ==(L1)===> [tag][data]
[virtual index] ==(L1)===> [tag][data] times associativity

Because the line could have been loaded previously with a different
translation (pretty rare we hope), the other possible locations must be
searched in case of a miss (also rare we hope). There must be only one
copy in L1 of a line.
Quote:
if ([tag] == [page physicall addr])
send [data] to cpu /* HIT! */
In case of a match we DO have a hit.



--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark
Back to top
Jan Vorbrüggen
Guest





Posted: Mon Dec 13, 2004 2:52 pm    Post subject: Re: Athlon cache question. Reply with quote

Quote:
But the real problem is that 4kb page size* 2 ways =8kb not 64kb.

Put a CAM on the eight possible matching tags, and when more than one
match occurs, generate a machine check?

Jan
Back to top
Bernd Paysan
Guest





Posted: Mon Dec 13, 2004 4:49 pm    Post subject: Re: Athlon cache question. Reply with quote

Jan Vorbrüggen wrote:

Quote:
But the real problem is that 4kb page size* 2 ways =8kb not 64kb.

Put a CAM on the eight possible matching tags, and when more than one
match occurs, generate a machine check?

I've started to write a program that tests the effect of the virtual indexed
cache on the Athlon, but haven't got that far. mmap() is the tool of
choice, to map a file to several pages, so that the physical memory behind
these pages is always the same. Several pages of the file should then be
mapped on the virtual alias boundary (i.e. 32k apart), to see how the
associativity of the cache works out.

If read timing is not affected, it's quite likely that the same data is
multiple times in the L1 cache. This should however give an impact on
write. The test data should cover the whole page, so that a simple store
buffer or something like that can't hide the latency problem.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Back to top
Anton Ertl
Guest





Posted: Mon Dec 13, 2004 5:25 pm    Post subject: Re: Athlon cache question. Reply with quote

Bernd Paysan <bernd.paysan@gmx.de> writes:
Quote:
I've started to write a program that tests the effect of the virtual indexed
cache on the Athlon, but haven't got that far. mmap() is the tool of
choice, to map a file to several pages, so that the physical memory behind
these pages is always the same.

I have done such a program that tests the cache consistency in this
way <http://www.complang.tuwien.ac.at/anton/toggle_dc_16k/mapcheck.c>;
the parameters are set up for the 21064a (16KB direct mapped D-cache,
8KB page size). A timing program would be harder.

Quote:
Several pages of the file should then be
mapped on the virtual alias boundary (i.e. 32k apart), to see how the
associativity of the cache works out.

Well, if they really did more than 2-way set associativity (as seen
from the CPU), they would certainly advertize it; and if they did it
direct-mapped, that should be pretty obvious and well-known by now.

Some time ago there was a discussion on that (my contribution was
<2004Aug9.100911@mips.complang.tuwien.ac.at>), and there were some
explanations of how physical tagging worked on the Athlons.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
Back to top
Dan Koren
Guest





Posted: Mon Dec 20, 2004 2:46 am    Post subject: Re: Athlon cache question. Reply with quote

It would be interesting to find out how will the Athlon
perform under database and OLTP workloads, where the
translation cases you describe as "rare" are in fact
quite frequent. Previous architectures that used
virtually indexed caches have encountered serious
performance and scalability problems under multi-
threaded OLTP workloads (Oracle, Informix, Tuxedo).
Those have included early SPARC designs, HP PA-RISC,
and the MIPS R6000.

I find it almost unbelievable that 30 years after IBM
wrote the book on cache design (and closed it) people
are still throwing darts in the dark. But I suppose it
feels more rewarding to be "creative", dabble in new
designs and get more patents to one's name, than to do
solid engineering and spend time researching earlier
work.



dk


"Niels Jørgen Kruse" <nospam@ab-katrinedal.dk> wrote in message
news:1gondj3.42ca4il8zwjkN%nospam@ab-katrinedal.dk...
Quote:
Kornilios Kourtis <kkourt@NO.cslab.MORE.ece.SPAM.ntua.gr> wrote:

Jouni Osmala <josmala@cc.hut.fi> wrote:
virtually addressed, physicly tagged, how that works I don't know.

I think it goes like this:

[virtual address] = [virtual page nr][offset]

[virtual line address] = [remainder][virtual index]
(remainder is high bits of virtual page nr, not used as a unit)

the following translation are conducted in parallel:

[virtual page nr] ==(TLB)==> [page physicall addr]
[offset] ==(L1)===> [tag][data]
[virtual index] ==(L1)===> [tag][data] times associativity
Because the line could have been loaded previously with a different
translation (pretty rare we hope), the other possible locations must be
searched in case of a miss (also rare we hope). There must be only one
copy in L1 of a line.
if ([tag] == [page physicall addr])
send [data] to cpu /* HIT! */
In case of a match we DO have a hit.


--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark
Back to top
Andi Kleen
Guest





Posted: Mon Dec 20, 2004 2:56 am    Post subject: Re: Athlon cache question. Reply with quote

"Dan Koren" <dankoren@yahoo.com> writes:

Quote:
It would be interesting to find out how will the Athlon
perform under database and OLTP workloads, where the

The database benchmarks for Opteron (which afaik uses very similar
caches to Athlon) look very good.

Quote:
translation cases you describe as "rare" are in fact
quite frequent. Previous architectures that used
virtually indexed caches have encountered serious
performance and scalability problems under multi-
threaded OLTP workloads (Oracle, Informix, Tuxedo).
Those have included early SPARC designs, HP PA-RISC,
and the MIPS R6000.

Remember only the L1 is virtually indexed. L2 isn't.
D-L1 is only 64K, L2 is between 256K and 1MB (later
for server oriented chips)

The big workloads tend to thrash L1 pretty badly anyways, often even
L2 is thrashed. But L1 use is so localized that it doesn't matter
much. Other server x86 CPUs get away with a much smaller L1 even.

-Andi
Back to top
Niels Jørgen Kruse
Guest





Posted: Mon Dec 20, 2004 4:12 am    Post subject: Re: Athlon cache question. Reply with quote

Dan Koren <dankoren@yahoo.com> wrote:

Quote:
It would be interesting to find out how will the Athlon
perform under database and OLTP workloads, where the
translation cases you describe as "rare" are in fact
quite frequent. Previous architectures that used
virtually indexed caches have encountered serious
performance and scalability problems under multi-
threaded OLTP workloads (Oracle, Informix, Tuxedo).
Those have included early SPARC designs, HP PA-RISC,
and the MIPS R6000.

Although the subject line singles out the Athlon, I answered the
question in a general sense and had the PPC970FX manual open to check
against. That is how the line about there only being one copy of a line
slipped over. I don't know that the Athlon enforces this (but could
perhaps hunt down the answer in public references).

In the case of the POWER4 core (as in the PPC970FX) it should be noted
that the virtual index bits are invariant under segment mapping and all
processes share a single virtual address space, so sharing is possible
without aliasing in L1.

Quote:
I find it almost unbelievable that 30 years after IBM
wrote the book on cache design (and closed it) people
are still throwing darts in the dark. But I suppose it
feels more rewarding to be "creative", dabble in new
designs and get more patents to one's name, than to do
solid engineering and spend time researching earlier
work.

What was it IBM wrote 30 years ago?

--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark
Back to top
Dan Koren
Guest





Posted: Mon Dec 20, 2004 7:55 am    Post subject: Re: Athlon cache question. Reply with quote

"Dan Koren" <dankoren@yahoo.com> wrote in message
news:41c5f69b$1@news.meer.net...
Quote:

It would be interesting to
find out how will the Athlon
^^^^^^

------------------------||||||


Oops! I meant Opteron.


dk
Back to top
Dan Koren
Guest





Posted: Mon Dec 20, 2004 7:55 am    Post subject: Re: Athlon cache question. Reply with quote

"Niels Jørgen Kruse" <nospam@ab-katrinedal.dk> wrote in message
news:1gp2e53.14wois9oey6gwN%nospam@ab-katrinedal.dk...
Quote:
Dan Koren <dankoren@yahoo.com> wrote:

I find it almost unbelievable that 30 years after IBM
wrote the book on cache design (and closed it) people
are still throwing darts in the dark. But I suppose it
feels more rewarding to be "creative", dabble in new
designs and get more patents to one's name, than to do
solid engineering and spend time researching earlier
work.

What was it IBM wrote 30 years ago?



IBM researched cache architectures
pretty much to death during the '70s.

Of course younger designers tend to
ignore the work of earlier generations.



dk
Back to top
Anne & Lynn Wheeler
Guest





Posted: Wed Dec 29, 2004 11:35 pm    Post subject: Re: Athlon cache question. Reply with quote

"Dan Koren" <dankoren@yahoo.com> writes:
Quote:
IBM researched cache architectures
pretty much to death during the '70s.

Of course younger designers tend to
ignore the work of earlier generations.

slightly related is replacement strategies ... my recollection
is that at the same asilomar/sigops meeting where i432 people
presented ... including lots of comments about patching complex
operating system features that had been dropped into silicon
.... recent refs:
http://www.garlic.com/~lynn/2004q.html#60 Will multicore CPUs have identical cores?
http://www.garlic.com/~lynn/2004q.html#64 Will multicore CPUs have identical cores?

... jim introduced me to a co-worker at tandem that was having
trouble getting his stanford phd ... so it must have been after
jim had left sjr for tandem .... random sjr/systemr posts
http://www.garlic.com/~lynn/subtopic.html#systemr

the problem was that the thesis was basically on global LRU
replacement strategy and stanford was getting a lot of pushback from
strong advocate of local LRU replacement.

in the late 60s there was some academic literature on local LRU
replacement and working sets. at that time i was an undergraduate and
doing lots of operating system changes ... including having come up
with this global LRU idea, implemented it and it had shipped in
products.

the problem at hand was to show global LRU replacement significantly
better than local LRU.

much of the 70s, i was at the cambridge science center ... and in the
early 70s the grenoble science center had done a study using the same
operating system and the same hardware and the same type of workload
.... but implementing a "working set dispatcher" and even gotten a
paper published in the cacm. they had come by cambridge and left me
with rough draft ... as well as a lot of the the detailed backup study
information. The primary difference between cambridge and grenoble was
that grenoble had a 1mbyte real storage 360/67 (154 4k "pageable
pages" after fixed memory requirements) and were running 35 concurrent
users while cambridge had a 768k real storage 360/67 (104 4k "pageable
pages" after fixed memory requirements) and 75-80 users. Cambridge
with approx. twice the load and significantly smaller real storage
using a lobal LRU replacment strategy was getting about the same
performance as the Grenoble "working set dispatcher" and local LRU
(with half the users and 50 percent more effective paging storage).

In any case, all the backup material showing local LRU significantly
outperforming global LRU on directly compareable hardware, software,
and workload help tip the balance in getting the Phd approved.

Not including in the comparison ... but about that time in the early
70s, I was also playing with some coding tricks with global LRU
implementation. Normally any sort of LRU-like implementation
effectively degrades to FIFO when there isn't sufficient information
to otherwise distinguish reference patterns between pages (except in
some rare cases, FIFO isn't a particularly good replacement strategy).
I had this slight of hand coding tricks for global LRU ... which
continued to look, smell and taste like standard global LRU
replacement ... but had the unusual characteristic of degradding to
random replacement (instead of FIFO ... in lots of detailed simulation
studies it was shown to out-perform true LRU across wide-variety of
conditions ... as compared to the LRU-approximation implementations
which strived just to be nearly as good as true LRU).

lots of past replacement algorithm postings
http://www.garlic.com/~lynn/subtopic.html#wsclock

some specific past references to the thesis (as well as grenoble paper):
http://www.garlic.com/~lynn/93.html#4 360/67, was Re: IBM's Project F/S ?
http://www.garlic.com/~lynn/94.html#1 Multitasking question
http://www.garlic.com/~lynn/99.html#18 Old Computers
http://www.garlic.com/~lynn/2001c.html#10 Memory management - Page replacement
http://www.garlic.com/~lynn/2002c.html#49 Swapper was Re: History of Login Names

for some topic drift mention of jim leaving sjr and going to tandem:
http://www.garlic.com/~lynn/2002k.html#39 Vnet : Unbelievable
http://www.garlic.com/~lynn/2002o.html#73 They Got Mail: Not-So-Fond Farewells
http://www.garlic.com/~lynn/2002o.html#75 They Got Mail: Not-So-Fond Farewells
http://www.garlic.com/~lynn/2004c.html#15 If there had been no MS-DOS
http://www.garlic.com/~lynn/2004l.html#31 Shipwrecks

some of the cambridge detailed trace and simulation work was
eventually released as a product called vs/repack in the mid-70s (i.e.
took detailed trace of application and attempted semi-reorg of large
application to minimize paging). it was also used by a number of
corporate products that were making the transition from real stroage
environment to virtual memory (compilers, database managers, etc) ..
as well as starting to study cache-sensitivity issues ...

random past vs/repack references:
http://www.garlic.com/~lynn/94.html#7 IBM 7090 (360s, 370s, apl, etc)
http://www.garlic.com/~lynn/99.html#68 The Melissa Virus or War on Microsoft?
http://www.garlic.com/~lynn/2000g.html#30 Could CDR-coding be on the way back?
http://www.garlic.com/~lynn/2001b.html#83 Z/90, S/390, 370/ESA (slightly off topic)
http://www.garlic.com/~lynn/2001c.html#31 database (or b-tree) page sizes
http://www.garlic.com/~lynn/2001c.html#33 database (or b-tree) page sizes
http://www.garlic.com/~lynn/2001i.html#20 Very CISC Instuctions (Was: why the machine word size ...)
http://www.garlic.com/~lynn/2002c.html#28 OS Workloads : Interactive etc
http://www.garlic.com/~lynn/2002c.html#45 cp/67 addenda (cross-post warning)
http://www.garlic.com/~lynn/2002c.html#46 cp/67 addenda (cross-post warning)
http://www.garlic.com/~lynn/2002c.html#49 Swapper was Re: History of Login Names
http://www.garlic.com/~lynn/2002e.html#50 IBM going after Strobe?
http://www.garlic.com/~lynn/2002f.html#50 Blade architectures
http://www.garlic.com/~lynn/2003f.html#15 Alpha performance, why?
http://www.garlic.com/~lynn/2003f.html#21 "Super-Cheap" Supercomputing
http://www.garlic.com/~lynn/2003f.html#53 Alpha performance, why?
http://www.garlic.com/~lynn/2003g.html#15 Disk capacity and backup solutions
http://www.garlic.com/~lynn/2003h.html#8 IBM says AMD dead in 5yrs ... -- Microsoft Monopoly vs. IBM
http://www.garlic.com/~lynn/2003j.html#32 Language semantics wrt exploits
http://www.garlic.com/~lynn/2004c.html#21 PSW Sampling
http://www.garlic.com/~lynn/2004.html#14 Holee shit! 30 years ago!
http://www.garlic.com/~lynn/2004m.html#22 Lock-free algorithms
http://www.garlic.com/~lynn/2004n.html#55 Integer types for 128-bit addressing
http://www.garlic.com/~lynn/2004o.html#7 Integer types for 128-bit addressing




--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB