ultrasparc IIe question
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
ultrasparc IIe question

 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Joe Seigh
Guest





Posted: Tue Aug 23, 2005 4:15 pm    Post subject: ultrasparc IIe question Reply with quote

I'm doing a hazard pointer implementation on sparc and also
have an implementation with memory barriers to try to quantify
the benefit of not having them. Code looks like

__asm__ __volatile__ (
"ld [%3], %0 ;\n" // load src ptr
"0:\t"
"membar #LoadStore | #StoreStore ;\n" // release membar
"st %0, [%2] ;\n" // store hazard ptr[0]
"membar #StoreLoad ;\n" // store/load membar
"ld [%3], %1 ;\n" // reload src ptr
"cmp %1, %0 ;\n" // check if changed
"bne,a,pn %%icc,0b ;\n" // retry if changed
"mov %1, %0 ;\n"
"st %0, [%2 + 4] ;\n" // store hazard ptr[1]
: "=&r" (ret), "=&r" (_tmp)
: "r" (hptr), "r" (src)
: "cc", "memory"
);

Other version is same except without the memory barriers.

Simplistic performance measurement on a 500 mhz ultrasparc IIe
show about a 2x performance benefit. On an intel P3 and on
a G4 powerpc the benefit of hazard pointers w/o membars is
8x to 20x.

Why the big difference? Is it sparc memory barriers are
really efficient? Or the sparc pipeline isn't that deep,
so not as big an effect? Or something else?

Also it doesn't make any difference AFAICT whether branch
prediction or anullment is used. These aren't implemented
on IIe? Is there a potential performance hit from using
anullment? In this code it doesn't matter if the delay
slot instruction is executed on fall through.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Seongbae Park
Guest





Posted: Tue Aug 23, 2005 8:02 pm    Post subject: Re: ultrasparc IIe question Reply with quote

Joe Seigh <jseigh_01@xemaps.com> wrote:
Quote:
I'm doing a hazard pointer implementation on sparc and also
have an implementation with memory barriers to try to quantify
the benefit of not having them. Code looks like

__asm__ __volatile__ (
"ld [%3], %0 ;\n" // load src ptr
"0:\t"
"membar #LoadStore | #StoreStore ;\n" // release membar
"st %0, [%2] ;\n" // store hazard ptr[0]
"membar #StoreLoad ;\n" // store/load membar
"ld [%3], %1 ;\n" // reload src ptr
"cmp %1, %0 ;\n" // check if changed
"bne,a,pn %%icc,0b ;\n" // retry if changed
"mov %1, %0 ;\n"
"st %0, [%2 + 4] ;\n" // store hazard ptr[1]
: "=&r" (ret), "=&r" (_tmp)
: "r" (hptr), "r" (src)
: "cc", "memory"
);

Other version is same except without the memory barriers.

Simplistic performance measurement on a 500 mhz ultrasparc IIe
show about a 2x performance benefit. On an intel P3 and on
a G4 powerpc the benefit of hazard pointers w/o membars is
8x to 20x.

First of all,
#loadstore and #storestore are effectively nop
on any SPARC processor running in TSO mode.
So the first membar is unnecessary
as long as you're running in TSO mode
(which you are, since practically[1] there's no SPARC processor
that runs in RMO mode and Solaris doesn't support
running user programs in RMO mode yet).

Secondly,
US-IIe doesn't support multiprocessor
and hence does not really need #storeload either
since the processor is always self-consistent [2].

Quote:
Why the big difference? Is it sparc memory barriers are
really efficient? Or the sparc pipeline isn't that deep,
so not as big an effect? Or something else?

I don't know - because I don't know what's the overhead
of memory barrier in those other platforms.
IIRC, PowerPC is by default running in
what's equivalent of RMO mode in SPARC.
Not sure about P3 though.
Regardless I'm fairly sure you've picked the wrong SPARC processor
to compare the memory barrier performance :)
US-III or IV should be more interesting.

Another factor that favors USIIe is that it has a builtin memory controller
- although I don't think this really matters
since all memory consistency is maintained at L2 level IIRC.

Quote:
Also it doesn't make any difference AFAICT whether branch
prediction or anullment is used. These aren't implemented
on IIe?

They are.

Quote:
Is there a potential performance hit from using anullment?

I don't think so.

Quote:
In this code it doesn't matter if the delay
slot instruction is executed on fall through.

Right.

[1] I've heard that there's been some implementation that supported PSO
but it was before my time and I don't think that platform was sold
in any significant numbers. So *practically* there's no SPARC system
running non-TSO.

[2] It still needs some membars for synchronizing
between memory access and IO access.

PS. Follow-up set to comp.arch.
--
#pragma ident "Seongbae Park, compiler, http://blogs.sun.com/seongbae/"
Back to top
Seongbae Park
Guest





Posted: Wed Aug 24, 2005 5:13 pm    Post subject: Re: ultrasparc IIe question Reply with quote

Joe Seigh <jseigh_01@xemaps.com> wrote:
....
Quote:
But I don't have access to any of those processors.
And there's no point in doing development on a processor that
doesn't give me any idea of program behavior on other processors
in that processor family.
So no memory barrier free hazard pointers on Sparc for now.

You can test neither correctness nor performance of memory model
aspect of your program on a single processor system.
To do any of that, you need a multiprocessor system.
Depending on multi-core implementations,
you may even need multi-chip systems.

Quote:
Whatever it is, I don't think it's sufficient
to just have the kernel scale well in a multi-core environment.

True.

Quote:
Not unless you think there's a killer market in running lots
of essentially single threaded apps.

I don't completely agree with what this statement implies.
Commercial applications that run on multiprocessor systems
already use some techniques to eliminate or reduce
the synchronization (as in mutex) overhead.
They will continue to run well on a multi-core environment
(or at least their performance won't be degraded
by moving from a single-core multi-chip system
to a multi-core multi-chip system).
Whether they can exploit the full potential of multicore systems
is a different question.

I believe contemporary ISAs are not providing what it can
to help implementing mutex-free algorithms and datastructures
which will become more important
as the performance improvement of single core slows down
while core count per chip increases,
forcing more programs become multithreaded.
--
#pragma ident "Seongbae Park, compiler, http://blogs.sun.com/seongbae/"
Back to top
Joe Seigh
Guest





Posted: Wed Aug 24, 2005 9:24 pm    Post subject: Re: ultrasparc IIe question Reply with quote

Seongbae Park wrote:
Quote:
Joe Seigh <jseigh_01@xemaps.com> wrote:

I'm doing a hazard pointer implementation on sparc and also
have an implementation with memory barriers to try to quantify
the benefit of not having them. Code looks like
[...]

Other version is same except without the memory barriers.

Simplistic performance measurement on a 500 mhz ultrasparc IIe
show about a 2x performance benefit. On an intel P3 and on
a G4 powerpc the benefit of hazard pointers w/o membars is
8x to 20x.

Why the big difference? Is it sparc memory barriers are
really efficient? Or the sparc pipeline isn't that deep,
so not as big an effect? Or something else?


I don't know - because I don't know what's the overhead
of memory barrier in those other platforms.
IIRC, PowerPC is by default running in
what's equivalent of RMO mode in SPARC.
Not sure about P3 though.
Regardless I'm fairly sure you've picked the wrong SPARC processor
to compare the memory barrier performance :)
US-III or IV should be more interesting.

Or Niagara. But I don't have access to any of those processors.
And there's no point in doing development on a processor that
doesn't give me any idea of program behavior on other processors
in that processor family. So no memory barrier free hazard
pointers on Sparc for now. Which is good since I need the
space that the SB100 is taking up. :)

I think I'm going to halt further development on Intel/AMD
processors until it becomes more clear what the various hw
vendors think the programming model will be for multi-core
processors. Whatever it is, I don't think it's sufficient
to just have the kernel scale well in a multi-core environment.
Not unless you think there's a killer market in running lots
of essentially single threaded apps.



--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
Joe Seigh
Guest





Posted: Wed Aug 24, 2005 11:34 pm    Post subject: Re: ultrasparc IIe question Reply with quote

Seongbae Park wrote:
Quote:
Joe Seigh <jseigh_01@xemaps.com> wrote:
...

But I don't have access to any of those processors.
And there's no point in doing development on a processor that
doesn't give me any idea of program behavior on other processors
in that processor family.
So no memory barrier free hazard pointers on Sparc for now.


You can test neither correctness nor performance of memory model
aspect of your program on a single processor system.
To do any of that, you need a multiprocessor system.
Depending on multi-core implementations,
you may even need multi-chip systems.

That's not really true for reader lock-free. The only thing that
you need multiprocessor systems for is to show how badly mutex
or rwlock based synchronization will tank in comparison.

Quote:


Whatever it is, I don't think it's sufficient
to just have the kernel scale well in a multi-core environment.


True.


Not unless you think there's a killer market in running lots
of essentially single threaded apps.


I don't completely agree with what this statement implies.
Commercial applications that run on multiprocessor systems
already use some techniques to eliminate or reduce
the synchronization (as in mutex) overhead.
They will continue to run well on a multi-core environment
(or at least their performance won't be degraded
by moving from a single-core multi-chip system
to a multi-core multi-chip system).
Whether they can exploit the full potential of multicore systems
is a different question.

I believe contemporary ISAs are not providing what it can
to help implementing mutex-free algorithms and datastructures
which will become more important
as the performance improvement of single core slows down
while core count per chip increases,
forcing more programs become multithreaded.

I work (or was working) with lock-free algorithms. There's some
interesting stuff you can do now even with the existing ISAs. The
problem isn't do we need to add more support to the ISA, it's
keeping what we have. It will be very ironic if the hw vendors
eliminate support for lock-free when they go to higher core counts.
Something that can very easily happen if they're not aware of what
is going on in the lock-free area.

That's not to say there aren't some interesting things you could
add to the ISA but if the hw vendors don't recognise what is being
done now, they're certainly not going to be very receptive to
suggestions in that area.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB