IBM's POWER6
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
IBM's POWER6
Goto page 1, 2  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
David Kanter
Guest





Posted: Tue Dec 20, 2005 1:15 am    Post subject: IBM's POWER6 Reply with quote

I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.

The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634


Enjoy,


David Kanter
Back to top
Del Cecchi
Guest





Posted: Tue Dec 20, 2005 8:14 am    Post subject: Re: IBM's POWER6 Reply with quote

"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
Quote:
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.

The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634


Enjoy,


David Kanter

link doesn't work, at the moment.
Back to top
David Kanter
Guest





Posted: Tue Dec 20, 2005 9:15 am    Post subject: Re: IBM's POWER6 Reply with quote

Quote:

link doesn't work, at the moment.

It should work now, if it doesn't email me...also, are you at work or
at home?

DK
Back to top
Jan-Frode Myklebust
Guest





Posted: Tue Dec 20, 2005 5:15 pm    Post subject: Re: IBM's POWER6 Reply with quote

Quote:
It should work now, if it doesn't email me...also, are you at work or
at home?

It seems the links doesn't work unless you've visited the front
page of realworldtech, and gotten a "SECTION" variable defined. So,
either click the link for "An eCLipz Looms on the Horizon" on the frontpage,
or go directly to the "print" link:

http://www.realworldtech.com/includes/templates/articles.cfm?ArticleID=RWT121905001634&mode=print


-jf
Back to top
Del Cecchi
Guest





Posted: Tue Dec 20, 2005 5:15 pm    Post subject: Re: IBM's POWER6 Reply with quote

"David Kanter" <dkanter@gmail.com> wrote in message
news:1135061324.303757.22480@g47g2000cwa.googlegroups.com...
Quote:

link doesn't work, at the moment.

It should work now, if it doesn't email me...also, are you at work or
at home?

DK

I was at home. And if I went to the home page and clicked the link I got

there. Interesting article. I didn't get to the end, but it isn't all
Poughkeepsie. Boeblingen also involved.

del
Back to top
David Kanter
Guest





Posted: Wed Dec 21, 2005 1:15 am    Post subject: Re: IBM's POWER6 Reply with quote

Hi Jan,

This problem has been fixed.

DK
Back to top
Stephen Fuld
Guest





Posted: Wed Dec 21, 2005 9:15 am    Post subject: Re: IBM's POWER6 Reply with quote

"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
Quote:
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.

The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634


Enjoy,

Thanks, David. I did enjoy it.

One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code. But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine as
the target, didn't seem to have. Specifically, it could add logic similar
to what Intel does, and on the fly "decode"/ translate the Z series code to
Power code, perhaps with the addition of several otherwise unused "special"
power instructions to aid performance. Didn't one of the early AMD pentium
compatible chips actually translate into 29K instructions? Or, they could
do something like what ARM is doing with their jazelle technology to almost
directly execute java byte code. It directly executes some instructions by
translating them "on-the-fly" into ARM instructions (for the simple ones),
and has some kind of "escape" mechanism" to go to a routine for interpretive
execution of the complex ones. Of course, since IBM controls the compilers,
it could have a version that "knew" what instructions were executed directly
and preferentially generate code for them for higher performance on the new
systems (again, something Transmeta couldn't do.).

Do either of these make sense as a potential for IBM? I would guess that if
they did, it would produce a higher performance product than they would get
with a software JIT system.

--
- Stephen Fuld
e-mail address disguised to prevent spam
Back to top
David Kanter
Guest





Posted: Wed Dec 21, 2005 5:15 pm    Post subject: Re: IBM's POWER6 Reply with quote

Stephen Fuld wrote:
Quote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.

The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634


Enjoy,

Thanks, David. I did enjoy it.

Glad to hear that.

Quote:
One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code.

Yes, that seems to be the most probable.

Quote:
But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine as
the target, didn't seem to have. Specifically, it could add logic similar
to what Intel does, and on the fly "decode"/ translate the Z series code to
Power code, perhaps with the addition of several otherwise unused "special"
power instructions to aid performance.

That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them. So, you would be further complicating the decoders
by adding support for zArch binaries; zArch is definitely CISC and
would make things worse. Moreover, that's a feature in hardware which
will not be needed by the iSeries and pSeries. There are perhaps 10K
mainframe users worldwide, probably 200-700K iSeries users and A LOT of
pSeries users. Why would they go and complicate the hardware that
powers most IBM servers, just for the sake of the 10K mainframe users?
Note that IBM can control BT, they would have to use some sort of eFuse
to disable the extra decode for a hardware alternative.

Quote:
Didn't one of the early AMD pentium
compatible chips actually translate into 29K instructions?

I don't know, but that sounds like a distinctly not so smart idea. It
probably wasn't for the K6, which was AMD's first moderately
competitive chip.

Quote:
Or, they could
do something like what ARM is doing with their jazelle technology to almost
directly execute java byte code. It directly executes some instructions by
translating them "on-the-fly" into ARM instructions (for the simple ones),
and has some kind of "escape" mechanism" to go to a routine for interpretive
execution of the complex ones. Of course, since IBM controls the compilers,
it could have a version that "knew" what instructions were executed directly
and preferentially generate code for them for higher performance on the new
systems (again, something Transmeta couldn't do.).

Do either of these make sense as a potential for IBM? I would guess that if
they did, it would produce a higher performance product than they would get
with a software JIT system.

IBM has done a lot of work on JIT technology. Certainly, their
compilers will help people produce new code that is more efficient for
translation, but IBM will not require that users recompile their code.

DK
Back to top
Anne & Lynn Wheeler
Guest





Posted: Wed Dec 21, 2005 5:15 pm    Post subject: Re: IBM's POWER6 Reply with quote

"Stephen Fuld" <s.fuld@PleaseRemove.att.net> writes:
Quote:
One comment. You talk about using technology similar to Transmeta's
JIT mechanism to translate Z series code to Power code. But IBM has
some freedom that apparently Transmeta, with its restriction of a
vliw machine as the target, didn't seem to have. Specifically, it
could add logic similar to what Intel does, and on the fly "decode"/
translate the Z series code to Power code, perhaps with the addition
of several otherwise unused "special" power instructions to aid
performance. Didn't one of the early AMD pentium compatible chips
actually translate into 29K instructions? Or, they could do
something like what ARM is doing with their jazelle technology to
almost directly execute java byte code. It directly executes some
instructions by translating them "on-the-fly" into ARM instructions
(for the simple ones), and has some kind of "escape" mechanism" to
go to a routine for interpretive execution of the complex ones. Of
course, since IBM controls the compilers, it could have a version
that "knew" what instructions were executed directly and
preferentially generate code for them for higher performance on the
new systems (again, something Transmeta couldn't do.).

Do either of these make sense as a potential for IBM? I would guess
that if they did, it would produce a higher performance product than
they would get with a software JIT system.

note, there was a group looking at this during fort knox time-frame
.... 1980. there were huge number of microprocessors inside the
company, used for controllers, devices, low & mid-range 370s,
s38/as400, etc. the proposal was to move all of these to 801.

low & mid-range 370s were microprocessors of various kinds with 370
implemented as microcode. these machines avg. about 10 microcode
instructions per 370 instruction. we had taken advantage of this for
ecps ... which migrated 6k of high-use kernel code into microcode.
the migrated code about a 10:1 speedup (originally for 148 and then on
4341). i helped with an analysis that killed the 801 use for the 4341
followon. the alternative was that chips were advancing to the point
where you could get much of 370 instructions directly in silicon
.... which was faster than using 801 to emulate 370.

at the time, there was both a jit project for 370->801 ... and sort of
a more advanced version of ecps ... where portions of 370 code would
be recompiled to 801. i got involved because in the early 70s, i had
written a pli program that analyzed 370 assembler listings in various
ways, including generating a high-level language abstraction of what
the assembler code was doing (also detailed control flow, register
useage, etc).

the project using 801 for 4341 followon was canceled ... and the 4381
was much more of a native silicon implemention.

a 801 project that did survive was ROMP ... a joint research, office
products project to use 801/romp for a displaywriter followon. when
that got killed, it was decided to retarget the machine to the unix
workstation market ... hiring the company that had done the pc/ix port
(for the ibm/pc) to do one that came to be called aix. the romp
followon was rios/power.

misc. 801, romp, rios, fort knok, etc collected posts
http://www.garlic.com/~lynn/subtopic.html#801

recent postings in similar thread in mainframe n.g.
http://www.garlic.com/~lynn/2005u.html#40 POWER6 on zSeries?
http://www.garlic.com/~lynn/2005u.html#43 POWER6 on zSeries?
http://www.garlic.com/~lynn/2005u.html#44 POWER6 on zSeries?

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Back to top
Stephen Fuld
Guest





Posted: Wed Dec 21, 2005 11:30 pm    Post subject: Re: IBM's POWER6 Reply with quote

"David Kanter" <dkanter@gmail.com> wrote in message
news:1135184733.328035.126630@z14g2000cwz.googlegroups.com...
Quote:

Stephen Fuld wrote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.

The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634


Enjoy,

Thanks, David. I did enjoy it.

Glad to hear that.

One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code.

Yes, that seems to be the most probable.

But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine
as
the target, didn't seem to have. Specifically, it could add logic
similar
to what Intel does, and on the fly "decode"/ translate the Z series code
to
Power code, perhaps with the addition of several otherwise unused
"special"
power instructions to aid performance.

That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.

I didn't realize that (except for the one case of auto-increment). In that
case, one would presumably devode Z series instructions directly into those
micro-ops, not into power code to be decoded again.

Quote:
So, you would be further complicating the decoders
by adding support for zArch binaries;

Yes, I agree. However, a decoder is a lot less complex than an entire CPU,
so they still would save a lot of design and support effort. but gain
higher performance. It is essentially a "middle way". But if they don't
need the performance, then it probably isn't worth it.


Quote:
zArch is definitely CISC and
would make things worse. Moreover, that's a feature in hardware which
will not be needed by the iSeries and pSeries. There are perhaps 10K
mainframe users worldwide, probably 200-700K iSeries users and A LOT of
pSeries users. Why would they go and complicate the hardware that
powers most IBM servers, just for the sake of the 10K mainframe users?

The question is whether it would be worth the extra performance gained for
those very profitable 10K users to add a modest amount of silicon. You are
saying that it seems not. I can certainly accept that.



Quote:
Note that IBM can control BT, they would have to use some sort of eFuse
to disable the extra decode for a hardware alternative.

Didn't one of the early AMD pentium
compatible chips actually translate into 29K instructions?

I don't know, but that sounds like a distinctly not so smart idea. It
probably wasn't for the K6, which was AMD's first moderately
competitive chip.

Or, they could
do something like what ARM is doing with their jazelle technology to
almost
directly execute java byte code. It directly executes some instructions
by
translating them "on-the-fly" into ARM instructions (for the simple
ones),
and has some kind of "escape" mechanism" to go to a routine for
interpretive
execution of the complex ones. Of course, since IBM controls the
compilers,
it could have a version that "knew" what instructions were executed
directly
and preferentially generate code for them for higher performance on the
new
systems (again, something Transmeta couldn't do.).

Do either of these make sense as a potential for IBM? I would guess that
if
they did, it would produce a higher performance product than they would
get
with a software JIT system.

IBM has done a lot of work on JIT technology. Certainly, their
compilers will help people produce new code that is more efficient for
translation, but IBM will not require that users recompile their code.

No, of course not. Just that they would get a performance benefit if they
did so. But old load modules would certainly still work unchanged.

--
- Stephen Fuld
e-mail address disguised to prevent spam
Back to top
David Kanter
Guest





Posted: Wed Dec 21, 2005 11:38 pm    Post subject: Re: IBM's POWER6 Reply with quote

Quote:
That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.

I didn't realize that (except for the one case of auto-increment). In that
case, one would presumably devode Z series instructions directly into those
micro-ops, not into power code to be decoded again.

Yes, if you were doing hardware decode it would go straight to uops.

Quote:
So, you would be further complicating the decoders
by adding support for zArch binaries;

Yes, I agree. However, a decoder is a lot less complex than an entire CPU,
so they still would save a lot of design and support effort. but gain
higher performance. It is essentially a "middle way". But if they don't
need the performance, then it probably isn't worth it.

zArch is definitely CISC and
would make things worse. Moreover, that's a feature in hardware which
will not be needed by the iSeries and pSeries. There are perhaps 10K
mainframe users worldwide, probably 200-700K iSeries users and A LOT of
pSeries users. Why would they go and complicate the hardware that
powers most IBM servers, just for the sake of the 10K mainframe users?

The question is whether it would be worth the extra performance gained for
those very profitable 10K users to add a modest amount of silicon. You are
saying that it seems not. I can certainly accept that.

Yea, I don't think the benefits support the cost. To be honest, I
haven't heard of any MPU that can decode instructions from two ISAs in
anything resembling an efficient fashion. That's an awful lot of
baggage to be carrying around...although perhaps adding certain
instructions to assist BT would make sense.

Quote:
IBM has done a lot of work on JIT technology. Certainly, their
compilers will help people produce new code that is more efficient for
translation, but IBM will not require that users recompile their code.

No, of course not. Just that they would get a performance benefit if they
did so. But old load modules would certainly still work unchanged.

I'd be interested to see what the performance increase is versus the
performance cost (probably in clockspeed) for UNIX workloads.

DK
Back to top
KR Williams
Guest





Posted: Wed Dec 21, 2005 11:45 pm    Post subject: Re: IBM's POWER6 Reply with quote

In article <Zigqf.178005$qk4.82127@bgtnsc05-
news.ops.worldnet.att.net>, s.fuld@PleaseRemove.att.net says...
Quote:

"David Kanter" <dkanter@gmail.com> wrote in message
news:1135184733.328035.126630@z14g2000cwz.googlegroups.com...

Stephen Fuld wrote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.

The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634


Enjoy,

Thanks, David. I did enjoy it.

Glad to hear that.

One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code.

Yes, that seems to be the most probable.

But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine
as
the target, didn't seem to have. Specifically, it could add logic
similar
to what Intel does, and on the fly "decode"/ translate the Z series code
to
Power code, perhaps with the addition of several otherwise unused
"special"
power instructions to aid performance.

That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.

I didn't realize that (except for the one case of auto-increment). In that
case, one would presumably devode Z series instructions directly into those
micro-ops, not into power code to be decoded again.

Only the more complicated ops are "cracked" or "microcoded" (things
like load/store string). Most ops are translated 1:1.

--
Keith
Back to top
Iain McClatchie
Guest





Posted: Thu Dec 22, 2005 1:15 am    Post subject: Re: IBM's POWER6 Reply with quote

DK> zArch is definitely CISC and would make things worse.

zArch has cruft, which requires microcode escapes or a JIT escape.
But the core zArch instruction set is pretty reasonable, and actually
looks (to me) a lot like the x86-64:

16 64-bit GPRs
RX instructions: Rx <- Rx op MEM(Ry + Rz + constant)
variable-length instructions

The costly thing about accelerating zArch stuff is handling all those
memory models in hardware. Actually, the costly thing is *verifying*
all those memory models. I'd guess that the decoder, while taking up
more than a few mm^2, doesn't add tremendously to the cost of the
machine. Note that IBM isn't going to be selling these die to Apple.
A little extra cost per die for the pArch and iArch folks is worth
saving tens of millions of dollars for the zArch development.

Fast, variable-sized-instruction, variable-microops-emitted, escapes-
to-microcode decoders are a hard but solved problem these days.
AMD and Intel have had them for a decade. I'd guess that Power6's
decoder probably looks at 128 bits and can emit 4-8 microops per
cycle.

DK> I haven't heard of any MPU that can decode instructions from
DK> two ISAs in anything resembling an efficient fashion.

ARM/Thumb?
x86/x86-64?
Back to top
Stephen Fuld
Guest





Posted: Thu Dec 22, 2005 1:15 am    Post subject: Re: IBM's POWER6 Reply with quote

"KR Williams" <krw@att.bizzzz> wrote in message
news:MPG.1e135ea77b833bd898976c@News.Individual.NET...
Quote:
In article <Zigqf.178005$qk4.82127@bgtnsc05-
news.ops.worldnet.att.net>, s.fuld@PleaseRemove.att.net says...

"David Kanter" <dkanter@gmail.com> wrote in message
news:1135184733.328035.126630@z14g2000cwz.googlegroups.com...

Stephen Fuld wrote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project.
I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.

The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634


Enjoy,

Thanks, David. I did enjoy it.

Glad to hear that.

One comment. You talk about using technology similar to Transmeta's
JIT
mechanism to translate Z series code to Power code.

Yes, that seems to be the most probable.

But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw
machine
as
the target, didn't seem to have. Specifically, it could add logic
similar
to what Intel does, and on the fly "decode"/ translate the Z series
code
to
Power code, perhaps with the addition of several otherwise unused
"special"
power instructions to aid performance.

That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.

I didn't realize that (except for the one case of auto-increment). In
that
case, one would presumably devode Z series instructions directly into
those
micro-ops, not into power code to be decoded again.

Only the more complicated ops are "cracked" or "microcoded" (things
like load/store string). Most ops are translated 1:1.

Yes, that is what I thought (I forgot about the string ops). Thanks, Keith

--
- Stephen Fuld
e-mail address disguised to prevent spam
Back to top
David Kanter
Guest





Posted: Thu Dec 22, 2005 3:19 pm    Post subject: Re: IBM's POWER6 Reply with quote

Quote:
DK> zArch is definitely CISC and would make things worse.

zArch has cruft, which requires microcode escapes or a JIT escape.
But the core zArch instruction set is pretty reasonable, and actually
looks (to me) a lot like the x86-64:

16 64-bit GPRs
RX instructions: Rx <- Rx op MEM(Ry + Rz + constant)
variable-length instructions

Yes, but it has a lot of awkward stuff in there as well, hexFP for one.
Admittedly, it is cleaner than x86.

Quote:
The costly thing about accelerating zArch stuff is handling all those
memory models in hardware. Actually, the costly thing is *verifying*
all those memory models. I'd guess that the decoder, while taking up
more than a few mm^2, doesn't add tremendously to the cost of the
machine.

The question is whether it may or may not be in the critical path, and
the verification issues you mentioned.

Quote:
Note that IBM isn't going to be selling these die to Apple.
A little extra cost per die for the pArch and iArch folks is worth
saving tens of millions of dollars for the zArch development.

I'm not convinced that doing decode rather than a JIT will save ~$10M.
They already have research projects to do just that, although they are
not productized or tested.

Quote:
Fast, variable-sized-instruction, variable-microops-emitted, escapes-
to-microcode decoders are a hard but solved problem these days.
AMD and Intel have had them for a decade. I'd guess that Power6's
decoder probably looks at 128 bits and can emit 4-8 microops per
cycle.

You really think they did it in hardware?

Quote:
DK> I haven't heard of any MPU that can decode instructions from
DK> two ISAs in anything resembling an efficient fashion.

ARM/Thumb?
x86/x86-64?

Sorry, I meant to say two ISAs of different families (say PA-RISC and
Alpha, ARM & x86, etc.). Although, I'm not familiar enough with
ARM/Thumb to discuss how different they are, but x86 vs. x86-64 doesn't
seem like it is a huge stretch.

DK
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB