| Author |
Message |
David Kanter
Guest
|
Posted:
Tue Dec 20, 2005 1:15 am Post subject:
IBM's POWER6 |
|
|
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634
Enjoy,
David Kanter |
|
| Back to top |
|
 |
Del Cecchi
Guest
|
Posted:
Tue Dec 20, 2005 8:14 am Post subject:
Re: IBM's POWER6 |
|
|
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
| Quote: | I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634
Enjoy,
David Kanter
link doesn't work, at the moment. |
|
|
| Back to top |
|
 |
David Kanter
Guest
|
Posted:
Tue Dec 20, 2005 9:15 am Post subject:
Re: IBM's POWER6 |
|
|
| Quote: |
link doesn't work, at the moment.
|
It should work now, if it doesn't email me...also, are you at work or
at home?
DK |
|
| Back to top |
|
 |
Jan-Frode Myklebust
Guest
|
Posted:
Tue Dec 20, 2005 5:15 pm Post subject:
Re: IBM's POWER6 |
|
|
| Quote: | It should work now, if it doesn't email me...also, are you at work or
at home?
|
It seems the links doesn't work unless you've visited the front
page of realworldtech, and gotten a "SECTION" variable defined. So,
either click the link for "An eCLipz Looms on the Horizon" on the frontpage,
or go directly to the "print" link:
http://www.realworldtech.com/includes/templates/articles.cfm?ArticleID=RWT121905001634&mode=print
-jf |
|
| Back to top |
|
 |
Del Cecchi
Guest
|
Posted:
Tue Dec 20, 2005 5:15 pm Post subject:
Re: IBM's POWER6 |
|
|
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135061324.303757.22480@g47g2000cwa.googlegroups.com...
| Quote: |
link doesn't work, at the moment.
It should work now, if it doesn't email me...also, are you at work or
at home?
DK
I was at home. And if I went to the home page and clicked the link I got |
there. Interesting article. I didn't get to the end, but it isn't all
Poughkeepsie. Boeblingen also involved.
del |
|
| Back to top |
|
 |
David Kanter
Guest
|
Posted:
Wed Dec 21, 2005 1:15 am Post subject:
Re: IBM's POWER6 |
|
|
Hi Jan,
This problem has been fixed.
DK |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Wed Dec 21, 2005 9:15 am Post subject:
Re: IBM's POWER6 |
|
|
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
| Quote: | I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634
Enjoy,
|
Thanks, David. I did enjoy it.
One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code. But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine as
the target, didn't seem to have. Specifically, it could add logic similar
to what Intel does, and on the fly "decode"/ translate the Z series code to
Power code, perhaps with the addition of several otherwise unused "special"
power instructions to aid performance. Didn't one of the early AMD pentium
compatible chips actually translate into 29K instructions? Or, they could
do something like what ARM is doing with their jazelle technology to almost
directly execute java byte code. It directly executes some instructions by
translating them "on-the-fly" into ARM instructions (for the simple ones),
and has some kind of "escape" mechanism" to go to a routine for interpretive
execution of the complex ones. Of course, since IBM controls the compilers,
it could have a version that "knew" what instructions were executed directly
and preferentially generate code for them for higher performance on the new
systems (again, something Transmeta couldn't do.).
Do either of these make sense as a potential for IBM? I would guess that if
they did, it would produce a higher performance product than they would get
with a software JIT system.
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
David Kanter
Guest
|
Posted:
Wed Dec 21, 2005 5:15 pm Post subject:
Re: IBM's POWER6 |
|
|
Stephen Fuld wrote:
| Quote: | "David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634
Enjoy,
Thanks, David. I did enjoy it.
|
Glad to hear that.
| Quote: | One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code.
|
Yes, that seems to be the most probable.
| Quote: | But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine as
the target, didn't seem to have. Specifically, it could add logic similar
to what Intel does, and on the fly "decode"/ translate the Z series code to
Power code, perhaps with the addition of several otherwise unused "special"
power instructions to aid performance.
|
That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them. So, you would be further complicating the decoders
by adding support for zArch binaries; zArch is definitely CISC and
would make things worse. Moreover, that's a feature in hardware which
will not be needed by the iSeries and pSeries. There are perhaps 10K
mainframe users worldwide, probably 200-700K iSeries users and A LOT of
pSeries users. Why would they go and complicate the hardware that
powers most IBM servers, just for the sake of the 10K mainframe users?
Note that IBM can control BT, they would have to use some sort of eFuse
to disable the extra decode for a hardware alternative.
| Quote: | Didn't one of the early AMD pentium
compatible chips actually translate into 29K instructions?
|
I don't know, but that sounds like a distinctly not so smart idea. It
probably wasn't for the K6, which was AMD's first moderately
competitive chip.
| Quote: | Or, they could
do something like what ARM is doing with their jazelle technology to almost
directly execute java byte code. It directly executes some instructions by
translating them "on-the-fly" into ARM instructions (for the simple ones),
and has some kind of "escape" mechanism" to go to a routine for interpretive
execution of the complex ones. Of course, since IBM controls the compilers,
it could have a version that "knew" what instructions were executed directly
and preferentially generate code for them for higher performance on the new
systems (again, something Transmeta couldn't do.).
Do either of these make sense as a potential for IBM? I would guess that if
they did, it would produce a higher performance product than they would get
with a software JIT system.
|
IBM has done a lot of work on JIT technology. Certainly, their
compilers will help people produce new code that is more efficient for
translation, but IBM will not require that users recompile their code.
DK |
|
| Back to top |
|
 |
Anne & Lynn Wheeler
Guest
|
Posted:
Wed Dec 21, 2005 5:15 pm Post subject:
Re: IBM's POWER6 |
|
|
"Stephen Fuld" <s.fuld@PleaseRemove.att.net> writes:
| Quote: | One comment. You talk about using technology similar to Transmeta's
JIT mechanism to translate Z series code to Power code. But IBM has
some freedom that apparently Transmeta, with its restriction of a
vliw machine as the target, didn't seem to have. Specifically, it
could add logic similar to what Intel does, and on the fly "decode"/
translate the Z series code to Power code, perhaps with the addition
of several otherwise unused "special" power instructions to aid
performance. Didn't one of the early AMD pentium compatible chips
actually translate into 29K instructions? Or, they could do
something like what ARM is doing with their jazelle technology to
almost directly execute java byte code. It directly executes some
instructions by translating them "on-the-fly" into ARM instructions
(for the simple ones), and has some kind of "escape" mechanism" to
go to a routine for interpretive execution of the complex ones. Of
course, since IBM controls the compilers, it could have a version
that "knew" what instructions were executed directly and
preferentially generate code for them for higher performance on the
new systems (again, something Transmeta couldn't do.).
Do either of these make sense as a potential for IBM? I would guess
that if they did, it would produce a higher performance product than
they would get with a software JIT system.
|
note, there was a group looking at this during fort knox time-frame
.... 1980. there were huge number of microprocessors inside the
company, used for controllers, devices, low & mid-range 370s,
s38/as400, etc. the proposal was to move all of these to 801.
low & mid-range 370s were microprocessors of various kinds with 370
implemented as microcode. these machines avg. about 10 microcode
instructions per 370 instruction. we had taken advantage of this for
ecps ... which migrated 6k of high-use kernel code into microcode.
the migrated code about a 10:1 speedup (originally for 148 and then on
4341). i helped with an analysis that killed the 801 use for the 4341
followon. the alternative was that chips were advancing to the point
where you could get much of 370 instructions directly in silicon
.... which was faster than using 801 to emulate 370.
at the time, there was both a jit project for 370->801 ... and sort of
a more advanced version of ecps ... where portions of 370 code would
be recompiled to 801. i got involved because in the early 70s, i had
written a pli program that analyzed 370 assembler listings in various
ways, including generating a high-level language abstraction of what
the assembler code was doing (also detailed control flow, register
useage, etc).
the project using 801 for 4341 followon was canceled ... and the 4381
was much more of a native silicon implemention.
a 801 project that did survive was ROMP ... a joint research, office
products project to use 801/romp for a displaywriter followon. when
that got killed, it was decided to retarget the machine to the unix
workstation market ... hiring the company that had done the pc/ix port
(for the ibm/pc) to do one that came to be called aix. the romp
followon was rios/power.
misc. 801, romp, rios, fort knok, etc collected posts
http://www.garlic.com/~lynn/subtopic.html#801
recent postings in similar thread in mainframe n.g.
http://www.garlic.com/~lynn/2005u.html#40 POWER6 on zSeries?
http://www.garlic.com/~lynn/2005u.html#43 POWER6 on zSeries?
http://www.garlic.com/~lynn/2005u.html#44 POWER6 on zSeries?
--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/ |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Wed Dec 21, 2005 11:30 pm Post subject:
Re: IBM's POWER6 |
|
|
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135184733.328035.126630@z14g2000cwz.googlegroups.com...
| Quote: |
Stephen Fuld wrote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634
Enjoy,
Thanks, David. I did enjoy it.
Glad to hear that.
One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code.
Yes, that seems to be the most probable.
But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine
as
the target, didn't seem to have. Specifically, it could add logic
similar
to what Intel does, and on the fly "decode"/ translate the Z series code
to
Power code, perhaps with the addition of several otherwise unused
"special"
power instructions to aid performance.
That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.
|
I didn't realize that (except for the one case of auto-increment). In that
case, one would presumably devode Z series instructions directly into those
micro-ops, not into power code to be decoded again.
| Quote: | So, you would be further complicating the decoders
by adding support for zArch binaries;
|
Yes, I agree. However, a decoder is a lot less complex than an entire CPU,
so they still would save a lot of design and support effort. but gain
higher performance. It is essentially a "middle way". But if they don't
need the performance, then it probably isn't worth it.
| Quote: | zArch is definitely CISC and
would make things worse. Moreover, that's a feature in hardware which
will not be needed by the iSeries and pSeries. There are perhaps 10K
mainframe users worldwide, probably 200-700K iSeries users and A LOT of
pSeries users. Why would they go and complicate the hardware that
powers most IBM servers, just for the sake of the 10K mainframe users?
|
The question is whether it would be worth the extra performance gained for
those very profitable 10K users to add a modest amount of silicon. You are
saying that it seems not. I can certainly accept that.
| Quote: | Note that IBM can control BT, they would have to use some sort of eFuse
to disable the extra decode for a hardware alternative.
Didn't one of the early AMD pentium
compatible chips actually translate into 29K instructions?
I don't know, but that sounds like a distinctly not so smart idea. It
probably wasn't for the K6, which was AMD's first moderately
competitive chip.
Or, they could
do something like what ARM is doing with their jazelle technology to
almost
directly execute java byte code. It directly executes some instructions
by
translating them "on-the-fly" into ARM instructions (for the simple
ones),
and has some kind of "escape" mechanism" to go to a routine for
interpretive
execution of the complex ones. Of course, since IBM controls the
compilers,
it could have a version that "knew" what instructions were executed
directly
and preferentially generate code for them for higher performance on the
new
systems (again, something Transmeta couldn't do.).
Do either of these make sense as a potential for IBM? I would guess that
if
they did, it would produce a higher performance product than they would
get
with a software JIT system.
IBM has done a lot of work on JIT technology. Certainly, their
compilers will help people produce new code that is more efficient for
translation, but IBM will not require that users recompile their code.
|
No, of course not. Just that they would get a performance benefit if they
did so. But old load modules would certainly still work unchanged.
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
David Kanter
Guest
|
Posted:
Wed Dec 21, 2005 11:38 pm Post subject:
Re: IBM's POWER6 |
|
|
| Quote: | That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.
I didn't realize that (except for the one case of auto-increment). In that
case, one would presumably devode Z series instructions directly into those
micro-ops, not into power code to be decoded again.
|
Yes, if you were doing hardware decode it would go straight to uops.
| Quote: | So, you would be further complicating the decoders
by adding support for zArch binaries;
Yes, I agree. However, a decoder is a lot less complex than an entire CPU,
so they still would save a lot of design and support effort. but gain
higher performance. It is essentially a "middle way". But if they don't
need the performance, then it probably isn't worth it.
zArch is definitely CISC and
would make things worse. Moreover, that's a feature in hardware which
will not be needed by the iSeries and pSeries. There are perhaps 10K
mainframe users worldwide, probably 200-700K iSeries users and A LOT of
pSeries users. Why would they go and complicate the hardware that
powers most IBM servers, just for the sake of the 10K mainframe users?
The question is whether it would be worth the extra performance gained for
those very profitable 10K users to add a modest amount of silicon. You are
saying that it seems not. I can certainly accept that.
|
Yea, I don't think the benefits support the cost. To be honest, I
haven't heard of any MPU that can decode instructions from two ISAs in
anything resembling an efficient fashion. That's an awful lot of
baggage to be carrying around...although perhaps adding certain
instructions to assist BT would make sense.
| Quote: | IBM has done a lot of work on JIT technology. Certainly, their
compilers will help people produce new code that is more efficient for
translation, but IBM will not require that users recompile their code.
No, of course not. Just that they would get a performance benefit if they
did so. But old load modules would certainly still work unchanged.
|
I'd be interested to see what the performance increase is versus the
performance cost (probably in clockspeed) for UNIX workloads.
DK |
|
| Back to top |
|
 |
KR Williams
Guest
|
Posted:
Wed Dec 21, 2005 11:45 pm Post subject:
Re: IBM's POWER6 |
|
|
In article <Zigqf.178005$qk4.82127@bgtnsc05-
news.ops.worldnet.att.net>, s.fuld@PleaseRemove.att.net says...
| Quote: |
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135184733.328035.126630@z14g2000cwz.googlegroups.com...
Stephen Fuld wrote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634
Enjoy,
Thanks, David. I did enjoy it.
Glad to hear that.
One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code.
Yes, that seems to be the most probable.
But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine
as
the target, didn't seem to have. Specifically, it could add logic
similar
to what Intel does, and on the fly "decode"/ translate the Z series code
to
Power code, perhaps with the addition of several otherwise unused
"special"
power instructions to aid performance.
That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.
I didn't realize that (except for the one case of auto-increment). In that
case, one would presumably devode Z series instructions directly into those
micro-ops, not into power code to be decoded again.
|
Only the more complicated ops are "cracked" or "microcoded" (things
like load/store string). Most ops are translated 1:1.
--
Keith |
|
| Back to top |
|
 |
Iain McClatchie
Guest
|
Posted:
Thu Dec 22, 2005 1:15 am Post subject:
Re: IBM's POWER6 |
|
|
DK> zArch is definitely CISC and would make things worse.
zArch has cruft, which requires microcode escapes or a JIT escape.
But the core zArch instruction set is pretty reasonable, and actually
looks (to me) a lot like the x86-64:
16 64-bit GPRs
RX instructions: Rx <- Rx op MEM(Ry + Rz + constant)
variable-length instructions
The costly thing about accelerating zArch stuff is handling all those
memory models in hardware. Actually, the costly thing is *verifying*
all those memory models. I'd guess that the decoder, while taking up
more than a few mm^2, doesn't add tremendously to the cost of the
machine. Note that IBM isn't going to be selling these die to Apple.
A little extra cost per die for the pArch and iArch folks is worth
saving tens of millions of dollars for the zArch development.
Fast, variable-sized-instruction, variable-microops-emitted, escapes-
to-microcode decoders are a hard but solved problem these days.
AMD and Intel have had them for a decade. I'd guess that Power6's
decoder probably looks at 128 bits and can emit 4-8 microops per
cycle.
DK> I haven't heard of any MPU that can decode instructions from
DK> two ISAs in anything resembling an efficient fashion.
ARM/Thumb?
x86/x86-64? |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Thu Dec 22, 2005 1:15 am Post subject:
Re: IBM's POWER6 |
|
|
"KR Williams" <krw@att.bizzzz> wrote in message
news:MPG.1e135ea77b833bd898976c@News.Individual.NET...
| Quote: | In article <Zigqf.178005$qk4.82127@bgtnsc05-
news.ops.worldnet.att.net>, s.fuld@PleaseRemove.att.net says...
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135184733.328035.126630@z14g2000cwz.googlegroups.com...
Stephen Fuld wrote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project.
I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634
Enjoy,
Thanks, David. I did enjoy it.
Glad to hear that.
One comment. You talk about using technology similar to Transmeta's
JIT
mechanism to translate Z series code to Power code.
Yes, that seems to be the most probable.
But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw
machine
as
the target, didn't seem to have. Specifically, it could add logic
similar
to what Intel does, and on the fly "decode"/ translate the Z series
code
to
Power code, perhaps with the addition of several otherwise unused
"special"
power instructions to aid performance.
That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.
I didn't realize that (except for the one case of auto-increment). In
that
case, one would presumably devode Z series instructions directly into
those
micro-ops, not into power code to be decoded again.
Only the more complicated ops are "cracked" or "microcoded" (things
like load/store string). Most ops are translated 1:1.
|
Yes, that is what I thought (I forgot about the string ops). Thanks, Keith
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
David Kanter
Guest
|
Posted:
Thu Dec 22, 2005 3:19 pm Post subject:
Re: IBM's POWER6 |
|
|
| Quote: | DK> zArch is definitely CISC and would make things worse.
zArch has cruft, which requires microcode escapes or a JIT escape.
But the core zArch instruction set is pretty reasonable, and actually
looks (to me) a lot like the x86-64:
16 64-bit GPRs
RX instructions: Rx <- Rx op MEM(Ry + Rz + constant)
variable-length instructions
|
Yes, but it has a lot of awkward stuff in there as well, hexFP for one.
Admittedly, it is cleaner than x86.
| Quote: | The costly thing about accelerating zArch stuff is handling all those
memory models in hardware. Actually, the costly thing is *verifying*
all those memory models. I'd guess that the decoder, while taking up
more than a few mm^2, doesn't add tremendously to the cost of the
machine.
|
The question is whether it may or may not be in the critical path, and
the verification issues you mentioned.
| Quote: | Note that IBM isn't going to be selling these die to Apple.
A little extra cost per die for the pArch and iArch folks is worth
saving tens of millions of dollars for the zArch development.
|
I'm not convinced that doing decode rather than a JIT will save ~$10M.
They already have research projects to do just that, although they are
not productized or tested.
| Quote: | Fast, variable-sized-instruction, variable-microops-emitted, escapes-
to-microcode decoders are a hard but solved problem these days.
AMD and Intel have had them for a decade. I'd guess that Power6's
decoder probably looks at 128 bits and can emit 4-8 microops per
cycle.
|
You really think they did it in hardware?
| Quote: | DK> I haven't heard of any MPU that can decode instructions from
DK> two ISAs in anything resembling an efficient fashion.
ARM/Thumb?
x86/x86-64?
|
Sorry, I meant to say two ISAs of different families (say PA-RISC and
Alpha, ARM & x86, etc.). Although, I'm not familiar enough with
ARM/Thumb to discuss how different they are, but x86 vs. x86-64 doesn't
seem like it is a huge stretch.
DK |
|
| Back to top |
|
 |
|
|
|
|