I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?A ... 1905001634
Enjoy,
David Kanter
link doesn't work, at the moment.
link doesn't work, at the moment.
It should work now, if it doesn't email me...also, are you at work or
at home?
link doesn't work, at the moment.
It should work now, if it doesn't email me...also, are you at work or
at home?
DK
I was at home. And if I went to the home page and clicked the link I got
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?A ... 1905001634
Enjoy,
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?A ... 1905001634
Enjoy,
Thanks, David. I did enjoy it.
One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code.
But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine as
the target, didn't seem to have. Specifically, it could add logic similar
to what Intel does, and on the fly "decode"/ translate the Z series code to
Power code, perhaps with the addition of several otherwise unused "special"
power instructions to aid performance.
Didn't one of the early AMD pentium
compatible chips actually translate into 29K instructions?
Or, they could
do something like what ARM is doing with their jazelle technology to almost
directly execute java byte code. It directly executes some instructions by
translating them "on-the-fly" into ARM instructions (for the simple ones),
and has some kind of "escape" mechanism" to go to a routine for interpretive
execution of the complex ones. Of course, since IBM controls the compilers,
it could have a version that "knew" what instructions were executed directly
and preferentially generate code for them for higher performance on the new
systems (again, something Transmeta couldn't do.).
Do either of these make sense as a potential for IBM? I would guess that if
they did, it would produce a higher performance product than they would get
with a software JIT system.
One comment. You talk about using technology similar to Transmeta's
JIT mechanism to translate Z series code to Power code. But IBM has
some freedom that apparently Transmeta, with its restriction of a
vliw machine as the target, didn't seem to have. Specifically, it
could add logic similar to what Intel does, and on the fly "decode"/
translate the Z series code to Power code, perhaps with the addition
of several otherwise unused "special" power instructions to aid
performance. Didn't one of the early AMD pentium compatible chips
actually translate into 29K instructions? Or, they could do
something like what ARM is doing with their jazelle technology to
almost directly execute java byte code. It directly executes some
instructions by translating them "on-the-fly" into ARM instructions
(for the simple ones), and has some kind of "escape" mechanism" to
go to a routine for interpretive execution of the complex ones. Of
course, since IBM controls the compilers, it could have a version
that "knew" what instructions were executed directly and
preferentially generate code for them for higher performance on the
new systems (again, something Transmeta couldn't do.).
Do either of these make sense as a potential for IBM? I would guess
that if they did, it would produce a higher performance product than
they would get with a software JIT system.
Stephen Fuld wrote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?A ... 1905001634
Enjoy,
Thanks, David. I did enjoy it.
Glad to hear that.
One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code.
Yes, that seems to be the most probable.
But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine
as
the target, didn't seem to have. Specifically, it could add logic
similar
to what Intel does, and on the fly "decode"/ translate the Z series code
to
Power code, perhaps with the addition of several otherwise unused
"special"
power instructions to aid performance.
That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.
So, you would be further complicating the decoders
by adding support for zArch binaries;
zArch is definitely CISC and
would make things worse. Moreover, that's a feature in hardware which
will not be needed by the iSeries and pSeries. There are perhaps 10K
mainframe users worldwide, probably 200-700K iSeries users and A LOT of
pSeries users. Why would they go and complicate the hardware that
powers most IBM servers, just for the sake of the 10K mainframe users?
Note that IBM can control BT, they would have to use some sort of eFuse
to disable the extra decode for a hardware alternative.
Didn't one of the early AMD pentium
compatible chips actually translate into 29K instructions?
I don't know, but that sounds like a distinctly not so smart idea. It
probably wasn't for the K6, which was AMD's first moderately
competitive chip.
Or, they could
do something like what ARM is doing with their jazelle technology to
almost
directly execute java byte code. It directly executes some instructions
by
translating them "on-the-fly" into ARM instructions (for the simple
ones),
and has some kind of "escape" mechanism" to go to a routine for
interpretive
execution of the complex ones. Of course, since IBM controls the
compilers,
it could have a version that "knew" what instructions were executed
directly
and preferentially generate code for them for higher performance on the
new
systems (again, something Transmeta couldn't do.).
Do either of these make sense as a potential for IBM? I would guess that
if
they did, it would produce a higher performance product than they would
get
with a software JIT system.
IBM has done a lot of work on JIT technology. Certainly, their
compilers will help people produce new code that is more efficient for
translation, but IBM will not require that users recompile their code.
That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.
I didn't realize that (except for the one case of auto-increment). In that
case, one would presumably devode Z series instructions directly into those
micro-ops, not into power code to be decoded again.
So, you would be further complicating the decoders
by adding support for zArch binaries;
Yes, I agree. However, a decoder is a lot less complex than an entire CPU,
so they still would save a lot of design and support effort. but gain
higher performance. It is essentially a "middle way". But if they don't
need the performance, then it probably isn't worth it.
zArch is definitely CISC and
would make things worse. Moreover, that's a feature in hardware which
will not be needed by the iSeries and pSeries. There are perhaps 10K
mainframe users worldwide, probably 200-700K iSeries users and A LOT of
pSeries users. Why would they go and complicate the hardware that
powers most IBM servers, just for the sake of the 10K mainframe users?
The question is whether it would be worth the extra performance gained for
those very profitable 10K users to add a modest amount of silicon. You are
saying that it seems not. I can certainly accept that.
IBM has done a lot of work on JIT technology. Certainly, their
compilers will help people produce new code that is more efficient for
translation, but IBM will not require that users recompile their code.
No, of course not. Just that they would get a performance benefit if they
did so. But old load modules would certainly still work unchanged.
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135184733.328035.126630@z14g2000cwz.googlegroups.com...
Stephen Fuld wrote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project. I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?A ... 1905001634
Enjoy,
Thanks, David. I did enjoy it.
Glad to hear that.
One comment. You talk about using technology similar to Transmeta's JIT
mechanism to translate Z series code to Power code.
Yes, that seems to be the most probable.
But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw machine
as
the target, didn't seem to have. Specifically, it could add logic
similar
to what Intel does, and on the fly "decode"/ translate the Z series code
to
Power code, perhaps with the addition of several otherwise unused
"special"
power instructions to aid performance.
That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.
I didn't realize that (except for the one case of auto-increment). In that
case, one would presumably devode Z series instructions directly into those
micro-ops, not into power code to be decoded again.
In article <Zigqf.178005$qk4.82127@bgtnsc05-
news.ops.worldnet.att.net>, s.fuld@PleaseRemove.att.net says...
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135184733.328035.126630@z14g2000cwz.googlegroups.com...
Stephen Fuld wrote:
"David Kanter" <dkanter@gmail.com> wrote in message
news:1135038823.574953.268450@g44g2000cwa.googlegroups.com...
I just finished up a rather interesting article about IBM's upcoming
POWER6 MPU, and it's role in the somewhat infamous eCLipz project.
I
discuss broad details of microarchitecture of the POWER6, along with
some performance estimates for SPECint/fp.
The article can be found at:
http://www.realworldtech.com/page.cfm?A ... 1905001634
Enjoy,
Thanks, David. I did enjoy it.
Glad to hear that.
One comment. You talk about using technology similar to Transmeta's
JIT
mechanism to translate Z series code to Power code.
Yes, that seems to be the most probable.
But IBM has some
freedom that apparently Transmeta, with its restriction of a vliw
machine
as
the target, didn't seem to have. Specifically, it could add logic
similar
to what Intel does, and on the fly "decode"/ translate the Z series
code
to
Power code, perhaps with the addition of several otherwise unused
"special"
power instructions to aid performance.
That is a possibility, but it seems unlikely. Decoding the POWERPC
instruction is already a relatively tough task that they use Intel's
technique of breaking instructions down to "micro-ops" or whatever they
want to call them.
I didn't realize that (except for the one case of auto-increment). In
that
case, one would presumably devode Z series instructions directly into
those
micro-ops, not into power code to be decoded again.
Only the more complicated ops are "cracked" or "microcoded" (things
like load/store string). Most ops are translated 1:1.
DK> zArch is definitely CISC and would make things worse.
zArch has cruft, which requires microcode escapes or a JIT escape.
But the core zArch instruction set is pretty reasonable, and actually
looks (to me) a lot like the x86-64:
16 64-bit GPRs
RX instructions: Rx <- Rx op MEM(Ry + Rz + constant)
variable-length instructions
The costly thing about accelerating zArch stuff is handling all those
memory models in hardware. Actually, the costly thing is *verifying*
all those memory models. I'd guess that the decoder, while taking up
more than a few mm^2, doesn't add tremendously to the cost of the
machine.
Note that IBM isn't going to be selling these die to Apple.
A little extra cost per die for the pArch and iArch folks is worth
saving tens of millions of dollars for the zArch development.
Fast, variable-sized-instruction, variable-microops-emitted, escapes-
to-microcode decoders are a hard but solved problem these days.
AMD and Intel have had them for a decade. I'd guess that Power6's
decoder probably looks at 128 bits and can emit 4-8 microops per
cycle.
DK> I haven't heard of any MPU that can decode instructions from
DK> two ISAs in anything resembling an efficient fashion.
ARM/Thumb?
x86/x86-64?
Return to Computer Architecture
Users browsing this forum: Yahoo [Bot] and 0 guests