| Author |
Message |
colin
Guest
|
Posted:
Fri Dec 03, 2004 5:23 pm Post subject:
making an fpga hot |
|
|
Guys
We have just laid out a board and want to put the thermal analysis to
bed (it's conduction cooled so not much room for error). If the xilinx
estimator says we are going to use 25 watts does anyone know the best
way to code an FPGA so that it will get nice and hot.
The estimator is just that, but is there a more accurate way of
writing some code so that a particular clock input will generate a
particular amount of heat. A 2000 D type serial chain where every flip
flop is toggling every clock which blinks an LED is obviously one way
but doesn't seem very ellegant.
We have wired up the internal temp sense diode to take a look at the
result (and yes we know how noisy and innacurate they are).
Any experiences?
Colin |
|
| Back to top |
|
 |
Marc Randolph
Guest
|
Posted:
Fri Dec 03, 2004 9:07 pm Post subject:
Re: making an fpga hot |
|
|
colin wrote:
| Quote: | Guys
We have just laid out a board and want to put the thermal analysis to
bed (it's conduction cooled so not much room for error). If the xilinx
estimator says we are going to use 25 watts does anyone know the best
way to code an FPGA so that it will get nice and hot.
The estimator is just that, but is there a more accurate way of
writing some code so that a particular clock input will generate a
particular amount of heat. A 2000 D type serial chain where every flip
flop is toggling every clock which blinks an LED is obviously one way
but doesn't seem very ellegant.
|
If your goal is just to generate heat, use all the LUTs as SRL's, make
use of all the BRAM's, and drive all the I/O's with a nice high current
drive strength.
Marc |
|
| Back to top |
|
 |
Austin Lesea
Guest
|
Posted:
Fri Dec 03, 2004 9:12 pm Post subject:
Re: making an fpga hot |
|
|
Coiln,
Just make a huge shift register, or all DFF's toggling, and then just
vary the clock input (or the shifted data input pattern from ....000001,
to 101010....etc).
That is what we do.
Austin
colin wrote:
| Quote: | Guys
We have just laid out a board and want to put the thermal analysis to
bed (it's conduction cooled so not much room for error). If the xilinx
estimator says we are going to use 25 watts does anyone know the best
way to code an FPGA so that it will get nice and hot.
The estimator is just that, but is there a more accurate way of
writing some code so that a particular clock input will generate a
particular amount of heat. A 2000 D type serial chain where every flip
flop is toggling every clock which blinks an LED is obviously one way
but doesn't seem very ellegant.
We have wired up the internal temp sense diode to take a look at the
result (and yes we know how noisy and innacurate they are).
Any experiences?
Colin |
|
|
| Back to top |
|
 |
Symon
Guest
|
Posted:
Fri Dec 03, 2004 11:36 pm Post subject:
Re: making an fpga hot |
|
|
"colin" <colin_toogood@yahoo.com> wrote in message
news:885a4a4a.0412030423.4f6b7e7c@posting.google.com...
| Quote: | We have wired up the internal temp sense diode to take a look at the
result (and yes we know how noisy and innacurate they are).
Any experiences?
Well, I've found the diode isn't particularly noisy nor especially |
inaccurate! It gives repeatable and consistent (between parts) results,
certainly good enough for your application. You have routed its connections
together and away from big switching currents, I presume?!
I use copper sheet to move heat to where I can get rid of it. Cu is 400
W/m/K, about twice as good as Aluminium. Don't use copper alloys. Very
useful if you've got boards stacked closely together, you can get the heat
out from between the boards. I've never tried heat pipes, but they're meant
to be very good indeed.
Finally, you'll find that the FPGAs work at elevated temperature for a long
time. I recall a thread on CAF all about FPGAs down boreholes where they
were running for weeks at 175C. You might be enlightened by a quick trawl of
CAF in Google Groups. So, what's the lifetime of your product? How long will
you be working for that company? All part of the engineering compromise!!
Good luck, Syms. |
|
| Back to top |
|
 |
Mikeandmax
Guest
|
Posted:
Sat Dec 04, 2004 2:44 am Post subject:
Re: making an fpga hot |
|
|
| Quote: | So, what's the lifetime of your product? How long will
you be working for that company? All part of the engineering compromise!!
|
ROFL !!
thanx for the chuckle -
Mike T |
|
| Back to top |
|
 |
Mark Smith
Guest
|
Posted:
Tue Dec 07, 2004 1:32 am Post subject:
Re: making an fpga hot |
|
|
Ahhh, that explains the issues with the ANT then... ;-)
Mark |
|
| Back to top |
|
 |
Paul Leventis (at home)
Guest
|
Posted:
Wed Dec 08, 2004 9:44 am Post subject:
Re: making an fpga hot |
|
|
Hi Colin,
Below I try to give some insight into how to make a hot design, though I do
question the motivation of doing so. A simple FF chain comes no where close
to achieving a high (or even average) core power.
All of the phenomena I describe below are modeled in the recently released
Quartus II 4.2 software via its PowerPlay Power Analyzer. Target Stratix II
or Max II and you'll get very accurate estimates of how all these factors
affect your power consumption. You can try out the Power Analyzer in the
Quartus II 4.2 Web Edition software available from www.altera.com.
If you're trying to figure out if a given design will work on your board
after it's been made, the best bet is to try the chip out in the lab using
stimulus (vectors) that reflect the worst-case operating conditions for the
chip. I can make you a design that will burn many many Watts of power, but
that doesn't mean your design will. A dynamic power measurement from the
lab is the most accurate estimate possible -- just remember to use the
manufacturer's spec for worst-case static power (at worst-case temperature)
since the unit you have on your board is likely NOT worst-case.
| Quote: | The estimator is just that, but is there a more accurate way of
writing some code so that a particular clock input will generate a
particular amount of heat. A 2000 D type serial chain where every flip
flop is toggling every clock which blinks an LED is obviously one way
but doesn't seem very ellegant.
|
There are many factors that affect overall dynamic power consumption of an
FPGA design. I will highlight a few critical ones below, and make
suggestions along the way to build a design to turn your FPGA into the
hot-plate you desire. It is *not* as simple as making one big
shift-register...
(0) Transition Density. You want to toggle as much every cycle as possible.
Toggle FF/shift register achieve this, as do XOR functions (if you want to
utilize the LUT too).
(1) Routing Utilization. The routing buffers, multiplexers, and wiring in
an FPGA can add up to a large amount of switching capacitance and
short-circuit (crowbar) current. To maximize dynamic power, you must use a
lot of routing. A simple FF chain will actually use very little routing,
unless you purposely make the placement very bad by using region constraints
such as LogicLock regions. You could, for example, constrain the even bits
of your chain to one-half the chip and the odd bits to the other half, and
this will greatly increase routing utilization. Or use something other than
FFs to increase the number and fanout of the routed wires. Of course,
you'll need to experiment a little to find the right balance between high
utilization and still being able to route!
(2) LUT Configuration. A LUT configured as an AND gate does not burn nearly
as much power as one configured as an XOR. This difference is due to the
number of internal nodes in the circuit that toggle states upon the toggle
of in input signal. On top of this, the output of an XOR will toggle upon
the toggle of any input -- so chaining together XORs will result in a
cascade of glitching (if there are no pipeline registers), which can further
increase your power. To get the most accurate estimate of LUT power, you
must consider the functionality of the LUT -- Quartus II can do this for
you.
(3) Clock Network. The vast majority of power on a high-fanout clock will
be burned *inside* the LABs (on the LAB-wide clock), not on the global clock
network. If you distribute a clock such that it fans out to one FF (out of
16) in every LAB of the device, this will maximize this internal LAB clock
network power. You can achieve this through location constraints applied to
these FFs. And the more clocks you use, the more you will burn. You can
use the PLLs to step up the clock frequency to help increase the toggle
rate.
(4) RAMs. A RAM can burn significant power if you perform reads & writes
every cycle (keep the clock enable asserted). Just hook up all the RAMs in
the device to be in dual-port mode writing & reading random data every
cycle, and you've got some more power.
(5) I/Os. You can burn an arbitrary amount of power with your I/Os,
depending on external termination resistance, contention, I/O standard,
drive strength, load capacitance, etc. Let's just pretend you don't have
I/Os to make life easier.
Hopefully that gives you some ideas of where to go to burn some power. If
your using a Xilinx chip, I'm sure similar techniques will apply, though
their tools may not be able to fully predict the results you will see.
Regards,
Paul Leventis
Altera Corp. |
|
| Back to top |
|
 |
Symon
Guest
|
Posted:
Wed Dec 08, 2004 1:33 pm Post subject:
Re: making an fpga hot |
|
|
Hi Paul,
Comments/Questions below!
"Paul Leventis (at home)" <paulleventis-news@yahoo.ca> wrote in message
news:686dnTKPrvwyGyvcRVn-pQ@rogers.com...
| Quote: | (2) LUT Configuration. A LUT configured as an AND gate does not burn
nearly
as much power as one configured as an XOR. This difference is due to the
number of internal nodes in the circuit that toggle states upon the toggle
of in input signal. On top of this, (blah, blah, XORs transition more)
|
Could you explain that a little more? I thought that the LUT was just a 16x1
RAM. Is the extra power consumed only when two inputs change? e.g. 00 => 11
into the XOR would still have 0 as its output but it might transistion
through the 1 output state? I understand that XOR gates are more likely to
transition, but you seem to be saying there's some additional internal
reason why they consume power.
| Quote: |
Paul Leventis
Altera Corp.
Cheers, Syms. |
|
|
| Back to top |
|
 |
Ray Andraka
Guest
|
Posted:
Thu Dec 09, 2004 4:02 am Post subject:
Re: making an fpga hot |
|
|
The logic transitions in the routing and subsequent differential delays through
the LUT can make for many more transitions than a simple buffer implemented in a
LUT. Unless all the LUT inputs are precisely timed so that the edges change
together, you wind up with a walk through several of the LUT addresses in the
process of settling to the next clock. A paper presented at FPGA a few years
ago went as far as to say that as much as 30-40% of the power in a typical fpga
design is due to propagating glitches in the logic between flip-flops, and they
showed that by heavily pipelining the design, the power consumption improved
dramatically.
Symon wrote:
| Quote: | Hi Paul,
Comments/Questions below!
"Paul Leventis (at home)" <paulleventis-news@yahoo.ca> wrote in message
news:686dnTKPrvwyGyvcRVn-pQ@rogers.com...
(2) LUT Configuration. A LUT configured as an AND gate does not burn
nearly
as much power as one configured as an XOR. This difference is due to the
number of internal nodes in the circuit that toggle states upon the toggle
of in input signal. On top of this, (blah, blah, XORs transition more)
Could you explain that a little more? I thought that the LUT was just a 16x1
RAM. Is the extra power consumed only when two inputs change? e.g. 00 => 11
into the XOR would still have 0 as its output but it might transistion
through the 1 output state? I understand that XOR gates are more likely to
transition, but you seem to be saying there's some additional internal
reason why they consume power.
Paul Leventis
Altera Corp.
Cheers, Syms.
|
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com
"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759 |
|
| Back to top |
|
 |
Paul Leventis (at home)
Guest
|
Posted:
Thu Dec 09, 2004 5:58 am Post subject:
Re: making an fpga hot |
|
|
Hi Symon,
| Quote: | (2) LUT Configuration. A LUT configured as an AND gate does not burn
nearly
as much power as one configured as an XOR. This difference is due to
the
number of internal nodes in the circuit that toggle states upon the
toggle
of in input signal. On top of this, (blah, blah, XORs transition more)
Could you explain that a little more? I thought that the LUT was just a
16x1
RAM. Is the extra power consumed only when two inputs change? e.g. 00 =
11
into the XOR would still have 0 as its output but it might transistion
through the 1 output state? I understand that XOR gates are more likely to
transition, but you seem to be saying there's some additional internal
reason why they consume power.
|
While logically a LUT is just 16x1 ROM, physically it is not built the same
way as a RAM.
A traditional RAM is built with a 2D-array of bits, where a row is selected
by decoding the address, and a pair of differential bit lines per cell is
precharged and then the cell pulls one side down which is amplified by a
sense-amplifier to speed things up (gross simplification). In that
structure, regardless of what you are reading, you burn the same power since
the reads are differential, and you burn power on each read, regardless of
the previously read value, since all that precharge, pull-down and sensing
happens every read.
A LUT however is traditionally built as a multiplexor tree. You have 16
SRAM cells feeding a tree of 2:1 muxes. The 4 inputs of the LUT each
control one level of the tree. There is a diagram below for a 2-LUT.
Let's take a 2-LUT implementing an XOR as an example (see diagram). We have
x = A?1:0 and y = A?0:1, and f = B?y:x. Let's say A switches from 0-->1
(and B = 0). Node x toggles from a 0 to 1. Node y toggles from a 1 to a 0.
And node f toggles from a 0 to a 1 (with x). So you have not only the
output of the LUT toggling, but also the internal stages. If you extend the
example to an N-LUT, you'll see that a toggle on input A results in 2^(N-1)
first stage nodes toggling, 2^(N-2) second stage, etc. or 2^N - 1 nodes
toggling *internal* to the LUT. If you look at an AND instead, you'll see
that only one first stage node toggles state with a change in A.
A B
+-+ | |
|0|-|\ x |
+++ | |__ |
+-+ | | |\
|1|-|/ | |
+++ | | |__ f
+-+ | | |
|1|-|\ y| |
+++ | |__|/
+-+ | |
|0|-|/
+++
So in conclusion, an XOR not only results in a higher output switching
probability (which should be modeled by your simulation vectors or assumed
toggle rate), but also results in higher *internal* switching activity.
Hence power of a LUT is not constant in LUT mask. In fact, it also changes
as a function of what the "static probabilities" of each input are, or % of
the time those inputs are 1 or 0, since assymetric LUT masks result in
assymetric internal states as a function of input values.
Regards,
Paul Leventis
Altera Corp. |
|
| Back to top |
|
 |
Paul Leventis (at home)
Guest
|
Posted:
Thu Dec 09, 2004 6:09 am Post subject:
Re: making an fpga hot |
|
|
Hi Ray et al:
Good point on glitching. On a related note, this glitching also makes power
analysis difficult. Even with good-quality simulation vectors for a design,
the resulting gate-level simulation results will contain glitches. Are the
glitches real? If so, then they should count towards power. But
sufficiently short glitches will never propagate through the routing, or
even through the gate.
This is why we recommend that our users employ glitch filtering on
simulation results. This can be done with the Quartus II 4.2 simulator or
with 3rd party simulators (via the control file emitted by Quartus II). We
find that very glitchy designs do not correlate well unless this glitch
filtering is used. In addition, the resulting VCD files produced by 3rd
party sims need to be further filtered by Quartus in order to improve
accuracy further.
For further information on power analysis, the Quartus II PowerPlay Power
Analyzer and glitch filtering specifically, please see
http://www.altera.com/literature/hb/qts/qts_qii53013.pdf.
And yes, pipelining is an excellent way to reduce glitching and thus dynamic
power. At some point, the pipeline registers and additional clock routing
will add more power than the glitches removed, but for glitch-heavy designs
(anything with XORs, such as adders, multipliers, and parity trees, and
"randomizing" circuits such as encryption) pipeling will help a lot.
Regards,
Paul Leventis
Altera Corp. |
|
| Back to top |
|
 |
Symon
Guest
|
Posted:
Thu Dec 09, 2004 1:30 pm Post subject:
Re: making an fpga hot |
|
|
Hmm, that's very interesting. I wonder if the FPGA vendors have got their
SLICEs back to front? I.e. the FFs should feed directly into the LUTs within
the SLICEs, instead of the other way round that exists now. If it saved even
20% of the power, it'd be worth it. Instead of using all the FFs for
pipelining, you use them to replicate signals within the SLICEs to prevent
the glitchy power thing. Hmm, interesting indeed! Thanks Ray.
Cheers, Syms.
"Ray Andraka" <ray@andraka.com> wrote in message
news:41B787FB.B377EF49@andraka.com...
| Quote: | The logic transitions in the routing and subsequent differential delays
through
the LUT can make for many more transitions than a simple buffer
implemented in a
LUT. Unless all the LUT inputs are precisely timed so that the edges
change
together, you wind up with a walk through several of the LUT addresses in
the
process of settling to the next clock. A paper presented at FPGA a few
years
ago went as far as to say that as much as 30-40% of the power in a typical
fpga
design is due to propagating glitches in the logic between flip-flops, and
they
showed that by heavily pipelining the design, the power consumption
improved
dramatically.
|
|
|
| Back to top |
|
 |
Symon
Guest
|
Posted:
Thu Dec 09, 2004 9:29 pm Post subject:
Re: making an fpga hot |
|
|
Hi Paul,
That's interesting too! I think what you're saying is that some inputs to
the LUT are more power thirsty than others. So, in your example, the A input
in your example controls more muxes than the B input. This means that you
could reduce power by taking this into account. If you had a LUT structure
with four inputs A, B, C, D then A would feed 8 muxes, B feeds 4, C feeds 2,
and D feeds just one. For any two input function, only two inputs are used
and the P & R tools could prefer to use the C and D inputs for the least
amount of internal switching of nodes. Also, the net that changes most
frequently should be on the D input. Correct?
Thanks, Syms.
"Paul Leventis (at home)" <paulleventis-news@yahoo.ca> wrote in message
news:aLidnZsPU9W2PircRVn-ow@rogers.com...
| Quote: |
Let's take a 2-LUT implementing an XOR as an example (see diagram). We
have
x = A?1:0 and y = A?0:1, and f = B?y:x. Let's say A switches from 0-->1
(and B = 0). Node x toggles from a 0 to 1. Node y toggles from a 1 to a
0.
And node f toggles from a 0 to a 1 (with x). So you have not only the
output of the LUT toggling, but also the internal stages. If you extend
the
example to an N-LUT, you'll see that a toggle on input A results in
2^(N-1)
first stage nodes toggling, 2^(N-2) second stage, etc. or 2^N - 1 nodes
toggling *internal* to the LUT. If you look at an AND instead, you'll see
that only one first stage node toggles state with a change in A.
A B
+-+ | |
|0|-|\ x |
+++ | |__ |
+-+ | | |\
|1|-|/ | |
+++ | | |__ f
+-+ | | |
|1|-|\ y| |
+++ | |__|/
+-+ | |
|0|-|/
+++ |
|
|
| Back to top |
|
 |
glen herrmannsfeldt
Guest
|
Posted:
Thu Dec 09, 2004 10:56 pm Post subject:
Re: making an fpga hot |
|
|
Paul Leventis (at home) wrote:
(snip regarding power, XOR trees, and FPGAs)
| Quote: | While logically a LUT is just 16x1 ROM, physically it is not built the same
way as a RAM.
A traditional RAM is built with a 2D-array of bits, where a row is selected
by decoding the address, and a pair of differential bit lines per cell is
precharged and then the cell pulls one side down which is amplified by a
sense-amplifier to speed things up (gross simplification). In that
structure, regardless of what you are reading, you burn the same power since
the reads are differential, and you burn power on each read, regardless of
the previously read value, since all that precharge, pull-down and sensing
happens every read.
|
That sounds more like a DRAM or SDRAM. Traditional SRAMs were
completely combinatorial, such that the output changed the appropriate
propagation delay after the address changed. Wouldn't the precharging
require a clock? I would have thought a 2D array, where a row is
decoded, the outputs from the selected row, either differential or not
are supplied to a mutliplexer to select the appropriate bits to output.
At 16 cells the advantage of 2D decoding might not be worthwhile.
| Quote: | A LUT however is traditionally built as a multiplexor tree. You have 16
SRAM cells feeding a tree of 2:1 muxes. The 4 inputs of the LUT each
control one level of the tree. There is a diagram below for a 2-LUT.
|
I wonder how 16 bit SRAMs were built? As far as I understand it, the
first semiconductor memory for a commercial computer was the storage
protection keys for the IBM 360/91, built out if 16 bit SRAM chips.
-- glen |
|
| Back to top |
|
 |
Tim
Guest
|
Posted:
Tue Dec 21, 2004 7:14 am Post subject:
Re: making an fpga hot |
|
|
As I understand it (!) Stephen Trimberger (Xilinx and much
distinguished previous work) presented a paper recently on
this fairly recently.
"Symon" <symon_brewer@hotmail.com> wrote in message
news:31qgosF3cc0s9U1@individual.net...
| Quote: | Hmm, that's very interesting. I wonder if the FPGA vendors have got their
SLICEs back to front? I.e. the FFs should feed directly into the LUTs within
the SLICEs, instead of the other way round that exists now. If it saved even
20% of the power, it'd be worth it. Instead of using all the FFs for
pipelining, you use them to replicate signals within the SLICEs to prevent the
glitchy power thing. Hmm, interesting indeed! Thanks Ray.
Cheers, Syms. |
|
|
| Back to top |
|
 |
|
|
|
|