| Author |
Message |
Scott Bekker
Guest
|
Posted:
Thu Dec 08, 2005 1:16 am Post subject:
Virtex 4 not meeting timing constraints |
|
|
Hi,
I have a design for a Virtex 4 SX35-10 that is not meeting my timing
constraints. The only constraint is set in the ucf file as a clock
period of 4.75 ns. Synthesis gives the following:
Timing Summary:
---------------
Speed Grade: -10
Minimum period: 7.680ns (Maximum Frequency: 130.213MHz)
Minimum input arrival time before clock: 1.890ns
Maximum output required time after clock: 5.810ns
Maximum combinational path delay: 0.000ns
Doing a post map static timing analysis gives the following as the
first error. (place and route fails)
Source:
uut1/overlapadd1/fifo1/BU2/U0/ss/memblk/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_coreinst/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_blkmemdp_v6_2_xst/bm/mem/arch_v2/prim/4/b1/chk0/col/0/b2/mextd/arch_v2/c1/ram1/v2/d4096/by4/newSim8/RAMB16
(RAM)
Destination: uut1/overlapadd1/f2_data_in_sig_0_BRB2 (FF)
Requirement: 4.750ns
Data Path Delay: 5.522ns (Levels of Logic = 1)
Clock Path Skew: 0.000ns
Source Clock: fast_clk rising at 0.000ns
Destination Clock: fast_clk rising at 4.750ns
Clock Uncertainty: 0.060ns
Does the post map report include estimates of routing delays? Can I
constrain XST to provide better results, if so how? Is 210 MHz too fast
for this speed grade FPGA? Running XST with higher effort does not
seem to help.
thanks |
|
| Back to top |
|
 |
Jeff Cunningham
Guest
|
Posted:
Thu Dec 08, 2005 1:16 am Post subject:
Re: Virtex 4 not meeting timing constraints |
|
|
Scott Bekker wrote:
| Quote: | Hi,
I have a design for a Virtex 4 SX35-10 that is not meeting my timing
constraints. The only constraint is set in the ucf file as a clock
period of 4.75 ns. Synthesis gives the following:
Timing Summary:
---------------
Speed Grade: -10
Minimum period: 7.680ns (Maximum Frequency: 130.213MHz)
Minimum input arrival time before clock: 1.890ns
Maximum output required time after clock: 5.810ns
Maximum combinational path delay: 0.000ns
Doing a post map static timing analysis gives the following as the
first error. (place and route fails)
Source:
uut1/overlapadd1/fifo1/BU2/U0/ss/memblk/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_coreinst/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_blkmemdp_v6_2_xst/bm/mem/arch_v2/prim/4/b1/chk0/col/0/b2/mextd/arch_v2/c1/ram1/v2/d4096/by4/newSim8/RAMB16
(RAM)
Destination: uut1/overlapadd1/f2_data_in_sig_0_BRB2 (FF)
Requirement: 4.750ns
Data Path Delay: 5.522ns (Levels of Logic = 1)
Clock Path Skew: 0.000ns
Source Clock: fast_clk rising at 0.000ns
Destination Clock: fast_clk rising at 4.750ns
Clock Uncertainty: 0.060ns
Does the post map report include estimates of routing delays?
|
not sure. You said P&R failed -why? Were there unroutes? Timing failed?
| Quote: | Can I constrain XST to provide better results, if so how?
|
There is a switch (in map setup I think) for XST to optimize for speed
or area, which should be set to speed. But to get significant gains, you
need to understand what the path is that is failing and how it fits in
your design. Can you relate the source and destination names from that
timing report back to the corresponding RAM and FF in your source code?
Usually a typical design has many paths that are effectively not
really ever exercised at full clock speed or maybe not at all. If you
are lucky, the path that is failing is in this category. You might read
up on the "multicyle" path and "ignore" constraints.
If the path that is failing really needs to run that fast, you can use
tricks like pipelining to break up the large slow operation into several
smaller faster ones. hmmm I just noticed the failing path is just 1
logic level, so pipelining probably won't help.
| Quote: | Is 210 MHz too fast for this speed grade FPGA?
|
Depends entirely on the particulars of your design. A small state
machine, probably no problem. 64 bit non pipelined single cycle
accumulator, probably to slow.
It looks like the source of your failing path is the output of a fifo's
sram. IIRC, the clock to data out of block RAM is significantly larger
than that of FF's. If that's the case, maybe you can pull some trick
like make the fifo output twice as wide, and feed that as a 2-cycle path
into some sort of FF based mux that can run at full clock speed. In
other words, if you can transfer twice as much data, you can take 2
clocks to do it, so it effectively only has to run at 105 Mhz.
-Jeff |
|
| Back to top |
|
 |
Ray Andraka
Guest
|
Posted:
Fri Dec 09, 2005 1:16 am Post subject:
Re: Virtex 4 not meeting timing constraints |
|
|
Scott Bekker wrote:
| Quote: | Hi,
I have a design for a Virtex 4 SX35-10 that is not meeting my timing
constraints. The only constraint is set in the ucf file as a clock
period of 4.75 ns. Synthesis gives the following:
Timing Summary:
---------------
Speed Grade: -10
Minimum period: 7.680ns (Maximum Frequency: 130.213MHz)
Minimum input arrival time before clock: 1.890ns
Maximum output required time after clock: 5.810ns
Maximum combinational path delay: 0.000ns
Doing a post map static timing analysis gives the following as the
first error. (place and route fails)
Source:
uut1/overlapadd1/fifo1/BU2/U0/ss/memblk/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_coreinst/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_blkmemdp_v6_2_xst/bm/mem/arch_v2/prim/4/b1/chk0/col/0/b2/mextd/arch_v2/c1/ram1/v2/d4096/by4/newSim8/RAMB16
(RAM)
Destination: uut1/overlapadd1/f2_data_in_sig_0_BRB2 (FF)
Requirement: 4.750ns
Data Path Delay: 5.522ns (Levels of Logic = 1)
Clock Path Skew: 0.000ns
Source Clock: fast_clk rising at 0.000ns
Destination Clock: fast_clk rising at 4.750ns
Clock Uncertainty: 0.060ns
Does the post map report include estimates of routing delays? Can I
constrain XST to provide better results, if so how? Is 210 MHz too fast
for this speed grade FPGA? Running XST with higher effort does not
seem to help.
thanks
|
210 MHz is apparently too fast for YOUR DESIGN in this speed grade. Any
speed grade Virtex4 is capable of quite a bit faster clocking, but you
need to be somewhat careful in the design. I am currently working on a
floating point FFT design for an XC4VSX55-10 that is clocked at 400 MHz.
If you look at the .twr timing report instead of the one that comes up
in the gui, it gives more detail on the failing path, including an
element by element break down of the failing path and the location of
each element. Since there is only 1 level of logic, I am guessing that
this failing path is sourced by a block RAM that does not have the
output register enabled, and the destination has a LUT in front of the
flip-flop, plus it is probably not located immediately adjacent to the
BRAM. You'll want to increase or at least modify the pipelining to
improve the performance, and turn on the output register on the BRAM
(the clock to out of the BRAM is rather long without the output register). |
|
| Back to top |
|
 |
Scott Bekker
Guest
|
Posted:
Tue Dec 20, 2005 1:01 am Post subject:
Re: Virtex 4 not meeting timing constraints |
|
|
Thanks for the help, Ray. I added the register after the block ram and
that fixed that timing error. I was then having more timing errors in
a CoreGen FFT core. The design was running significantly slower than
the data sheet specified. After a lot of playing around with tool
settings, I finally found the problem. CoreGen showed the correct
device on the bottom of the main gui page, however the device setting
in the options was set to spartan 3. I corrected the setting, and now
my design is making timing with default settings for all implementation
tools. I think there is probably room for improvement as well.
Thanks again.
Scott |
|
| Back to top |
|
 |
Ray Andraka
Guest
|
Posted:
Tue Dec 20, 2005 9:15 am Post subject:
Re: Virtex 4 not meeting timing constraints |
|
|
Scott Bekker wrote:
| Quote: | Thanks for the help, Ray. I added the register after the block ram and
that fixed that timing error. I was then having more timing errors in
a CoreGen FFT core. The design was running significantly slower than
the data sheet specified. .... I think there is probably room for improvement as well.
Thanks again.
Scott
|
Glad to have been a help. As I indicated, with some diligence, you can
get the slow speed grade V4SX (-10) to run at 400 MHz, which is the max
clock rate of the BRAMs and DSP48's when fully pipelined. The fabric,
with the exception of the carry chains, can run considerably faster.
The carry chains are limited to about 10 bits at 400 Mhz, which is a shame. |
|
| Back to top |
|
 |
|
|
|
|