| Author |
Message |
Moti Cohen
Guest
|
Posted:
Sun Dec 05, 2004 2:55 pm Post subject:
how to speed up my accumulator ?? |
|
|
Hello all,
I've a design that contains a NCO (Numerically controlled oscillator).
The NCO consists of a 32'bit accumulator. when i write the accumulator
straight forward like this -
process (clk,resetn)
begin
if resetn = '0' then
accumulator <= (others =>'0');
elsif clk'event and clk ='1' then
accumulator <= accumulator + inc_value;
end if;
end process;
Fout <= accumulator (accumulator'high);
the maximum frequency I can achive for 'clk' is ~ 150 MHz (spartan 3).
I need it to work in ~200 MHz so I figured out that some pipelining is
needed but I dont know how to do it because of the accumulator
feedback. Maybe someone here can explain it to me or even give me a
code example (which will be great).
Thanks in advance, Moti. |
|
| Back to top |
|
 |
Hal Murray
Guest
|
Posted:
Sun Dec 05, 2004 3:16 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
| Quote: | the maximum frequency I can achive for 'clk' is ~ 150 MHz (spartan 3).
I need it to work in ~200 MHz so I figured out that some pipelining is
needed but I dont know how to do it because of the accumulator
feedback. Maybe someone here can explain it to me or even give me a
code example (which will be great).
|
google for carry-save adder. Or counter.
The idea is to break the adder into chunks. The carry-out
of each chunk goes into a FF and then into the carry-in of
the next chunk. Chop it up into chunks that are small enough
that they meet your speed requirements.
With modern dedicated carry logic, this doesn't work as well
as it did in the old days.
--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam. |
|
| Back to top |
|
 |
Antti Lukats
Guest
|
Posted:
Sun Dec 05, 2004 4:41 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
"Moti Cohen" <moti@terasync.net> wrote in message
news:c04bfe33.0412050155.7afd29ee@posting.google.com...
| Quote: | Hello all,
I've a design that contains a NCO (Numerically controlled oscillator).
The NCO consists of a 32'bit accumulator. when i write the accumulator
straight forward like this -
process (clk,resetn)
begin
if resetn = '0' then
accumulator <= (others =>'0');
elsif clk'event and clk ='1' then
accumulator <= accumulator + inc_value;
end if;
end process;
Fout <= accumulator (accumulator'high);
the maximum frequency I can achive for 'clk' is ~ 150 MHz (spartan 3).
I need it to work in ~200 MHz so I figured out that some pipelining is
needed but I dont know how to do it because of the accumulator
feedback. Maybe someone here can explain it to me or even give me a
code example (which will be great).
Thanks in advance, Moti.
|
http://ipcores.openchip.org/ddsx.html
NCO with max (virtual) frequency of 11 (eleven)GHz!
for your speed you possible can optimize the adder to get the performance.
however it is also possible to have way higher clock frequences for the NCO
then the FPGA fabric supports. it is resource consuming but working
solution.
to get 11GHz performance (using V4 rocketio) the 40 NCO words are calculated
each clock cycle and then the result is serialized in with rocket SERDES
similarly in FPGA's with no special serdes there would be still be some
speed gain using the NCO at lower frequency and calculatig maybe 4 or 8 bits
per clock and then using very fast shift register to shif the bits out. that
approuch would be useable for 400M+ frequencies (within FPGA fabric)
Antti |
|
| Back to top |
|
 |
Moti
Guest
|
Posted:
Sun Dec 05, 2004 6:56 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
Hi Hall,
you said -> The idea is to break the adder into chunks..
I know that I need to break the logic but my problem is what to do with
the feedback path, should I break it too ?
Regards, Moti. |
|
| Back to top |
|
 |
Moti
Guest
|
Posted:
Sun Dec 05, 2004 7:07 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
Hi Antti,
you worte -> http://ipcores.openchip.org/ddsx.html
NCO with max (virtual) frequency of 11
(eleven)GHz!
I couldnt find any detailed description there (only features +
deliverables description for buying it)
you worte -> For your speed you possible can optimize the adder to get
the performance
How would you suggest on doing this ?
you worte -> similarly in FPGA's with no special serdes there would be
still be some
speed gain using the NCO at lower frequency and calculatig maybe 4 or 8
bits
per clock and then using very fast shift register to shif the bits out.
that
approuch would be useable for 400M+ frequencies (within FPGA fabric
It seems to be very very interesting solution for me (higher frequency
= less jitter !! ) but I didnt realy understood how does it works so I
will appreciate it if you will provide me with more details or a with a
link to a detailed desciption..
Thanks, Moti. |
|
| Back to top |
|
 |
rickman
Guest
|
Posted:
Sun Dec 05, 2004 8:20 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
Moti Cohen wrote:
| Quote: |
Hello all,
I've a design that contains a NCO (Numerically controlled oscillator).
The NCO consists of a 32'bit accumulator. when i write the accumulator
straight forward like this -
process (clk,resetn)
begin
if resetn = '0' then
accumulator <= (others =>'0');
elsif clk'event and clk ='1' then
accumulator <= accumulator + inc_value;
end if;
end process;
Fout <= accumulator (accumulator'high);
the maximum frequency I can achive for 'clk' is ~ 150 MHz (spartan 3).
I need it to work in ~200 MHz so I figured out that some pipelining is
needed but I dont know how to do it because of the accumulator
feedback. Maybe someone here can explain it to me or even give me a
code example (which will be great).
Thanks in advance, Moti.
|
This is not elegant and it uses three times the resources, but it should
run at twice your current speed.
process (clk,resetn)
begin
if resetn = '0' then
phase <= (others =>'0');
accsingle <= (others =>'0');
accdouble <= (others =>'0');
accfast <= (others =>'0');
elsif clk'event and clk ='1' then
phase <= not phase;
if (phase = '0') then
accfast <= accsingle;
else
accfast <= accdouble;
accsingle <= accdouble + inc_value;
accdouble <= accdouble + inc_value sll 1;
end if;
end if;
end process;
Fout <= accfast (accfast'high);
I don't have a feel for how close your speed is to the theoretical
maximum, but have you tried optimizing your current design by using the
floorplanner? First, find out what your critical path is. I expect it
will be from "inc_value" to "accumulator". If so, you can place
"inc_value" adjacent to "accumulator" to improve the routing delay.
One other note, I don't know if the tools are smart enough to deal with
a low true async reset. I always make mine high true and I belive that
is the way it is spec'd for the startup block in Xilinx FPGAs. If a low
true reset works, then nevermind...
--
Rick "rickman" Collins
rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX |
|
| Back to top |
|
 |
Antti Lukats
Guest
|
Posted:
Sun Dec 05, 2004 9:59 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:41B32744.70D3A95F@yahoo.com...
| Quote: | Moti Cohen wrote:
Hello all,
I've a design that contains a NCO (Numerically controlled oscillator).
The NCO consists of a 32'bit accumulator. when i write the accumulator
straight forward like this -
process (clk,resetn)
begin
if resetn = '0' then
accumulator <= (others =>'0');
elsif clk'event and clk ='1' then
accumulator <= accumulator + inc_value;
end if;
end process;
Fout <= accumulator (accumulator'high);
|
Selected Device : 3s1500fg676-5
Number of Slices: 17 out of 13312 0%
Speed Grade: -5
Minimum period: 4.407ns (Maximum Frequency: 226.912MHz)
----------------------------------------------------------------------------
----
Constraint | Requested | Actual |
Logic
| | |
Levels
----------------------------------------------------------------------------
----
TS_clk = PERIOD TIMEGRP "clk" 5 nS HIG | 5.000ns | 4.847ns | 2
H 50.000000 % | | |
----------------------------------------------------------------------------
----
| Quote: | the maximum frequency I can achive for 'clk' is ~ 150 MHz (spartan 3).
I need it to work in ~200 MHz so I figured out that some pipelining is
needed but I dont know how to do it because of the accumulator
feedback. Maybe someone here can explain it to me or even give me a
code example (which will be great).
Thanks in advance, Moti.
This is not elegant and it uses three times the resources, but it should
run at twice your current speed.
process (clk,resetn)
begin
if resetn = '0' then
phase <= (others =>'0');
accsingle <= (others =>'0');
accdouble <= (others =>'0');
accfast <= (others =>'0');
elsif clk'event and clk ='1' then
phase <= not phase;
if (phase = '0') then
accfast <= accsingle;
else
accfast <= accdouble;
accsingle <= accdouble + inc_value;
accdouble <= accdouble + inc_value sll 1;
end if;
end if;
end process;
Fout <= accfast (accfast'high);
|
Selected Device : 3s1500fg676-5
Number of Slices: 34 out of 13312 0%
Speed Grade: -5
Minimum period: 4.632ns (Maximum Frequency: 215.889MHz)
----------------------------------------------------------------------------
----
Constraint | Requested | Actual |
Logic
| | |
Levels
----------------------------------------------------------------------------
----
TS_clk = PERIOD TIMEGRP "clk" 5 nS HIG | 5.000ns | 4.886ns | 2
H 50.000000 % | | |
----------------------------------------------------------------------------
----
Rick, hmmm... care to comment?
see synthesis and timing reports above :)
Antti |
|
| Back to top |
|
 |
Mike Treseler
Guest
|
Posted:
Sun Dec 05, 2004 10:41 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
Moti Cohen wrote:
| Quote: | elsif clk'event and clk ='1' then
accumulator <= accumulator + inc_value;
end if;
end process;
Fout <= accumulator (accumulator'high);
the maximum frequency I can achive for 'clk' is ~ 150 MHz (spartan 3).
I need it to work in ~200 MHz so I figured out that some pipelining is
needed but I dont know how to do it because of the accumulator
feedback.
|
Hmmm...
If inc_value'length < accumulator'length, maybe you
could do a slice addition of the lower bits with the
result msbit piped to enable an increment of the upper bits.
-- Mike Treseler |
|
| Back to top |
|
 |
Moti
Guest
|
Posted:
Sun Dec 05, 2004 11:20 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
Hi Rickman,
First of all, thanks for the code example It's always nice and clearer
to get one of this.
there is only one thing bothering me in your code - the "accsingle"
register is sampled on each rising edge of clock and therefore
does not improves the setup time (and therefore the frequency & clk
rate) i suppose that it should be sampled on every 2'nd clock. So maybe
your code contains a typo but the idea is "almost" clear and i'ts a
very clever one.
I presented this subject (my problem) to our algorithm's guy and he
figured out a very nice way of breaking the logic into to or more
levels (4, 8..) , but he is still working on it I will write the code
here when he will finish it..
Thanks Moti. |
|
| Back to top |
|
 |
rickman
Guest
|
Posted:
Sun Dec 05, 2004 11:23 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
Antti Lukats wrote:
| Quote: |
"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:41B32744.70D3A95F@yahoo.com...
Moti Cohen wrote:
Hello all,
I've a design that contains a NCO (Numerically controlled oscillator).
The NCO consists of a 32'bit accumulator. when i write the accumulator
straight forward like this -
process (clk,resetn)
begin
if resetn = '0' then
accumulator <= (others =>'0');
elsif clk'event and clk ='1' then
accumulator <= accumulator + inc_value;
end if;
end process;
Fout <= accumulator (accumulator'high);
Selected Device : 3s1500fg676-5
Number of Slices: 17 out of 13312 0%
Speed Grade: -5
Minimum period: 4.407ns (Maximum Frequency: 226.912MHz)
----------------------------------------------------------------------------
----
Constraint | Requested | Actual |
Logic
| | |
Levels
----------------------------------------------------------------------------
----
TS_clk = PERIOD TIMEGRP "clk" 5 nS HIG | 5.000ns | 4.847ns | 2
H 50.000000 % | | |
----------------------------------------------------------------------------
----
the maximum frequency I can achive for 'clk' is ~ 150 MHz (spartan 3).
I need it to work in ~200 MHz so I figured out that some pipelining is
needed but I dont know how to do it because of the accumulator
feedback. Maybe someone here can explain it to me or even give me a
code example (which will be great).
Thanks in advance, Moti.
This is not elegant and it uses three times the resources, but it should
run at twice your current speed.
process (clk,resetn)
begin
if resetn = '0' then
phase <= (others =>'0');
accsingle <= (others =>'0');
accdouble <= (others =>'0');
accfast <= (others =>'0');
elsif clk'event and clk ='1' then
phase <= not phase;
if (phase = '0') then
accfast <= accsingle;
else
accfast <= accdouble;
accsingle <= accdouble + inc_value;
accdouble <= accdouble + inc_value sll 1;
end if;
end if;
end process;
Fout <= accfast (accfast'high);
Selected Device : 3s1500fg676-5
Number of Slices: 34 out of 13312 0%
Speed Grade: -5
Minimum period: 4.632ns (Maximum Frequency: 215.889MHz)
----------------------------------------------------------------------------
----
Constraint | Requested | Actual |
Logic
| | |
Levels
----------------------------------------------------------------------------
----
TS_clk = PERIOD TIMEGRP "clk" 5 nS HIG | 5.000ns | 4.886ns | 2
H 50.000000 % | | |
----------------------------------------------------------------------------
----
Rick, hmmm... care to comment?
see synthesis and timing reports above :)
|
This shows that my approach will run twice as fast. It produces two
results rather than one and so can be constrained to require two clock
periods. You need to set your timing constraints to reflect that. The
only paths that don't run at the half clock rate are the output mux
running into accfast and the phase control signal. Set the path delay
on the accsingle and accdouble paths to be *two* clock periods (except
for the enable from phase).
But your timing numbers show both designs running at over 200 MHz which
is the OPs requirement, IIRC. Did you have to do any floorplanning?
Also, are these numbers post ROUTE or the output from synthesis? Timing
results from synthesis are worthless. I would like to see the details
on the critical path in each case.
The logic for my code should be a minimum of 97 LUTs. Your result is
only 34 slices which is a maximum of 68 LUTs. I suspect there is some
problem so that the code does not synthesize correctly (possibly in the
code).
I have not looked at the CLB details of the newer Xilinx FPGAs. An
adder still requires 1 LUT per bit, right? inc_value is a signal and
not a constant, right?
--
Rick "rickman" Collins
rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX |
|
| Back to top |
|
 |
Moti
Guest
|
Posted:
Sun Dec 05, 2004 11:24 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
Hi Mike,
Yes I know that, but my design inc_value'length is almost as the
accumulator'length ( maybee I will be able to decrese two bits..)
so it won't give me much more slack..
Thanks. Moti. |
|
| Back to top |
|
 |
rickman
Guest
|
Posted:
Sun Dec 05, 2004 11:41 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
Moti wrote:
| Quote: |
Hi Rickman,
First of all, thanks for the code example It's always nice and clearer
to get one of this.
there is only one thing bothering me in your code - the "accsingle"
register is sampled on each rising edge of clock and therefore
does not improves the setup time (and therefore the frequency & clk
rate) i suppose that it should be sampled on every 2'nd clock. So maybe
your code contains a typo but the idea is "almost" clear and i'ts a
very clever one.
|
Yes, both accsingle and accdouble are sampled on the rising edge of the
clock, but only when phase is high and so only *every other* clock. I
guess I figured that would be obvious. The addfast signal captures the
output of a mux on *every* clock so that it still has to run at full
speed. But this path has no carry, so it should be faster than your
previous result.
In any regard, you can likely improve your results by floorplanning so
that the registers involved are in ajacent (or even the same) CLBs to
optimize routing. I see no reason that your original design would not
run at 200 MHz.
| Quote: | I presented this subject (my problem) to our algorithm's guy and he
figured out a very nice way of breaking the logic into to or more
levels (4, 8..) , but he is still working on it I will write the code
here when he will finish it..
|
You will find that approach reduces the length of the carry path. But
the basic minimum path is from one register output through the LUT and
into a second register. This will be the ultimate limit for any adder
design if you reduce the carry delay to a single LUT. To reach the full
speed capability you likely will need to floorplan to get the optimally
fast routing which will be between registers in the same CLB. At that
point your carry delay may not matter with your requirement of 5 nS.
Typically the carry delay is < 0.1 ns/bit or < 3.2 ns for the 32 bit
adder.
I guess all those words are trying to say that you can only do so much
with pipelining an adder. Pipelining will break up the carry delay, the
finer you break it up, the closer to get to the reg -> LUT -> reg delay,
not zero delay. My dual parallel approach gets you directly to the
minimum delay if that is what's needed. But try floorplanning before
you do any more work with the algorithm. That should be sufficient at
32 bits.
Also, you did place and route it, right? The timing results from
synthesis are not very accurate since they "estimate" routing times.
--
Rick "rickman" Collins
rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX |
|
| Back to top |
|
 |
Antti Lukats
Guest
|
Posted:
Sun Dec 05, 2004 11:43 pm Post subject:
Re: how to speed up my accumulator ?? |
|
|
[lots of snipped]
| Quote: | Rick, hmmm... care to comment?
see synthesis and timing reports above :)
This shows that my approach will run twice as fast. It produces two
results rather than one and so can be constrained to require two clock
periods. You need to set your timing constraints to reflect that. The
only paths that don't run at the half clock rate are the output mux
running into accfast and the phase control signal. Set the path delay
on the accsingle and accdouble paths to be *two* clock periods (except
for the enable from phase).
|
:) ok, well your code "AS IS" did not synthesise so I tried mind guess an
fix to get it synthesize, posible making an error in the guess work.
YES, calculating 2 bits per clock is a solution, this is also what I
suggested in one of my earlier posts
I presented the synthesis (and timing) of the code "as you wrote" it (after
fix) I dont see the output mux in your code, and I did not add it either
generically I agree similar approuch (if code is correct) runs about twice
the speed
| Quote: | But your timing numbers show both designs running at over 200 MHz which
is the OPs requirement, IIRC. Did you have to do any floorplanning?
Also, are these numbers post ROUTE or the output from synthesis? Timing
results from synthesis are worthless. I would like to see the details
on the critical path in each case.
|
I posted both synthesis estimate and post place and route timings, in any
case both approuch are 210MHz +
No floorplanning, just set clock constraint to 5ns nothing more
| Quote: | The logic for my code should be a minimum of 97 LUTs. Your result is
only 34 slices which is a maximum of 68 LUTs. I suspect there is some
problem so that the code does not synthesize correctly (possibly in the
code).
|
yes, possible i corrected your code incorrectly :(
| Quote: | I have not looked at the CLB details of the newer Xilinx FPGAs. An
adder still requires 1 LUT per bit, right? inc_value is a signal and
not a constant, right?
|
I used all signal 32 bit wide, inc_value as input port
| Quote: | --
Rick "rickman" Collins
rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX |
|
|
| Back to top |
|
 |
Antti Lukats
Guest
|
Posted:
Mon Dec 06, 2004 12:51 am Post subject:
Re: how to speed up my accumulator ?? |
|
|
"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:41B35233.F5D96004@yahoo.com...
| Quote: | Antti Lukats wrote:
"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:41B32744.70D3A95F@yahoo.com...
Moti Cohen wrote:
Hello all,
I've a design that contains a NCO (Numerically controlled
oscillator).
The NCO consists of a 32'bit accumulator. when i write the
accumulator
straight forward like this -
[snip]
The logic for my code should be a minimum of 97 LUTs. Your result is
only 34 slices which is a maximum of 68 LUTs. I suspect there is some
problem so that the code does not synthesize correctly (possibly in the
code).
I have not looked at the CLB details of the newer Xilinx FPGAs. An
adder still requires 1 LUT per bit, right? inc_value is a signal and
not a constant, right?
Rick "rickman" Collins
|
hm... out of curiosity I did check DDSX ipcore in 2X mode (that is
calculating 2 bits per clock), the following stats are for
- 32 bit wide accumulator
- 32 bit variable phase increment value
Synthesis:
Selected Device : 3s1500fg320-5
Number of Slices: 33 out of 13312 0%
Minimum period: 4.577ns (Maximum Frequency: 218.508MHz)
Post P&R Timing:
Timing constraint: TS_clk = PERIOD TIMEGRP "clk" 5 nS HIGH 50.000000 % ;
497 items analyzed, 0 timing errors detected. (0 setup errors, 0 hold
errors)
Minimum period is 4.657ns.
----------------------------------------------------------------------------
----
All constraints were met.
Design statistics:
Minimum period: 4.657ns (Maximum frequency: 214.731MHz)
So DDSX ipcore can calculate 2 bits per clock (to be muxed or serialized) at
max frequency 214MHz using 33 Slices!
Ok, lets add one more slice for the mux or shifter that comes to 34 slices
:)
DDSX ipcore (in 2x mode) runs completly at 0.5 x DDS frequency!
So if the FPGA fabric can run a 2 bit shifter at 400MHz then the DDS would
run at virtual 400MHz
Real 400MHz is only used in one slice doing the shift or not at all when the
DDR iocell uses 2 phases of the clock.
Antti
PS just did run timing check on the 10GHz version of DDSX no problems either
:)
Sure 10GHz only with V4FX or V2ProX (using GT10 as serializer) |
|
| Back to top |
|
 |
Moti
Guest
|
Posted:
Mon Dec 06, 2004 2:05 am Post subject:
Re: how to speed up my accumulator ?? |
|
|
Hi Rickman,
I wrote ->
| Quote: | there is only one thing bothering me in your code - the "accsingle"
register is sampled on each rising edge of clock and therefore
does not improves the setup time (and therefore the frequency & clk
rate) i suppose that it should be sampled on every 2'nd clock
|
You wrote -> Yes, both accsingle and accdouble are sampled on the
rising edge of the
clock, but only when phase is high and so only
*every other* clock
That's what I ment :
as to my understanding accdouble is indeed being sampled every other
clock but,
accsingle is samped on every clock as follows :
when phase = '1' accsingle is being updated :
accsingle <= accdouble + inc_value
when phase = '0' accsingle is getting sampled :
accfast <= accsingle
so it seems to me that it is getting sampled one clock edge after it is
being changed (via the large logic block) , am I wrong or missing
something ??.. |
|
| Back to top |
|
 |
|
|
|
|