| Author |
Message |
fwj_733
Guest
|
Posted:
Wed Dec 15, 2004 4:40 am Post subject:
algorithm: square operation |
|
|
| There is a lot of square operations in my FPGA projects (using Xilinx VitexE). Now, I complete them by multiple, but this method is slice consuming and slow. As we know, square operation has many characters, which multiple operation of any random numberic don' have, then could we utilize these features to calculate square operations more economical and more fast? Thanks for all advice. Best Regards |
|
| Back to top |
|
 |
Ben Jackson
Guest
|
Posted:
Wed Dec 15, 2004 4:40 am Post subject:
Re: algorithm: square operation |
|
|
In article <ee8aa81.-1@webx.sUN8CHnE>, fwj_733 <fwj@nmrs.ac.cn> wrote:
| Quote: | There is a lot of square operations in my FPGA projects (using Xilinx
VitexE).
|
There may be shortcuts if you need a series of squares, for example.
You can compute that as
sq = 0;
for (i = 1; i < n; ++i) {
sq = sq + (i << 1) - 1;
// i * i == sq
}
--
Ben Jackson
<ben@ben.com>
http://www.ben.com/ |
|
| Back to top |
|
 |
Ray Andraka
Guest
|
Posted:
Wed Dec 15, 2004 7:26 pm Post subject:
Re: algorithm: square operation |
|
|
fwj_733 wrote:
| Quote: | There is a lot of square operations in my FPGA projects (using Xilinx VitexE). Now, I complete them by multiple, but this method is slice consuming and slow. As we know, square operation has many characters, which multiple operation of any random numberic don' have, then could we utilize these features to calculate square operations more economical and more fast? Thanks for all advice. Best Regards
You really need to give us more information. Are the squares sequential |
(eg squares of an arithmetic progression of values)? How many clock
cycles per square are available? How many bits? Do you have BRAM
available? Do you have area restrictions? etc. As with many other
problems, there are many ways to approach this.
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com
"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759 |
|
| Back to top |
|
 |
fwj_733
Guest
|
Posted:
Wed Dec 15, 2004 8:05 pm Post subject:
Re: algorithm: square operation |
|
|
| Sorry for not giving sufficient info. The squares works parallel, that is, one clock cycle per square. Bits width of Operand are about 10~20. VirtexE do have BRAM, I use Virtex600E, and the remaining maximum capacity for square operation is 128 Kb. I hope square of a 20bit number can be completed by less than 30 VirtexE slices, and work above 70MHz. I still wonder, for a multiplier written as: use IEEE.std_logic_arith; ... C<=A*B; ... Does ISE synthesis this multiplier most efficient in area sense? If not, how can I complete it more efficient? |
|
| Back to top |
|
 |
Arash Salarian
Guest
|
Posted:
Wed Dec 15, 2004 8:59 pm Post subject:
Re: algorithm: square operation |
|
|
"fwj_733" <fwj@nmrs.ac.cn> wrote in message news:ee8aa81.2@webx.sUN8CHnE...
| Quote: | Sorry for not giving sufficient info. The squares works parallel, that is,
one clock cycle per square. Bits width of Operand are about 10~20. VirtexE
do have BRAM, I use Virtex600E, and the remaining maximum capacity for
square operation is 128 Kb. I hope square of a 20bit number can be
completed by less than 30 VirtexE slices, and work above 70MHz. I still
wonder, for a multiplier written as: use IEEE.std_logic_arith; ... C<=A*B;
... Does ISE synthesis this multiplier most efficient in area sense? If
not, how can I complete it more efficient?
|
Generally modern synthesize tools generate a very high quality result for
basic operatinos like add, multiply. Both Xilinx and Altera have already
highly optimized macros for these operations (remember LPMs?) and usually
they are hand optimized and very efficient. The synthesize tool selects from
a library of these pre-made macros for these operations. Also the output
generated for C <= A*B is different from C <= A*A and the tool selects a
different macro which can be potentially more optimized for this case. So
unless you have a very special need you can just type the operations in high
level and let the synthesize tool take care of it.
Regards
Arash Salarian |
|
| Back to top |
|
 |
fwj_733
Guest
|
Posted:
Thu Dec 16, 2004 4:43 am Post subject:
Re: algorithm: square operation |
|
|
| Thank you for your reply. I made a test, two instance, one calculate A*B, the other calculate A*A, oprand is 20 bit. I found they are almost the same result. After map, both occupy 215 slices (VirtexE). And only small difference in the time delay. ISE translate the square operation to multiplier. |
|
| Back to top |
|
 |
Jeff Cunningham
Guest
|
Posted:
Thu Dec 16, 2004 7:04 pm Post subject:
Re: algorithm: square operation |
|
|
I think there was a post a year or two ago about doing an efficient
square operation by splitting the number in half such that a N bit
square could be done with some N bit lookups and some additions instead
of a 2N table lookup. Such that a 10 bit square only required a 10 bit
lookup table instead of 20 bit. I think it went something like this:
Take a 10 bit number to be squared and represent it as a+32b where a and
b are 5 bit numbers. (a+32b)(a+32b) = a^2 + 64ab+1024b^2. Now you have
to do 3 of the smaller table lookup mpys and some shifting and adding to
get the result. |
|
| Back to top |
|
 |
|
|
|
|