| Author |
Message |
Terje Mathisen
Guest
|
Posted:
Thu Sep 29, 2005 8:15 am Post subject:
Re: Implementation of pop count instruction |
|
|
Oliver S. wrote:
| Quote: | Can anyone tell me for what this popcnt-instruction should be good
for? I'm able to imagine some useful cases for such an instruction
but I can't imagine where such an instruction would give a perfor-
mance-boost of a complete algorithm.
|
Detecting non-random (i.e. skewed) output from a trial decrypt operation?
'No Such Agency' is rumoured to be the main reason for popcnt on most
architectures where it has been included.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Eric Gouriou
Guest
|
Posted:
Thu Sep 29, 2005 8:15 am Post subject:
Re: Implementation of pop count instruction |
|
|
Oliver S. wrote:
| Quote: | Can anyone tell me for what this popcnt-instruction should be good
for? I'm able to imagine some useful cases for such an instruction
but I can't imagine where such an instruction would give a perfor-
mance-boost of a complete algorithm.
|
popcnt(x) == 1 tests whether an unsigned x is a power of two
=> having a fast popcnt allows fast paths for common computations
(e.g. divisions)
popcnt((x-1) & ~x) counts the number of trailing zeros in
an unsigned x
=> very fast way to find the first zero/one bit in a bitmask.
This can be quite useful with some data structures
(e.g. priority queues where occupancy is reflected in a bit
vector)
Those are just two simple tricks. A google search will provide
you more examples. A popular claim is that this instruction is
used by a three-letter government agency.
For example:
<URL:http://archives.seul.org/f-cpu/f-cpu/Dec-2003/msg00004.html>
<URL:http://groups.google.com/group/sci.crypt/browse_frm/thread/517122ec6e961ee5?q=popcount+NSA+instruction&hl=en&>
I believe I read an explanation of that claim a while back, but
Google doesn't find it quickly.
Eric |
|
| Back to top |
|
 |
Drazen Kacar
Guest
|
Posted:
Thu Sep 29, 2005 11:32 am Post subject:
Re: Implementation of pop count instruction |
|
|
Eric Gouriou wrote:
| Quote: | popcnt(x) == 1 tests whether an unsigned x is a power of two
=> having a fast popcnt allows fast paths for common computations
(e.g. divisions)
|
Well, if x is non-negative and you're using 2nd complement arithmetic,
then the test can be (x & (x - 1)) == 0
--
.-. .-. Yes, I am an agent of Satan, but my duties are largely
(_ \ / _) ceremonial.
|
| dave@fly.srk.fer.hr |
|
| Back to top |
|
 |
Rob Warnock
Guest
|
Posted:
Thu Sep 29, 2005 3:05 pm Post subject:
Re: Implementation of pop count instruction |
|
|
Christian Bau <christian.bau@cbau.freeserve.co.uk> wrote:
+---------------
| "Stephen Fuld" <s.fuld@PleaseRemove.att.net> wrote:
| > So, for real CPUs that have this instruction, how is it implemented?
|
| Since it isn't used very much, I would try to adapt existing hardware.
| Assume you have a 64 x 64 bit integer multiplier, with two instructions
| giving the most and the least significant 64 bits of the result.
|
| Such a multiplier adds in bit position k the values (xi AND y(k-i)) for
| 0 <= i <= k using more or less brute force, for 0 <= k <= 126. Change
| the inputs so that in bit position 64 the inputs are xi for 0 <= i <= 63
| and in all other bit positions the inputs are 0. Let the existing
| multiplier hardware do the rest.
+---------------
There are some interesting variations on this theme in the ancient
HAKMEM, e.g., in ITEM 167 (Gosper, Freiberg) there is this:
;if A is a 9 bit quantity, B gets number of 1's (Schroeppel)
IMUL A,[1001001001] ;4 copies
AND A,[42104210421] ;every 4th bit
IDIVI A,17 ;casting out 15.'s in hexadecimal
and ITEM 169 (in order of one-ups-manship: Gosper, Mann, Lenard,
[Root and Mann]):
To count the ones in a PDP-6/10 word:
LDB B,[014300,,A] ;or MOVE B,A then LSH B,-1
AND B,[333333,,333333]
SUB A,B
LSH B,-1
AND B,[333333,,333333]
SUBB A,B ;each octal digit is replaced by number of 1's in it
LSH B,-3
ADD A,B
AND A,[070707,,070707]
IDIVI A,77 ;casting out 63.'s
These ten instructions, with constants extended, would work on
word lengths up to 62.; eleven suffice up to 254..
Look in the news archives for "sci.crypt" circa Jan. 2002 for the
topic "Bit counting" for another long thread about variations on
the "comb addition" method.
-Rob
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607 |
|
| Back to top |
|
 |
Rob Warnock
Guest
|
Posted:
Thu Sep 29, 2005 3:12 pm Post subject:
Re: Implementation of pop count instruction |
|
|
Oliver S. <Follow.Me@gmx.net> wrote:
+---------------
| Can anyone tell me for what this popcnt-instruction should be good
| for? I'm able to imagine some useful cases for such an instruction
| but I can't imagine where such an instruction would give a perfor-
| mance-boost of a complete algorithm.
+---------------
popcount(x XOR y) gives the Hamming distance between X & Y, sometimes
useful for decoding error-correcting codes and/or for certain kinds
of pattern matching and/or scoring.
-Rob
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607 |
|
| Back to top |
|
 |
Wilco Dijkstra
Guest
|
Posted:
Thu Sep 29, 2005 3:27 pm Post subject:
Re: Implementation of pop count instruction |
|
|
"Dan Koren" <dankoren@yahoo.com> wrote in message
news:433b13c5$1@news.meer.net...
| Quote: | "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> wrote in message
news:hQB_e.2706$O%.441@newsfe1-gui.ntli.net...
"Dan Koren" <dankoren@yahoo.com> wrote in message
news:433adb13$1@news.meer.net...
""Niels Jørgen Kruse"" <nospam@ab-katrinedal.dk> wrote in message
news:1h3lq2w.17779m0a6xzp1N%nospam@ab-katrinedal.dk...
Show us the timing loop.
Huh ?!?
int n = 1000000000;
hrtime_t t0, t1;
double dt;
t0 = gethrtime ();
while (n--) popc (whatever);
t1 = gethrtime ();
dt = t1 - t0;
So basically you are testing the performance of an empty loop..
No, because this is just a code skeleton, not a complete test.
|
Sure - but then why post an incorrect timing loop? The point is that
most of us suspect you're making a mistake somewhere. You were
hinting at how good your compiler was in optimizing the code away,
which is another clue that something is wrong. People new to
benchmarking often get the timing loop wrong, and indeed the above
code is incorrect.
So what does your real timing loop look like?
| Quote: | Thanks for the free tutorial.
|
You're welcome.
| Quote: | The popc routine is used in a super heavy duty, super fast and
super scalable memory allocator, and it has been extensively
profiled. You don't expect me to publish the whole thing here.
Or do you? One thing is sure -- I'd get fired if I did.
|
Of course not. What I'd like to see is the assembly code or C code of
the test that shows popc() runs in 2 cycles. You're making extraordinary
claims here, so it requires extraordinary evidence. If you said it takes 8
cycles then most people would agree that is pretty quick. But doing a
generic popcount in 2 cycles seems simply impossible (see Terje's post
for example).
Wilco |
|
| Back to top |
|
 |
Peter Dickerson
Guest
|
Posted:
Thu Sep 29, 2005 4:15 pm Post subject:
Re: Implementation of pop count instruction |
|
|
"Drazen Kacar" <dave@fly.srk.fer.hr> wrote in message
news:slrndjnk6r.ph1.dave@fly.srk.fer.hr...
| Quote: | Eric Gouriou wrote:
popcnt(x) == 1 tests whether an unsigned x is a power of two
=> having a fast popcnt allows fast paths for common computations
(e.g. divisions)
Well, if x is non-negative and you're using 2nd complement arithmetic,
then the test can be (x & (x - 1)) == 0
|
Nope. Fails for zero.
Peter |
|
| Back to top |
|
 |
David Hopwood
Guest
|
Posted:
Thu Sep 29, 2005 4:15 pm Post subject:
Re: Implementation of pop count instruction |
|
|
Eric Gouriou wrote:
| Quote: | Oliver S. wrote:
Can anyone tell me for what this popcnt-instruction should be good
for? I'm able to imagine some useful cases for such an instruction
but I can't imagine where such an instruction would give a perfor-
mance-boost of a complete algorithm.
popcnt(x) == 1 tests whether an unsigned x is a power of two
=> having a fast popcnt allows fast paths for common computations
(e.g. divisions)
|
When x > 0, ispowerof2(x) = (x & (x-1)) == 0.
--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk> |
|
| Back to top |
|
 |
Sander Vesik
Guest
|
Posted:
Sat Oct 01, 2005 11:11 am Post subject:
Re: Implementation of pop count instruction |
|
|
Dan Koren <dankoren@yahoo.com> wrote:
| Quote: |
If memory serves, the Ultra SPARC III+
can issue up to four instructions per
clock cycle -- can someone confirm or
refute this notion?
|
UltraSparc II could issue four - two integer, one load and
one branch provided they appear in the correct order (some
of the rules were a bit too restricting) and assuming integer
only code. Oh, and only one shift per cycle.
UltraSparcIII can also issue four but has much more relaxed
rules, though still only two integer pipelines (but equal
this time). Conditional moves suck slightly less. Reasonable
ordering rules.
OTOH, USparcIII went backwards in the handling of loads.
--
Sander
+++ Out of cheese error +++ |
|
| Back to top |
|
 |
|
|
|
|