Implementation of pop count instruction
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Implementation of pop count instruction
Goto page Previous  1, 2, 3, 4, 5
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Terje Mathisen
Guest





Posted: Thu Sep 29, 2005 8:15 am    Post subject: Re: Implementation of pop count instruction Reply with quote

Oliver S. wrote:

Quote:
Can anyone tell me for what this popcnt-instruction should be good
for? I'm able to imagine some useful cases for such an instruction
but I can't imagine where such an instruction would give a perfor-
mance-boost of a complete algorithm.

Detecting non-random (i.e. skewed) output from a trial decrypt operation?

'No Such Agency' is rumoured to be the main reason for popcnt on most
architectures where it has been included.

Terje

--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Back to top
Eric Gouriou
Guest





Posted: Thu Sep 29, 2005 8:15 am    Post subject: Re: Implementation of pop count instruction Reply with quote

Oliver S. wrote:
Quote:
Can anyone tell me for what this popcnt-instruction should be good
for? I'm able to imagine some useful cases for such an instruction
but I can't imagine where such an instruction would give a perfor-
mance-boost of a complete algorithm.

popcnt(x) == 1 tests whether an unsigned x is a power of two
=> having a fast popcnt allows fast paths for common computations
(e.g. divisions)

popcnt((x-1) & ~x) counts the number of trailing zeros in
an unsigned x
=> very fast way to find the first zero/one bit in a bitmask.
This can be quite useful with some data structures
(e.g. priority queues where occupancy is reflected in a bit
vector)

Those are just two simple tricks. A google search will provide
you more examples. A popular claim is that this instruction is
used by a three-letter government agency.
For example:
<URL:http://archives.seul.org/f-cpu/f-cpu/Dec-2003/msg00004.html>
<URL:http://groups.google.com/group/sci.crypt/browse_frm/thread/517122ec6e961ee5?q=popcount+NSA+instruction&hl=en&>

I believe I read an explanation of that claim a while back, but
Google doesn't find it quickly.

Eric
Back to top
Drazen Kacar
Guest





Posted: Thu Sep 29, 2005 11:32 am    Post subject: Re: Implementation of pop count instruction Reply with quote

Eric Gouriou wrote:

Quote:
popcnt(x) == 1 tests whether an unsigned x is a power of two
=> having a fast popcnt allows fast paths for common computations
(e.g. divisions)

Well, if x is non-negative and you're using 2nd complement arithmetic,
then the test can be (x & (x - 1)) == 0

--
.-. .-. Yes, I am an agent of Satan, but my duties are largely
(_ \ / _) ceremonial.
|
| dave@fly.srk.fer.hr
Back to top
Rob Warnock
Guest





Posted: Thu Sep 29, 2005 3:05 pm    Post subject: Re: Implementation of pop count instruction Reply with quote

Christian Bau <christian.bau@cbau.freeserve.co.uk> wrote:
+---------------
| "Stephen Fuld" <s.fuld@PleaseRemove.att.net> wrote:
| > So, for real CPUs that have this instruction, how is it implemented?
|
| Since it isn't used very much, I would try to adapt existing hardware.
| Assume you have a 64 x 64 bit integer multiplier, with two instructions
| giving the most and the least significant 64 bits of the result.
|
| Such a multiplier adds in bit position k the values (xi AND y(k-i)) for
| 0 <= i <= k using more or less brute force, for 0 <= k <= 126. Change
| the inputs so that in bit position 64 the inputs are xi for 0 <= i <= 63
| and in all other bit positions the inputs are 0. Let the existing
| multiplier hardware do the rest.
+---------------

There are some interesting variations on this theme in the ancient
HAKMEM, e.g., in ITEM 167 (Gosper, Freiberg) there is this:

;if A is a 9 bit quantity, B gets number of 1's (Schroeppel)
IMUL A,[1001001001] ;4 copies
AND A,[42104210421] ;every 4th bit
IDIVI A,17 ;casting out 15.'s in hexadecimal

and ITEM 169 (in order of one-ups-manship: Gosper, Mann, Lenard,
[Root and Mann]):

To count the ones in a PDP-6/10 word:

LDB B,[014300,,A] ;or MOVE B,A then LSH B,-1
AND B,[333333,,333333]
SUB A,B
LSH B,-1
AND B,[333333,,333333]
SUBB A,B ;each octal digit is replaced by number of 1's in it
LSH B,-3
ADD A,B
AND A,[070707,,070707]
IDIVI A,77 ;casting out 63.'s

These ten instructions, with constants extended, would work on
word lengths up to 62.; eleven suffice up to 254..

Look in the news archives for "sci.crypt" circa Jan. 2002 for the
topic "Bit counting" for another long thread about variations on
the "comb addition" method.


-Rob

-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
Back to top
Rob Warnock
Guest





Posted: Thu Sep 29, 2005 3:12 pm    Post subject: Re: Implementation of pop count instruction Reply with quote

Oliver S. <Follow.Me@gmx.net> wrote:
+---------------
| Can anyone tell me for what this popcnt-instruction should be good
| for? I'm able to imagine some useful cases for such an instruction
| but I can't imagine where such an instruction would give a perfor-
| mance-boost of a complete algorithm.
+---------------

popcount(x XOR y) gives the Hamming distance between X & Y, sometimes
useful for decoding error-correcting codes and/or for certain kinds
of pattern matching and/or scoring.


-Rob

-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
Back to top
Wilco Dijkstra
Guest





Posted: Thu Sep 29, 2005 3:27 pm    Post subject: Re: Implementation of pop count instruction Reply with quote

"Dan Koren" <dankoren@yahoo.com> wrote in message
news:433b13c5$1@news.meer.net...
Quote:
"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> wrote in message
news:hQB_e.2706$O%.441@newsfe1-gui.ntli.net...

"Dan Koren" <dankoren@yahoo.com> wrote in message
news:433adb13$1@news.meer.net...
""Niels Jørgen Kruse"" <nospam@ab-katrinedal.dk> wrote in message
news:1h3lq2w.17779m0a6xzp1N%nospam@ab-katrinedal.dk...

Show us the timing loop.

Huh ?!?

int n = 1000000000;
hrtime_t t0, t1;
double dt;

t0 = gethrtime ();
while (n--) popc (whatever);
t1 = gethrtime ();

dt = t1 - t0;

So basically you are testing the performance of an empty loop..


No, because this is just a code skeleton, not a complete test.

Sure - but then why post an incorrect timing loop? The point is that
most of us suspect you're making a mistake somewhere. You were
hinting at how good your compiler was in optimizing the code away,
which is another clue that something is wrong. People new to
benchmarking often get the timing loop wrong, and indeed the above
code is incorrect.

So what does your real timing loop look like?

Quote:
Thanks for the free tutorial.

You're welcome.

Quote:
The popc routine is used in a super heavy duty, super fast and
super scalable memory allocator, and it has been extensively
profiled. You don't expect me to publish the whole thing here.
Or do you? One thing is sure -- I'd get fired if I did.

Of course not. What I'd like to see is the assembly code or C code of
the test that shows popc() runs in 2 cycles. You're making extraordinary
claims here, so it requires extraordinary evidence. If you said it takes 8
cycles then most people would agree that is pretty quick. But doing a
generic popcount in 2 cycles seems simply impossible (see Terje's post
for example).

Wilco
Back to top
Peter Dickerson
Guest





Posted: Thu Sep 29, 2005 4:15 pm    Post subject: Re: Implementation of pop count instruction Reply with quote

"Drazen Kacar" <dave@fly.srk.fer.hr> wrote in message
news:slrndjnk6r.ph1.dave@fly.srk.fer.hr...
Quote:
Eric Gouriou wrote:

popcnt(x) == 1 tests whether an unsigned x is a power of two
=> having a fast popcnt allows fast paths for common computations
(e.g. divisions)

Well, if x is non-negative and you're using 2nd complement arithmetic,
then the test can be (x & (x - 1)) == 0


Nope. Fails for zero.

Peter
Back to top
David Hopwood
Guest





Posted: Thu Sep 29, 2005 4:15 pm    Post subject: Re: Implementation of pop count instruction Reply with quote

Eric Gouriou wrote:
Quote:
Oliver S. wrote:

Can anyone tell me for what this popcnt-instruction should be good
for? I'm able to imagine some useful cases for such an instruction
but I can't imagine where such an instruction would give a perfor-
mance-boost of a complete algorithm.

popcnt(x) == 1 tests whether an unsigned x is a power of two
=> having a fast popcnt allows fast paths for common computations
(e.g. divisions)

When x > 0, ispowerof2(x) = (x & (x-1)) == 0.

--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Back to top
Sander Vesik
Guest





Posted: Sat Oct 01, 2005 11:11 am    Post subject: Re: Implementation of pop count instruction Reply with quote

Dan Koren <dankoren@yahoo.com> wrote:
Quote:

If memory serves, the Ultra SPARC III+
can issue up to four instructions per
clock cycle -- can someone confirm or
refute this notion?

UltraSparc II could issue four - two integer, one load and
one branch provided they appear in the correct order (some
of the rules were a bit too restricting) and assuming integer
only code. Oh, and only one shift per cycle.

UltraSparcIII can also issue four but has much more relaxed
rules, though still only two integer pipelines (but equal
this time). Conditional moves suck slightly less. Reasonable
ordering rules.

OTOH, USparcIII went backwards in the handling of loads.

Quote:

Thx,


dk



--
Sander

+++ Out of cheese error +++
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2, 3, 4, 5
Page 5 of 5

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB