| Author |
Message |
Nick Maclaren
Guest
|
Posted:
Mon Jan 03, 2005 5:07 pm Post subject:
Re: Will multicore CPUs have identical cores? |
|
|
In article <dCYBd.1211468$Gx4.519065@bgtnsc04-news.ops.worldnet.att.net>,
Stephen Fuld <s.fuld@PleaseRemove.att.net> wrote:
| Quote: |
I am not trying to be difficult - just trying to understand the need and
perhaps explore solutions for that need. So what am I missing?
|
The fact that it is far trickier than might appear to deduce what the
problems, constraints, issues and solutions are without considerable
and detailed experience. I can remember missing a lot myself - I had
THOUGHT that I had thought things through, but I had missed a lot.
This is why I am being evasive about exactly what primitives are
needed. Firstly, there are MANY different combinations that will be
equally good. Secondly, I would have to code a suitable set of
functions (in draft) in order to ensure that I had a reasonably
complete combination.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Bernd Paysan
Guest
|
Posted:
Mon Jan 03, 2005 9:38 pm Post subject:
Re: Will multicore CPUs have identical cores? |
|
|
Terje Mathisen wrote:
| Quote: | Bernd Paysan wrote:
[snipped discussion which seemed to agree that you could do the
reduction in a simplified manner?]
|
Yes, I clearly agree that the reduction can be done efficient and in a
simplified manner.
| Quote: | Also, there are situations where you know that some huge value is exact,
not just a fp approximation, in which case exact range reduction does
make sense.
|
A huge, but exact number? That would be a bignum integer that happens to fit
(by accident) into a FP representation. I doubt that there are many
interesting bignum integers that do fit into a FP representation. Big
primes are already excluded. The hugest likely fitting numbers are probably
of the sort n!, because you have a lot of factor twos in a big n!.
| Quote: | OTOH, always returning 0 (or 1.0 or NaN?) for sin(1e300) would at least
give the user/programmer a hint that something was suspicious. :-(
|
Even when sin(big number) returns a constant, sinē+cosē should be 1. With
sin(big number)=0 and cos(big number)=1, you can satisfy a number of other
equations, too (you really define that all those big numbers are integer
multiplies of 2pi, which makes live much easier ;-).
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/ |
|
| Back to top |
|
 |
Christian Bau
Guest
|
Posted:
Mon Jan 03, 2005 11:33 pm Post subject:
Re: Will multicore CPUs have identical cores? |
|
|
In article <crbap0$c0b$1@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.mathisen@hda.hydro.com> wrote:
| Quote: | Bernd Paysan wrote:
[snipped discussion which seemed to agree that you could do the
reduction in a simplified manner?]
I don't care about exact reduction, since for me, a floating point number is
an interval (+- 1/2ULP), so sin(1e300) is correct when it delivers anything
between 1 and -1 - an interval arithmetic sin(1e300) should deliver [-1;1].
I used to be firmly in that camp, but the programmer in me would still
like to do the 'right thing' even when faced with obviously out-of-range
inputs. :-)
Also, there are situations where you know that some huge value is exact,
not just a fp approximation, in which case exact range reduction does
make sense.
|
To me, a floating point number is an (unknown) real number which is
hopefully close to the floating point number, plus an (unknown) error;
the sum of real number and error is exactly equal to the floating point
number. A good implementation of the sine function will not add more to
the error than necessary (that is of course if I am concerned with
precision, not with speed). If x is moderately large, say x around 10^5,
then returning sin (x + ulp/2) instead of sin (x) could easily double
the error. But if x is large, say x around 10^300, then returning any
result in [-1, +1] will not increase the error that was already present
in x, so that is fine by me. |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Mon Jan 03, 2005 11:48 pm Post subject:
Re: Will multicore CPUs have identical cores? |
|
|
Nick Maclaren wrote:
| Quote: | In article <cr9guk$a4e$1@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.mathisen@hda.hydro.com> wrote:
BTDT, several times. <BG
I know :-)
Today when hiking around in Nordmarka (the forests covering the hills
north of Oslo, I suddenly realized how I could do exact but still
efficient range reduction on really big values input to trig functions,
i.e. something like sin(1e300). :-)
Please check that this is correct:
Been there - done that. Yes, it works. No, it isn't very efficient,
but isn't at all bad. As far as I know, it is the only viable way to
do fully accurate range reduction - i.e. all of the others are far
slower.
|
OK, if this is the best known way to solve this particular problem, then
that's good enough for me. :-)
Regarding the cost of doing it:
Split the input into two parts, while extracting the exponent:
~10 cycles
Multiplying each part by 3-4 table entries (selected via table lookup on
the top exponent bits): Also around 10 cycles.
Take the two top products (which should be the only ones with an integer
part), and get rid of that, keeping the fraction. This can be done in
parallel with the previous multiplications.
Extract the top 24 bits of the sum by copying to a float, then subtract
this from the original value. Add all the lower-precision terms to this
After merging, I'd like to end up with two terms, a high (single prec)
part, and a double prec value containing the remainder, for a total
effective precision of about 23+53 = 86 bits. This is more than enough
to generate a final result well below 1 ulp, and the range reduction
step would take less than 40 cycles.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Jan 04, 2005 2:17 pm Post subject:
Re: Will multicore CPUs have identical cores? |
|
|
In article <crc42g$rr8$1@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.mathisen@hda.hydro.com> wrote:
| Quote: |
OK, if this is the best known way to solve this particular problem, then
that's good enough for me. :-)
|
It's the best way that I know of - I doubt there is a better, but I
can't be sure.
| Quote: | Regarding the cost of doing it:
Split the input into two parts, while extracting the exponent:
~10 cycles
Multiplying each part by 3-4 table entries (selected via table lookup on
the top exponent bits): Also around 10 cycles.
Take the two top products (which should be the only ones with an integer
part), and get rid of that, keeping the fraction. This can be done in
parallel with the previous multiplications.
Extract the top 24 bits of the sum by copying to a float, then subtract
this from the original value. Add all the lower-precision terms to this
After merging, I'd like to end up with two terms, a high (single prec)
part, and a double prec value containing the remainder, for a total
effective precision of about 23+53 = 86 bits. This is more than enough
to generate a final result well below 1 ulp, and the range reduction
step would take less than 40 cycles.
|
Yes. What I don't know is whether that is enough to get the last bit
right (assuming perfgectly accurate input). There could well be a
significant cost detecting and handling the very borderline cases.
What I am certain of is that it is a much better way than reducing
modulo 2 pi (or whatever). It ends up with a function where the
argument is expressed in grads, but that doesn't matter.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Tue Jan 04, 2005 6:42 pm Post subject:
Re: Will multicore CPUs have identical cores? |
|
|
Nick Maclaren wrote:
| Quote: | Terje Mathisen <terje.mathisen@hda.hydro.com> wrote:
After merging, I'd like to end up with two terms, a high (single prec)
part, and a double prec value containing the remainder, for a total
effective precision of about 23+53 = 86 bits. This is more than enough
to generate a final result well below 1 ulp, and the range reduction
step would take less than 40 cycles.
Yes. What I don't know is whether that is enough to get the last bit
right (assuming perfgectly accurate input). There could well be a
significant cost detecting and handling the very borderline cases.
|
It doesn't seem to be too bad:
For those input values that doesn't require any further range reduction,
it will suffice to calculate just the first term with extended
precision, and add that as the final step to the aggregate of the rest
of the taylor/cheby series.
For larger input values the reverse range adjustment at the end will
dominate, so doing this with extended precision should result in the
same ~0.5 ulp precision.
| Quote: | What I am certain of is that it is a much better way than reducing
modulo 2 pi (or whatever). It ends up with a function where the
argument is expressed in grads, but that doesn't matter.
|
Right. :-)
The only complication from this is that a normal taylor series for sin()
has integer factors, while our fractional circle inputs would require
these to be scaled by the corresponding power of pi, which in the case
of the primary term means that you need a pair of factors to achieve
extended precision.
Using Cheby series instead takes away this advantage, since this
requires non-integer factors for all the power terms.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Tue Jan 04, 2005 6:59 pm Post subject:
Re: Will multicore CPUs have identical cores? |
|
|
In article <cre6fq$232$1@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.mathisen@hda.hydro.com> writes:
|> >
|> > Yes. What I don't know is whether that is enough to get the last bit
|> > right (assuming perfgectly accurate input). There could well be a
|> > significant cost detecting and handling the very borderline cases.
|>
|> It doesn't seem to be too bad:
|>
|> For those input values that doesn't require any further range reduction,
|> it will suffice to calculate just the first term with extended
|> precision, and add that as the final step to the aggregate of the rest
|> of the taylor/cheby series.
|>
|> For larger input values the reverse range adjustment at the end will
|> dominate, so doing this with extended precision should result in the
|> same ~0.5 ulp precision.
Er, no :-)
The problem is when a perfect range reduction produces a value that
has a result that is almost exactly 0.5 ULPs away from the adjacent
representable values. With division and square root, the existence
of such values is fairly easy to analyse using number theory, but
it is a lot trickier for the trigonometric functions. I don't know
if there are some really nasty examples for sin, but there may be.
This is one reason why achieving 0.505 ULPs is vastly easier than
achieving 0.5 ULPs.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Terje Mathisen
Guest
|
Posted:
Tue Jan 04, 2005 7:54 pm Post subject:
Re: Will multicore CPUs have identical cores? |
|
|
Nick Maclaren wrote:
| Quote: | In article <cre6fq$232$1@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.mathisen@hda.hydro.com> writes:
|
|> For larger input values the reverse range adjustment at the end will
|> dominate, so doing this with extended precision should result in the
|> same ~0.5 ulp precision.
|
(Notice the ~ above, meaning approximately!)
| Quote: |
Er, no :-)
The problem is when a perfect range reduction produces a value that
has a result that is almost exactly 0.5 ULPs away from the adjacent
representable values. With division and square root, the existence
of such values is fairly easy to analyse using number theory, but
it is a lot trickier for the trigonometric functions. I don't know
if there are some really nasty examples for sin, but there may be.
This is one reason why achieving 0.505 ULPs is vastly easier than
achieving 0.5 ULPs.
|
Sure, I'm not even trying to get 0.5 ulp, just ~0.5, i.e. with a few
bits precision. As you said, getting 0.505 (or just 0.6) is a reasonable
goal, while 0.5 exactly could require arbitrary precision operations all
the way.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching" |
|
| Back to top |
|
 |
|
|
|
|