Re: Accuracy of Statistical Functions
- From: Leonard Mada <discoleo gmx net>
- To: gnumeric-list gnome org
- Subject: Re: Accuracy of Statistical Functions
- Date: Sat, 11 Nov 2006 12:08:38 +0200
Hi Morten,
indeed, calculating the sum is not an easy task on some data sets. :-)
Morten Welinder wrote:
Sorting works great if and if only all the numbers have the same sign.
If not, I do not currently know what is best. In fact, all the simple
algorithms we have thought up have been immediately met with a counter
example.
Notice, though, that in your example, the best order is the more or less
*reverse* sorted, i.e., start with the absolute biggest. Then the
largest
numbers cancel out nicely.
Well, the sorting algorithm works great for many situations (however as
I pointed in the OOo thread, you must sort the absolute values, i.e.
|Xi| to work for mixed data sets, too).
This method is more accurate in *most instances* than the usual
addition, though, as you pointed out, in gnumeric my particular Test
Case was more accurate using a different method. That was indeed a
particular case (and by the way, it challenged OOo, and for that
application, sorting was the best) and any algorithm implemented must
work in *the general case*, not just for a partucular case.
Because more robust methods tend to be slow, would it be feasible to
have a complementary set of functions, e.g.:
- SUM() and SUMA(), a very accurate version of SUM(), and the like?
- in general such algorithms may be more suitable/ useful for other
(statistical) functions (not just the sum)
IF that is the case, I could imagine some more accurate (but
painstakingly slow) algorithms, like:
- sort order ascending |Xi|, i = 1..N
- drop values of 0 => |Xi|, i = j..N, where j >= 1, depending on how
many 0's were present
- calculate round(log10(|Xi|)): values will be between -k..+m
- iterate through -k..+m
-- Yk = add first all Xi with the most significant digit in position (-k)
-- IF (|Yk| < |first (-(k-1)) Number| ) {
-- truncate the (-(k-1)) digit from Xi with (-(k-1)) most significant
digit
[though I do NOT know what the accuracy penalty would be for such
an operation]
[however, adding directly Yk to (-(k-1)) numbers may loose one
digit from Yk]
-- Yk-1 = add Yk + SUM of all less significant (k) decimal digits of
the (-(k-1)) numbers
-- Yk-1 += all (k-1) decimal digits
-- }
-- else { Yk-1 = Yk + SUM(-(k-1) Numbers ) } // COULD SORT THESE
NUMBERS AGAIN,
where Yk is added to the numbers list
-- repeat
Of course, there are some bottleneck cases even for this algorithm, BUT
in general it will work better.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]