| Bytes | Lang | Time | Link |
|---|---|---|---|
| 754 | The error in a floating point number is at most half the value of the last bit of the result | 240621T165215Z | gnasher7 |
| nan | Python | 240620T221346Z | CursorCo |
| nan | Python 3 | 240621T093740Z | Neil |
| nan | Rust | 240621T022747Z | att |
The error in a floating point number is at most half the value of the last bit of the result, and if it is exactly half the last bit then the result is rounded so that the last bit becomes 0, according to the IEEE 754 floating point rules.
The first sum is exact because we add 0+x with the result x and no error. The exact sum of the first k elements is at most k*n.
So the error is maximal if we take the k-th number as n, and subtract half the value of the last bit of kn if k >= 1. If kn is a power of two, then the rounding error is half of that, because rounding to a power of two must round up, and the rounding error is half as large.
Every sum we calculate is rounded exactly to k*n. So this is mathematically quite a trivial problem.
Python, Score: 1.9099388737231493e-11 2.1827872842550278e-11
Credit to @att and @Neil who got the correct optimal solution before me, and who's code looks much nicer than mine
import struct
def float_to_bitstring(f):
s = struct.pack('>d', f)
i = struct.unpack('>q', s)[0]
return bin(i)[2:].zfill(64)
def greedy(max_n, arr):
n_bitstr = float_to_bitstring(max_n)
sum_bitstr = float_to_bitstring(sum(arr)+max_n)
n_exp = int(n_bitstr[1:12],2)
sum_exp = int(sum_bitstr[1:12],2)
dif = sum_exp - n_exp
exp = n_exp
mant = int(n_bitstr[12:], 2)
desired_end = int('1'+(dif-1)*'0', 2)
if mant >= desired_end:
exp = bin(exp)[2:].zfill(11)
mant -= desired_end
mant -= mant % (2 * desired_end)
mant += desired_end
mant = bin(mant)[2:].zfill(52)
else:
exp = bin(exp-1)[2:].zfill(11)
mant = bin(desired_end)[2:].rjust(52, '1')
return struct.unpack('>d', struct.pack('>q', int(exp + mant, 2)))[0]
def func(n):
arr = [n]
for i in range(9):
arr.append(greedy(n, arr))
return arr
Defines a function func which takes input n and outputs the desired list of floats. Note that floats are natively 64 bit in python.
Output for n=5000 is [5000, 4999.999999999999, 4999.999999999999, 4999.999999999998, 4999.999999999998, 4999.999999999998, 4999.999999999996, 4999.999999999996, 4999.999999999996, 4999.999999999996]
Explanation
Floating point addition accumulates error through rounding. The most this rounding can ever be off by is one power of 2 less than the least significant bit of the resulting sum. It's not too complicated to construct a number which will accumulate that maximum error, in fact there are many such numbers, so specifically we want to construct the largest such number. This is because as our sum gets larger we can round off larger numbers. In this way this is a sort of greedy algorithm.
Additional notes:
- I don't think adding a negative number is ever optimal in this challenge since we always want our error to accumulate in the same direction, and a bigger overall number leads to bigger errors.
- Please pay no mind to this very dirty python code.
- Edit: In my haste I neglected to account for the fact that the sum gets bigger when you add to it. A simple fix which brings my answer into a tie with the others.
Python 3, score 2.1827872842550278e-11
from fractions import Fraction
def error(i):
return float(Fraction(sum(i)) - sum(map(Fraction, i)))
def inaccurate(n):
e = .5 ** 55
while n - e == n: e *= 2
i = [n]
for _ in range(9):
j = i + [n - e * 2]
i = i + [n - e]
if error(j) > error(i): e *= 2; i = j
return i
Rust, score \$\frac{24}{2^{40}}\$≈2.1827872842550278e-11
fn f(n: u32) -> [f64; 10] {
let a = n as f64;
let b = f64::to_bits(a);
let mut s: f64 = 0.;
std::array::from_fn(|_| {
let v = (0..).map(|i|f64::from_bits(b-i)).take_while(|j|s+j==s+a).last().unwrap();
s+=a;
v
})
}