(One)-minute geek news - Floating point representation

Floating point representation

We manipulate floats everyday, yet do you know how they are represented on your machine, and why does it matter?

Integers

As you know it, computers only deal with binary sequences. Therefore numbers must be converted into binary sequences before being processed.

For integers, the mapping is (relatively) straight forward: it is simply the base 2 representation of the number. Well, actually, this only works for positive numbers. Negative numbers are usually represented using the two's complement notation: swap all bits and add 1. This is equivalent to distributing numbers along a ring (with periodicity 2**n, where n is the number of bits used). Such representation allows for the generalisation of conventional arithmetic operations to negative numbers.

Note: In Python 3, the number of bits used for representing a number is dynamically assigned to fit the integer. In other words, a number is stored in (at least) as many bits needed to represent it. As a consequence, there is no integer overflow in Python 3 (which could have avoided the spectacular €340m Ariane 5 failure).

Using such binary representation, integers can be exactly stored and manipulated. Meaning that integer operations are exact.

IEEE 754 standard

For floats, this is far more complicated... Any finite binary representation cannot exactly represent all real numbers (this is also true for the [0,1] interval alone).

Which one do you use? It is very likely that you use a IEEE 754 standard representation, as this norm (from 1985) is largely widespread nowadays. Let us review the basics of this norm with a scheme:

To these numbers we add:

Positive and negative infinity: ones-filled exponent, and zero-filled mantissa.
NaN "numbers": ones-filled exponents, and non-zero mantissa.

So we can note that there are a whole lot of NaN "numbers", but also distinct positive and negative zero!

So how long is the exponent and the mantissa? These IEEE 754 representations come with multiple precisions, the most famous being:

Simple precision: exponent = 8 bits ; mantissa = 23 bits => total: 32 bits
Double precision: exponent = 11 bits ; mantissa = 52 bits => total: 64 bits

In Python 3, floats are by default in double precision representation. Let us then play with double precision floats!

Double precision IEEE 754

Let us review some important informations about double precision floats:

How many differents valid floats?

There is 2**64 - 2**52 (NaNs) - 1 (duplicate zero) different non standard floats possible!

What is the range?

Mininum float (absolute value): 2**-(1022-52) = 2**-1074 ~ 4.94e-324
Maximum float (before inf): 2**1024 - 2**(1023-52) = 2**1024 - 2**971 ~ 1.80e+308

What is the precision?

Some floats can be exactly stored within the IEEE 754 representation, like: 1.0, 0.5, 0.625, ... But most aren't: 0.1, 1/3, pi, ... We could define the precision such as the largest gap between a number and its adjacent neighbors in the IEEE 754 representation. In that case, the precision on a float is 2**-52 times its exponent factor.
Therefore, the relative precision on a float is no less than 2**-53 ~ 1e-16 (and can rise much higher!).
For example, any float higher than 2**53 ~ 9e+15 is precise at least up to the rounding integer!

Python fractions module

Exact algebra on rational numbers!

Bonus: convert to binary

small script

References

IEEE 754
http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html