Interactive visualization of the floating-point format

- 10 January 2024

Sign

Exponent

mantissa

The structure

The single-precision floating-point format is described in the IEEE 754 standard which was first published in 1985.

The standard describes a 32-bit variant and a 64-bit variant called single-precision and double-precision respectively.

In programming languages like C, these two formats are known as "float" and "double".

This post will only focus on the 32-bit variant.

Sign

The most significant bit is reserved to indicate whether it's a positive or negative number.

0 means positive, 1 means negative.

Exponent

The next 8 bits are reserved for the exponent. If for instance, the value of these 8 bits is 6, this number will be interpreted as 2⁶ which equals 64.

Compared to a regular 8-bit integer, the binary representation looks a bit weird.
In a regular integer "0111111" would be "127" but here the value is "0".

This is caused by the value being shifted by 127. The exponent is therefore often referred to as a biased exponent. This allows for negative numbers.

Mantissa

The last 23 bits are reserved for the mantissa (also called significand or fraction).

These bits act like a bitfield of fractions in descending order. Starting at the most significant bit the fractions are ¹/₂, ¹/₄, ¹/₈, ¹/₁₆, ¹/₃₂, ¹/₆₄ ...
all the way to the last bit ending with ¹/_8388608

When multiple bits are 1 in the mantissa the respective fractions are summed together.

Combining the two

Now that we can interpret both the exponent and mantissa - we can add them together.

This is done by taking the mantissa adding 1 and multiplying it with the exponent value.

exponent * (1 + mantissa)

Examples

Let's try some different examples and see how we can reach the results.

Goal: 32
Here we can simply set the exponent's value to "5".

2⁵ = 32
Goal: 0.25
Setting the exponent's value to "-2" should do the trick.

2^-2 = 0.25
Goal: 2.5
Setting the exponent's value to "1" gives us "2". Now select the fraction ¹/₄ in the mantissa.

2¹
* (1 + ¹/₄) = 2.5

2 * (1 +
¹/₄
) = 2.5

2 * (
1 + 0.25
) = 2.5

2 *
(1.25)
= 2.5

2 * 1.25 = 2.5
Goal: 13
The lowest value we can get from the exponent without overshooting is "8" by setting the exponent's value to "3"

Now we have to reach the rest of the way with the mantissa. Selecting ¹/₂ gives us "12". Adding a ¹/₄ to that will overshoot our target by giving us 14. So instead we can select ¹/₈.

2³
* (1 + ¹/₂ + ¹/₈) = 13

8 * (1 +
¹/₂
+ ¹/₈) = 13

8 * (1 + 0.5 +
¹/₈
) = 13

8 * (1 +
0.5 + 0.125
) = 13

8 * (
1 + 0.625
) = 13

8 *
(1.625)
= 13

8 * 1.625 = 13

What about zero?

With the logic described above, we can produce many common numbers except for one very common value - zero.

Setting the exponent to "0" will result in 2⁰ = 1 and 2^-1 = 0.5 will only get us closer to zero.

They have luckily thought of this, and we are now going to look at exceptions to the rules we've previously been through.

Special values

The standard describes 3 special values.

All 32 bits being zero - is interpreted as 0
All exponent bits being 1 and all mantissa bits being 0 - is interpreted as Infinity
All exponent bits being 1 and any mantissa bit being 1 - is interpreted as Not a Number

Normal and Denormal numbers

Besides having special values - the floating-point standard also describes a whole mode called "Denormalized numbers" or "Subnormal numbers"

The rules described previously are therefore producing what is referred to as "Normalized numbers".

Denormal numbers are trying to improve the resolution around 0. They are "activated" by having all exponent bits set to 0 - and at least one mantissa bit set to 1.

This mode changes the calculation a bit.

The exponent is fixed at 2^-126 and 1 is no longer added to the mantissa. This allows the use of the mantissa to go smaller than the exponent's value.

An example: 2^-126 * ¹/₂

Thanks!

Thank you for staying with me until the end. I hope you found it informative and engaging.

Should you have any questions, thoughts, corrections or comments, feel free to reach me at: hello@eibx.com