Floating Point Numbers
It's called floating point because the point in the number really floats to different spots. This is just scientific notation. There, you can float the point anywhere with exponents.
Decimals is how we write numbers in base 10. It can be both whole and fractional numbers.
Floating point is just the computer's version of scientific notation.
I always assumed floating point numbers were just two integers split by a dot. That's wrong. A floating point value is stored as one value in a floating point register. It uses different binary handling and standards.
Converting floating numbers to binaryโ
The integer and fractional parts are handled differently. They aren't just two integers split by a decimal point.

Floating points aren't accurateโ

The diagram above is only a mental model for why it isn't accurate.
In reality, to convert a decimal to a floating form, the computer does this:
- if the number is 585.22, it will convert it into regular number as -
- Then it will convert both numerator and denominator to binary.
- Perform division on these binary numbers until the quotient is 53 bits.
- Then converts the answer to the format mentioned below.
Standards for binary representation of floating numbersโ
All CPU architectures follow one standard, the IEE 754, for floating numbers. It uses scientific notation and normalization. The integer part of the binary is always just 1. The exponent is base 2, since the value is binary.
When we convert a decimal to binary, there will be 1 at some location for sure. The normalization will keep moving the decimal point to left until it reaches the first 1.
Finally what's stored is - sign bit + exponent + mantissa (binary value after the decimal point) only. Here the main assumptions are -
- The integer part is understood that it's always 1.
- The size of sign, exponent and mantissa bits are fixed.
- The bias added to the exponent is known based on the register size.
- Exponent is for base 2 since it's binary.
The exponent itself can be positive or negative depending on how decimal is moved to get just 1 before the decimal point.
Adding Bias to Exponentโ
This standard uses scientific notation for the mantissa. The exponent can be positive or negative. Still, the goal is to keep exponents positive. That makes comparison easier. The exponent size alone shows if a number is larger or smaller.
Bias in english means, having an opinion different to truth. That's exactly what's done in IEE755. The actual value of exponent is biased with a fixed value.
For example, a 32 bit float has a bias of 127. The 32 bit float register reserves 8 bits for the exponent. The exponent runs from to . The exponent bits must hold to . We add 127 to each, so only positive numbers are stored.
Floating point in programming languagesโ
When you create floating point numbers in Java, it converts them to IEE 758 format before storing them. This is a hardware need that all programming languages meet.
In JavaScript, all numbers are represented in IEE754 format. Meaning even for whole numbers, it has only 53 bits available.
FPU in CPUโ
The FPU is a CPU component that handles the IEE 754 standard. The ALU is only for integers. The FPU does all floating point math.