Floating point representation
Representing real numbers in binary using sign bit, exponent, and mantissa fields enabling representation of very large and very small numbers. Floating point approximates real numbers within limited precision. IEEE 754 standard defines 32-bit (float) and 64-bit (double) formats. Precision limitations cause rounding errors.
Formula
Value = (−1)^sign × 1.mantissa × 2^(exponent − bias)
Real World
In 1996, the Ariane 5 rocket exploded 37 seconds after launch because a 64-bit floating-point velocity value overflowed when converted to a 16-bit integer, demonstrating how precision limits can have catastrophic real-world consequences.
Exam Focus
When converting, show the sign bit, normalised mantissa, and biased exponent separately — examiners award marks for each component.
How well did you know this?