Fixed-point representations of numbers using unsigned/twoâs-complement/BCD binary limit number ranges in a way thatâd forbid fractional value representation
As a result the denary standard-form floating point system (where larger numbers can be written as something like 2.5Ă1024) can be adapted for usage in binary notation as MĂ2E (where M = the mantissa and E = the exponent)
Firstly add up all the 1-values in the mantissa (1/2 + 1/8 + 1/16 + 1/64 = 0.703 = 45/64)
Then add up all the 1-values in the exponent and use the M Ă 2E formula (E = 4 so M Ă 2E = 45/64 Ă 24 = 45/64 Ă 16 = 11.25)
Firstly add up all the 1-values in the mantissa (-1 + 1/2 + 1/16 + 1/32 = 0.40625)
Then add up all the 1-values in the exponent and use the M Ă 2E formula (E = 8+4 = 12 so M Ă 2E = 0.40625 Ă 12 = 4.875)
Firstly check if the fractionâs in the correct form (numerator < denominator if fraction < 1) - in this case it is indeed since 0.171875 is approx. equal to 11/64
Then split the fraction (11/64) up into the individual values itâs summed up from (1/8 + 1/32 + 1/64) which gives 0.0010110 as the mantissa and 0 as the exponent
Stuff like this often comes up with fractional numbers whose values can only be approximated (ie. have odd denominators - eg. 1/3) - reducing an error like this is possible when the mantissaâs increased in terms of bit size (eg. a 16-bit mantissaâs larger than an 8-bit one)
To increase the mantissaâs size the binary floating point has to be moved as far to the left as possible (eg. if 5.88 = 0101.11100001 then moving the binary point three places to the left causes that number to be stored as 5.75)
Firstly the bits in the mantissa get shifted by two places to the left to get 1.0110000
Then the exponent gets reduced by two to 00000011 and recombined with the mantissa to provide the normalized result (1.011000 00001000)
Certain numbers can only be approximated using floating point binary due to mantissa size limitations (an issue that can be minimized when using double/quadruple precision-permitting programming languages)
Overflow errors also risk being produced if a floating point binary calculation leads to a value that exceeds the maximum possible storable value
Underflow errors also risk being produced if division by a very large number leads to any value lower than the smallest storable value
The mantissa also doesnât allow for a zero value as it must be valued at either 0.1 or 1.0