Before continuing these examples please follow the previous chapter: Concept of IEEE-Floating Point Representation
Example 1:
Consider a 16-bit floating-point number where the mantissa is a sign-magnitude fraction and exponent is in biased form bits are allocated for the mantissa.
Answer:
Solution (a):
Exponent is of 16-(9+1) =16-10 = 6 bits
Solution (b):
Range of positive mantissa:
Mantissa is of a bit, so the maximum value of mantissa =
and the minimum value of mantissa =
Number representation in the format specified:
In floating-point representation, every number cannot be represented accurately, since number distribution is non-uniform and non-continuous. So, there is a chance of ever, as precision is done.
First Maximum +ve number:
Second maximum +ve number:
Difference between the first maximum +ve number and second maximum +ve number
NOTE:
Since the positive and the negative number representation are symmetric, so, the difference between the first maximum –ve number and the second maximum –ve number is also . So, it is okay to analyze only the number representation with +ve numbers.
First minimum +ve number:
Remember,
as a generic representation of value expression is
Second minimum +ve number:
Difference between the first minimum and second minimum is
Conclusion:
(i) The difference between first and second maximum on the below format is =
(ii) The difference between first and second
(i) (Diff) first and second max >>> (Diff) first and second minima
→ (gap) first and second max is maximum,
→ error is maximum between first and second maximum.
(ii) Maximum error will occur if some number lies exactly halfway between 1st max, 2nd max since the maximum error occurs when approximation in representation is done when deviation from both sides will be equal.
(iii) Minimum error is possible between 1st min and 2nd min.
(iv) Number system is clustered towards 0
i.e. the number representation is dense towards ‘0’ and sparse away from‘0’.
Solution (C):
Bias = 32
So, Error = 0.125
Example 2:
Consider the IBM system which allocated 32 bits for floating-point number. The base of the system is 16. Mantissa is in normalized sign-magnitude form. Exponent is in excess-128 formats. Express the largest representable value as powers of a number system with base 11.
Solution:
Excess -128 format is required, which means
So,
The mantissa is 23 bits since we are using explicit normalization. So value expression will be
Now, E is of 8 bits, so, maximum number that can be represented with 8 bits
Say,
The largest representable value in a number system with base 11 will be.