Computer Organisation and Architecture Notes
Positive and Negative Numbers
- The discussion focuses on the binary system.
- Computer systems use the Most Significant Bit (MSB) to represent the sign of an integer.
- MSB = 0 indicates a positive number.
- MSB = 1 indicates a negative number.
- The remaining bits represent the value which can be interpreted differently based on the number format.
- Signed binary integers can be expressed using different number formats; the course will cover the most commonly used format: Two’s Complement Representation.
Two’s Complement Representation
- Positive numbers are represented as normal binary numbers.
- Negative numbers are created through negation:
- Invert all bits of the number (flip the bits, also known as 1’s complement).
- Add one to the inverted result, ignoring any overflow.
- The MSB indicates the sign:
- MSB = 0 for positive numbers.
- MSB = 1 for negative numbers.
- The negation process can compute the negative equivalent of a positive number and vice versa. The same negation process is applied when converting a negative number back to positive.
- Example:
- +106 is represented as
0 1 1 0 1 0 1 0. - Invert all bits:
1 0 0 1 0 1 0 1 - Add 1:
1 0 0 1 0 1 1 0 - -106 is
1 0 0 1 0 1 1 0.
Two’s Complement to Decimal Conversion
- Each bit position has a weight.
- Table of weights for an 8-bit number:
| 2^0 | 2^1 | 2^2 | 2^3 | 2^4 | 2^5 | 2^6 | -2^7 |
|---|
| 1 | 2 | 4 | 8 | 16 | 32 | 64 | -128 |
- The most significant bit has a negative weight.
- To convert from two’s complement to decimal:
- Calculate the sum of the product of individual bits and their corresponding weightage.
- Example:
- Convert
10001110 to decimal. - −128+0+0+0+8+4+2+0=−114
- Therefore, the decimal representation of
10001110 is -114.
- The MSB is not just a sign bit; it also has a negative weight.
- If the MSB is ‘1’, the final value will always be negative.
Carry vs Overflow
- Carry Flag is set when there is a ‘1’ that gets carried out of the MSB of the result.
*Example: 48 – 19 = 48 + (-19) = 29
- First, negate 19 (00010011 -> 11101101)
- Then add the two numbers and discard any carries emitting from the high order bit.
- Carry does not always mean that we have an error/overflow.
- In signed number systems, a carry bit set does not necessarily indicate an error or overflow.
- Overflow detection in result needs to be determined to see if there is an error.
Detecting Overflow in Two’s Complement Numbers
- Overflow can be detected by checking the MSB of the operands and the result.
- Conditions for overflow in addition (Result = A + B):
- If MSB(A) = MSB(B) and MSB(Result) ≠ MSB(A), then overflow occurs.
- Conditions for overflow in subtraction (Result = A – B):
- If MSB(A) ≠ MSB(B) and MSB(Result) ≠ MSB(A), then overflow occurs.
- Conditions:
| Operation | Conditions | Result |
|---|
| A + B | A > 0, B > 0 | < 0 |
| A + B | A < 0, B < 0 | > 0 |
| A – B | A > 0, B < 0 | < 0 |
| A – B | A < 0, B > 0 | > 0 |
- If the results are not feasible (e.g., adding two positive numbers but getting a negative result), the overflow flag is set, which implies an error if the system used is a signed number system.
Carry vs. Overflow (Recap)
- Unsigned Numbers:
- Carry = 1 always indicates an overflow (the new value is too large to be stored in the given number of bits).
- The overflow flag means nothing in the context of unsigned numbers.
- Signed Numbers:
- Overflow = 1 indicates an overflow.
- The carry flag can be set for signed numbers, but this does not necessarily mean an overflow has occurred.
- Examples:
| Expression | Result | Carry? | Overflow? | Correct Result? |
|---|
| 0100 (+4) + 0010 (+2) | 0110 (+6) | No | No | Yes |
| 0100 (+4) + 0110 (+6) | 1010 (-6) | No | Yes | No |
| 1100 (-4) + 1110 (-2) | 1010 (-6) | Yes | No | Yes |
| 1100 (-4) + 1010 (-6) | 0110 (+6) | Yes | Yes | No |
Sign Extension
- In two’s complement, sign extension is needed to convert a smaller size operand to a larger size operand.
- Sign extension copies the sign bit (MSB) into the higher-order bits.
- Its basically the method used when converting a smaller size to a larger size data.
- For example, to convert a number stored in an 8-bit data type to a 16-bit data type variable, sign extension is used to ensure that the values in the two variables are the same.
- Examples:
- 8-bit:
10011010 (-102) - 16-bit Sign Extended:
1111111110011010 (-102)
Multi-Precision Arithmetic
- If operands are larger than 32-bits (e.g., 64-bit operands) and we have only a single 32-bit ALU:
- Reuse the 32-bit adder for multi-precision addition.
- Multi-precision arithmetic involves the computation of numbers whose precision is larger than what is supported by the maximum size of the processor register (Single-Precision).
- Example (64-bit addition using a 32-bit ALU):
ADD R0, R0, R1 ; add lower word with carry out
ADC R2, R2, R3 ; add upper word with carry in
- The ARM registers are 32bit wide so at any instance, it can only store 32bit data.
- To add two 64bit operands, we need to add the lower order word first, followed by the higher order word.
- When adding the higher order word, the carry bit (C bit) has to be added as well.
- Carry bit is only set if there are any ‘1’ overflowing from the MSB of the lower order word.
- The ARM assembly instruction that adds the operands and the carry bit is ADC.
Fixed and Floating Point Number System
- The two main types of number systems: fixed-point and floating-point.
Range and Precision
- Range: Interval between the smallest (max-) and largest (max+) representable number.
- Example: Range of two’s complement is −(2N−1) to (2N−1−1)
*Precision, on the other hand, is the amount of information used to represent each number. E.g. 1.666 represent information in a higher precision compared to 1.67.
- Precision: Amount of information used to represent each number.
- Example: 1.666 has higher precision than 1.67.
- Precision corresponds to the interval between adjacent tick marks on the number line. Each tick mark corresponds to a representable number in the number system used.
- In a binary number system, precision corresponds to the value represented by the LSB.
Fixed-Point Representation
- Fixed-point format can represent integer and/or fractional values.
- Similar to the decimal system, the binary system has a radix point too known as the binary point.
- Fixed point system has a fixed resolution as its radix point position is fixed.
- A 4-bit integer has a range of 0 to 15.
- Binary points also has a fractional weightage.
- The smallest resolution that this system can support determines its precision, and this correspond to its LSB.
- Limitations:
- Given a fixed total number of bits allocated for a number.
- Precision is limited by the range of the integer.
- If you allocate more bits for the integer, the range will increase but the precision will reduce.
Floating Point Representation
- Three main fields:
- Sign: positive/negative number
- Mantissa: base value
- Exponent: specifies position of radix point
- Example: Simple decimal floating-point format:
- Sign: ±
- Mantissa: 9.99
- Exponent: ±99
- Representing number in this manner, we are able to ‘move’ the position of the radix point by changing the exponent value.
- Comparing with a fixed point number, we can see that it doesn’t belong to a positional numbering system, actual value is calculated by evaluating the sign, mantissa and exponent value.
Floating Point Representation
- The size of the exponent determines the range.
- The size of the mantissa and the value of the exponent determine the precision.
- Small numbers can be represented with good precision.
- Large numbers may sacrifice precision to achieve a greater range.
- Density of floating-point numbers is not uniform.
- Floating point representation can represent values across a wide range (-9.991099 to +9.9910+99).
Normalisation
- Multiple representations exist for the same value, so Normalisation is necessary to avoid synonymous representation by maintaining one non-zero digit before the radix-point.
- In decimal number, this digit can be from 1 to 9
- In binary number, this digit should be 1
- Maximizes the number of bits of precision since the number of leading zeroes are minimised.
Underflow
- Normalisation creates an underflow region.
- Underflow region is close to zero and cannot be represented by the floating point number.
- Underflow occurs when a value is too small to be represented.
- Floating-point overflow and underflow can cause programs to crash if not handled properly.
- Smallest positive normalized number +1 x 10−99
- Smallest negative normalized number -1 x 10−99
IEEE 754
- IEEE754 floating point number standard is used in the industry.
- It is also the standard behind the floating point data type you declared in programming languages such as C.
IEEE 754 Floating Point Standard
- Found in virtually every computer since 1980.
- Simplified porting of floating-point numbers.
- Unified the development of floating-point algorithms.
- Single Precision (32-bits):
- 1-bit sign + 8-bit exponent + 23-bit fraction.
- Double Precision (64-bits):
- 1-bit sign + 11-bit exponent + 52-bit fraction.
- Note however that the ‘Exponent’ and ‘Fractional’ field specify in the IEEE754 format is not the Exponent and Mantissa value of a floating point number.
IEEE 754 Normalised Numbers
- Sign bit:
- S = 0 (positive); S = 1 (negative).
- Fraction:
- Assumes hidden 1 (not stored) for normalised numbers.
- Value of normalized floating point number is: (−1)S x (1+f<em>1x2−1+f</em>2x2−2+f<em>3x2−3+f</em>4x2−4+…)2 x 2E−Bias
- Exponent:
- Biased representation (00000001 to 11111110).
- Value of exponent = E - Bias
- Bias = 127 (Single Precision) and 1023 (Double Precision).
- The Mantissa is derived by attaching a leading ‘1’ to the left of the fractional bits. Giving the expression 1.F.
- The actual exponent value is given by the expression E-Bias where E is the value from the Exponent Field in a IEEE754 number representation and Bias is = 127 and 1023 for Single and Double precision IEEE754 number respectively.
- The E field in IEEE754 is a positive number, Bias allows the actual exponent to span across positive and negative number ranges.
Converting Single Precision to Decimal
- Example 1:
0 10110010 1110000000000000000000- Sign = 0 (positive)
- Exponent = 101100102 = 178; E – Bias = 178 – 127 = 51
- 1 + Fraction = (1.111)2 = 1 + 2−1 + 2−2 + 2−3 = 1.875
- Value in decimal = +1.875 x 251
- Example 2:
1 00001100 01010000000000000000000- Sign = 1 (negative)
- Exponent = 000011002 = 12; E – Bias = 12 – 127 = -115
- 1 + Fraction = (1.0101)2 = 1 + 2−2 + 2−4 = 1.3125
- Value in decimal = -1.3125 x 2−115
Representable Range for Normalised Single Precision
- Smallest magnitude normalized number
*Normalized (+ve) X 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Exponent = (00000001)2 = 1; E - Bias = 1 – 127 = -126
1 + Fraction = (1.000…000)2 = 1
Value in decimal = 1 x 2-126
+2-126 - Largest magnitude normalized number
- X 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Exponent = (11111110)2 = 254; E - Bias = 254 – 127 = 127
1 + Fraction = (1.111…111)2 ≈ 2
Value in decimal ≈ 2 x 2127 = 2128
+2128 - In normalised mode, exponent is from 00000001 to 11111110
IEEE 754 Encoding Mode
| Encoding Mode | Sign | Exponent | Fraction |
|---|
| Normalized | 1 / 0 | 00000001 to 11111110 | Anything |
| Denormalized | 1 / 0 | 00000000 | Non zero |
| Zero | 1 / 0 | 00000000 | 0000 … 0000 |
| Infinity | 1 / 0 | 11111111 | 0000 … 0000 |
| Not a Number (NaN) | 1 / 0 | 11111111 | Non zero |
- In normalised mode, the E value ranges from 1 to 1111 1110 while the fraction can take on any value.
- IEEE754 has a special Denormalised mode that allows it to represent numbers in its underflow region.
- In denorm mode, the MSB of the mantissa is a 0 and not a 1, i.e. 0.F instead of 1.F. Exponent is given a value zero and Fraction can be any non-zero value.
- Take note however that the exponent value used in the denorm mode is 2−126 and not 2−127.
- Zero, Infinity and NaN are special use cases and are represented by special numbers in E and Fraction.
Fixed Point vs Floating Point Number System
- Given the same number of bits to represent a data, e.g. 32 bits.
- Floating Point (IEEE754)
- Max Range ≈ 2*2128 (-2128 to ~2128 ).
- Max Precision (near to zero) less than 2−126
- Fixed Point
- Max Range (Radix right of LSB, unsigned) ≈ 232
- Max Precision (Radix left of MSB) 2−32
- Floating point yield a larger range and better precision at small numbers with the same number of bits representation.
- One usually needs the best precision when the numbers are small.
- However, Fixed point number has the advantage of having uniform precision across entire range.
- Floating point number’s precision changes across the range and the very coarse precision at the two end of the range may not be desirable to an intended algorithm.