Computer Organisation and Architecture Notes

Positive and Negative Numbers

  • The discussion focuses on the binary system.
  • Computer systems use the Most Significant Bit (MSB) to represent the sign of an integer.
    • MSB = 0 indicates a positive number.
    • MSB = 1 indicates a negative number.
  • The remaining bits represent the value which can be interpreted differently based on the number format.
  • Signed binary integers can be expressed using different number formats; the course will cover the most commonly used format: Two’s Complement Representation.

Two’s Complement Representation

  • Positive numbers are represented as normal binary numbers.
  • Negative numbers are created through negation:
    • Invert all bits of the number (flip the bits, also known as 1’s complement).
    • Add one to the inverted result, ignoring any overflow.
  • The MSB indicates the sign:
    • MSB = 0 for positive numbers.
    • MSB = 1 for negative numbers.
  • The negation process can compute the negative equivalent of a positive number and vice versa. The same negation process is applied when converting a negative number back to positive.
  • Example:
    • +106 is represented as 0 1 1 0 1 0 1 0.
    • Invert all bits: 1 0 0 1 0 1 0 1
    • Add 1: 1 0 0 1 0 1 1 0
    • -106 is 1 0 0 1 0 1 1 0.

Two’s Complement to Decimal Conversion

  • Each bit position has a weight.
  • Table of weights for an 8-bit number:
2^02^12^22^32^42^52^6-2^7
1248163264-128
  • The most significant bit has a negative weight.
  • To convert from two’s complement to decimal:
    • Calculate the sum of the product of individual bits and their corresponding weightage.
  • Example:
    • Convert 10001110 to decimal.
    • 128+0+0+0+8+4+2+0=114-128 + 0 + 0 + 0 + 8 + 4 + 2 + 0 = -114
    • Therefore, the decimal representation of 10001110 is -114.
  • The MSB is not just a sign bit; it also has a negative weight.
  • If the MSB is ‘1’, the final value will always be negative.

Carry vs Overflow

  • Carry Flag is set when there is a ‘1’ that gets carried out of the MSB of the result. *Example: 48 – 19 = 48 + (-19) = 29
    • First, negate 19 (00010011 -> 11101101)
    • Then add the two numbers and discard any carries emitting from the high order bit.
    • Carry does not always mean that we have an error/overflow.
  • In signed number systems, a carry bit set does not necessarily indicate an error or overflow.
  • Overflow detection in result needs to be determined to see if there is an error.

Detecting Overflow in Two’s Complement Numbers

  • Overflow can be detected by checking the MSB of the operands and the result.
  • Conditions for overflow in addition (Result = A + B):
    • If MSB(A) = MSB(B) and MSB(Result) ≠ MSB(A), then overflow occurs.
  • Conditions for overflow in subtraction (Result = A – B):
    • If MSB(A) ≠ MSB(B) and MSB(Result) ≠ MSB(A), then overflow occurs.
  • Conditions:
OperationConditionsResult
A + BA > 0, B > 0< 0
A + BA < 0, B < 0> 0
A – BA > 0, B < 0< 0
A – BA < 0, B > 0> 0
  • If the results are not feasible (e.g., adding two positive numbers but getting a negative result), the overflow flag is set, which implies an error if the system used is a signed number system.

Carry vs. Overflow (Recap)

  • Unsigned Numbers:
    • Carry = 1 always indicates an overflow (the new value is too large to be stored in the given number of bits).
    • The overflow flag means nothing in the context of unsigned numbers.
  • Signed Numbers:
    • Overflow = 1 indicates an overflow.
    • The carry flag can be set for signed numbers, but this does not necessarily mean an overflow has occurred.
  • Examples:
ExpressionResultCarry?Overflow?Correct Result?
0100 (+4) + 0010 (+2)0110 (+6)NoNoYes
0100 (+4) + 0110 (+6)1010 (-6)NoYesNo
1100 (-4) + 1110 (-2)1010 (-6)YesNoYes
1100 (-4) + 1010 (-6)0110 (+6)YesYesNo

Sign Extension

  • In two’s complement, sign extension is needed to convert a smaller size operand to a larger size operand.
  • Sign extension copies the sign bit (MSB) into the higher-order bits.
  • Its basically the method used when converting a smaller size to a larger size data.
  • For example, to convert a number stored in an 8-bit data type to a 16-bit data type variable, sign extension is used to ensure that the values in the two variables are the same.
  • Examples:
    • 8-bit: 10011010 (-102)
    • 16-bit Sign Extended: 1111111110011010 (-102)

Multi-Precision Arithmetic

  • If operands are larger than 32-bits (e.g., 64-bit operands) and we have only a single 32-bit ALU:
    • Reuse the 32-bit adder for multi-precision addition.
  • Multi-precision arithmetic involves the computation of numbers whose precision is larger than what is supported by the maximum size of the processor register (Single-Precision).
  • Example (64-bit addition using a 32-bit ALU):
ADD R0, R0, R1  ; add lower word with carry out
ADC R2, R2, R3  ; add upper word with carry in
  • The ARM registers are 32bit wide so at any instance, it can only store 32bit data.
  • To add two 64bit operands, we need to add the lower order word first, followed by the higher order word.
  • When adding the higher order word, the carry bit (C bit) has to be added as well.
  • Carry bit is only set if there are any ‘1’ overflowing from the MSB of the lower order word.
  • The ARM assembly instruction that adds the operands and the carry bit is ADC.

Fixed and Floating Point Number System

  • The two main types of number systems: fixed-point and floating-point.

Range and Precision

  • Range: Interval between the smallest (max-) and largest (max+) representable number.
    • Example: Range of two’s complement is (2N1)-(2^{N-1}) to (2N11)(2^{N-1} - 1)
      *Precision, on the other hand, is the amount of information used to represent each number. E.g. 1.666 represent information in a higher precision compared to 1.67.
  • Precision: Amount of information used to represent each number.
    • Example: 1.666 has higher precision than 1.67.
  • Precision corresponds to the interval between adjacent tick marks on the number line. Each tick mark corresponds to a representable number in the number system used.
  • In a binary number system, precision corresponds to the value represented by the LSB.

Fixed-Point Representation

  • Fixed-point format can represent integer and/or fractional values.
  • Similar to the decimal system, the binary system has a radix point too known as the binary point.
  • Fixed point system has a fixed resolution as its radix point position is fixed.
  • A 4-bit integer has a range of 0 to 15.
  • Binary points also has a fractional weightage.
  • The smallest resolution that this system can support determines its precision, and this correspond to its LSB.
  • Limitations:
    • Given a fixed total number of bits allocated for a number.
    • Precision is limited by the range of the integer.
    • If you allocate more bits for the integer, the range will increase but the precision will reduce.

Floating Point Representation

  • Three main fields:
    • Sign: positive/negative number
    • Mantissa: base value
    • Exponent: specifies position of radix point
  • Example: Simple decimal floating-point format:
    • Sign: ±
    • Mantissa: 9.99
    • Exponent: ±99
  • Representing number in this manner, we are able to ‘move’ the position of the radix point by changing the exponent value.
  • Comparing with a fixed point number, we can see that it doesn’t belong to a positional numbering system, actual value is calculated by evaluating the sign, mantissa and exponent value.

Floating Point Representation

  • The size of the exponent determines the range.
  • The size of the mantissa and the value of the exponent determine the precision.
  • Small numbers can be represented with good precision.
  • Large numbers may sacrifice precision to achieve a greater range.
  • Density of floating-point numbers is not uniform.
  • Floating point representation can represent values across a wide range (-9.99109910^{99} to +9.9910+9910^{+99}).

Normalisation

  • Multiple representations exist for the same value, so Normalisation is necessary to avoid synonymous representation by maintaining one non-zero digit before the radix-point.
  • In decimal number, this digit can be from 1 to 9
  • In binary number, this digit should be 1
  • Maximizes the number of bits of precision since the number of leading zeroes are minimised.

Underflow

  • Normalisation creates an underflow region.
  • Underflow region is close to zero and cannot be represented by the floating point number.
  • Underflow occurs when a value is too small to be represented.
  • Floating-point overflow and underflow can cause programs to crash if not handled properly.
  • Smallest positive normalized number +1 x 109910^{-99}
  • Smallest negative normalized number -1 x 109910^{-99}

IEEE 754

  • IEEE754 floating point number standard is used in the industry.
  • It is also the standard behind the floating point data type you declared in programming languages such as C.

IEEE 754 Floating Point Standard

  • Found in virtually every computer since 1980.
  • Simplified porting of floating-point numbers.
  • Unified the development of floating-point algorithms.
  • Single Precision (32-bits):
    • 1-bit sign + 8-bit exponent + 23-bit fraction.
  • Double Precision (64-bits):
    • 1-bit sign + 11-bit exponent + 52-bit fraction.
  • Note however that the ‘Exponent’ and ‘Fractional’ field specify in the IEEE754 format is not the Exponent and Mantissa value of a floating point number.

IEEE 754 Normalised Numbers

  • Sign bit:
    • S = 0 (positive); S = 1 (negative).
  • Fraction:
    • Assumes hidden 1 (not stored) for normalised numbers.
    • Value of normalized floating point number is: (1)S(-1)^S x (1+f<em>1x21+f</em>2x22+f<em>3x23+f</em>4x24+)2(1 + f<em>1x2^{-1} + f</em>2x2^{-2} + f<em>3x2^{-3} + f</em>4x2^{-4} + …)_2 x 2EBias2^{E-Bias}
  • Exponent:
    • Biased representation (00000001 to 11111110).
    • Value of exponent = E - Bias
    • Bias = 127 (Single Precision) and 1023 (Double Precision).
  • The Mantissa is derived by attaching a leading ‘1’ to the left of the fractional bits. Giving the expression 1.F.
  • The actual exponent value is given by the expression E-Bias where E is the value from the Exponent Field in a IEEE754 number representation and Bias is = 127 and 1023 for Single and Double precision IEEE754 number respectively.
  • The E field in IEEE754 is a positive number, Bias allows the actual exponent to span across positive and negative number ranges.

Converting Single Precision to Decimal

  • Example 1:
    • 0 10110010 1110000000000000000000
    • Sign = 0 (positive)
    • Exponent = 10110010210110010_2 = 178; E – Bias = 178 – 127 = 51
    • 1 + Fraction = (1.111)2(1.111)_2 = 1 + 212^{-1} + 222^{-2} + 232^{-3} = 1.875
    • Value in decimal = +1.875 x 2512^{51}
  • Example 2:
    • 1 00001100 01010000000000000000000
    • Sign = 1 (negative)
    • Exponent = 00001100200001100_2 = 12; E – Bias = 12 – 127 = -115
    • 1 + Fraction = (1.0101)2(1.0101)_2 = 1 + 222^{-2} + 242^{-4} = 1.3125
    • Value in decimal = -1.3125 x 21152^{-115}

Representable Range for Normalised Single Precision

  • Smallest magnitude normalized number
    *Normalized (+ve) X 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    Exponent = (00000001)2 = 1; E - Bias = 1 – 127 = -126
    1 + Fraction = (1.000…000)2 = 1
    Value in decimal = 1 x 2-126
    +2-126
  • Largest magnitude normalized number
  • X 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    Exponent = (11111110)2 = 254; E - Bias = 254 – 127 = 127
    1 + Fraction = (1.111…111)2 ≈ 2
    Value in decimal ≈ 2 x 2127 = 2128
    +2128
  • In normalised mode, exponent is from 00000001 to 11111110

IEEE 754 Encoding Mode

Encoding ModeSignExponentFraction
Normalized1 / 000000001 to 11111110Anything
Denormalized1 / 000000000Non zero
Zero1 / 0000000000000 … 0000
Infinity1 / 0111111110000 … 0000
Not a Number (NaN)1 / 011111111Non zero
  • In normalised mode, the E value ranges from 1 to 1111 1110 while the fraction can take on any value.
  • IEEE754 has a special Denormalised mode that allows it to represent numbers in its underflow region.
  • In denorm mode, the MSB of the mantissa is a 0 and not a 1, i.e. 0.F instead of 1.F. Exponent is given a value zero and Fraction can be any non-zero value.
  • Take note however that the exponent value used in the denorm mode is 21262^{-126} and not 21272^{-127}.
  • Zero, Infinity and NaN are special use cases and are represented by special numbers in E and Fraction.

Fixed Point vs Floating Point Number System

  • Given the same number of bits to represent a data, e.g. 32 bits.
  • Floating Point (IEEE754)
    • Max Range ≈ 2*21282^{128} (-21282^{128} to ~21282^{128} ).
    • Max Precision (near to zero) less than 21262^{-126}
  • Fixed Point
    • Max Range (Radix right of LSB, unsigned) ≈ 2322^{32}
    • Max Precision (Radix left of MSB) 2322^{-32}
  • Floating point yield a larger range and better precision at small numbers with the same number of bits representation.
  • One usually needs the best precision when the numbers are small.
  • However, Fixed point number has the advantage of having uniform precision across entire range.
  • Floating point number’s precision changes across the range and the very coarse precision at the two end of the range may not be desirable to an intended algorithm.