Computer Architecture Notes
Computer Architecture
General Information
- Course: Computer Architecture (CSE 2213)
- Instructor: Nawshin Tabassum Tanny
- Lecturer, Department of CSE, AUST
- Email: tanny.cse@aust.edu
- Contact: 01644681387
- Room: 7A01/J
- Acknowledgement: Some materials are developed and copyrighted by Swapna S. Gokhale.
What Will We Learn?
- Computer Architecture:
- The science and art of designing the hardware/software interface.
- Designing, selecting, and interconnecting hardware components to create a computing system.
- Meeting functionality requirements, performance, energy consumption, cost, and other specific goals.
Tasks of a Computer Architect
- Determine which attributes are important for a new computer.
- Design a computer to maximize performance and energy efficiency while staying within cost, power, and availability constraints.
- Instruction set design
- Functional organization
- Logic design
- Implementation; which encompass
- integrated circuit design
- packaging
- power and cooling
- Optimizing the design.
What is “Computer Architecture”?
- Computer Architecture = Instruction Set Architecture + Computer Organization
- Instruction Set Architecture (ISA)
- WHAT the computer does (logical view)
- Computer Organization
- HOW the ISA is implemented (physical view)
- Both will be studied in this course.
Instruction Set Architecture
- Instruction set architecture is the attributes of a computing system as seen by the assembly language programmer or compiler.
- Instruction Set (what operations can be performed?)
- Instruction Format (how are instructions specified?)
- Data storage (where is data located?)
- Addressing Modes (how is data accessed?)
- Exceptional Conditions (what happens if something goes wrong?)
Computer Organization
- Computer organization is the view of the computer that is seen by the logic designer. This includes
- Capabilities & performance characteristics of functional units (e.g., registers, ALU, shifters, etc.).
- Ways in which these components are interconnected
- How information flows between components
- Logic and means by which such information flow is controlled
- Coordination of functional units
What is a Computer?
- A computer is an electronic calculating machine that:
- Accepts digitized input information,
- Processes the information according to a list of internally stored instructions and
- Produces the resulting output information.
- The list of instructions is called a computer program, and the internal storage is called computer memory.
- Functions performed by a computer are:
- Accepting information to be processed as input.
- Storing a list of instructions to process the information.
- Processing the information according to the list of instructions.
- Providing the results of the processing as output.
Basic Functional Units of a Computer
- Functional units:
- Input: Accepts information (human operators, electromechanical devices, other computers).
- Output: Sends results of processing (monitor display, printer).
- Memory: Stores information (instructions, data).
- Arithmetic and Logic Unit (ALU): Performs operations on the input information as determined by instructions in the memory.
- Control Unit: Coordinates various actions (input, output, processing).
Information in a Computer - Instructions
- Instructions are explicit commands that:
- Transfer information within a computer (e.g., from memory to ALU).
- Transfer of information between the computer and I/O devices (e.g., from keyboard to computer, or computer to printer).
- Perform arithmetic and logic operations (e.g., Add two numbers, Perform a logical AND).
- A sequence of instructions to perform a task is called a program, which is stored in the memory.
- Processor fetches instructions that make up a program from the memory and performs the operations stated in those instructions.
- Instructions operate upon data.
Information in a Computer - Data
- Data are the “operands” upon which instructions operate.
- Data could be:
- Numbers,
- Encoded characters.
- Data, in a broad sense means any digital information.
- Computers use data that is encoded as a string of binary digits called bits.
Input Unit
- Input Unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Memory Unit
- Memory unit stores instructions and data.
- Data is represented as a series of bits.
- The memory contains a large number of semiconductor storage cells each capable of storing one bit of information.
Memory Unit (Contd..)
- Processor reads instructions and reads/writes data from/to the memory during the execution of a program.
- In theory, instructions and data could be fetched one bit at a time.
- In practice, a group of bits is fetched at a time.
- Group of bits stored or retrieved at a time is termed as “word”
- Number of bits in a word is termed as the “word length” of a computer. Typical word lengths range from 16 to 64 bits.
- "Address” is associated with each word location, addresses are numbers that identify successive locations. (Memory address)
Memory Unit (Contd..)
- Processor reads/writes to/from memory based on the memory address:
- Access any word location in a short and fixed amount of time based on the address.
- Random Access Memory (RAM) provides fixed access time independent of the location of the word.
- Access time is known as “Memory Access Time”.
- Memory and processor have to “communicate” with each other in order to read/write information.
- In order to reduce “communication time”, a small amount of RAM (known as Cache) is tightly coupled with the processor.
- Modern computers have three to four levels of RAM units with different speeds and sizes:
- Fastest, smallest known as Cache
- Slowest, largest known as Main memory.
Memory Unit (Contd..)
- There are 2 classes of storage called primary and secondary.
- Primary storage of the computer consists of RAM units.
- Fastest, smallest unit is Cache.
- Slowest, largest unit is Main Memory.
- Primary storage is insufficient to store large amounts of data and programs.
- Primary storage can be added, but it is expensive.
- Store large amounts of data on secondary storage devices:
- Magnetic disks and tapes,
- Optical disks (CD-ROMS).
- Access to the data stored in secondary storage in slower, but take advantage of the fact that some information may be accessed infrequently.
- Cost of a memory unit depends on its access time, lesser access time implies higher cost.
Arithmetic and Logic Unit (ALU)
- Most computer operations are executed in the Arithmetic and Logic Unit (ALU).
- Arithmetic operations such as addition, subtraction.
- Logic operations such as comparison of numbers.
- In order to execute an instruction, operands need to be brought into the ALU from the memory.
- Operands are stored in general purpose registers available in the ALU.
- Access times of general purpose registers are faster than the cache.
- Results of the operations are stored back in the memory or retained in the processor for immediate use.
Output Unit
- Output Unit:
- Interfaces with output devices.
- Accepts processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an output device and send processed results to the outside world.
Control Unit
- Operation of a computer can be summarized as:
- Accepts information from the input units (Input unit).
- Stores the information (Memory).
- Processes the information (ALU).
- Provides processed results through the output units (Output unit).
- Operations of Input unit, Memory, ALU and Output unit are coordinated by Control unit.
- Instructions control “what” operations take place (e.g. data transfer, processing).
- Control unit generates timing signals which determines “when” a particular operation takes place.
How are the Functional Units Connected?
- Functional units need to communicate with each other.
- In order to communicate, they need to be connected.
- Functional units may be connected by a group of parallel wires.
- The group of parallel wires is called a bus.
- Each wire in a bus can transfer one bit of information.
- The number of parallel wires in a bus is equal to the word length of a computer
Bus Structures
- A group of lines that serves a connecting path for several devices is called a bus
- In addition to the lines that carry the data, the bus must have lines for address and control purposes
- The simplest way to interconnect functional units is to use a single bus (Single bus structure)
Drawbacks & Advantages of the Single Bus Structure
- The devices connected to a bus vary widely in their speed of operation
- Some devices are relatively slow, such as printer and keyboard
- Some devices are considerably fast, such as optical disks
- Memory and processor units operate are the fastest parts of a computer
- Efficient transfer mechanism thus is needed to cope with this problem
- A common approach is to include buffer registers with the devices to hold the information during transfers
- Advantages of the Single Bus Structure:
- Low cost
- Flexibility for attaching peripheral devices
Organization of Cache and Main Memory
- Cache memory is faster than main memory.
Computer Components: Top-Level View
- Key components include the processor, memory, input/output, and the system bus.
- Processor contains
- Control unit
- ALU
- Registers (PC, IR, MAR, MDR, General-purpose registers)
Basic Operational Concepts
- Activity in a computer is governed by instructions.
- To perform a task, an appropriate program consisting of a list of instructions is stored in the memory.
- A Program = A sequence of instructions: Assembly language or Machine language instructions
- Individual instructions are brought from the memory into the processor, which executes the specified operations.
- Data to be used as operands are also stored in the memory.
A Typical Instruction
- MOV LOCA, R0
- General format: Instruction = Operation sourceoperand destinationoperand
- Moves the operand at memory location LOCA to the operand in a register R0 in the processor.
- Simply: Moves the contents of Memory Location LOCA to the processor register R0
- The original contents of LOCA are preserved.
- The original contents of R0 is overwritten.
- Instruction that Moves data from Memory to Register is called LOAD instruction (e.g., MOV LOCA, R0)
- Instruction that moves data from Register to Memory is called STORE instruction (e.g., MOV R0, LOCA)
Another Typical Instruction
- ADD LOCA, R0
- General format: Instruction = Operation Sourceoperand Destinationoperand
- Add the operand at memory location LOCA to the operand in a register R0 in the processor.
- Place the sum into register R0.
- The original contents of LOCA are preserved.
- The original contents of R0 is overwritten.
- Instruction is fetched from the memory into the processor – the operand at LOCA is fetched and added to the contents of R0 – the resulting sum is stored in register R0.
Load and Store Instructions to Transfer From/To Memory to/from Registers
- Summary:
- MOV LOCA, R1 = means => Bring the content of memory location A into Register R1
- MOV R2, LOCB = means => save the value of register R2 in memory location B
- ADD R1, R0 == means => R0 ß [R0] + [R1] (Add the contents of both the registers R0 and R1 and store into register R0
- For ADD, whose contents will be overwritten? (R0)
- Load and Store Instructions
- LOAD LOCA, R1 equivalent to MOV LOCA, R1
- STORE R2, LOCB equivalent to MOV R2, LOCB
Examples of a Few Registers:
- Instruction register (IR): Holds the instruction that is currently executing by the CPU
- Program counter register (PC): Points to (i.e., holds the address of) the next instruction that will be fetched from the memory to be executed by the CPU
- General-purpose registers (R0 – Rn-1): generally holds the operands for executing the instructions of current program
- Memory address register (MAR): Holds the memory address to be read. A read signal from the CPU to the memory module reads the word address held by the MAR register
- Memory data register (MDR): Contains the data to be written into or read out of the addressed location i.e Facilitates the transfer of operands/data to/from Memory from/to the CPU.
Executing a Program … Basic Operating Steps
- Programs reside in the main memory (RAM) through input devices
- PC register’s value is set to the first instruction
- Repeat the following Steps Until the “END” instruction is executed
- Instruction fetch:
- The contents of PC are transferred to MAR
- A Read signal is sent by CU to the memory
- The Memory module reads out the location addressed by MAR register. The contents of that location is loaded into (returned by) MDR
- The contents of MDR are transferred to IR register
- Decode and execute
- At this point, the instruction is ready to be decoded and executed. Instruction in the IR is examined (decoded) to determine which operation is to be performed.
- Get operands for ALU: Fetch the operands from the memory or registers.
- Instruction fetch:
Executing a Program … Basic Operating Steps…
- The operand may already in a General-purpose register
- Or, may be fetched from Memory (send address to MAR – send Read signal to Memory module – Wait for MFC signal (WMFC) from Memory – Get the operand/data from MDR)
- Perform operation in ALU
- Store the result back
- Store in a general-purpose register
- Or, store into memory (send the write address to MAR, and send result to MDR – Write signal to Memory – WMFC)
- WMFC = Wait for Memory Function Complete Signal
- Meanwhile, PC is incremented to the next instruction
- Some Examples: Add R0, R1; Add (R0), R1; Add 50(R0), R1;
Interrupt
- Normal execution of programs may be interrupted if some device requires urgent servicing
- To deal with the situation immediately, the normal execution of the current program must be interrupted
- Procedure of interrupt operation
- The device raises an interrupt signal
- The processor provides the requested service by executing an appropriate interrupt-service routine
- The state of the processor is first saved before servicing the interrupt
- Normally, the contents of the PC, the general registers, and some control information are stored in memory
- When the interrupt-service routine is completed, the state of the processor is restored so that the interrupted program may continue
Classes of Interrupts
- Program
- Generated by some condition that occurs as a result of an instruction execution such as arithmetic overflow, division by zero, attempt to execute an illegal machine instruction, or reference outside a user’s allowed memory space
- Timer
- Generated by a timer within the processor. This allows the operating system to perform certain functions on a regular basis
- I/O
- Generated by an I/O controller, to signal normal completion of an operation or to signal a variety of error conditions
- Hardware failure
- Generated by a failure such as power failure
Software
- In order for a user to enter and run an application program, the computer must already contain some system software in its memory
- System software is a collection of programs that are executed as needed to perform functions such as
- Receiving and interpreting user commands
- Running standard application programs such as word processors, etc, or games
- Managing the storage and retrieval of files in secondary storage devices
- Controlling I/O units to receive input information and produce output results
Software
- Translating programs from source form prepared by the user into object form consisting of machine instructions
- Linking and running user-written application programs with existing standard library routines, such as numerical computation packages
- System software is thus responsible for the coordination of all activities in a computing system
Operating System
- Operating system (OS)
- This is a large program, or actually a collection of routines, that is used to control the sharing of and interaction among various computer units as they perform application programs
- The OS routines perform the tasks required to assign computer resource to individual application programs
- These tasks include assigning memory and magnetic disk space to program and data files, moving data between memory and disk units, and handling I/O operations
Performance
- The most important measure of a computer is how quickly it can execute programs i.e., Runtime of programs. The speed with which a computer executes programs is affected by the design of its hardware and its machine language instructions. Because programs are usually written in a high-level language, performance is also affected by the compiler that translates programs into machine languages.
- For best performance, the following factors must be considered
- Compiler
- Instruction set
- Hardware design
Performance
- Three factors affect performance:
- Hardware design (e.g., CPU clock rate)
- 1GHz CPU => 1 Billion Hz => 10^9 clock cycles/sec (Hz=cycles/sec)
- 1 basic operation (e.g., integer addition) possible in 1 cycle => 1 billion basic operations (10^9 integer additions!) possible in 1 sec!!! WOW!!!
- 1Mhz => 1 Million Hz => 10^6 clock cycles/sec
- Instruction set architecture (ISA) (e.g., CISC or RISC ISA?)
- CISC => instructions complex, more capable, but runs slower
- RISC => instructions Simple, runs faster, but less capable
- Compiler (how efficient your compiler to optimize your code for pipelining…etc?)
- Hardware design (e.g., CPU clock rate)
Performance
- Processor circuits are controlled by a timing signal called a clock
- The clock defines regular time intervals, called clock cycles
- To execute a machine instruction, the processor divides the action to be performed into a sequence of basic steps, such that each step can be completed in one clock cycle
- Let the length P of one clock cycle, its inverse is the clock rate, R=1/P
Processor Clock
- Clock, clock cycle, and clock rate
- Clock Rate = 1 GHz = 10^9 Hz = 10^9 cycles/second or 10^9 clock pulses per second !!! WOW!!! It also means it has a Clock Cycle of 1/10^9 = 10^{-9} sec = 1 ns (nano-second).
- 4GHz CPU => 4x10^9 cy/sec => 1 clock cycle = 0.25 ns
- 500 MHz => 500x10^6 cycles/sec => 2 ns clock pulses
- 1 MHz = 10^6 cycles/sec; 1KHz=10^3 cycles/sec
- 1GHz=1000MHz, 1MHz=1000KHz, 1KHz=1000Hz
- Hz (Hertz) – cycles per second (clock cycles / second)
Basic Performance Equation
T – processor time required to execute a program that may have been prepared in high- level language
N – Dynamic Instruction Count. It is the number of actual machine language instructions needed to complete the execution (note: A single 1-line loop may execute more than a billion times !!! )
S – average number of basic steps (or, clock cycles) needed to execute one machine instruction. Each basic step completes in one clock cycle. Unit: cycles/instruction
R – clock rate: cycles/sec
Note: these are not independent to each other
How to improve T?
- reduce N x S, Increase R
Formula: T = \frac{N \times S}{R}
Basic Performance Equation
- T–program execution time. Unit: second
- N – Unit: instructions
- S – Unit: cycles/instructions
- R–clock rate: cycles/second
- Example: A program with dynamic instruction count (N) of 1000 instructions, each instruction taking 5 cycles on average (S=5 cycles/instruction) and running at a speed of 1KHZ (R = 10^3 0r 1000 cycles/second), what will be the program execution time T?
- Ans: T= \frac{1000 \times 5}{1000} = 5
Overview
- The execution time T of a program that has a dynamic instruction count N is given by: T = \frac{N \times S}{R}
- Here S is the average number of clock cycles it takes to fetch and execute one instruction, and R is the clock rate. (The dynamic instruction count N is computed considering loops, repeated function calls, recursion, etc!)
- Instruction throughput is defined as the number of instructions executed per second.
Performance Improvement
- Pipelining and superscalar operation
- Pipelining: by overlapping the execution of successive instructions
- Superscalar: different instructions are concurrently executed with multiple instruction pipelines. This means that multiple functional units are needed
- Clock rate improvement
- Improving the integrated-circuit technology makes logic circuits faster, which reduces the time needed to complete a basic step
Performance Improvement
- Reducing amount of processing done in one basic step also makes it possible to reduce the clock period, P.
- However, if the actions that have to be performed by an instruction remain the same, the number of basic steps needed may increase
- Reduce the number of basic steps to execute
- Reduced instruction set computers (RISC) and complex instruction set computers (CISC)
Improving Performance: Effect of Instruction Set Architectures (ISA), E.G., CISC and RISC ISA
- Reduced Instruction Set Computers (RISC): simpler instructions => N ↑, S ↓, Better than CISC, because Pipelining is more effective for RISC!!
- Complex Instruction Set Computers (CISC): Complex instructions => N ↓ , S ↑, Not Good, As not suitable for Pipelining!! Instructions complex, more capable => the program gets smaller in size (reduced N), but complex instructions increase S and hampers/stalls pipeline. Example of CISC: Intel processors
- So, A key consideration is the use of Pipelining
- S is close to 1, means the number of cycles per instruction is nearly ideal / small (close to 1) (e.g. RISC processors)
- RISC is Better, because easier to implement efficient pipelining with simpler instruction sets. (example of RISC architecture: ARM processors
Performance Measurement
- T is difficult to compute. Also, T has inappropriate unit (second) for commercial use.
- Measure computer performance using benchmark programs (a set of sample programs, e.g., word processing programs, games, media (audio/video) playback , I/O intensive programs, etc …).
- System Performance Evaluation Corporation (SPEC) selects and publishes representative application programs for different application domains, together with test results for many commercially available computers.
- Reference computer: A previous, renowned computer system, picked by SPEC
Reference Book
- Computer Organization: Carl Hamacher