CPU and GPU

ALU, CU, Registers and Buses

Control Unit

  • Co-ordinates all the activities of the CPU

  • Directs flow of data between CPU and other devices

  • Accepts next instruction, decodes it, handles its execution and stores the resulting data back in memory or registers

  • Sends memory read/write requests → main memory on control bus

  • Co-ordinates and communicates with all parts of the CPU

Program Counter

  • Holds the address of the next instruction to be executed

  • At the start of every FDE cycle, address held in the PC is copied to the MAR

Memory Address Register

  • Holds he address of the memory location from which data and/or instructions are to be fetched OR written to

  • Sends these addresses to RAM via address bus

Memory Address Register

  • Used to temporarily store data which is read from or written to memory

  • All data to and from memory is sent via data bus and through the MDR

Current Instruction Register

  • Holds the current instruction being executed

  • Contents of the MDR are copied to the CIR if it is an instruction

Accumulator

  • Data/control information is often stored in them

  • Results of calculations carried out by the ALU are temporarily stored here

  • General purpose register

Busses

Address Bus

CPU → RAM

Carries memory addresses hat identify where the data is being read from or written to

Data Bus

CPU←→RAM

Carries the 1s and 0s that make up the actual information being transmitted around the CPU/computer

Control Bus

CPU←→RAM

Carries command and control signals to and from every other component of the CPU/computer

Decode Unit

Prepares the execution of an instruction by looking up the binary operation code in its table so the CPU knows what to do

Status Register

  • Contains information about the status of the processor

  • Flags can be checked at any point

Interrupt Register

  • Checked by CPU to see if an interrupt is only processing

  • An interrupt can be pressing a key or releasing a key

Cache

  • Small area of memory near CPU or on it

  • Provides fast access to frequently used instrictions

L1

Usually part of the CPU chip itself

The smallest and fastest to access

L2/L3

Bigger than L1

Built between the CPU and RAM

Slightly longer to access than L1

The FDE Cycle

Computer: An electronic device that takes an input processes data and delivers an output

Fetch

  • Program counter is checked; holds address of the next instruction to be executed

  • Address is then copied into the MAR

  • Address is then sent along the address bus to main memory where it waits to receive a signal from the control bus

  • The CU sends a read signal along the control bus to main memory

  • The contents stored in memory at address #### can now be sent along the data bus to MDR

  • Data received by MDR from RAM is copied to CIR

  • Increment contents of the PC so the address it has can point to the next instruction to be executed

Decode

  • Instruction held by CIR is decoded by decode unit

  • Instruction is made up of

    • OPCODE: What to do

    • OPERAND: What to do it do

Execute

  • Send address #### to MAR

  • As in this scenario we want to read the data stored in ####, the CU sends a signal along the control bus to RAM

  • Contents of the MDR as now copied to the accumulator

  • Instruction is now complete

Performance of the CPU

Clock Speed

  • Measured in Hz

  • Number of cycles per second

  • Avg. 3.2 billion instructions fetched [per second]; 3.2 Gigahertz

Cache

  • Temporary storage of data/instructions read to and written from

  • Located on/near the CPU

  • Stores copies of recent data/instruction

  • Much quicker to access than RAM

Number of Cores

  • Complete copy of the CPU

  • CPUs with multiple cores have more power to run multiple pages at the same time

  • CPU cores have to communicate with each other; takes time

  • Not all pages are designed with multiple cores in mind

Pipelining

Next instruction can be fetched

While simultaneously the processor is performing arithmetic or logic operations in the ALU for a previous instruction

It is a much efficient use of various registers and onbound CPU cache. It allows different parts of instruction across multiple stages to be held in different registers a the same time

Fetching an instruction whilst prior one is being decoded and the one before that is being executed

Arithmetic Pipeline

Consists of the parts of an arithmetic operation that can be broken down and overlapped as they are carried out.

Allows multiple instructions to be executed simultaneously

Downsides of Pipelining

A program with a lot of branching instructions (ELSE IF) may not benefit from pipelining — an instruction that is deleted is called flushing

CISC and RISC

Instructions Set

Set of all instructions written in machine code

CISC = Complex Instruction Set Computer

RISC = Reduced Instruction Set Computer

CISC

Completes the task in as few lines of assembly as possible

Processor hardware and circuiting have to me more complicated so it can understand and execute a series of operations.

  • More memory efficient: Because CISC instructions are more complex, they require fewer instructions to perform complex tasks, which can result in more memory-efficient code.

  • Widely used: CISC processors have been in use for a longer time than RISC processors, so they have a larger user base and more available software.

  • Slower execution: CISC processors take longer to execute instructions because they have more complex instructions and need more time to decode them.

  • Higher power consumption: CISC processors consume more power than RISC processors because of their more complex instruction sets.

RISC

Aim to use single instructions that will be executed within a single machine / clock cycle

RISC instructions requires fewer transitions and less complex hardware, leaving more room for general purpose registers and cache

As all instructions are uniform in terms of their execution time, pipelining is possible

Very popular in low power and portable devices e.g. smart TV, smartwatches, tablets, etc.

  • More instructions required: RISC processors require more instructions to perform complex tasks than CISC processors.

  • Increased memory usage: RISC processors require more memory to store the additional instructions needed to perform complex tasks.

Von Neumann and Harvard

Von Neumann

Shared memory space for instructions and data

Instructions and data are stored in the same format

A single CU/processor follows a linear FDE cycle

Registers are used as fast access to data/instructions

Harvard

Instructions/data are stored in separate memory units

Each has its own bus

Reading/writing data can be done at the same time as fetching instructions

Used by RISC processors

SIMD

Simple Instruction; Multiple Data

Parallel processing is where a processor carries out a single instruction on multiple data items at the same time — used by graphics processors

MIMD

Multiple Instructions; Multiple Data

Multiple instructions are carried on multiple data items across several cores

Multi-Core and Parallel Systems

Multi-core Processors

Single chip containing 2+ independent processors; interpolate them into a CMP, and a CMP contains 1+ CPU cores

Enhancements

Double cores does not equal double performance because

  • Over heads involved with inter-core communication

  • Some programs can’t make maximum use of all cores

Parallel Processing

The processing of program instructions by dividing them between multiple processors or processor cores; running program takes less time

Parallel VS Concurrent

Parallel

Concurrent

1+ Processors are executing separate instructions at the same time

When more than 1 process is running from a program at once

Several tasks being performed at the same time

Increased program throughput

Huge performance increase for graphics processing

Saves processing time

Potential slowdown if many user requests a similar action

GPUs and their Uses

Co-Processor

Any additional processor used for a specialized task

Improves overall speed of the computer

Executes concurrently with the main CPU

GPU

Graphical Processing Unit → A.I

Slower than a CPU core

Highly specialized

Superior in speed and efficiency but only for certain tasks

Used for machine learning

Better for simple operations for large data sets

Performs calculations on instructions/vectors and multiple data sets at the same time

Rendering graphics is easier

Pa

CPU

Excel at performing complex operations on small data set

Serial Processor