1/66
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
ALU
Arithmetic logic unit. Performs arithmetic operations and logic operations.
Amdahl's Law
The law of diminishing returns for speedup in processors.
CISC
Complex Instruction Set Computer. Design where instructions can execute low-level operations
Example: Intel
CPI
Cycles per Instruction. Average number of clock cycles per instruction, and measures performance
CPU
Central Processing Unit. Brain of the computer. Adds and tests numbers. Includes ALU, memory, and registers.
DMA
Direct Memory Access. Allows hardware to access RAM independent of the CPU
FP
Floating Point. Arithmetic approximation of real numbers
GFLOPS
Billion Floating Point Operations Per Second. Measurement of computer performance in 10^9
GPU
Graphics Processor Unit. Renders images to display on a device. Ex: GeForece graphics card
INT
Integer. A whole number that can be positive, negative, or 0.
LRU
Least Recently Used. Algorithm for selecting less recently used pages.
LSB
Least Significant Bit. Bit in a binary number with the lowest numerical value
Microcode
A low-level instruction set
MFLOPS
Million Floating Point Operations Per Second. Measurement of computer performance in 10^6
MIPS
Million Instruction Per Second. An instruction set architecture (ISA) that is a RISC
MMU
Memory Management Unit. Translates virtual addresses to physical addresses
Moore's Law
Processing power for computers will double every two years
MSB
Most Significant Bit. Bit in a binary number with the highest numerical value
Pipeline
Overlapping multiple instructions/stages during execution, like an assembly line does
PC
Program Counter. Keeps track of the next instruction to be executed.
PLA
Programmable Logic Array. Device used to implement logic circuits
DRAM
Dynamic RAM. RAM that must be constantly refreshed.
RAM
Random Access Memory. Stores currently used data for quick access.
SRAM
Static RAM. RAM that does not need to be constantly refreshed.
RISC
Reduced Instruction Set Computer. An instruction set architecture (ISA) that can have a lower CPI than a CISC.
Example: MIPS
TLB
Translation Look-aside Buffer. Memory cache for tags that are translated to a physical page number
TFLOPS
Trillion Floating Point Operations per Second. Measurement of computer performance in 10^12
VHDL
VHISC (Very High Speed Integrated Circuit) Hardware Description Language. Language that describes digital systems.
VHSIC
Very High Speed Integrated Circuit. U.S. program that researched integrated circuits
VLIW
Very Long Instruction Word. Architecture that exploits instruction level parallelism
VM
Virtual Memory. Simulated memory that is written on a hard drive.
IF
Instruction Fetch. Retrieves data from the instruction set
ID
Instruction Decode. Figures out what the instruction does and what registers to use
EX
Execute. Performs an instruction with the ALU
MEM
Memory. Accesses an operand in data memory
WB
Write Back. Writes the result back into the register.
What is the 3 hazards in pipelining and describe them and how they are overcome?
Structural. A required resource is busy. Solved by pipeline stalling or hardware redesign.
Data: Data from previous operation is not ready. Solved by pipeline stalling or forwarding.
Control: Wrong instruction is in the pipeline. Solved by pipeline stalling or branch prediction
Mac Pro
Desktop - General Purpose, variety of software
HP Itanium
Server - High performance, capacity, reliability
Toaster
Embedded - Stringent power/performance
DSP
Custom - Unique specialty circuitry
Watson
Supercomputer - Massive parallel processing
ARM Structure
Number of Registers: 32 Register Bit Length: 64 Instruction Bit Length: 32 Data Path Bit Length: 64
LEGv8 Structure
Number of Registers: 32 Register Bit Length: 32 Instruction Bit Length: 32 Data Path Bit Length: 32
To increment to the next instruction, what do we do to the existing PC and why?
Add 4 to the PC, because each instruction is 4 bytes.
The Mill
- Operations execute in program order
- The compiler controls when ops issue and retire
- Short pipeline
- Is not yet silicon
- No rename registers to eliminate hazards
- Has no general registers since transient data lives on the Belt which is a FIFO
Neural Nets
- Inductive Reasoning. Given input and output data (training examples), we construct the rules
- Supervised and Unsupervised Training; Supervised: both inputs and outputs are provided. Unsupervised: Only input is provided.
- Good types of processors for NNs
-Software
Quantum Computers
Uses quantum bits, or qubits, but instead of 1's and 0's can be 1, 0, or both; called superposition.
Who has the Most Powerful Computer?
China
Who Has the Largest Server Farm?
What is the difference between a stall and an interrupt?
A stall pauses the pipeline. An interrupt flushes the pipeline and jumps to a memory location to execute code.
What are three types of cache misses and their definitions?
Compulsory: First to a block.
Capacity: Finite cache size and the replaced block is accessed later
Conflict: Competition for entries in a set. Eliminated in a fully associative cache
Design a 256KB direct‐mapped data cache that uses a 32‐bit address and 8 words per block. Calculate the following: (a) How many bits are used for the byte offset and why?
(b) How many bits are used for the set (index) field?
(c) How many bits are used for the tag?
(a) How many bits are used for the byte offset and why? 5 bits because 265/8 = 32 and 2^5 = 32 bytes
(b) How many bits are used for the set (index) field? 13 because there are 32 bytes per block and 8192 = 2^13; 13 bits are used
(c) How many bits are used for the tag? 14 bits because 5 + 13 + 14 = 32; 14 bits remain for the tag
What are the following bits/flags used for? Valid, Dirty, Reference
Valid: Cache loaded with valid data
Dirty: Cache changed since it was read from main memory.
Reference: Estimates LRU bits
Cache Placement
Direct Mapped - each memory block has a specific cache block
Fully Associative - Any memory block can go in any cache block
Set associative - Each memory block has any set of cache blocks
Calculating Cache Sizes
[20 points] Design a 256KB direct-mapped data cache that uses a 32-bit address and 8 words per block. Calculate the
following:
(a) How many bits are used for the byte offset?
(b) How many bits are used for the set (index) field?
(c) How many bits are used for the tag?
(a) How many bits are used for the byte offset?
256/8 = 32. 2^5 = 32. Therefore, 5 bits
(b) How many bits are used for the set (index) field?
256*8 = 8192. 2^13 = 8192. Therefore, 13 bits are used.
(c) How many bits are used for the tag?
32-13-5 = 14 bits.
North Bridge vs South Bridge
Both are core logic chips on a motherboard. North bridge is connected to CPU, so it can perform tasks of higher performance.
**Multithreading
Course Grained: Simplest, one thread runs until an event with big latency
Fine Grained: One thread followed by another and so forth.
Simultaneous: Most efficient. Combination of coarse grained and fine grained
Strong Scaling vs. Weak Scaling
Strong: Time does not vary with # of processors for fixed total problem size
Weak: Time varies with # of processors for fixed problem size/processor
SISD/SIMD/MISD/MIMD
SISD: Single Instruction, Single Data (Has one control unit and more than one procession unit). Example: Intel
SIMD: Single Instruction, Multiple Data. Processes multiple data with the same instruction*. Example: Xplor
MISD: Multiple Instruction, Single Data Example: iWarp
MIMD: Multiple Instruction, Multiple Data. Example: Intel/
Shared Memory Multiprocessor
UMA: Uniform Memory Access. All processors have the same physical memory.
NUMA: Non-Uniform Memory Access. All processors have their own physical memory
Synchronization and Locks: All processor have their own physical memory
**Difference b/w shared memory multiprocessor and message-passing multiprocessor
Shared memory multiprocessor is faster. Does not need to wait for a response to send.
Reliability: MTTF, MTBF, MTTR, AFR, Availability
MTTF: Mean Time To Failure -Improved by Fault Avoidance, Tolerance, & Forecasting
MTBF: Mean Time Between Failures
MTTR: Mean Time to Repair
AFR: Annual Failure Rate,
Availability = MTTF / (MTTF + MTTR)
Problem Sets to Know:
ARM STRUCTURE FILL IN SHEET
Binary Logic Operation Gates
(INV, OR, AND, NOR, NAND, XOR, XNOR, MUX)
Cache problems
Cache Placement (Direct Mapped, Fully Associative, Set Associative)
Integer
- Hexadecimal, 2's Complement, Invert, Negate, 2's Complement, AND, XNOR
-Single and Double Precision, Carry Look Ahead concepts, Fast Multiplication
Floating Point***: Single & Double Precision
Abstraction??
* Layers of Code
- MatLab, Office Products, Web Browsers
- C, C++, C#, Java, Fortran
- Assembly
- Binary
- Microcode
- Nanocode
* Assembler
*Linker
*Loader
RAID Level??
Redundant Array of Inexpensive Disks
0, 1, 2, 3, 4, 5, 6
What the various blocks are and what they do and why they do it??