1/19
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
benchmarking
assessing performance of a system in comparison to other systems
MIPS
millions of instructions per second
doesn’t factor instruction complexity so can’t be used as a measure
FLOPS
floating point operations per second
focuses on a subset of instructions with similar complexity
no consensus on what a floating point operation is
synthetic benchmark
program used to measure performance of a system
whetstone, linpack, dhrystone
easy to optimise as 1 program so not a good benchmark
benchmark suites
test suites rather than 1 program
harder to optimise
new benchmarks can be swapped in or out
SPEC
standard performance evaluation corporation
benchmark suite for programmes with integer/floating point operations
CINT2017 (integer operations)
CFP2017 (floating point operations)
geometric mean
multiple all numbers
take root of N (number of elements)
execution cycles equation
execution cycles = clock cycles per instruction x number of instructions
RISC pros
decreases execution times by reducing clock cycles per instruction
less instructions with a fixed length
CU hardwired for maximum speed
instructions take 1 clock cycle to execute so easy pipelining
RISC cons
long programs
only load/store instructions for memory access
few addressing modes
requires more register so expensive
CISC pros
improves performance by reducing number of instructors per program
lots of complex instructions with variable length
easier to program as fewer instructions needed for complex programs
many instructions can access memory with many addressing modes
fewer registers needed as fewer operands per instruction
CISC cons
CU needs special circuits for interpreting instructions
instructions take multiple clock cycles to execute so harder to pipeline
parameter passing through memory causes strain on bottleneck
delayed branching
fetch instructions after conditional
3 types of branch predictions
random: just hope right one is picked
static: analyse code at compile time, correct 80% of time
dynamic: keep track of how often each branch is taken during run time
needs training time to warm up
4 methods of code optimisation
use appropriate data types
eliminate unnecessary branches by combining conditions
use multiplication over division as simpler computation
profile program by identifying parts that use most processor time
temporal locality
if location has been accessed recently it’ll be accessed again
spatial locality
if location has been accessed recently then nearby locations are likely to be accessed next
sequential locality
if location has been accessed recently then next/previous location is likely to be accessed next
write through cache
when data in cache is changed, write to main memory at same time
improves reliability but reduced performance
write back cache
wait for an efficient point in time to write changed cache data to main memory
write operations piggy banked onto read operations
improved performance of write operations
reduces performance for read operations so reduces reliability