Computer Architecture Final Exam

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/38

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

39 Terms

1
New cards

What are the five main architectural components of a computer?

Input, Output, Memory, Datapath, Control

2
New cards

Why did processor clock speeds, which had increased for decades, finally stop increasing in the last decade?

The power wall: as frequency increases, so does power consumption, and eventually power consumption became too high to allow feasible cooling or battery life.

3
New cards

Where are increasing transistor counts being used now?

What about in the past?

More processor cores.

Deeper pipelines.

4
New cards

What is a meaningless metric?

A metric that doesn't convey any true performance and may vary entirely independently from performance.

5
New cards

x86 ISA has shown incredible resisiency due to backwards compatibility. Why is backwards compatibility so important?

Given this, why is ARM popular?

Software (binaries) generally lives much longer than hardware and there's more money invested in software than hardware.

ARM is a new platform and has no old binaries that need continued support.

6
New cards

Many x86 instructions require the destination register to also be one of the source registers... Why?

This eliminates one of the register specifiers from the instruction op code, i.e., requires fewer bits.

7
New cards

What is the RISC ISA and what are the advantages?

Which ISA is this?

Make the most simple instructions possible at high speed

Fewer instruction formats and addressing modes leads to simpler hardware

MIPS

8
New cards

What is the CISC ISA and what are the advantages?

Which ISA is this?

Complete the task in as few lines of code as possible

Higher code density

Operations work directly on memory

Larger immediate fields

x86

9
New cards

What is the ideal speed of of a N stage pipeline?

What issues prevent ideal speedup?

N- times speed up

Work cannot be divided equally into N parts resulting in wasted execution time.

10
New cards

What problem is caused by long pipelines?

Branch/flush penalties become very high.

11
New cards

Five basic pipeline stages

IF - Fetch

ID - Identify

EXE - Execute

MEM - Memory

WB - Writeback

12
New cards

What does branch prediction eliminate?

Stalls due to control hazards

13
New cards

What does pipeline forwarding eliminate?

Stalls due to data hazards

14
New cards

What are the disadvantages of compiler managed static multiple issue?

Different implementations will require different instruction placement because branch outcomes and memory stalls are not statistically predictable.

15
New cards

How does multiple issue impact data hazards and forwarding paths in an in-order-pipeline?

More instructions executing in parallel means there are more hazards to check for and more forwarding paths are required.

16
New cards

What is the best CPI a superscalar CPU with N pipelines can achieve? What performance metric do we use instead of CPI to discuss superscalar pipelines?

The best CPI possible is 1/N, though never achieved.

IPC - Instructions per cycle.

17
New cards

Which hazard: a required resource is busy

structure hazard

18
New cards

Which hazard: Need to wait for previous instruction to complete its data read/write

Data hazard

19
New cards

Which hazard: Deciding on control action depends on previous instruction

Control hazard

20
New cards

Which type of locality: Items access recently are likely to be accessed again soon (Instructions in a loop)

Temporal Locality

21
New cards

Which type of locality: Items near those accessed recently are likely to be accessed soon (array data)

Spatial locality

22
New cards

In the following C++ code segment for a matrix-matrix multiply, specify for each of the

matrices A, B, and C whether the matrix's accesses primarily have spatial, temporal, or no

locality. (Accesses fewer than N words apart are considered to have spatial locality).

for(k = 0; k < N; k++)

for(i = 0; i < N; i++)

for(j = 0; j < N; j++)

C[i][j] += A[i][k] * B[k][j];

A: temporal (loop invariant to innermost loop)

B: spatial

C: spatial

23
New cards

Suppose you are able to parallelize 80% of a particular program such that it can make use of

1000 parallel processors. What speedup will you achieve for the entire program?

Speedup = 1/[(1 - affected) + (amount_improved / affected)] = 1 / [(1 - .8) + (.8 / 1000)]

= 1 / 0.2008 = 4.98x

24
New cards

In a cache-coherent multi-core system, a cache sometimes has to invalidate one of its entries... What happens to trigger this invalidate?

A tag match on an invalidation message on the bus, indicating another core is writing the same block.

25
New cards

How do GPU's hide long latency operations?

Hardware multi-threading, they switch to another available thread.

26
New cards

What are the advantages of shared-memory multiprocessing systems?

Fast communication through shared memory

Lower administration costs

27
New cards

What are the advantages of distributed systems?

Easier to design, program, and scale

No need for special OS

28
New cards

Moore's Law

The number of transistors doubles ~ every two years

29
New cards

Amdahl's Law

knowt flashcard image
30
New cards

What are 32 consecutive threads called in a GPU?

Warp

31
New cards

How does lookup work in a 4-way set

associative cache?

The set is referenced by an index from a simple hashing.

32
New cards

How is a branch predictor structured?

With a two bit counter and a TLP.

33
New cards

Average CPI calculation

For the multi-cycle MIPS

Load 5 cycles

Store 4 cycles

R-type 4 cycles

Branch 3 cycles

Jump 3 cycles

If a program has

50% R-type instructions

10% load instructions

20% store instructions

8% branch instructions

2% jump instructions

CPI = (4x50 + 5x10 + 4x20 + 3x8 + 3x2)/100 = 3.6

34
New cards

Comparison using classic CPU Performance Equation

The three factors are, in order, known as the instruction count (IC), clocks per instruction (CPI), and clock time (CT). CPI is computed as an effective value.

35
New cards

What's the only meaningful metric?

Wall clock time

36
New cards

Weak vs. strong scaling for parallelism

Weak: Run a larger problem, greater problem size

Strong: run a problem faster, same problem size

37
New cards

Average Memory Access Time (AMAT)

Hit Time + Miss Rate * Miss Penalty

38
New cards

Five Forwarding Paths

EXE/MEM -> ID

EXE/MEM -> EXE

MEM/WB -> ID

MEM/WB -> EXE

MEM/WB -> MEM

39
New cards

Memory hierarchy

SRAM -> DRAM -> SSD -> HDD