Architect Chapter 5

CSC 320: Computer Organization and Architecture - Chapter 5.1: Introduction

Temporal locality: The locality principle stating that if a data location is referenced then it will tend to be referenced again soon.

Spatial locality: The locality principle stating that if a data location is referenced, data locations with nearby addresses will tend to be referenced soon.

Bob is building a fence behind his house. He uses a hammer to attach a board to the rail. Bob then measures and cuts the next board.
The likelihood that Bob will need the hammer again is an example of _____ locality.

- Temporal

Bob is building a fence behind his house. He grabs a hammer from the garage. Bob will likely need additional tools stored in the garage, so Bob also grabs nails, a shovel, and a level.
The likelihood that Bob will need resources stored together in the garage is an example of _____ locality.

- Spatial

Given the following loop, the high likelihood of accessing multiple elements within array A is an example of _____ locality.
while (i < 10){

A[i] = A[i] + 2; i = i + 1; }

- Spatial

Given the following loop, the high likelihood of accessing i = i + 1 repeatedly is an example of _____ locality.
while (i < 10){

A[i] = A[i] + 2; i = i + 1; }

- Temporal

True or False. Instructions may exhibit temporal locality, but never spatial locality.

- False

True or False. Data may exhibit spatial locality, but never temporal locality.

- False

Memory hierarchy: A structure that uses multiple levels of memories; as the distance from the processor increases, the size of the memories and the access time both increase while the cost per bit decreases.

Block (or line): The minimum unit of information that can be either present or not present in a cache.

The size and access time of each memory increases as the distance from the processor increases.

The smallest memory is closest to the processor and contains a subset of the data in any further away memory.

The minimum unit of information is a block or a line, not a bit. In the library analogy, a block of information relates to a book, rather than a single page or chapter of the book.

**
Hit rate**: The fraction of memory accesses found in a level of the memory hierarchy.

Miss rate: The fraction of memory accesses not found in a level of the memory hierarchy.

**
Hit time**: The time required to access a level of the memory hierarchy, including the time needed to determine whether the access is a hit or a miss.

Miss penalty: The time required to fetch a block into a level of the memory hierarchy from the lower level, including the time to access the block, transmit it from one level to the other, insert it in the level that experienced the miss, and then pass the block to the requestor.

A memory hierarchy is composed of an upper level and a lower level. Data is requested by the processor. 9 out of 10 requests find the data in the upper level and returns the data in 0.4 ns. The remaining requests require 0.7 ns to return the data.

Determine the corresponding values for the upper level memory.

Hit Rate = .9

Miss rate = .1

Miss penalty = .7

Hit time = .4

True or False. Memory hierarchies take advantage of temporal locality.

- True

True or False. On a read, the value returned depends on which blocks are in the cache.

- False

True or False. Most of the cost of the memory hierarchy is at the highest level.

- False

True or False. Most of the capacity of the memory hierarchy is at the lowest level.

- True

SRAM (static random access memory)

DRAM (dynamic random access memory)

SRAM OR DRAM? Used to implement the memory levels closest to the processor.

- SRAM

SRAM OR DRAM? Has fewer transistors per bit of memory.

- DRAM

SRAM OR DRAM? Requires a periodic refresh.

- DRAM

SRAMs are simply integrated circuits that are memory arrays with (usually) a single access port that can provide either a read or a write. SRAMs have a fixed access time to any datum, though the read and write access times may differ.

In a dynamic RAM (DRAM), the value kept in a cell is stored as a charge in a capacitor. A single transistor is then used to access this stored charge, either to read the value or to overwrite the charge stored there. Because DRAMs use only a single transistor per bit of storage, they are much denser and cheaper per bit than SRAM. As DRAMs store the charge on a capacitor, it cannot be kept indefinitely and must periodically be refreshed.

Modern DRAMS are organized in banks. Each bank consists of a series of _____.

- Rows

DRAMs enable fast access to data by transferring bits in bursts. Successive bits are transferred on each _____.

- Clock Edge

Between 1980 and 2012, the average column access time to an existing row _____.

- Decreased

Which of the following is NOT a technique that improves a DRAM's performance?

- DIMM

Flash memory is a type of electrically erasable programmable read-only memory (EEPROM)

Track: One of thousands of concentric circles that make up the surface of a magnetic disk.

Sector: One of the segments that make up a track on a magnetic disk; a sector is the smallest amount of information that is read or written on a disk

Seek: The process of positioning a read/write head over the proper track on a disk.

Rotational latency: Also called rotational delay. The time required for the desired sector of a disk to rotate under the read/write head; usually assumed to be half the rotation time.

A magnetic disk is a type of _____.

- Mechanical device

Writes to the same location in a _____ can wear out memory bits

- Flash Memory

Memories in personal mobile devices are typically _____.

- Flash Memory

True or False. In a magnetic disk, sequential block numbers are placed next to one another on a track. Ex: Block 207 is placed after block 206.

- False

True or False. Magnetic disks are volatile.

- False

Direct-mapped cache: A cache structure in which each memory location is mapped to exactly one location in the cache

Determine the cache index given the direct-mapped cache size and block address.
Type the cache index as a binary value. Ex: 110

1)Direct-mapped cache size: 8 one-word blocks
Block address: 00011

= 011

Direct-mapped cache size: 16 one-word blocks
Block address: 00101100

= 1100

Tag: A field in a table used for a memory hierarchy that contains the address information required to identify whether the associated block in the hierarchy corresponds to a requested word.

**
Valid bit**: A field in the tables of a memory hierarchy that indicates that the associated block in the hierarchy contains valid data.

Assume a direct-mapped cache with 32 blocks and a block size of 8 bytes.

1)Byte address 400 maps to block address _____.

= 400/8 = 50

2)Byte address 400 maps to block number _____.

= 50 modulo 32 = 18

3)Byte address 360 maps to block number _____

= (360/8) modulo 32 = 13

**
Cache miss**: A request for data from the cache that cannot be filled because the data is not present in the cache.

True or False. The miss rate may increase if the block size becomes a significant fraction of the cache size.

- True

Which of the following items does NOT contribute to the cost of a miss penalty?

- Latency to determine the cache block

The processing of a cache miss creates a _____.

- Pipeline stall

If an instruction access results in a miss, then the address of the instruction at _____ is fetched from the memory.

- PC - 4

Write-through: A scheme in which writes always update both the cache and the next lower level of the memory hierarchy, ensuring that data is always consistent between the two.

Write buffer: A queue that holds data while the data is waiting to be written to memory.

Write-back: A scheme that handles writes by updating values only to the block in the cache, then writing the modified block to the lower level of the hierarchy when the block is replaced.

Split cache: A scheme in which a level of the memory hierarchy is composed of two independent caches that operate in parallel with each other, with one handling instructions and one handling data.

The speed of the memory system affects the designer's decision on the size of the cache block. Complete the following cache designer guidelines.

The shorter the memory latency, the _____ the cache block.

- Smaller

The higher the memory bandwidth, the _____ the cache block.

- Larger

If the clock rate is increased without changing the memory system, the fraction of execution time due to cache misses _____ relative to total execution time.

- Increases

AMAT considers the average time to access data for _____.

- Both hits and misses

**
Fully associative cache**: A cache structure in which a block can be placed in any location in the cache.

Set-associative cache: A cache that has a fixed number of locations (at least two) where each block can be placed

Assume a cache with 8 one-word blocks. Determine the cache position given the cache configuration and memory block.

1)Cache configuration: Direct-mapped
Memory block: 15

- Set #7 (15 mod 8 = 7)

Cache configuration: Fully associative
Memory block: 15

- Any of the eight cache block

Cache configuration: Two-way set-associative
Memory block: 15

- Set #3 (15 mod 4 = 3)

Least recently used (LRU): A replacement scheme in which the block replaced is the one that has been unused for the longest time.

The _____ of every cache block within the appropriate set of a set-associative cache is checked for a match against the memory block address.

- Tag

A four-way set-associative cache with 32-one word blocks requires _____ comparators to compare the tags of each element within the set.

- 4

A direct mapped cache with 32-one word blocks requires _____ comparator(s) to compare the tags of of an element with the memory block address.

- 1

Multilevel cache: A memory hierarchy with multiple levels of caches, rather than just a cache and main memory.

The second-level cache in a multi-level cache is typically used to reduce the multi-level cache's _____.

- Miss penalty

Refer to the above figure (COD Figure 5.19 (Comparing Quicksort and Radix Sort …)). As the number of items to sort increases, Radix Sort requires _____ clock cycles compared to Quicksort.

- More

True or False. The cache blocked version of DGEMM improves performance by operating on submatrices, rather than operating on entire rows or columns of an array.

- True

True or False. Calls to do_block() degrade performance due to parameter passing and return address bookkeeping instructions associated with function calls.

- False

True or False. The performance of the cached blocked version of DGEMM is halved for the largest matrix.

- False

Global miss rate: The fraction of references that miss in all levels of a multilevel cache.

Local miss rate: The fraction of references to one level of a cache that miss; used in multilevel hierarchies.

First-level caches are more concerned about _____.

- Hit time

Second-level caches are more concerned about _____.

- Miss rate