1/188
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
When the address in the PC is sent to the memory to “fetch” the instruction to be
executed, what does fetching mean?
Reading the instruction by the memory, and sending it back to the CPU.
When can the address in the PC be incremented by the CPU?
After it is sent to the memory
What is the format of each one-byte instruction in a simple accumulator machine?
Op code: The leftmost 3 bits represent the op code, which specifies the operation to be performed.
Memory address: The rightmost 5 bits represent the memory address for the memory operand. The word size is 1 byte.
What is the role of the PC (Program Counter) in the simple accumulator machine?
The PC holds the address of the next instruction to be fetched from memory.
What does INC (Incrementer) do in the simple accumulator machine?
The INC increases the value of the program counter (PC) to point to the next instruction.
What is the function of the ACC (Accumulator) in the simple accumulator machine?
The ACC is a register that stores intermediate arithmetic and logic results.
What is the role of the ALU (Arithmetic Logic Unit) in the simple accumulator machine?
The ALU performs arithmetic and logical operations on data stored in the accumulator.
What does the IR (Instruction Register) do in the simple accumulator machine?
The IR holds the currently fetched instruction before it is decoded and executed.
What happens during the DECODE phase in the simple accumulator machine?
The DECODE phase interprets the instruction stored in the IR and determines the next steps for execution.
What is the purpose of the MAR (Memory Address Register) in the simple accumulator machine?
The MAR holds the address of a memory location that is being accessed for reading or writing.
What does the MDR (Memory Data Register) do in the simple accumulator machine?
The MDR temporarily holds data being transferred to or from memory.
What is the role of the Bus in the simple accumulator machine?
The internal CPU bus transfers data between different components of the CPU, separate from the memory bus.
How are the MAR, MDR, and read enable line used to read data or an instruction from memory in the simple accumulator machine?
4 steps (ARWE)
The CPU puts the address of the memory word in the MAR (m emory address register).
The CPU sets the "read enable" bit.
The CPU waits for a certain number of clock cycles.
At the end of that wait, the value at that memory address appears in the MDR (memory data register)
How are the MAR, MDR, and read enable line used to write data to memory in the simple accumulator machine?
4 steps
The CPU puts the address of the memory word in the MAR (memory address register).
The CPU puts the value to be written in the MDR (memory data register).
The CPU sets the "write enable" bit.
The CPU waits for a certain number of clock cycles, and at that point, the value is in memory at the specified address
How many clock cycles does each instruction require in the simple accumulator machine?
Each instruction takes 6 clock cycles (4 for fetch/decode, and 2 for execution)
Why does the simple accumulator machine only allow sequential execution of
instructions?
The simple accumulator machine only allows for sequential execution of instructions because the PC (program counter) is always incremented by 1. There are no branch/jump instructions, and no call or return statements.
How does use of a cache improve performance?
The CPU will not have to wait for data/instructions to come from memory if they are in the cache, and very often, they are.
About how large are CPU caches?
Typically less than 0.1% of the size of main memory/RAM
When data is moved to/from the cache, are single words moved?
No, whole blocks are moved; a block is usually 16 words or more.
How is the 16-bit address divided in a direct-mapped cache?
A 16-bit memory address is divided into three parts; tag bits, block bits, word bits (byte bits)
What is the purpose of tag bits?
Used to identify which memory location the cache block corresponds to.
What is the purpose of block bits?
Used to identify a block of memory in the cache.
What is the purpose of word bits?
Used to identify the specific word or data location within a block.
What is cache hit rate?
Cache hit rate is the percentage of memory accesses by the CPU that hit in the cache.
What is cache miss rate (definition)?
When the CPU tries to access instructions or data not in the cache, this is called a cache miss.
What are typical cache hit rates we mentioned in class for many types of software?
Hit rates in the 90% range are common in programs (though it depends on the type of software)
What is the difference between write-through and write-back protocol for updating data in the cache in memory?
Write-through protocol: This protocol updates both the value in the cache and in the main memory simultaneously.
Write-back protocol: This protocol updates only the cache location, and sets the cache block's dirty bit to 1. The dirty bit tracks whether the cache block has been written/modified since it was brought into the cache. The memory copy will be updated/written when the cache block is replaced (evicted) in the cache
Which protocol (write-through or write-back) uses a dirty bit, and how is the dirty bit used?
The write-back protocol uses a dirty bit to manage updates between the cache and main memory.
When data is modified in the cache, the write-back protocol updates only the cache location.
At the same time, the cache block's dirty bit is set to 1. This indicates that the cache block has been written to or modified since it was brought into the cache.
When the cache block needs to be replaced (evicted) to make room for new data, the dirty bit is checked.
If the dirty bit is 1, the modified data in the cache block is written back to main memory before the block is replaced. This ensures that the main memory is updated with the latest version of the data
What is the fundamental idea behind pipelining a CPU?
Overlap the stages of execution of various instructions, so that more than one instruction can be executed at any given time.
How many clock cycles are required to execute each instruction if a pipeline is not used in a 4-stage execution model?
Without a pipeline, each instruction requires 4 clock cycles to complete, one for each stage of execution.
How many instructions complete per clock cycle once the pipeline is fully loaded?
Once the pipeline is fully loaded, 1 instruction completes per clock cycle, as each stage processes a different instruction simultaneously.
What is the theoretical improvement in performance when using a pipeline in a 4-stage execution model?
The theoretical improvement in performance is a 4x speedup, as instructions can be completed in every clock cycle after the pipeline is fully loaded, compared to 4 clock cycles per instruction without the pipeline.
What can we say about the general improvement in performance when a pipeline is used?
In general, a pipeline allows multiple instructions to be processed simultaneously at different stages, improving throughput. Once the pipeline is fully loaded, one instruction can be completed per clock cycle, leading to a significant performance boost.
Will a pipeline with more stages offer more improvement in performance, or less?
A pipeline with more stages can offer more improvement in performance by allowing more instructions to be processed simultaneously. However, the performance gain is not always linear.
What did we say is a typical number of stages in modern pipelined CPUs?
More than 20
What are the potential problems in a pipeline?
Instruction hazards and data hazards.
How much do pipeline hazards typically reduce the theoretical maximum performance improvement?
Less than 10%
What two values are used in Boolean logic>
0 and 1.
What is the truth table for Boolean AND?
What is the truth table for Boolean OR?
What is the truth table for Boolean NOT?
What is the precedence rule for AND/OR/NOT.
The precedence follows the "NAO Rule": NOT first, then AND, then OR.
What is an important practical use of Boolean identities/laws?
They can be used to simplify circuits
Why is circuit simplification important?
It can reduce cost of implementing a circuit, power use, and heat.
What are the symbols for AND, OR, and NOT gates?
What is the truth table for XOR?
What is the truth table for NAND?
What is the truth table for NOR?
What are the symbols for XOR, NAND, and NOR gates?
Why are NAND and NOR gates useful?
They are cheap to implement and are universal
What are the two inputs and two outputs of a half adder?
Input: X and Y
Output: Sum and Carry
What are the three inputs and two outputs of a full adder?
Inputs: X, Y, and Carry In
Outputs: Sum, Carry Out
What is a decoder used for?
Memory address decoder, which is responsible for selecting a specific memory location based on an input address. This type of decoder is commonly found in memory systems, where it activates a particular memory cell (or byte) for reading or writing
If a decoder has n inputs, how many outputs does it have?
Address decoder with n inputs can select any one of 2n bytes of memory
For a given input, how many of the output lines of a decoder can have the value 1?
Only one output line can have the value 1
What is a multiplexor used to do?
Selected a single output from several inputs (the bit value on the input selected will be passed to the output).
If a multiplexor has n inputs, how many control lines are needed to be able to select one of the n input tunes to pass to the output?
Number of control lines = log2(n)
What is a combinational circuit?
A combinational circuit has no memory (the values of its output ports depends only on the current values on its input ports). There is no dependence in history (prior inputs).
What is combinational circuit used for?
Combinational circuits are used to implement operations in ALU units (addition/subtraction, multiplication, division, AND, OR, XOR, etc.) in a CPU (and for other purposes also, such sa decoders and multiplexors).
What is a sequential circuit?
The value on the output ports of a sequential circuit will, in general, depend on the current values on its input ports as well as on past values (so these circuits have memory)
What is a sequential circuit used for?
Sequential circuits are used for all forms of memory (registers, cache, main memory (RAM), and even digital disk drives (SSDs) [but NOT for traditional hard-drives (magnetic disks)]
Which type of circuit, combinational or sequential, is used to implement memory?
Sequential circuits.
Which type of circuit, combinational or sequential, are full-adders, multiplexors, and decoders?
Combinational circuits.
Which type of circuit, combinational or sequential, uses a clock input?
Sequential circuits.
Which combination of input bit values did we say must be prevented from occurring for an SR-gate (flip-flop, latch), because it makes the circuit unstable (the output is unknown)?
S = 1 and R = 1.
When a clocked SR-gate/latch/flip-flop is used to implement a D gate/latch/flip-flop, which input (S or R) is used for D, and which (S or R) is used for ~D (inverted D)?
D (Data input) is connected to the S (Set) input.
~D (Inverted D) is connected to the R (Reset) input.
How many data movement instructions are there?
4
Name the 4 data movement instructions.
They are rrmovq, irmovq, rmmovq and mrmovq.
What does rrmovq do in Y86?
Copies a value between registers
What does irmovq do in Y86?
Used to place 8 byte immediate (constant) values (addresses of labels or numeric literal constants) into a register
What does rmmovq do in Y86?
Stores a word in memory
What does mrmovq do in Y86?
Loads a word from memory
Is a source operand read, written, or both in Y86?
Only read, never written
Is a destination operand read, written, or both in Y86?
For ALU instructions, both read and written; for other types of instructions, only written.
What is an immediate operant in Y86?
A constant value; always stored as part of the encoded bit string for the instruction.
What does the Y6 address expression DISP(BASE) mean?
It means accessing memory at the address stored in BASE, plus an optional DISP (displacement).
BASE is a register holding an address
DISP is a constant added to that address
Example: mrmovq 8(%rbx), %rax reads from rbx + 8 and stores the value in rax
What are the 4 ALU instructions in Y86?
addq rA, rB
subq rA, rB
andq rA, rB
xorq rA, rB
How does addq work? How many register operands?
addq rA, rB: This instruction adds the value in register rA to the value in register rB and stores the result in register rB i.e. R[rB] ← R[rB] + R[rA].
2 Register operands
How does subq work? How many register operands?
subq rA, rB: This instruction subtracts the value in register rA from the value in register rB and stores the result in register rB i.e. R[rB] ← R[rB] - R[rA].
2 Register operands
How does andq work? How many register operands?
andq rA, rB: This instruction performs a bitwise AND operation between the values in register rA and register rB, storing the result in register rB i.e. R[rB] ← R[rB] & R[rA].
2 Register operands
How does xorq work? How many register operands?
xorq rA, rB: This instruction performs a bitwise XOR operation between the values in register rA and register rB, storing the result in register rB i.e. R[rB] ← R[rB] ^ R[rA]
2 Register operands
How do Y86 ALU instructions affect flags, and do other instructions change them?
Y86 ALU instructions set three flags:
ZF (Zero flag): Set if the result is 0
SF (Sign flag): Set if the result is negative
OF (Overflow flag): Set if there is signed overflow
Other instructions (like data movement and control flow) do not change the flags
What does the unconditional jump instruction (jmp) in Y86 do?
The jmp instruction changes the program counter (PC) to a new address, causing execution to continue from there. It always jumps, regardless of flag values.
Example: jmp Label makes the program jump to Label unconditionally.
How do conditional jump instructions work in Y86, and what flag conditions must be met for each jump?
Conditional jumps in Y86 check flag values before jumping:
je (jump if equal): Jumps if ZF = 1 (last result was zero).
jne (jump if not equal): Jumps if ZF = 0 (last result was not zero).
jl (jump if less than): Jumps if SF ≠ OF (negative result in signed comparison).
jle (jump if less than or equal): Jumps if ZF = 1 or SF ≠ OF (zero or negative result).
jg (jump if greater): Jumps if ZF = 0 and SF = OF (positive result, not zero).
jge (jump if greater than or equal): Jumps if SF = OF (positive or zero result).
What does it mean to say a jump (or branch) is taken in Y86?
This means that the program counter is updated to the destination address specified in the jump instruction, and the program continues execution from that new address.
What does it mean to say that a jump (or branch) is not taken?
This means that the program counter is not updated to the destination address specified in the jump instruction. Instead, the program continues execution with the next sequential instruction in memory.
What are labels in Y86?
Labels are strings that mark address in memory. The assembler converts the table to the corresponding address, so the CPU does not see the label; it only sees addresses.
What two things does the call instruction do in Y86, and what happens to the PC and stack?
Pushes the return address onto the stack (the address of the next instruction after the call).
Updates the PC to the function’s address (so execution jumps to the function).
After call Label, the PC holds Label’s address.
The return address (next instruction after call) is stored at the top of the stack.
What two things does the ret instruction in Y86 do?
The ret instruction in Y86 performs two main actions:
Pops the return address from the stack.
Writes the return address to the program counter (PC).
Because the call instruction pushes a return address onto the stack, a stack is required to call functions.
What is the portion of the stack that each subroutine (function, procedure, method) is given to use called?
A frame
Why does each subroutine (other than main) save the calling subroutine’s rbp (base/frame pointer) before it sets its own base/frame pointer?
To ensure that the calling subroutine’s base/frame pointer can be restored before returning to the calling subroutine.
When new data is pushed onto the stack, how does the stack grow in memory?
It grows downward; that is, it grows toward lower addresses
What does the pushq instruction do in Y86, and how does it change rsp?
pushq rA stores rA's value on the stack.
rsp decreases by 8 (stack grows downward).
The value of rA is saved at the new rsp address.
What does the popq instruction do in Y86, and how does it change rsp?
popq rA loads a value from the stack into rA.
The value at rsp is copied into rA.
rsp increases by 8 (stack shrinks upward).
How are parameters passed on the stack in Y86 (and also in Intel)?
They are pushed onto the stack in reverse order; the last parameter is pushed first, the second last second, ... and the first parameter is pushed last.
What is always pushed onto the stack after all of the parameters (if any) when a subroutine is called?
The return address, when the call instruction is executed to call the suborutine.
How many instructions in assembly language correspond to a single machine language instruction?
In assembly language, there is one instruction per machine language instruction
Does every type of CPU (Intel, ARM, Mac M1, SPARC, PowerPC, MIPS etc.) use the same assembly language?
No, each CPU has its own assembly language.
How many assembly language instructions does a high-level language statement (Java, C++, C, etc.) correspond to?
High level language statements typically require multiple assembly instructions.
Which real CPU architecture is the Y86-64 simulated CPU based on?
The Y86-64 assembly language is based on the Intel X86-64 architecture, which is the assembly language used for Intel CPUs.