Assembly Language x86 - Chapter 4 Notes

Data Transfer Instructions

  • Operand Types
    • Immediate: A constant integer (8, 16, or 32 bits). Value is encoded within the instruction.
    • Register: The name of a register. Register name is converted to a number and encoded within the instruction.
    • Memory: Reference to a location in memory. Memory address is encoded within the instruction, or a register holds the address of a memory location.
  • Instruction Operand Notation
    • reg8: 8-bit general-purpose register (AH, AL, BH, BL, CH, CL, DH, DL).
    • reg16: 16-bit general-purpose register (AX, BX, CX, DX, SI, DI, SP, BP).
    • reg32: 32-bit general-purpose register (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP).
    • reg: Any general-purpose register.
    • sreg: 16-bit segment register (CS, DS, SS, ES, FS, GS).
    • imm: 8-, 16-, or 32-bit immediate value.
    • imm8: 8-bit immediate byte value.
    • imm16: 16-bit immediate word value.
    • imm32: 32-bit immediate doubleword value.
    • reg/mem8: 8-bit operand, which can be an 8-bit general register or memory byte.
    • reg/mem16: 16-bit operand, which can be a 16-bit general register or memory word.
    • reg/mem32: 32-bit operand, which can be a 32-bit general register or memory doubleword.
    • mem: An 8-, 16-, or 32-bit memory operand.
  • Direct Memory Operands
    • A direct memory operand is a named reference to storage in memory.
    • The named reference (label) is automatically dereferenced by the assembler.
    • Example:
      assembly .data var1 BYTE 10h .code mov al,var1 ; AL = 10h mov al,[var1] ; AL = 10h (alternate format)
  • MOV Instruction
    • Move from source to destination.
    • Syntax: MOV destination,source
    • Rules:
      • No more than one memory operand permitted.
      • CS, EIP, and IP cannot be the destination.
      • No immediate to segment moves.
    • Example:
      assembly .data count BYTE 100 wVal WORD 2 .code mov bl,count mov ax,wVal mov count,al ; mov al,wVal ; error ; mov ax,count ; error ; mov eax,count ; error
  • Zero Extension
    • The MOVZX instruction fills (extends) the upper half of the destination with zeros when copying a smaller value into a larger destination.
    • The destination must be a register.
    • Example:
      assembly mov bl,10001111b movzx ax,bl ; zero-extension
  • Sign Extension
    • The MOVSX instruction fills the upper half of the destination with a copy of the source operand's sign bit.
    • The destination must be a register.
    • Example:
      assembly mov bl,10001111b movsx ax,bl ; sign extension
  • XCHG Instruction
    • XCHG exchanges the values of two operands.
    • At least one operand must be a register.
    • No immediate operands are permitted.
    • Example:
      assembly .data var1 WORD 1000h var2 WORD 2000h .code xchg ax,bx ; exchange 16-bit regs xchg ah,al ; exchange 8-bit regs xchg var1,bx ; exchange mem, reg xchg eax,ebx ; exchange 32-bit regs ; xchg var1,var2 ; error: two memory operands
  • Direct-Offset Operands
    • A constant offset is added to a data label to produce an effective address (EA).
    • The address is dereferenced to get the value inside its memory location.
    • Example:
      assembly .data arrayB BYTE 10h,20h,30h,40h .code mov al,arrayB+1 ; AL = 20h mov al,[arrayB+1] ; alternative notation

Addition and Subtraction

  • INC and DEC Instructions
    • Add 1 or subtract 1 from the destination operand.
    • Operand may be a register or memory.
    • INC destination: Destination = Destination + 1
    • DEC destination: Destination = Destination – 1
    • Examples:
      assembly .data myWord WORD 1000h myDword DWORD 10000000h .code inc myWord ; 1001h dec myWord ; 1000h inc myDword ; 10000001h mov ax,00FFh inc ax ; AX = 0100h mov ax,00FFh inc al ; AX = 0000h
  • ADD and SUB Instructions
    • ADD destination, source: Destination = Destination + Source
    • SUB destination, source: Destination = Destination – Source
    • Same operand rules as for the MOV instruction.
    • Examples:
      assembly .data var1 DWORD 10000h var2 DWORD 20000h .code mov eax,var1 ; 00010000h add eax,var2 ; 00030000h add ax,0FFFFh ; 0003FFFFh add eax,1 ; 00040000h sub ax,1 ; 0004FFFFh
  • NEG (Negate) Instruction
    • Reverses the sign of an operand.
    • Operand can be a register or memory operand.
    • Example:
      assembly .data valB BYTE -1 valW WORD +32767 .code mov al,valB ; AL = -1 neg al ; AL = +1 neg valW ; valW = -32767
    • The processor implements NEG using the internal operation: SUB 0, operand.
    • Any nonzero operand causes the Carry flag to be set.
  • Implementing Arithmetic Expressions
    • HLL compilers translate mathematical expressions into assembly language.
    • Example: Rval = -Xval + (Yval – Zval)
      assembly .data Rval DWORD ? Xval DWORD 26 Yval DWORD 30 Zval DWORD 40 .code mov eax,Xval neg eax ; EAX = -26 mov ebx,Yval sub ebx,Zval ; EBX = -10 add eax,ebx mov Rval,eax ; -36
  • Flags Affected by Arithmetic
    • The ALU has a number of status flags that reflect the outcome of arithmetic (and bitwise) operations, based on the contents of the destination operand.
    • Zero Flag (ZF): Set when the destination equals zero.
    • Sign Flag (SF): Set when the destination is negative.
    • Carry Flag (CF): Set when an unsigned value is out of range.
    • Overflow Flag (OF): Set when a signed value is out of range.
    • The MOV instruction never affects the flags.
  • Zero Flag (ZF)
    • The Zero flag is set when the result of an operation produces zero in the destination operand.
    • A flag is set when it equals 1; a flag is clear when it equals 0.
    • Examples:
      assembly mov cx,1 sub cx,1 ; CX = 0, ZF = 1 mov ax,0FFFFh inc ax ; AX = 0, ZF = 1 inc ax ; AX = 1, ZF = 0
  • Sign Flag (SF)
    • The Sign flag is set when the destination operand is negative.
    • The flag is clear when the destination is positive.
    • The sign flag is a copy of the destination's highest bit.
    • Examples:
      assembly mov cx,0 sub cx,1 ; CX = -1, SF = 1 add cx,2 ; CX = 1, SF = 0 mov al,0 sub al,1 ; AL = 11111111b, SF = 1 add al,2 ; AL = 00000001b, SF = 0
  • Signed and Unsigned Integers
    • All CPU instructions operate exactly the same on signed and unsigned integers.
    • The CPU cannot distinguish between signed and unsigned integers.
    • YOU, the programmer, are solely responsible for using the correct data type with each instruction.
  • Overflow and Carry Flags
    • How the ADD instruction affects OF and CF:
      • CF = (carry \text{ out of the MSB})
      • OF = CF \text{ XOR MSB}
    • How the SUB instruction affects OF and CF:
      • CF = \text{INVERT}(carry \text{ out of the MSB})
      • \text{negate the source and add it to the destination}
      • OF = CF \text{ XOR MSB}
    • MSB = Most Significant Bit (high-order bit)
    • XOR = eXclusive-OR operation
    • NEG = Negate (same as SUB 0,operand )
  • Carry Flag (CF)
    • The Carry flag is set when the result of an operation generates an unsigned value that is out of range (too big or too small for the destination operand).
    • Examples:
      assembly mov al,0FFh add al,1 ; CF = 1, AL = 00 ; Try to go below zero: mov al,0 sub al,1 ; CF = 1, AL = FF
  • Overflow Flag (OF)
    • The Overflow flag is set when the signed result of an operation is invalid or out of range.
    • Examples:
      assembly ; Example 1 mov al,+127 add al,1 ; OF = 1, AL = ?? ; Example 2 mov al,7Fh ; OF = 1, AL = 80h add al,1
    • The two examples are identical at the binary level because 7Fh equals +127.
    • To determine the value of the destination operand, it is often easier to calculate in hexadecimal.
  • Rule of Thumb for Overflow Flag
    • When adding two integers:
      • The Overflow flag is only set when two positive operands are added and their sum is negative, or two negative operands are added and their sum is positive.

Data-Related Operators and Directives

  • OFFSET Operator
    • OFFSET returns the distance in bytes of a label from the beginning of its enclosing segment.
      • Protected mode: 32 bits
      • Real mode: 16 bits
    • The Protected-mode programs use only a single segment (flat memory model).
    • The value returned by OFFSET is a pointer.
    • Relating to C/C++:
      c++ // C++ version: char array[1000]; char * p = array;
      assembly ; Assembly language: .data array BYTE 1000 DUP(?) .code mov esi,OFFSET array
  • PTR Operator
    • Overrides the default type of a label (variable).
    • Provides the flexibility to access part of a variable.
    • Little endian order is used when storing data in memory.
    • Example:
      assembly .data myDouble DWORD 12345678h .code ; mov ax,myDouble ; error – why? mov ax,WORD PTR myDouble ; loads 5678h mov WORD PTR myDouble,4321h ; saves 4321h
  • Little Endian Order
    • Little endian order refers to the way Intel stores integers in memory.
    • Multi-byte integers are stored in reverse order, with the least significant byte stored at the lowest address.
    • For example, the doubleword 12345678h would be stored as 78 56 34 12.
    • When integers are loaded from memory into registers, the bytes are automatically re-reversed into their correct positions.
  • TYPE Operator
    • The TYPE operator returns the size, in bytes, of a single element of a data declaration.
    • assembly .data var1 BYTE ? var2 WORD ? var3 DWORD ? var4 QWORD ? .code mov eax,TYPE var1 ; 1 mov eax,TYPE var2 ; 2 mov eax,TYPE var3 ; 4 mov eax,TYPE var4 ; 8
  • LENGTHOF Operator
    • The LENGTHOF operator counts the number of elements in a single data declaration.
    • assembly .data byte1 BYTE 10,20,30 ; 3 array1 WORD 30 DUP(?),0,0 ; 32 array2 WORD 5 DUP(3 DUP(?)) ; 15 array3 DWORD 1,2,3,4 ; 4 digitStr BYTE "12345678",0 ; 9 .code mov ecx,LENGTHOF array1 ; 32
  • SIZEOF Operator
    • The SIZEOF operator returns a value that is equivalent to multiplying LENGTHOF by TYPE.
    • assembly .data byte1 BYTE 10,20,30 ; 3 array1 WORD 30 DUP(?),0,0 ; 64 array2 WORD 5 DUP(3 DUP(?)) ; 30 array3 DWORD 1,2,3,4 ; 16 digitStr BYTE "12345678",0 ; 9 .code mov ecx,SIZEOF array1 ; 64
  • LABEL Directive
    • Assigns an alternate label name and type to an existing storage location.
    • LABEL does not allocate any storage of its own.
    • Removes the need for the PTR operator.
    • assembly .data dwList LABEL DWORD wordList LABEL WORD intList BYTE 00h,10h,00h,20h .code mov eax,dwList ; 20001000h mov cx,wordList ; 1000h mov dl,intList ; 00h

Indirect Addressing

  • Indirect Operands
    • An indirect operand holds the address of a variable, usually an array or string.
    • It can be dereferenced (just like a pointer).
    • assembly .data val1 BYTE 10h,20h,30h .code mov esi,OFFSET val1 mov al,[esi] ; dereference ESI (AL = 10h) inc esi mov al,[esi] ; AL = 20h inc esi mov al,[esi] ; AL = 30h
    • Use PTR to clarify the size attribute of a memory operand.
    • assembly .data myCount WORD 0 .code mov esi,OFFSET myCount ; inc [esi] ; error: ambiguous inc WORD PTR [esi] ; ok
  • Array Sum Example
    • assembly .data arrayW WORD 1000h,2000h,3000h .code mov esi,OFFSET arrayW mov ax,[esi] add esi,2; or: add esi,TYPE arrayW add ax,[esi] add esi,2 add ax,[esi]; AX = sum of the array
    • Indirect operands are ideal for traversing an array.
    • Register in brackets must be incremented by a value that matches the array type.
  • Indexed Operands
    • An indexed operand adds a constant to a register to generate an effective address.
    • Two notational forms:
      • [label + reg]
      • label[reg]
    • assembly .data arrayW WORD 1000h,2000h,3000h .code mov esi,0 mov ax,[arrayW + esi] ; AX = 1000h mov ax,arrayW[esi] ; alternate format add esi,2 add ax,[arrayW + esi] ; etc.
  • Index Scaling
    • Scale an indirect or indexed operand to the offset of an array element by multiplying the index by the array's TYPE.
    • assembly .data arrayB BYTE 0,1,2,3,4,5 arrayW WORD 0,1,2,3,4,5 arrayD DWORD 0,1,2,3,4,5 .code mov esi,4 mov al,arrayB[esi*TYPE arrayB] ; 04 mov bx,arrayW[esi*TYPE arrayW] ; 0004 mov edx,arrayD[esi*TYPE arrayD] ; 00000004
  • Pointers
    • Declare a pointer variable that contains the offset of another variable.
    • Alternate format: ptrW DWORD OFFSET arrayW
    • assembly .data arrayW WORD 1000h,2000h,3000h ptrW DWORD arrayW .code mov esi,ptrW mov ax,[esi] ; AX = 1000h

JMP and LOOP Instructions

  • JMP Instruction
    • JMP is an unconditional jump to a label that is usually within the same procedure.
    • Syntax: JMP target
    • Logic: EIP = target
    • Example:
      assembly top: . . jmp top
    • A jump outside the current procedure must be to a special type of label called a global label.
  • LOOP Instruction
    • The LOOP instruction creates a counting loop.
    • Syntax: LOOP target
    • Logic:
      • ECX = ECX – 1
      • if ECX != 0, jump to target
    • Implementation:
      • The assembler calculates the distance, in bytes, between the offset of the following instruction and the offset of the target label (relative offset).
      • The relative offset is added to EIP.
  • LOOP Example
    • The following loop calculates the sum of the integers 5 + 4 + 3 +2 + 1:
    • assembly 00000000 66 B8 0000 mov ax,0 00000004 B9 00000005 mov ecx,5 00000009 66 03 C1 L1: add ax,cx 0000000C E2 FB loop L1 0000000E
    • When LOOP is assembled, the current location = 0000000E (offset of the next instruction).
    • -5 (FBh) is added to the current location, causing a jump to location 00000009: 00000009 = 0000000E + FB offset.
  • Nested Loop
    • If you need to code a loop within a loop, you must save the outer loop counter's ECX value.
    • assembly .data count DWORD ? .code mov ecx,100 ; set outer loop count L1: mov count,ecx ; save outer loop count mov ecx,20 ; set inner loop count L2: . . loop L2; repeat the inner loop mov ecx,count ; restore outer loop count loop L1 ; repeat the outer loop
  • Summing an Integer Array
    • assembly .data intarray WORD 100h,200h,300h,400h .code mov edi,OFFSET intarray; address of intarray mov ecx,LENGTHOF intarray; loop counter mov ax,0; zero the accumulator L1: add ax,[edi]; add an integer add edi,TYPE intarray; point to next integer loop L1 ; repeat until ECX = 0
    • The preceding code calculates the sum of an array of 16-bit integers.
  • Copying a String
    • assembly .data source BYTE "This is the source string",0 target BYTE SIZEOF source DUP(0) .code mov esi,0 ; index register mov ecx,SIZEOF source ; loop counter L1: mov al,source[esi] ; get char from source mov target[esi],al ; store it in the target inc esi ; move to next character loop L1 ; repeat for entire string

64-Bit Programming

  • MOV instruction in 64-bit mode accepts operands of 8, 16, 32, or 64 bits.
  • When you move a 8, 16, or 32-bit constant to a 64-bit register, the upper bits of the destination are cleared.
  • When you move a memory operand into a 64-bit register, the results vary:
    • 32-bit move clears high bits in destination
    • 8-bit or 16-bit move does not affect high bits in destination
  • #### MOVSXD
    • Sign extends a 32-bit value into a 64-bit destination register
  • The OFFSET operator generates a 64-bit address
  • LOOP uses the 64-bit RCX register as a counter.
  • RSI and RDI are the most common 64-bit index registers for accessing arrays.
  • ADD and SUB affect the flags in the same way as in 32-bit mode.
  • You can use scale factors with indexed operands.