Basics of Assembly Language Programs

Chapter One: Basics of Assembly Language Programs

Assembly Language Programming

Introduction to Assembly Language
  • What is Assembly Language?

    Chapter One: Basics of Assembly Language Programs
    Assembly Language Programming

    Introduction to Assembly Language

    • What is Assembly Language?

      • Assembly language is a low-level programming language specifically tailored for a particular type of processor architecture.

      • It provides a direct and explicit mapping to the machine instructions that the processor can execute, enabling low-level hardware operations.

      • Unlike higher-level programming languages such as Python or Java, assembly language lacks built-in input/output functions like cin and cout, requiring the programmer to interact more closely with the hardware in order to perform tasks.

    Communicating with the User

    • How do we communicate with the user?

      • Communication in assembly language is typically achieved through system calls or interrupt calls, particularly in operating environments such as DOS, where direct interaction with hardware is common.

      • An exemplary code snippet illustrating explicit coding for an interrupt might look like this:

        • Mov al, 2h

        • Mov dl, 'a'

        • Int 21

      • Through such commands, programmers can control CPU operations and user interactions at a granular level, making direct use of CPU registers to facilitate real-time data processing.

    Structure/Components of Assembly Language
    1. Label:

      • A symbolic name that serves as an identifier for a memory address, which improves code readability and helps in organizing code structure.

    2. Operation Code (Opcode):

      • Represents the operation or instruction to be executed by the processor, such as data movement or arithmetic operations.

    3. Operand:

      • Refers to additional information or data required by the opcode, which can be a constant, a variable, or a memory address.

    4. Comment:

      • Provides documentation space intended for developers’ notes and explanations, facilitating code debugging and future maintenance efforts, which is critical in complex programming.

    Mnemonics

    • Historically, many assembly mnemonics were three-letter abbreviations (e.g., JMP for jump, INC for increment).

    • Modern processors feature a larger instruction set, with more descriptive or longer mnemonics such as FPATAN for floating-point particular tangent, which enhances clarity in programming instructions.

    Family of Related Opcodes

    • Some assemblers use a single mnemonic (e.g., MOV) to refer to multiple related opcodes, allowing for streamlined coding practices.

    • Other assemblers differentiate between various opcodes according to operation type:

      • L: Move memory to register

      • ST: Move register to memory

      • LR: Move register to register

      • MVI: Move immediate operand to memory

    Comparison of Assembly and High-Level Languages
    • Assembly language is classified as a low-level language, while high-level languages (HLL) like Java or C++ are considered high-level due to their abstraction from hardware specifics.

    • While assembly can utilize registers and main memory directly, high-level languages primarily operate on main memory and manage data more abstractly without direct hardware engagement.

    • Machine-Oriented vs Human-Oriented:

      • Assembly demonstrates a machine-oriented approach, characterized by a near one-to-one correlation between symbolic instructions and the resulting executable machine codes.

      • Assembly also includes directives for the assembler, linker, data space organization, and macros, which are fundamental for efficient programming and code optimization.

    Applications of Assembly Language
    • Usage:

      • Directly embedded in system boot ROMs (such as BIOS on PCs) which perform essential hardware initializations at startup.

      • Frequently used in creating stand-alone binaries that are compact and operational without the overhead of high-level language runtime components.

    • Value in Reverse Engineering:

      • Machine code can be more straightforwardly translated into assembly language, allowing for easier examination and understanding of code structures during reverse engineering processes.

    • Time-Critical Applications:

      • Notable sectors utilizing assembly language include:

        • Aircraft navigation systems

        • Process control systems

        • Robot control software

        • Communication software

        • Target acquisition (missile tracking) software

    • System Software Requirements:

      • Assembly language is crucial for applications that demand direct hardware control; consequently, it plays a vital role in developing system software, including:

        • Operating systems

        • Assemblers and compilers

        • Linkers and loaders

        • Device drivers and network interfaces

    Challenges with Assembly Language
    • Common Criticisms:

      • Learning Difficulty: Considered hard to learn and read compared to high-level languages, due to its complexity and detailed nature.

      • Time Consuming: Developing in assembly often requires significantly more time, and the code is not portable across different architectures without substantial modification.

      • Improved Compiler Technology: Advances in compiler technology for high-level languages significantly reduce the necessity of assembly programming for many applications.

      • Hardware Advancements: With modern machines boasting faster speeds and more memory, the need for assembly programming has diminished in many use cases.

      • Algorithm Efficiency: For applications demanding speed improvements, better algorithms are often preferred over rewriting code in assembly language.

    Negative Attributes of Assembly Language

    • Challenges Include:

      • Hard to learn

      • Difficult to read and understand for those unfamiliar with low-level programming.

      • Complicated debugging, maintenance, and writing processes when compared to high-level languages.

      • Extremely time-consuming and generally considered not portable across different architecture systems.

    Advantages of Assembly Language
    • Significant Benefits:

      • Speed: Programs written in assembly are among the most efficient and fastest available when executed by the CPU, making them ideal for performance-critical applications.

      • Space Efficiency: Assembly programs typically require less memory since they are directly executed by the hardware with minimal overhead.

      • Capability: Assembly allows direct manipulation of hardware resources, enabling operations that may be difficult or even impossible to perform using high-level programming languages.

      • Knowledge Enhancement: Familiarity with assembly language can lead to a deeper understanding of how high-level languages operate, ultimately resulting in the creation of more efficient and optimized programs.

    Hierarchy of Languages
    • Language Types:

      • High-Level Language

      • Machine Independent

      • Machine Specific

      • Assembly Language (low-level)

      • Machine Language

    • Visual Representation:

      • Application Program

      • High-Level Language

      • Assembly Language

      • Machine Language

    Compilers and Assemblers
    • Functions:

      • Compilers are responsible for translating high-level language into machine code, either directly or via an assembler.

      • Assemblers perform the task of converting assembly code into machine code, crucial for program execution.

    Tools for Assembly Language Programming
    • Available Tools:

      • There are both GNU licensed free software and various commercial products available to aid in assembly programming.

      • Essential software and tools include:

        1. Assemblers:

          • MASM (Microsoft)

          • TASM (Borland, commercial)

          • NASM (Free, open-source)

        2. Emulators:

          • EMU8086

        3. Editors:

          • Notepad/Notepad++, TextPad, VS Code, JEdit

    Basic Elements of Assembly Language – Integer Constants
    • Definition: An integer constant comprises:

      • An optional leading sign (+ or -)

      • One or more digits

      • An optional suffix character indicating the radix of the number.

    • Radix Examples:

      • When no radix is specified, it is assumed to be decimal by default.

      • A hexadecimal constant that begins with a letter should have a leading zero preceding it (e.g., 0ABC).

    Integer Expressions

    • Definition: A mathematical expression involving integer values and operators.

    • Capacity: The valid range of value storage spans from 00000000h to FFFFFFFFh in hexadecimal representation.

    • Operators: Ranked by precedence as follows:

      • 1: Parentheses ()

      • 2: Unary plus, minus; Modulus and subtract operations +-MOD

      • 3: Multiply, divide operations * /

      • 4: Add, subtract operations + -

    Example of Operator Precedence

    • -5 + 2:

      • Result: -3

    • 12 - 1 MOD 5:

      • Result: 1

    • (4 + 2) * 6:

      • Result: 36

    Real Number Constants
    • Real numbers can be represented as either decimal or hexadecimal reals, following the format:

      • [sign] integer.[integer][exponent]

    • Examples of Real Constants:

      • +3.0, -44.2E+05, 26.E5


    This enriched content provides a more comprehensive overview, covering each significant aspect, concept, and functionality of assembly language programming and processors, helping learners grasp the importance and usage of assembly language in various application domains effectively.

    • It provides a direct mapping of the machine instructions provided by the processor.

    • Unlike higher-level languages, assembly language does not offer input/output functions like cin and cout.

Communicating with the User
  • How do we communicate with the user?

    • Communication is achieved through system calls or interrupt calls (e.g., in DOS).

    • Example of explicit coding for interrupt:

    • Mov al, 2h

    • Mov dl, 'a'

    • Int 21

  • Assembly language allows programmers to work directly with operations implemented on the CPU.

Structure/Components of Assembly Language

  1. Label: Symbolic name for a memory address.

  2. Operation Code (Opcode): The instruction to be executed.

  3. Operand: Additional information or data the opcode requires.

  4. Comment: Documentation space for explaining code for debugging and maintenance.

Mnemonics
  • Historically, many assembly mnemonics were three-letter abbreviations (e.g., JMP for jump, INC for increment).

  • Modern processors have a larger instruction set with longer mnemonics (e.g., FPATAN for floating-point particular tangent).

Family of Related Opcodes
  • Some assemblers have a single mnemonic (e.g., MOV) that refers to multiple related opcodes for various operations.

  • Other assemblers may designate separate opcodes for:

    • L: Move memory to register

    • ST: Move register to memory

    • LR: Move register to register

    • MVI: Move immediate operand to memory

Comparison of Assembly and High-Level Languages

  • Assembly language is low-level; high-level languages (HLL) are high-level.

  • Assembly can use registers and main memory; HLL primarily uses main memory.

  • Machine-Oriented vs Human-Oriented:

    • Assembly is machine-oriented with a near one-to-one correlation between symbolic instructions and executable machine codes.

    • Assembly also incorporates directives for the assembler, linker, organizing data space, and macros.

Applications of Assembly Language

  • Usage:

    • Hard-coded in system boot ROM (e.g., BIOS on PCs).

    • Often used for stand-alone binaries with compact size that run without high-level language runtime components.

  • Value in Reverse Engineering:

    • Machine code is easier to translate into assembly language for examination.

  • Time-Critical Applications:

    • Examples:

    • Aircraft navigation systems

    • Process control systems

    • Robot control software

    • Communication software

    • Target acquisition (missile tracking) software

  • System Software Requirements:

    • Direct control over system hardware, necessitating assembly language for hardware communication.

    • Examples include:

    • Operating systems

    • Assemblers and compilers

    • Linkers and loaders

    • Device drivers and network interfaces

Challenges with Assembly Language

  • Common Criticisms:

    • Learning Difficulty: Hard to learn and read.

    • Time Consuming: Assembly language programming requires more time and is not portable.

    • Improved Compiler Technology: High-level languages have seen advancements that reduce the necessity of assembly.

    • Hardware Advancements: Modern machines have more speed and memory, diminishing the need for assembly programming.

    • Algorithm Efficiency: If more speed is required, better algorithms should be used rather than switching to assembly.

Negative Attributes of Assembly Language
  • Hard to learn

  • Hard to read and understand

  • Difficult to debug, maintain, and write

  • Time-consuming and not portable

Advantages of Assembly Language

  • Benefits:

    • Speed: Assembly programs are among the fastest available.

    • Space Efficiency: Assembly programs tend to be smaller in size.

    • Capability: Allows for operations that are difficult or impossible in high-level languages (HLLs).

    • Knowledge Enhancement: Understanding assembly language aids in writing better programs even when using HLLs.

Hierarchy of Languages

  • Language Types:

    • High-Level Language

    • Machine Independent

    • Machine Specific

    • Assembly Language (low-level)

    • Machine Language

  • Visual Representation:

    • Application Program

    • High-Level Language

    • Assembly Language

    • Machine Language

Compilers and Assemblers

  • Functions:

    • Compilers translate high-level language to machine code either directly or via an assembler.

    • Assemblers translate assembly code to machine code.

Tools for Assembly Language Programming

  • Available Tools:

    • GNU licensed free software and commercial products.

    • Essential software include:

    1. Assemblers:

      • MASM (Microsoft)

      • TASM (Borland, commercial)

      • NASM (Free, open-source)

    2. Emulators:

      • EMU8086

    3. Editors:

      • Notepad/Notepad++, TextPad, VS Code, JEdit

Basic Elements of Assembly Language – Integer Constants

  • Definition: An integer constant consists of:

    • An optional leading sign (+ or -)

    • One or more digits

    • An optional suffix character (radix).

  • Radix Examples:

    • If no radix, assumed to be decimal.

    • Example of a hexadecimal constant beginning with a letter should have a leading zero.

Integer Expressions
  • Definition: A mathematical expression involving integer values and operators.

  • Capacity: Can store values from 00000000h to FFFFFFFFh.

  • Operators: Ranked by precedence:

    • 1: Parentheses ()

    • 2: Unary plus, minus; Modulus, subtract +-MOD

    • 3: Multiply, divide * /

    • 4: Add, subtract + -

Example of Operator Precedence
  • -5 + 2:

    • Result: -3

  • 12 - 1 MOD 5:

    • Result: 1

  • (4 + 2) * 6:

    • Result: 36

Real Number Constants

  • Represented as decimal or hexadecimal reals with the following format:

    • [sign] integer. [integer][exponent]

  • Examples:

    • +3.0, -44.2E+05, 26.E5

Character and String Constants
  • Character Constants: Encapsulated in single or double quotes.

    • Example: 'A', "d"

  • String Constants: A sequence of characters enclosed in quotes.

    • Example: 'ABC', 'X', "Good night, Gracie"

Reserved Words in Assembly Language

  • Have special meanings and contexts where they can be used.

  • Types Include:

    • Instruction mnemonics (e.g., MOV, ADD, MUL, INC, JMP)

    • Register names (e.g., AX, BX, CX, SI, DI)

    • Directives (e.g., .code, .data, .stack)

    • Attributes (size and usage information e.g., BYTE, WORD)

    • Operators in constant expressions (e.g., +, -, *)

Identifiers

  • Definition: Programmer-defined name for variables, constants, procedures, or labels.

  • Rules for Identifiers:

    • Length: 1 to 247 characters.

    • Case Insensitive.

    • Beginning characters: Must be letters, _, @, ?, $.

    • Valid Examples: Variables like counter, sumValue, @myVal.

Directives in Assembly Language

  • Definition: Commands recognized by the assembler within source code.

  • Functions:

    • Not executed at runtime.

    • Used for defining variables, macros, procedures.

    • Assign names to memory segments.

  • Examples:

    • myVar DWORD 26 (reserving space for a variable)

    • Difference Between Directives and Instructions:

    • Directives do not execute but prepare the assembler, while instructions execute at runtime.

Defining Segments with Directives
  • Segment Directives:

    • .DATA: Data segment for variables.

    • .CODE: Code segment for executable instructions.

    • .STACK: Runtime stack section definition.

Structure of an Instruction
  • Consists of parts: [label:] mnemonic [operands] [; comment]

  • Label: A marker for instructions or data, representing their address (e.g., Count DWORD 100). Code labels end with a colon.

Example Instruction – JMP
  • Instruction to transfer control:

  target:
  mov ax, bx
  jmp target ; create a loop
Operands in Assembly Instructions
  • Can have 0-3 operands:

    • Examples:

    • stc (no operands)

    • inc eax (one operand)

    • mov count, ebx (two operands)

    • imul eax, ebx, 5 (three operands)

  • Comments:

    • Single-line comments (starts with ;).

    • Block comments using COMMENT directive.

Assembling, Linking, and Running Programs

  • Source programs written in assembly require translation into executable code via an assembler.

  • Output: An object file which must be linked to create an executable file.

Defining Data Types

  • Characteristic: Defines values assigned to given types. Size in bits includes 8, 16, 32, 48, 64.

  • Example Declaration: DWORD variable holds an unsigned 32-bit integer.

  • Instruction Mnemonic: Short identifier for an assembly instruction.

Common Instruction Mnemonics

  • Types & Usages:

    • BYTE: 8-bit unsigned integer.

    • WORD: 16-bit unsigned integer.

    • DWORD: 32-bit unsigned integer.

  • Directives for Data Definition:

    • DB, DW, DD, DQ for various bit sizes.

Data Definition Statement

  • Syntax: name directive initializer [, initializer]...

  • Example: count DWORD 12345

  • At least one initializer is required, with additional initializers separated by commas.

Defining BYTE and SBYTE Data

  • Definition: Allocates storage for unsigned (BYTE) or signed (SBYTE) byte values.

  • Examples:

    • value1 BYTE 'A'

    • value2 BYTE 0

  • Initializers can leave variables uninitialized using the ? symbol (e.g., value6 BYTE ?).

Multiple Initializers

  • Comment: When using multiple initializers in one definition, the label refers to the first initializer's offset.

  • Example:

    • list BYTE 10, 20, 30, 40

    • Different formats/radixes can be combined (e.g., character constants).

DUP Operator

  • Function: Allocates space for multiple data items using a constant expression as a counter.

  • Example Usage:

    • BYTE 20 DUP(0) allocates 20 bytes initialized to zero.

Data Transfer Instructions

  • General Syntax: Instruction can have up to three operands.

  • Types of Operands:

    • Immediate values

    • Registers

    • Memory references

MOV Instruction

  • Definition: Copies from a source operand to a destination operand.

  • Syntax: MOV destination, source

  • Example: MOV EAX, EBX

  • Rules:

    • Both operands must be the same size.

    • Cannot have two memory operands.

    • Immediate values cannot be assigned to segment registers.

Arithmetic Instructions
  • Types include:

    • INC: Increments value by 1.

    • DEC: Decreases value by 1.

    • ADD: Adds two values.

    • SUB: Subtracts one value from another.

  • Carry/Zero/Sign Flags affected by arithmetic operations.

Example of Arithmetic Operations

  • ADD:

  .data
  var1 DWORD 10000h
  var2 DWORD 20000h
  .code
  mov eax, var1
  add eax, var2
  • SUB: Subtract two DWORDs, affecting flags as well.

Division Instruction (DIV)

  • Syntax: DIV S

  • Works with 8, 16, and 32-bit values, managing quotient and remainder in specific registers (e.g., AX, DX).

Multiplication Instruction (MUL)

  • Multiplication default storage is managed in specific CPU registers with syntax reflecting the size being multiplied.

Control Flow Instructions: JMP & LOOP

  • JMP: Unconditional jump to a new instruction address.

  • LOOP: Decrements ECX and jumps to a specified label if ECX is not zero.

Usage Example of JMP and LOOP
  • Example Code:

  top:
  jmp top ; endless loop
  • Nested Loop Example: Saving and restoring ECX:

  mov ecx, 100
  L1: mov count, ecx
  mov ecx, 20
  L2: loop L2 ; repeat
  loop L1 ; repeat outer

Arrays in Assembly Language

  • Concept: Arrays as chains of variables with examples that show string representations as byte arrays (ASCII).

  • Access via square brackets (e.g., MOV AL, a[3]) or index registers (e.g., MOV SI, 3; MOV AL, a[SI]).

Example of Array Declaration
  • Use of DUP operator to allocate multiple initial values (BYTE 20 DUP(9)).

Accessing Array Elements

  • Use of LEA (Load Effective Address) and OFFSET to get an element's address.

  • Example Usage:

  mov bx, OFFSET VAR1
  mov BYTE PTR [BX], 44h; modify VAR1

End of Document