Basics of Assembly Language Programs
Chapter One: Basics of Assembly Language Programs
Assembly Language Programming
Introduction to Assembly Language
What is Assembly Language?
Chapter One: Basics of Assembly Language Programs
Assembly Language Programming
Introduction to Assembly Language
What is Assembly Language?
Assembly language is a low-level programming language specifically tailored for a particular type of processor architecture.
It provides a direct and explicit mapping to the machine instructions that the processor can execute, enabling low-level hardware operations.
Unlike higher-level programming languages such as Python or Java, assembly language lacks built-in input/output functions like
cinandcout, requiring the programmer to interact more closely with the hardware in order to perform tasks.
Communicating with the User
How do we communicate with the user?
Communication in assembly language is typically achieved through system calls or interrupt calls, particularly in operating environments such as DOS, where direct interaction with hardware is common.
An exemplary code snippet illustrating explicit coding for an interrupt might look like this:
Mov al, 2hMov dl, 'a'Int 21
Through such commands, programmers can control CPU operations and user interactions at a granular level, making direct use of CPU registers to facilitate real-time data processing.
Structure/Components of Assembly Language
Label:
A symbolic name that serves as an identifier for a memory address, which improves code readability and helps in organizing code structure.
Operation Code (Opcode):
Represents the operation or instruction to be executed by the processor, such as data movement or arithmetic operations.
Operand:
Refers to additional information or data required by the opcode, which can be a constant, a variable, or a memory address.
Comment:
Provides documentation space intended for developers’ notes and explanations, facilitating code debugging and future maintenance efforts, which is critical in complex programming.
Mnemonics
Historically, many assembly mnemonics were three-letter abbreviations (e.g.,
JMPfor jump,INCfor increment).Modern processors feature a larger instruction set, with more descriptive or longer mnemonics such as
FPATANfor floating-point particular tangent, which enhances clarity in programming instructions.
Family of Related Opcodes
Some assemblers use a single mnemonic (e.g.,
MOV) to refer to multiple related opcodes, allowing for streamlined coding practices.Other assemblers differentiate between various opcodes according to operation type:
L: Move memory to register
ST: Move register to memory
LR: Move register to register
MVI: Move immediate operand to memory
Comparison of Assembly and High-Level Languages
Assembly language is classified as a low-level language, while high-level languages (HLL) like Java or C++ are considered high-level due to their abstraction from hardware specifics.
While assembly can utilize registers and main memory directly, high-level languages primarily operate on main memory and manage data more abstractly without direct hardware engagement.
Machine-Oriented vs Human-Oriented:
Assembly demonstrates a machine-oriented approach, characterized by a near one-to-one correlation between symbolic instructions and the resulting executable machine codes.
Assembly also includes directives for the assembler, linker, data space organization, and macros, which are fundamental for efficient programming and code optimization.
Applications of Assembly Language
Usage:
Directly embedded in system boot ROMs (such as BIOS on PCs) which perform essential hardware initializations at startup.
Frequently used in creating stand-alone binaries that are compact and operational without the overhead of high-level language runtime components.
Value in Reverse Engineering:
Machine code can be more straightforwardly translated into assembly language, allowing for easier examination and understanding of code structures during reverse engineering processes.
Time-Critical Applications:
Notable sectors utilizing assembly language include:
Aircraft navigation systems
Process control systems
Robot control software
Communication software
Target acquisition (missile tracking) software
System Software Requirements:
Assembly language is crucial for applications that demand direct hardware control; consequently, it plays a vital role in developing system software, including:
Operating systems
Assemblers and compilers
Linkers and loaders
Device drivers and network interfaces
Challenges with Assembly Language
Common Criticisms:
Learning Difficulty: Considered hard to learn and read compared to high-level languages, due to its complexity and detailed nature.
Time Consuming: Developing in assembly often requires significantly more time, and the code is not portable across different architectures without substantial modification.
Improved Compiler Technology: Advances in compiler technology for high-level languages significantly reduce the necessity of assembly programming for many applications.
Hardware Advancements: With modern machines boasting faster speeds and more memory, the need for assembly programming has diminished in many use cases.
Algorithm Efficiency: For applications demanding speed improvements, better algorithms are often preferred over rewriting code in assembly language.
Negative Attributes of Assembly Language
Challenges Include:
Hard to learn
Difficult to read and understand for those unfamiliar with low-level programming.
Complicated debugging, maintenance, and writing processes when compared to high-level languages.
Extremely time-consuming and generally considered not portable across different architecture systems.
Advantages of Assembly Language
Significant Benefits:
Speed: Programs written in assembly are among the most efficient and fastest available when executed by the CPU, making them ideal for performance-critical applications.
Space Efficiency: Assembly programs typically require less memory since they are directly executed by the hardware with minimal overhead.
Capability: Assembly allows direct manipulation of hardware resources, enabling operations that may be difficult or even impossible to perform using high-level programming languages.
Knowledge Enhancement: Familiarity with assembly language can lead to a deeper understanding of how high-level languages operate, ultimately resulting in the creation of more efficient and optimized programs.
Hierarchy of Languages
Language Types:
High-Level Language
Machine Independent
Machine Specific
Assembly Language (low-level)
Machine Language
Visual Representation:
Application Program
High-Level Language
Assembly Language
Machine Language
Compilers and Assemblers
Functions:
Compilers are responsible for translating high-level language into machine code, either directly or via an assembler.
Assemblers perform the task of converting assembly code into machine code, crucial for program execution.
Tools for Assembly Language Programming
Available Tools:
There are both GNU licensed free software and various commercial products available to aid in assembly programming.
Essential software and tools include:
Assemblers:
MASM (Microsoft)
TASM (Borland, commercial)
NASM (Free, open-source)
Emulators:
EMU8086
Editors:
Notepad/Notepad++, TextPad, VS Code, JEdit
Basic Elements of Assembly Language – Integer Constants
Definition: An integer constant comprises:
An optional leading sign (+ or -)
One or more digits
An optional suffix character indicating the radix of the number.
Radix Examples:
When no radix is specified, it is assumed to be decimal by default.
A hexadecimal constant that begins with a letter should have a leading zero preceding it (e.g.,
0ABC).
Integer Expressions
Definition: A mathematical expression involving integer values and operators.
Capacity: The valid range of value storage spans from
00000000htoFFFFFFFFhin hexadecimal representation.Operators: Ranked by precedence as follows:
1: Parentheses
()2: Unary plus, minus; Modulus and subtract operations
+-MOD3: Multiply, divide operations
* /4: Add, subtract operations
+ -
Example of Operator Precedence
-5 + 2:Result:
-3
12 - 1 MOD 5:Result:
1
(4 + 2) * 6:Result:
36
Real Number Constants
Real numbers can be represented as either decimal or hexadecimal reals, following the format:
[sign] integer.[integer][exponent]
Examples of Real Constants:
+3.0,-44.2E+05,26.E5
This enriched content provides a more comprehensive overview, covering each significant aspect, concept, and functionality of assembly language programming and processors, helping learners grasp the importance and usage of assembly language in various application domains effectively.
It provides a direct mapping of the machine instructions provided by the processor.
Unlike higher-level languages, assembly language does not offer input/output functions like
cinandcout.
Communicating with the User
How do we communicate with the user?
Communication is achieved through system calls or interrupt calls (e.g., in DOS).
Example of explicit coding for interrupt:
Mov al, 2hMov dl, 'a'Int 21
Assembly language allows programmers to work directly with operations implemented on the CPU.
Structure/Components of Assembly Language
Label: Symbolic name for a memory address.
Operation Code (Opcode): The instruction to be executed.
Operand: Additional information or data the opcode requires.
Comment: Documentation space for explaining code for debugging and maintenance.
Mnemonics
Historically, many assembly mnemonics were three-letter abbreviations (e.g.,
JMPfor jump,INCfor increment).Modern processors have a larger instruction set with longer mnemonics (e.g.,
FPATANfor floating-point particular tangent).
Family of Related Opcodes
Some assemblers have a single mnemonic (e.g.,
MOV) that refers to multiple related opcodes for various operations.Other assemblers may designate separate opcodes for:
L: Move memory to register
ST: Move register to memory
LR: Move register to register
MVI: Move immediate operand to memory
Comparison of Assembly and High-Level Languages
Assembly language is low-level; high-level languages (HLL) are high-level.
Assembly can use registers and main memory; HLL primarily uses main memory.
Machine-Oriented vs Human-Oriented:
Assembly is machine-oriented with a near one-to-one correlation between symbolic instructions and executable machine codes.
Assembly also incorporates directives for the assembler, linker, organizing data space, and macros.
Applications of Assembly Language
Usage:
Hard-coded in system boot ROM (e.g., BIOS on PCs).
Often used for stand-alone binaries with compact size that run without high-level language runtime components.
Value in Reverse Engineering:
Machine code is easier to translate into assembly language for examination.
Time-Critical Applications:
Examples:
Aircraft navigation systems
Process control systems
Robot control software
Communication software
Target acquisition (missile tracking) software
System Software Requirements:
Direct control over system hardware, necessitating assembly language for hardware communication.
Examples include:
Operating systems
Assemblers and compilers
Linkers and loaders
Device drivers and network interfaces
Challenges with Assembly Language
Common Criticisms:
Learning Difficulty: Hard to learn and read.
Time Consuming: Assembly language programming requires more time and is not portable.
Improved Compiler Technology: High-level languages have seen advancements that reduce the necessity of assembly.
Hardware Advancements: Modern machines have more speed and memory, diminishing the need for assembly programming.
Algorithm Efficiency: If more speed is required, better algorithms should be used rather than switching to assembly.
Negative Attributes of Assembly Language
Hard to learn
Hard to read and understand
Difficult to debug, maintain, and write
Time-consuming and not portable
Advantages of Assembly Language
Benefits:
Speed: Assembly programs are among the fastest available.
Space Efficiency: Assembly programs tend to be smaller in size.
Capability: Allows for operations that are difficult or impossible in high-level languages (HLLs).
Knowledge Enhancement: Understanding assembly language aids in writing better programs even when using HLLs.
Hierarchy of Languages
Language Types:
High-Level Language
Machine Independent
Machine Specific
Assembly Language (low-level)
Machine Language
Visual Representation:
Application Program
High-Level Language
Assembly Language
Machine Language
Compilers and Assemblers
Functions:
Compilers translate high-level language to machine code either directly or via an assembler.
Assemblers translate assembly code to machine code.
Tools for Assembly Language Programming
Available Tools:
GNU licensed free software and commercial products.
Essential software include:
Assemblers:
MASM (Microsoft)
TASM (Borland, commercial)
NASM (Free, open-source)
Emulators:
EMU8086
Editors:
Notepad/Notepad++, TextPad, VS Code, JEdit
Basic Elements of Assembly Language – Integer Constants
Definition: An integer constant consists of:
An optional leading sign (+ or -)
One or more digits
An optional suffix character (radix).
Radix Examples:
If no radix, assumed to be decimal.
Example of a hexadecimal constant beginning with a letter should have a leading zero.
Integer Expressions
Definition: A mathematical expression involving integer values and operators.
Capacity: Can store values from
00000000htoFFFFFFFFh.Operators: Ranked by precedence:
1: Parentheses
()2: Unary plus, minus; Modulus, subtract
+-MOD3: Multiply, divide
* /4: Add, subtract
+ -
Example of Operator Precedence
-5 + 2:Result:
-3
12 - 1 MOD 5:Result:
1
(4 + 2) * 6:Result:
36
Real Number Constants
Represented as decimal or hexadecimal reals with the following format:
[sign] integer. [integer][exponent]
Examples:
+3.0,-44.2E+05,26.E5
Character and String Constants
Character Constants: Encapsulated in single or double quotes.
Example: 'A', "d"
String Constants: A sequence of characters enclosed in quotes.
Example: 'ABC', 'X', "Good night, Gracie"
Reserved Words in Assembly Language
Have special meanings and contexts where they can be used.
Types Include:
Instruction mnemonics (e.g.,
MOV,ADD,MUL,INC,JMP)Register names (e.g.,
AX,BX,CX,SI,DI)Directives (e.g.,
.code,.data,.stack)Attributes (size and usage information e.g.,
BYTE,WORD)Operators in constant expressions (e.g.,
+,-,*)
Identifiers
Definition: Programmer-defined name for variables, constants, procedures, or labels.
Rules for Identifiers:
Length: 1 to 247 characters.
Case Insensitive.
Beginning characters: Must be letters,
_,@,?,$.Valid Examples: Variables like
counter,sumValue,@myVal.
Directives in Assembly Language
Definition: Commands recognized by the assembler within source code.
Functions:
Not executed at runtime.
Used for defining variables, macros, procedures.
Assign names to memory segments.
Examples:
myVar DWORD 26(reserving space for a variable)Difference Between Directives and Instructions:
Directives do not execute but prepare the assembler, while instructions execute at runtime.
Defining Segments with Directives
Segment Directives:
.DATA: Data segment for variables..CODE: Code segment for executable instructions..STACK: Runtime stack section definition.
Structure of an Instruction
Consists of parts:
[label:] mnemonic [operands] [; comment]Label: A marker for instructions or data, representing their address (e.g.,
Count DWORD 100). Code labels end with a colon.
Example Instruction – JMP
Instruction to transfer control:
target:
mov ax, bx
jmp target ; create a loop
Operands in Assembly Instructions
Can have 0-3 operands:
Examples:
stc(no operands)inc eax(one operand)mov count, ebx(two operands)imul eax, ebx, 5(three operands)
Comments:
Single-line comments (starts with
;).Block comments using
COMMENTdirective.
Assembling, Linking, and Running Programs
Source programs written in assembly require translation into executable code via an assembler.
Output: An object file which must be linked to create an executable file.
Defining Data Types
Characteristic: Defines values assigned to given types. Size in bits includes 8, 16, 32, 48, 64.
Example Declaration:
DWORDvariable holds an unsigned 32-bit integer.Instruction Mnemonic: Short identifier for an assembly instruction.
Common Instruction Mnemonics
Types & Usages:
BYTE: 8-bit unsigned integer.WORD: 16-bit unsigned integer.DWORD: 32-bit unsigned integer.
Directives for Data Definition:
DB,DW,DD,DQfor various bit sizes.
Data Definition Statement
Syntax:
name directive initializer [, initializer]...Example:
count DWORD 12345At least one initializer is required, with additional initializers separated by commas.
Defining BYTE and SBYTE Data
Definition: Allocates storage for unsigned (BYTE) or signed (SBYTE) byte values.
Examples:
value1 BYTE 'A'value2 BYTE 0
Initializers can leave variables uninitialized using the
?symbol (e.g.,value6 BYTE ?).
Multiple Initializers
Comment: When using multiple initializers in one definition, the label refers to the first initializer's offset.
Example:
list BYTE 10, 20, 30, 40Different formats/radixes can be combined (e.g., character constants).
DUP Operator
Function: Allocates space for multiple data items using a constant expression as a counter.
Example Usage:
BYTE 20 DUP(0)allocates 20 bytes initialized to zero.
Data Transfer Instructions
General Syntax: Instruction can have up to three operands.
Types of Operands:
Immediate values
Registers
Memory references
MOV Instruction
Definition: Copies from a source operand to a destination operand.
Syntax:
MOV destination, sourceExample:
MOV EAX, EBXRules:
Both operands must be the same size.
Cannot have two memory operands.
Immediate values cannot be assigned to segment registers.
Arithmetic Instructions
Types include:
INC: Increments value by 1.DEC: Decreases value by 1.ADD: Adds two values.SUB: Subtracts one value from another.
Carry/Zero/Sign Flags affected by arithmetic operations.
Example of Arithmetic Operations
ADD:
.data
var1 DWORD 10000h
var2 DWORD 20000h
.code
mov eax, var1
add eax, var2
SUB: Subtract two DWORDs, affecting flags as well.
Division Instruction (DIV)
Syntax:
DIV SWorks with 8, 16, and 32-bit values, managing quotient and remainder in specific registers (e.g., AX, DX).
Multiplication Instruction (MUL)
Multiplication default storage is managed in specific CPU registers with syntax reflecting the size being multiplied.
Control Flow Instructions: JMP & LOOP
JMP: Unconditional jump to a new instruction address.
LOOP: Decrements ECX and jumps to a specified label if ECX is not zero.
Usage Example of JMP and LOOP
Example Code:
top:
jmp top ; endless loop
Nested Loop Example: Saving and restoring ECX:
mov ecx, 100
L1: mov count, ecx
mov ecx, 20
L2: loop L2 ; repeat
loop L1 ; repeat outer
Arrays in Assembly Language
Concept: Arrays as chains of variables with examples that show string representations as byte arrays (ASCII).
Access via square brackets (e.g.,
MOV AL, a[3]) or index registers (e.g.,MOV SI, 3; MOV AL, a[SI]).
Example of Array Declaration
Use of
DUPoperator to allocate multiple initial values (BYTE 20 DUP(9)).
Accessing Array Elements
Use of LEA (Load Effective Address) and OFFSET to get an element's address.
Example Usage:
mov bx, OFFSET VAR1
mov BYTE PTR [BX], 44h; modify VAR1