Binaries - binary file format

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/7

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No study sessions yet.

8 Terms

New cards

symbols:

role of symbols
types (2×3)
omitted vs not omitted
static vs dyn linking

Role of symbols

Symbols identify functions and global variables across object files.
They encode name, type (function/data), visibility (local/global), and optionally location.
Used by linker/loader for symbol resolution and relocation.

Types of symbols

Defined symbols: element is defined in the object file, location known.
Undefined symbols (UND): element is referenced but defined elsewhere.
Local symbols: not visible outside the object file (e.g. static in C).
Global symbols: visible to other object files.
Static symbol table (.symtab): full symbol info, mainly for linkers/debugging.
Dynamic symbol table (.dynsym): minimal set needed for dynamic linking.

Which symbols can be omitted

Static symbols and debug symbols can be stripped after static linking.
Full .symtab, .strtab, debug sections are optional in final binaries.

Which symbols cannot be omitted

Dynamic symbols (.dynsym) and related relocation info for dynamically linked binaries/libraries.
Needed by the OS loader to resolve symbols at load/run time.

Depending on linking form

Static linking:
- All symbol resolution done at link time.
- Symbols and relocations can be fully stripped afterwards.
Dynamic linking (ELF DLLs / shared libs):
- Resolution deferred to load/run time.
- Dynamic symbols, GOT/PLT, and relocation metadata must remain.
Why:
- Loader must know which symbols to resolve and where to write resolved addresses.

New cards

position (in) dependent

Position-dependent code (PDC)

Assumes a fixed load address.
Absolute addresses or link-time–fixed immediates are embedded in instructions.
Fewer relocations at load time, slightly faster.
Typical in non-PIE executables, static linking.

Position-independent code (PIC)

Can be loaded at any address.
Uses relative addressing (e.g. PC/RIP-relative) and indirection via GOT/PLT.
Requires relocations at load time.
Mandatory for shared libraries and PIE executables.

How to recognize in assembly

PIC:
- PC/RIP-relative addressing (mov x(%rip), reg, lea x(%rip), reg)
- Indirect calls via PLT (call foo@plt)
- Accesses through GOT
PDC:
- Absolute addresses as immediates (mov $0xADDR, reg)
- Direct calls with fixed targets
- No GOT/PLT indirection

Rule of thumb

Any absolute address ⇒ position-dependent.
Only relative addressing + indirection ⇒ position-independent.

New cards

to explain why relocation information is required, which tools rely on it (linker and loader) and how and why in different circumstances (static vs. dynamic linking).

Why relocation information is required

The compiler does not know final addresses of code/data.
→ placeholders om te fixen wanneer de adressen gekend zijn
aka waar een value moet gepatched worden en hoe het te vinden via symbols

Which tools rely on relocation info

Linker: fixt symbolen en applies relocations tijdens static linking

Loader (OS dynamic loader): applies relocations at load/run time for dynamically linked binaries and libraries.

Static linking

All object files and static libraries are combined at link time.
Final relative/absolute addresses become known.
Linker applies all relocations and patches the code/data.
After this, relocation info is no longer needed and can be stripped.

Dynamic linking

at run time dus: adressen worden bekend wanneer programma geladen wordt
linker heeft shit unresolved reloacations laten staan in binary

Loader resolves symbols, computes addresses, and applies relocations at load time (or lazily).
Minimal relocation and dynamic symbol information must remain in the binary (cannot be stripped).

Why behavior differs

Static linking: single fixed layout → relocation once.
Dynamic linking: variable load addresses → relocation required at every program start.
PIC + GOT/PLT reduce relocation work to tables instead of patching code.

New cards

to summarise the roles and types of content of the different types of code and data sections;
- Ordinary code
- Initialization / finalization code
- Statically allocated data
- String data
- Zero-initialized data
- Debug information
- Relocation sections
- Symbol sections
- Global Offset Table (GOT)
- Procedure Linkage Table (PLT)
- Exception handling sections
- C++ RTTI sections

Samenvatting: rollen en inhoud van code- en datasecties

Ordinary code (.text)
Gewone functies en methodes. Uitgevoerd enkel wanneer aangeroepen.
Initialization / finalization code (.init, .fini, init_array, fini_array, ctors, dtors)
Code die automatisch wordt uitgevoerd bij load/unload van een binary of bij OS-events (bv. C++ static constructors, thread setup).
Statically allocated data (.data, .rodata)
Globale/static variabelen met initiële waarde.
- .data: mutable data
- .rodata: read-only data (consts, string literals, vtables, RTTI)
String data
Vaak apart gegroepeerd om padding door alignment te vermijden. Optimalisatie, niet verplicht.
Zero-initialized data (.bss)
Globale/static variabelen met initieel 0.
Geen bytes in het bestand, enkel grootte; geheugen wordt bij load op nul gezet.
Debug information (.debug*)
Source mapping, types, symbolische namen, stack layouts.
Alleen voor debugging, volledig strippable.
Relocation sections (.rel*, .rela*)
Beschrijven waar en hoe adressen gepatcht moeten worden door linker of loader.
Symbol sections (.symtab, .dynsym)
Namen, types en visibility van functies/variabelen.
- .symtab: volledig, optioneel
- .dynsym: minimaal, verplicht voor dynamic linking
Global Offset Table (GOT)
Tabel met absolute adressen van globale data/functies.
Maakt position-independent code mogelijk; loader vult deze in.
Procedure Linkage Table (PLT)
Stubs voor aanroepen van dynamisch gelinkte functies.
PLT → GOT → echte functie.
Doel: PIC, lazy binding, minder relocaties in .text.
Exception handling sections (.eh_frame, …)
Stack unwinding en exception handling.
Nodig voor correcte uitvoering, niet strippable.
C++ RTTI sections
Run-time type info voor dynamic_cast en exceptions.
Vereist voor correct gedrag, niet strippable.

New cards

to explain how the PLT is used and what goal is serves.

PLT: role and goal

What it is: Procedure Linkage Table, a set of stubs for calling dynamically linked functions.
How it works:
- Call instruction jumps to a PLT stub.
- Stub uses GOT entry to jump to the real function.
- On first call, loader resolves the symbol and updates the GOT (lazy binding).
Goal:
- Enable dynamic linking with position-independent code.
- Avoid patching call sites in .text.
- Reduce relocation work and improve performance.
  SHP Executable Binaries and Lib…

New cards

Base address (BA), virtual address (VA), relative virtual address (RVA)

Base address (BA): preferred address where the PE file is assumed to be loaded. Chosen at link time.
Virtual address (VA): actual address in the process virtual memory after loading.
Relative virtual address (RVA): offset relative to the base address.
- Formula: RVA = VA − BA
- Used inside the binary to stay independent of the actual load address.
  SHP Executable Binaries and Lib…

Relation

If loaded at preferred BA: VA = BA + RVA, no relocation needed.
If loaded elsewhere: VA changes, RVAs stay constant, fixups are applied.
SHP Executable Binaries and Lib…

New cards

Exports

Symbols (functions/data) a PE file makes available to other modules.
Stored in the Export Address Table (EAT).
Each export has:
- name
- ordinal (index)
- RVA of the symbol.
  SHP Executable Binaries and Lib…

Imports

Symbols a PE file needs from other DLLs.
Listed per DLL in the Import Address Table (IAT).
Each entry is a placeholder that will receive the resolved VA of the imported symbol.
SHP Executable Binaries and Lib…

Fixups (relocations)

List of locations that must be patched if the file is not loaded at its base address.
Applied by the loader when VA ≠ BA.
Necessary because PE code/data is position-dependent by default.
SHP Executable Binaries and Lib…

How IAT and EAT operate during dynamic loading

Loader maps DLLs into memory.
For each imported DLL:
- Loader scans the importing binary’s IAT.
- Looks up symbols in the DLL’s EAT (via name + ordinal hint).
Loader writes resolved VAs into the IAT.
Code calls imported functions indirectly via the IAT.
If DLL not loaded at BA, fixups are applied to absolute addresses.
SHP Executable Binaries and Lib…

Key idea

EAT: tells where symbols are in a DLL.
IAT: tells where to write resolved addresses in the importing binary.
Fixups ensure correctness when load addresses differ.
SHP Executable Binaries and Lib…