Binaries - binary file format

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/7

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

8 Terms

1
New cards

symbols:

  • role of symbols

  • types (2×3)

  • omitted vs not omitted

  • static vs dyn linking

Role of symbols

  • Symbols identify functions and global variables across object files.

  • They encode name, type (function/data), visibility (local/global), and optionally location.

  • Used by linker/loader for symbol resolution and relocation.

Types of symbols

  • Defined symbols: element is defined in the object file, location known.

  • Undefined symbols (UND): element is referenced but defined elsewhere.

  • Local symbols: not visible outside the object file (e.g. static in C).

  • Global symbols: visible to other object files.

  • Static symbol table (.symtab): full symbol info, mainly for linkers/debugging.

  • Dynamic symbol table (.dynsym): minimal set needed for dynamic linking.

Which symbols can be omitted

  • Static symbols and debug symbols can be stripped after static linking.

  • Full .symtab, .strtab, debug sections are optional in final binaries.

Which symbols cannot be omitted

  • Dynamic symbols (.dynsym) and related relocation info for dynamically linked binaries/libraries.

  • Needed by the OS loader to resolve symbols at load/run time.

Depending on linking form

  • Static linking:

    • All symbol resolution done at link time.

    • Symbols and relocations can be fully stripped afterwards.

  • Dynamic linking (ELF DLLs / shared libs):

    • Resolution deferred to load/run time.

    • Dynamic symbols, GOT/PLT, and relocation metadata must remain.

  • Why:

    • Loader must know which symbols to resolve and where to write resolved addresses.

2
New cards

position (in) dependent

Position-dependent code (PDC)

  • Assumes a fixed load address.

  • Absolute addresses or link-time–fixed immediates are embedded in instructions.

  • Fewer relocations at load time, slightly faster.

  • Typical in non-PIE executables, static linking.

Position-independent code (PIC)

  • Can be loaded at any address.

  • Uses relative addressing (e.g. PC/RIP-relative) and indirection via GOT/PLT.

  • Requires relocations at load time.

  • Mandatory for shared libraries and PIE executables.

How to recognize in assembly

  • PIC:

    • PC/RIP-relative addressing (mov x(%rip), reg, lea x(%rip), reg)

    • Indirect calls via PLT (call foo@plt)

    • Accesses through GOT

  • PDC:

    • Absolute addresses as immediates (mov $0xADDR, reg)

    • Direct calls with fixed targets

    • No GOT/PLT indirection

Rule of thumb

  • Any absolute address ⇒ position-dependent.

  • Only relative addressing + indirection ⇒ position-independent.

3
New cards

to explain why relocation information is required, which tools rely on it (linker and loader) and how and why in different circumstances (static vs. dynamic linking).

Why relocation information is required

  • The compiler does not know final addresses of code/data.

  • → placeholders om te fixen wanneer de adressen gekend zijn

  • aka waar een value moet gepatched worden en hoe het te vinden via symbols

Which tools rely on relocation info

  • Linker: fixt symbolen en applies relocations tijdens static linking

  • Loader (OS dynamic loader): applies relocations at load/run time for dynamically linked binaries and libraries.

Static linking

  • All object files and static libraries are combined at link time.

  • Final relative/absolute addresses become known.

  • Linker applies all relocations and patches the code/data.

  • After this, relocation info is no longer needed and can be stripped.

Dynamic linking

  • at run time dus: adressen worden bekend wanneer programma geladen wordt

  • linker heeft shit unresolved reloacations laten staan in binary

  • Loader resolves symbols, computes addresses, and applies relocations at load time (or lazily).

  • Minimal relocation and dynamic symbol information must remain in the binary (cannot be stripped).

Why behavior differs

  • Static linking: single fixed layout → relocation once.

  • Dynamic linking: variable load addresses → relocation required at every program start.

  • PIC + GOT/PLT reduce relocation work to tables instead of patching code.

4
New cards
  • to summarise the roles and types of content of the different types of code and data sections;

    • Ordinary code

    • Initialization / finalization code

    • Statically allocated data

    • String data

    • Zero-initialized data

    • Debug information

    • Relocation sections

    • Symbol sections

    • Global Offset Table (GOT)

    • Procedure Linkage Table (PLT)

    • Exception handling sections

    • C++ RTTI sections

Samenvatting: rollen en inhoud van code- en datasecties

  • Ordinary code (.text)
    Gewone functies en methodes. Uitgevoerd enkel wanneer aangeroepen.

  • Initialization / finalization code (.init, .fini, init_array, fini_array, ctors, dtors)
    Code die automatisch wordt uitgevoerd bij load/unload van een binary of bij OS-events (bv. C++ static constructors, thread setup).

  • Statically allocated data (.data, .rodata)
    Globale/static variabelen met initiële waarde.

    • .data: mutable data

    • .rodata: read-only data (consts, string literals, vtables, RTTI)

  • String data
    Vaak apart gegroepeerd om padding door alignment te vermijden. Optimalisatie, niet verplicht.

  • Zero-initialized data (.bss)
    Globale/static variabelen met initieel 0.
    Geen bytes in het bestand, enkel grootte; geheugen wordt bij load op nul gezet.

  • Debug information (.debug*)
    Source mapping, types, symbolische namen, stack layouts.
    Alleen voor debugging, volledig strippable.

  • Relocation sections (.rel*, .rela*)
    Beschrijven waar en hoe adressen gepatcht moeten worden door linker of loader.

  • Symbol sections (.symtab, .dynsym)
    Namen, types en visibility van functies/variabelen.

    • .symtab: volledig, optioneel

    • .dynsym: minimaal, verplicht voor dynamic linking

  • Global Offset Table (GOT)
    Tabel met absolute adressen van globale data/functies.
    Maakt position-independent code mogelijk; loader vult deze in.

  • Procedure Linkage Table (PLT)
    Stubs voor aanroepen van dynamisch gelinkte functies.
    PLT → GOT → echte functie.
    Doel: PIC, lazy binding, minder relocaties in .text.

  • Exception handling sections (.eh_frame, …)
    Stack unwinding en exception handling.
    Nodig voor correcte uitvoering, niet strippable.

  • C++ RTTI sections
    Run-time type info voor dynamic_cast en exceptions.
    Vereist voor correct gedrag, niet strippable.

5
New cards
  • to explain how the PLT is used and what goal is serves.  

PLT: role and goal

  • What it is: Procedure Linkage Table, a set of stubs for calling dynamically linked functions.

  • How it works:

    • Call instruction jumps to a PLT stub.

    • Stub uses GOT entry to jump to the real function.

    • On first call, loader resolves the symbol and updates the GOT (lazy binding).

  • Goal:

    • Enable dynamic linking with position-independent code.

    • Avoid patching call sites in .text.

    • Reduce relocation work and improve performance.

      SHP Executable Binaries and Lib…

6
New cards

7
New cards

Base address (BA), virtual address (VA), relative virtual address (RVA)

  • Base address (BA): preferred address where the PE file is assumed to be loaded. Chosen at link time.

  • Virtual address (VA): actual address in the process virtual memory after loading.

  • Relative virtual address (RVA): offset relative to the base address.

    • Formula: RVA = VA − BA

    • Used inside the binary to stay independent of the actual load address.

      SHP Executable Binaries and Lib…

Relation

  • If loaded at preferred BA: VA = BA + RVA, no relocation needed.

  • If loaded elsewhere: VA changes, RVAs stay constant, fixups are applied.

    SHP Executable Binaries and Lib…

8
New cards

Exports

  • Symbols (functions/data) a PE file makes available to other modules.

  • Stored in the Export Address Table (EAT).

  • Each export has:

    • name

    • ordinal (index)

    • RVA of the symbol.

      SHP Executable Binaries and Lib…

Imports

  • Symbols a PE file needs from other DLLs.

  • Listed per DLL in the Import Address Table (IAT).

  • Each entry is a placeholder that will receive the resolved VA of the imported symbol.

    SHP Executable Binaries and Lib…

Fixups (relocations)

  • List of locations that must be patched if the file is not loaded at its base address.

  • Applied by the loader when VA ≠ BA.

  • Necessary because PE code/data is position-dependent by default.

    SHP Executable Binaries and Lib…

How IAT and EAT operate during dynamic loading

  1. Loader maps DLLs into memory.

  2. For each imported DLL:

    • Loader scans the importing binary’s IAT.

    • Looks up symbols in the DLL’s EAT (via name + ordinal hint).

  3. Loader writes resolved VAs into the IAT.

  4. Code calls imported functions indirectly via the IAT.

  5. If DLL not loaded at BA, fixups are applied to absolute addresses.

    SHP Executable Binaries and Lib…

Key idea

  • EAT: tells where symbols are in a DLL.

  • IAT: tells where to write resolved addresses in the importing binary.

  • Fixups ensure correctness when load addresses differ.

    SHP Executable Binaries and Lib…