Digital Forensics & Reverse Engineering – Lecture 0x0A Comprehensive Notes

Digital Forensics & Incident Response (DFIR)

  • Definition: Field within cybersecurity dedicated to the identification, investigation, containment, eradication and remediation of cyber-attacks.

    • Integrates both Digital Forensics (evidence acquisition & interpretation) and Incident Response (operational reaction to an active threat).

  • Drivers for growth:

    • Escalating volume, sophistication and automation of cyber-attacks.

    • Proliferation of heterogeneous endpoints (servers, workstations, IoT, cloud VMs, mobile, etc.).

  • Organisational structure:

    • CIRT / CSIRT (Cyber/Computer Incident Response Team) → multidisciplinary team that triages, responds and coordinates recovery.

    • Relies heavily on forensic artefacts to make evidence-based decisions.

    • Typical stakeholders: SOC analysts, threat hunters, legal/HR, PR, executive management.

NIST SP-800-86 Forensic Process Phases

  1. Collection – acquire the data (live memory, disk, logs, network traffic) while maintaining integrity.

  2. Examination – forensically process raw data (filtering, decrypting, carving, normalising formats).

  3. Analysis – draw conclusions, correlate artefacts, recreate timelines, attribute activity.

  4. Reporting – document methods, findings, chain-of-custody and recommended actions.

(Phases repeat iteratively as new leads emerge.)

Forensic Areas of Practice

  • File-System Forensics

  • Memory Forensics

  • Malware Analysis

  • Network Forensics

  • Mobile & IoT Forensics

  • Cloud Forensics (multi-tenant, API-driven evidence)

  • Log Analysis (system, application, authentication, audit, security devices)

Digital forensics is much more than just hard-drive analysis; any digital substrate that produces artefacts can be a target.

Logs Commonly Leveraged

  • System logs – kernel/OS events, crashes, reboots.

  • Application logs – user interactions, error traces.

  • Security-device logs – firewalls, IDS/IPS, EDR sensors.

  • Authentication logssuccessful / failed  logons\text{successful / failed} \;\text{logons}, MFA status.

  • Network-device logs – router/switch flow data, configuration events.

  • Audit logs – privileged actions, policy changes, compliance checkpoints.

File-Level Forensic Toolkit Quick-Reference

  • libmagic / file → signature-based format identification (“magic bytes”).

    • file screenshot.png ⇒ returns PNG image data, 1920×1080, 8-bit/color …

  • Carving with dd – extract embedded objects

    • dd if=container.xxx of=payload.xxx bs=1 skip=<offset> count=<len>

  • strings – pulls ASCII/Unicode sequences: strings -o screenshot.png.

  • hexdump / xxd – binary & hex inspection; useful for manual header checks.

  • exiftool – rich metadata (EXIF, IPTC, XMP) for images, docs, video, etc.

Ethical note: ensure you respect privacy/SOC policies; metadata may reveal PII.

Network Forensics

  • Packet Trace (PCAP)

    • Captured via tcpdump, Wireshark, or hardware taps.

    • Contains full payloads → can reconstruct sessions, files, voip calls.

  • Network Logs

    • High-level events (src/dst, port, proto) but no payload.

    • Complementary to PCAP for long-term retention.

Packet Capture Techniques

  • Network Tap – passive inline optical/electrical splitter; zero packet loss, transparent.

  • Port Mirroring / SPAN – switch clones selected traffic to a monitor port.

  • Wireless Sniffing – monitor mode interface captures 802.11 frames.

Analysis Workflow

  1. Import PCAP into Wireshark; apply protocol & display filters.

  2. Triage (e.g., find HTTP POSTs, unusual DNS, malformed TLS handshakes).

  3. Reassemble streams, export objects, carve malicious binaries.

  4. Correlate timestamps with host logs for attribution.

Steganography & Steganalysis

  • Steganography = art/science of hiding data within innocuous carriers so that the existence of the message is concealed.

    • Not equivalent to encryption (crypto scrambles content but admits its presence).

    • Not equivalent to watermarking (fingerprinting) where an external index describes the file.

Motivations & Threat Landscape

  • Intellectual-property protection (embed author ID, anti-piracy codes).

  • Covert malware transport (payload inside a harmless JPEG in a spear-phish).

  • Data exfiltration from locked-down networks (post images w/ hidden corporate IP).

Techniques

Text Carriers
  • Line-shift coding: each text line moved up/down slightly –4040 lines ⇒ 40×6=24040\times6=240 code points.

  • Word-spacing: alter inter-word gaps to encode bits.

  • Character micro-changes: tiny glyph perturbations (PDF, PostScript) invisible to reader.

Image Carriers
  • Spatial-domain (LSB) – flip the Least-Significant Bit of pixel channels; imperceptible noise.

  • Colour-plane separation – embed across RGB components.

  • Frequency-domain (DCT/FFT) – inject bits into high-frequency coefficients (robust against cropping, but beware JPEG compression attacking same coefficients).

Network Steganography
  • Embed payloads in sequence numbers, timing gaps, or uncommon header fields.

Steganalysis

  • Goal: detect & extract covert data.

  • Methods:

    • Compare suspected file vs. known-good baseline (size, entropy, colour histograms).

    • Statistical tests (chi-square on LSB, RS analysis).

    • Visual inspection for artefacts (misaligned blocks, resolution loss).

    • Machine-learning classifiers on large corpora of “clean” vs “stego” samples.

Ethical/practical implication: false positives can be high → corroborate with additional evidence before attribution.

Reverse Engineering (RE) & Malware Analysis

  • Definition: Deconstruct a physical/software artefact to learn its design, behaviour, vulnerabilities, or to enable interoperability.

  • Core use-cases: vulnerability discovery, malware triage, patch diffing, legacy system maintenance, audit of closed-source products.

Legal Considerations (Australia)

  • Permitted when performed for:

    • Interoperability

    • Error correction

    • Security testing / research (malware, vuln analysis)

  • Prohibited when intent is:

    • Selling a competing clone, cracking copy protection, distributing licence bypasses.

    • Obtaining unauthorised access to computers.

    • (Always consult counsel; laws vary by jurisdiction.)

Compilation Pipeline (C/C++)

Source (.c/.cpp/.h)Step 1Pre-processorExpanded (.i/.ii)Step 2CompilerAssembly (.s)Step 3AssemblerObject (.o)Step 4LinkerExecutable (.exe/.out)\text{Source (.c/.cpp/.h)} \xrightarrow[\text{Step 1}]{\text{Pre-processor}} \text{Expanded (.i/.ii)} \xrightarrow[\text{Step 2}]{\text{Compiler}} \text{Assembly (.s)} \xrightarrow[\text{Step 3}]{\text{Assembler}} \text{Object (.o)} \xrightarrow[\text{Step 4}]{\text{Linker}} \text{Executable (.exe/.out)}

  • Static libraries .a.a or .lib.lib may be merged during linking.

  • Understanding each stage helps map binary artefacts back to source constructs.

Learning RE ≈ Learning a New Language

  • Vocabulary – mnemonics (mov, cmp, jmp).

  • Grammar – addressing modes, calling conventions (ABI).

  • Idioms/patterns – compiler optimisations, prologue/epilogue forms.

  • Toolchain dialects – GCC vs MSVC, O-levels, inline-function merging.

x86 (32-bit) Architecture Refresher

  • Special registers:

    • EIPEIP – instruction pointer.

    • ESPESP – stack pointer (top of current stack frame).

    • EBPEBP – base pointer (frame pointer; usually ESP+4ESP+4 at entry).

  • Stack layout (high addresses ↓ low): arguments → return address → old EBPEBP → locals.

Intel vs AT&T Syntax
  • Intel: mov eax, 0xCA (dest, src); [ebp+0x8] dereferences.

  • AT&T: movl $0xCA, %eax (src, dest); -0x8(%ebp) dereferences.

  • Course uses Intel.

Common Instruction Categories
  • Arithmetic: add, sub, mul, div (quotient in EAXEAX, remainder in EDXEDX).

  • Data movement: mov, lea (load effective address).

  • Control flow: call, ret, conditional jumps je, jl, etc.

Example Walk-Through

C snippet:

int main(){
   int year = 2019;
   printf("hello csf %d\n", year);
   return 0;
}

→ Compiled assembly includes:

  • Prologue aligning stack to 1616-byte boundary (and esp, 0xfffffff0).

  • Local variable allocation (sub esp, 0x14).

  • Value assignments via mov / add.

  • Epilogue (leave, ret).

Demonstrated in slides with additional conditional if(eax < a) & printf.

RE Without Source

  • Use disassemblers and decompilers to transform binaries back into human-readable forms.

    • Disassembler ⇒ assembly.

    • Decompiler ⇒ high-level C-like pseudo-code.

  • Popular tools: IDA Pro, Binary Ninja, Ghidra, HIEW, HT-Editor.

Static vs Dynamic Analysis
  • Static: inspect code without execution.

    • Pros: safer; covers all paths; works offline.

    • Techniques: string extraction, control-flow graph mapping, signature matching.

  • Dynamic: run program (sandbox, emulator, debugger) and monitor behaviour.

    • Pros: reveals unpacking, runtime decryption, environment checks; quick IOC extraction.

    • Must handle anti-debugging, VM detection.

Ghidra Highlights

  • Open-source NSA tool (~1.2M1.2\text{M} LOC, Java).

  • Runs on Windows, macOS, Linux.

  • Currently static-only.

  • Key UI components (slides):

    • Program Tree, Symbol Tree, Data Type Manager, Listing (disassembly), Decompiler View, CFG (Control Flow Graph).

  • Supports scripting (Jython/Java) for automation.

Ethical note: Always analyse malware in isolated labs; respect software EULAs.

CTF vs Real-World Forensics Perspective

  • CTF tasks emphasise focused puzzles:

    • File-format quirks, stego, memory images, single PCAP, sharp flags.

  • Production forensics emphasises contextual evidence:

    • Chain-of-custody, timeline reconstruction, insider-threat behaviour, metadata correlation.

    • Must handle volume, incomplete data, legal admissibility.