Digital Forensics & Reverse Engineering – Lecture 0x0A Comprehensive Notes

Digital Forensics & Incident Response (DFIR)

Definition: Field within cybersecurity dedicated to the identification, investigation, containment, eradication and remediation of cyber-attacks.
- Integrates both Digital Forensics (evidence acquisition & interpretation) and Incident Response (operational reaction to an active threat).
Drivers for growth:
- Escalating volume, sophistication and automation of cyber-attacks.
- Proliferation of heterogeneous endpoints (servers, workstations, IoT, cloud VMs, mobile, etc.).
Organisational structure:
- CIRT / CSIRT (Cyber/Computer Incident Response Team) → multidisciplinary team that triages, responds and coordinates recovery.
- Relies heavily on forensic artefacts to make evidence-based decisions.
- Typical stakeholders: SOC analysts, threat hunters, legal/HR, PR, executive management.

NIST SP-800-86 Forensic Process Phases

Collection – acquire the data (live memory, disk, logs, network traffic) while maintaining integrity.
Examination – forensically process raw data (filtering, decrypting, carving, normalising formats).
Analysis – draw conclusions, correlate artefacts, recreate timelines, attribute activity.
Reporting – document methods, findings, chain-of-custody and recommended actions.

(Phases repeat iteratively as new leads emerge.)

Forensic Areas of Practice

File-System Forensics
Memory Forensics
Malware Analysis
Network Forensics
Mobile & IoT Forensics
Cloud Forensics (multi-tenant, API-driven evidence)
Log Analysis (system, application, authentication, audit, security devices)

Digital forensics is much more than just hard-drive analysis; any digital substrate that produces artefacts can be a target.

Logs Commonly Leveraged

System logs – kernel/OS events, crashes, reboots.
Application logs – user interactions, error traces.
Security-device logs – firewalls, IDS/IPS, EDR sensors.
Authentication logs – $\text{successful / failed} \;\text{logons}$ , MFA status.
Network-device logs – router/switch flow data, configuration events.
Audit logs – privileged actions, policy changes, compliance checkpoints.

File-Level Forensic Toolkit Quick-Reference

libmagic / file → signature-based format identification (“magic bytes”).
- file screenshot.png ⇒ returns PNG image data, 1920×1080, 8-bit/color …
Carving with dd – extract embedded objects
- dd if=container.xxx of=payload.xxx bs=1 skip=<offset> count=<len>
strings – pulls ASCII/Unicode sequences: strings -o screenshot.png.
hexdump / xxd – binary & hex inspection; useful for manual header checks.
exiftool – rich metadata (EXIF, IPTC, XMP) for images, docs, video, etc.

Ethical note: ensure you respect privacy/SOC policies; metadata may reveal PII.

Network Forensics

Packet Trace (PCAP)
- Captured via tcpdump, Wireshark, or hardware taps.
- Contains full payloads → can reconstruct sessions, files, voip calls.
Network Logs
- High-level events (src/dst, port, proto) but no payload.
- Complementary to PCAP for long-term retention.

Packet Capture Techniques

Network Tap – passive inline optical/electrical splitter; zero packet loss, transparent.
Port Mirroring / SPAN – switch clones selected traffic to a monitor port.
Wireless Sniffing – monitor mode interface captures 802.11 frames.

Analysis Workflow

Import PCAP into Wireshark; apply protocol & display filters.
Triage (e.g., find HTTP POSTs, unusual DNS, malformed TLS handshakes).
Reassemble streams, export objects, carve malicious binaries.
Correlate timestamps with host logs for attribution.

Steganography & Steganalysis

Steganography = art/science of hiding data within innocuous carriers so that the existence of the message is concealed.
- Not equivalent to encryption (crypto scrambles content but admits its presence).
- Not equivalent to watermarking (fingerprinting) where an external index describes the file.

Motivations & Threat Landscape

Intellectual-property protection (embed author ID, anti-piracy codes).
Covert malware transport (payload inside a harmless JPEG in a spear-phish).
Data exfiltration from locked-down networks (post images w/ hidden corporate IP).

Techniques

Text Carriers

Line-shift coding: each text line moved up/down slightly – $40$ lines ⇒ $40\times6=240$ code points.
Word-spacing: alter inter-word gaps to encode bits.
Character micro-changes: tiny glyph perturbations (PDF, PostScript) invisible to reader.

Image Carriers

Spatial-domain (LSB) – flip the Least-Significant Bit of pixel channels; imperceptible noise.
Colour-plane separation – embed across RGB components.
Frequency-domain (DCT/FFT) – inject bits into high-frequency coefficients (robust against cropping, but beware JPEG compression attacking same coefficients).

Network Steganography

Embed payloads in sequence numbers, timing gaps, or uncommon header fields.

Steganalysis

Goal: detect & extract covert data.
Methods:
- Compare suspected file vs. known-good baseline (size, entropy, colour histograms).
- Statistical tests (chi-square on LSB, RS analysis).
- Visual inspection for artefacts (misaligned blocks, resolution loss).
- Machine-learning classifiers on large corpora of “clean” vs “stego” samples.

Ethical/practical implication: false positives can be high → corroborate with additional evidence before attribution.

Reverse Engineering (RE) & Malware Analysis

Definition: Deconstruct a physical/software artefact to learn its design, behaviour, vulnerabilities, or to enable interoperability.
Core use-cases: vulnerability discovery, malware triage, patch diffing, legacy system maintenance, audit of closed-source products.

Legal Considerations (Australia)

Permitted when performed for:
- Interoperability
- Error correction
- Security testing / research (malware, vuln analysis)
Prohibited when intent is:
- Selling a competing clone, cracking copy protection, distributing licence bypasses.
- Obtaining unauthorised access to computers.
- (Always consult counsel; laws vary by jurisdiction.)

Compilation Pipeline (C/C++)

$\text{Source (.c/.cpp/.h)} \xrightarrow[\text{Step 1}]{\text{Pre-processor}} \text{Expanded (.i/.ii)} \xrightarrow[\text{Step 2}]{\text{Compiler}} \text{Assembly (.s)} \xrightarrow[\text{Step 3}]{\text{Assembler}} \text{Object (.o)} \xrightarrow[\text{Step 4}]{\text{Linker}} \text{Executable (.exe/.out)}$

Static libraries $.a$ or $.lib$ may be merged during linking.
Understanding each stage helps map binary artefacts back to source constructs.

Learning RE ≈ Learning a New Language

Vocabulary – mnemonics (mov, cmp, jmp).
Grammar – addressing modes, calling conventions (ABI).
Idioms/patterns – compiler optimisations, prologue/epilogue forms.
Toolchain dialects – GCC vs MSVC, O-levels, inline-function merging.

x86 (32-bit) Architecture Refresher

Special registers:
- $EIP$ – instruction pointer.
- $ESP$ – stack pointer (top of current stack frame).
- $EBP$ – base pointer (frame pointer; usually $ESP+4$ at entry).
Stack layout (high addresses ↓ low): arguments → return address → old $EBP$ → locals.

Intel vs AT&T Syntax

Intel: mov eax, 0xCA (dest, src); [ebp+0x8] dereferences.
AT&T: movl $0xCA, %eax (src, dest); -0x8(%ebp) dereferences.
Course uses Intel.

Common Instruction Categories

Arithmetic: add, sub, mul, div (quotient in $EAX$ , remainder in $EDX$ ).
Data movement: mov, lea (load effective address).
Control flow: call, ret, conditional jumps je, jl, etc.

Example Walk-Through

C snippet:

int main(){
   int year = 2019;
   printf("hello csf %d\n", year);
   return 0;
}

→ Compiled assembly includes:

Prologue aligning stack to $16$ -byte boundary (and esp, 0xfffffff0).
Local variable allocation (sub esp, 0x14).
Value assignments via mov / add.
Epilogue (leave, ret).

Demonstrated in slides with additional conditional if(eax < a) & printf.

RE Without Source

Use disassemblers and decompilers to transform binaries back into human-readable forms.
- Disassembler ⇒ assembly.
- Decompiler ⇒ high-level C-like pseudo-code.
Popular tools: IDA Pro, Binary Ninja, Ghidra, HIEW, HT-Editor.

Static vs Dynamic Analysis

Static: inspect code without execution.
- Pros: safer; covers all paths; works offline.
- Techniques: string extraction, control-flow graph mapping, signature matching.
Dynamic: run program (sandbox, emulator, debugger) and monitor behaviour.
- Pros: reveals unpacking, runtime decryption, environment checks; quick IOC extraction.
- Must handle anti-debugging, VM detection.

Ghidra Highlights

Open-source NSA tool (~ $1.2\text{M}$ LOC, Java).
Runs on Windows, macOS, Linux.
Currently static-only.
Key UI components (slides):
- Program Tree, Symbol Tree, Data Type Manager, Listing (disassembly), Decompiler View, CFG (Control Flow Graph).
Supports scripting (Jython/Java) for automation.

Ethical note: Always analyse malware in isolated labs; respect software EULAs.

CTF vs Real-World Forensics Perspective

CTF tasks emphasise focused puzzles:
- File-format quirks, stego, memory images, single PCAP, sharp flags.
Production forensics emphasises contextual evidence:
- Chain-of-custody, timeline reconstruction, insider-threat behaviour, metadata correlation.
- Must handle volume, incomplete data, legal admissibility.