Tor Network and Encryption Tools Lecture

Introduction to the Tor Network and Onion Routing

  • The Tor Network Definition: A service designed to provide anonymous web browsing by masking online identities and activities from attackers.

  • Onion Routing Premise: The core mechanism of the Tor network, which layers encryption around data to ensure privacy.

  • The Research Challenge: Determining if the service is truly 100%100\% secure or if its fundamental design contains inherent, unpatchable weaknesses.

  • Network Architecture:     * Relays: Machines operated by volunteers who contribute bandwidth.     * Circuit Construction: A path through the network for encrypted data to travel from a user to a server.     * Encryption Layers: Data is encrypted in layers; the number of layers typically corresponds to the number of relays the data passes through, which is usually 33.     * Onion Client: Local software required to access the network.

  • Types of Relays in a Standard Circuit:     1. Entry Relay: Only knows the user’s IP address.     2. Middle Relay: Knows neither the data nor the final destination; only knows the preceding and succeeding relay in the hop.     3. Exit Relay: Only knows the destination address of the server.

  • The Directory Server: Tasked with determining the path and relay selections for the network.

Vulnerabilities and Attacks on Tor Infrastructure

  • Traffic Correlation Attacks:     * Mechanism: An attacker observes both ends of a circuit simultaneously, matching traffic flows based on timing and volume (e.g., matching a specific burst of data entering and exiting).     * Evolution: While historically deemed impractical due to millions of daily users, modern research employs statistical tools and deep learning to optimize matches.

  • Infrastructure Manipulation (2015 Study):     * Attackers can manipulate the underlying physical infrastructure of the internet to insert themselves into the data path.     * Allows observation of traffic without touching a single Tor relay.     * Detection: Tor cannot detect or prevent this because it assumes the routing infrastructure is secure.

  • DeepCore and Deep Learning (2018 Study):     * DeepCore: A deep learning model trained on real Tor traffic to identify matched flows.     * Capabilities: Recognizes specific behaviors, congestion patterns, and packet timing signatures.     * Performance: Achieved 96%96\% accuracy compared to only 4%4\% accuracy using earlier statistical methods.

  • Active Relay Blocking (2022 Study):     * Method: An attacker actively forces victims into controlled relays by selectively blocking legitimate ones.     * Results: 90%90\% of all circuits were compromised in tests; users were identified with 100%100\% accuracy in under 40seconds40\,\text{seconds}.     * Bypassing Guarantees: These attacks can bypass "guard rotation," Tor's protection mechanism that cycles entry relays every few months.

Malicious Actors and the Sybil Attack Threat

  • Definition of Cyber Attack (Sybil Attack): A scenario where one entity operates multiple compromised or malicious relays to gain a larger share of network traffic.

  • Network Analysis (2016 Study): Analyzed 9years9\,\text{years} of archive data (> 100\,\text{gigabytes}) to document malicious relay behavior.

  • The CyberHunter Tool: A detection tool built to find coordinated relay groups.     * Case Study: Found a group of exit relays silently replacing Bitcoin addresses in web traffic to redirect transactions to attacker-controlled wallets.

  • The CAC17 Case Study (2021):     * A single entity operated undetected for 4years4\,\text{years}.     * At its peak, it controlled over 900900 relays across 5050 networks.     * Impact: A user had a 10%10\% chance of passing through its guard relays and a 24%24\% chance of passing through its middle relays.

  • Economic Low Barriers (2025 Study):     * The top 1010 bandwidth contributors control over half of all exit traffic.     * Operating just 300300 relays across two perceived identities could compromise nearly 1%1\% of all circuits (tens of thousands of circuits).     * Cost: Approximately $400\$400 per month in cloud hosting allows an attacker to observe over 13%13\% of exit traffic.

  • Current Countermeasures:     * Limiting relays per IP address.     * Minimum uptime requirements.     * "Trusted flags" for relays.     * Avoiding relays from the same network block in a single path.

Onion Services and Server-Side Anonymity

  • Onion Services: Websites and services existing entirely within Tor with no public IP address.

  • Predictability Vulnerabilities (2013 Study):     * Onion services register with specific relays based on publicly known information (like time connectivity).     * Attackers can mathematically calculate and place their own relays in those positions in advance.

  • De-anonymization Categories:     1. Traffic Analysis.     2. Exploiting Implementation Vulnerabilities.     3. Bad Operational Security (OpSec).

  • Carnegie Mellon University (CMU) Attack (2014):     * Researchers controlled both directory and entry positions.     * Used non-standard signals to link users to the specific services they visited; ran undetected for months.

  • The Human Factor (Operational Security):     * Most legal/illegal marketplaces were compromised due to human error (e.g., using personal emails in forum posts or exposing info in routine site communications).     * Response Leakage: Details in server responses can be matched against public internet scanning tools to find the physical server IP.

Fundamentals of Data Encryption Tools

  • Data Breach Statistics: Mention of 21,000,00021,000,000 stolen passwords, 17,000,00017,000,000 leaked emails, and 1TB1\,\text{TB} of data lost in single attacks.

  • Encryption Types:     1. File Encryption: Locking individual documents (e.g., financial or legal files).     2. Container Encryption: Creating an encrypted vault or folder within a standard system (e.g., Veracrypt).     3. Full Disk Encryption (FDE): Locking the entire hard drive; requires a password at boot (e.g., LUKS, Diskryptor).

Deep Dive into Encryption Software: LUKS, Diskryptor, VeraCrypt, and AxCrypt

  • LUKS (Linux Unified Key Setup):     * Platform: Standard for Linux distributions (Debian, Ubuntu, Fedora).     * Scope: Full disk encryption.     * Interface: No Graphical User Interface (GUI); managed entirely via command line.     * Target User: System administrators and technical users.

  • Diskryptor:     * Platform: Windows.     * Scope: Full disk or specific partitions.     * Algorithms: Supports AES\text{AES}, Serpent\text{Serpent}, and TwoFish\text{TwoFish}.     * Interface: Includes a GUI for easier use.     * Performance Note: Slightly slower than VeraCrypt in benchmark testing.

  • VeraCrypt:     * Platform: Windows, Linux, and MacOS.     * Scope: Both full disk and container-based encryption.     * Hidden Volumes: Allows for a secret volume inside an encrypted volume. An user can provide the "outer" password if coerced, while the highly sensitive data remains safe in the secret "inner" volume.     * Algorithms: AES\text{AES}, Serpent\text{Serpent}, and TwoFish\text{TwoFish}; can be used in layered formats.

  • AxCrypt (AX script):     * Platform: Windows focus.     * Scope: Individual file encryption.     * Features: Auto-encryption (files re-encrypt automatically after editing) and easy right-click integration in shell.     * Standards: Uses AES-128\text{AES-128} or AES-256\text{AES-256}.

Critical Security Risks and Human Factors

  • Weak Passwords: Encryption strength is irrelevant if the password is easy to guess. Recommendations include at least 1515 characters with symbols and numbers.

  • Physical Access/Power Status: Encryption keys are held in memory while the machine is on. An unlocked, powered-on machine is vulnerable to physical bypass.

  • Human Error: Writing passwords on sticky notes, storing them in unencrypted files, or sharing them over email.

  • Key Size Selection: Choosing AES-256\text{AES-256} for sensitive business/legal data over the lower AES-128\text{AES-128} standard.

Questions & Discussion

  • Balance of Problems and Solutions: The moderator advised that research presentations should balance the identification of vulnerabilities with proposed solutions, noting that the Tor presentation was heavy on flaws but light on remediation.

  • How DeepCore Works:     * Prompt: Can you explain how it works?     * Response: It is a training model that analyzes real Tor traffic on live networks to identify patterns and match both ends of a circuit, effectively de-anonymizing the user and the service they access.

  • De-anonymization Mechanism in DeepCore:     * Prompt: How does de-anonymization work in DeepCore?     * Response: It matches characteristics of ingoing and outgoing data by counting packet volume and tracking the timing of data transmission and receipt.

  • Relay Node Configuration:     * Prompt: Does each node have a specific start or end point?     * Response: While users can configure their relays for specific roles, generally any relay can function as an entry, middle, or exit node.

  • Research Depth on Encryption Tools:     * Feedback: The moderator suggested that instead of an overview of many tools, a research presentation should focus on one tool in depth, examining its implementation and identifying specific vulnerabilities an attacker could exploit.

  • Layered Encryption Complexity:     * Prompt: How can you combine different encryptions like Serpent and AES\text{AES}?     * Response: These are implemented in a layered format at the implementation level to provide higher security by stacking the algorithms.