Study Notes on Secure Outsourcing and Cryptography
Overview of Secure Outsourcing and Related Concepts
Context and Motivation:
The shift toward cloud computing necessitates outsourcing data to third-party servers. The primary challenge is maintaining data confidentiality and access pattern privacy concurrently with high performance.
The model typically assumes a semi-honest (honest-but-curious) server, which follows the protocol but attempts to learn information from the metadata and access sequences.
Key Themes
Security and Efficiency Trade-offs:
Data Privacy: Traditional encryption (AES, RSA) protects data content but not access patterns (which indices are being accessed).
Access Pattern Privacy: Critical because repetitive access to specific indices can leak the underlying nature of the data (e.g., medical records or stock trends).
Efficiency Metrics: Measured by Communication Complexity (total bits exchanged), Server Computation (CPU cycles per query), and Storage Overhead (extra space required on the server or client).
Major Concepts
PIR (Private Information Retrieval) vs. ORAM (Oblivious RAM):
PIR: Focused on "read-only" queries from a public database where the user wants to hide which item they are retrieving.
ORAM: Supports both "read" and "write" operations on a private, encrypted database. It is more complex as it must hide whether a request is a read or a write and shuffle data to prevent linkage between accesses.
Communication Complexity Comparisons
Information Theoretic PIR (IT-PIR):
Impossible with a single server unless the entire database is downloaded ().
With non-colluding servers, the complexity reaches or even lower () depending on the generation of the protocol.
Computational PIR (CPIR):
Possible with a single server using cryptographic primitives like Homomorphic Encryption.
Communication is typically or poly-logarithmic, but server computation is often linear () because the server must touch every bit to hide the query index.
ORAM:
Communication overhead is usually logarithmic, such as or in advanced schemes like Path ORAM.
Private Information Retrieval (PIR)
The Model:
A client seeks to retrieve the -th bit of an -bit database without any server learning the value of .
Replicated Database: In IT-PIR, each of the servers holds an identical copy of . The security bound relies on the assumption that at most servers collude.
Specific Regimes:
Short Answer Regime: Servers respond with very few bits (e.g., a single bit), while queries might be larger.
Balanced Regime: Both the query from the client and the response from the server are optimized to be roughly equal in size, minimizing total bandwidth.
Preprocessing PIR: Servers perform a one-time computation to create hints, reducing online query complexity.
Construction Generations and LDCs
First Generation: Based on the work of Chor et al., using basic linear algebra and XORing bits. For 2 servers, queries achieve complexity.
Second Generation: Uses polynomial evaluation and Reed-Muller codes, breaking the cubic-root barrier.
Third Generation: Utilizes Locally Decodable Codes (LDCs). These codes allow a bit to be recovered by reading only a few random positions, which directly translates to high-efficiency PIR schemes with sub-polynomial complexity ().
Oblivious RAM (ORAM)
Mechanism:
ORAM works by constantly shuffling and re-encrypting data blocks. Every time a block is accessed, it is moved to a new random physical location.
Logical vs. Physical Addresses: The client maintains a mapping (often in a "Stash" or a small local index) so they know where the "logical" block is currently stored in the "physical" server memory.
The Square-Root ORAM:
Data is divided into the main memory and a "shelter" (buffer). After a certain number of accesses, the shelter is emptied and the main memory is permuted again.
Path ORAM:
The state-of-the-art structure where data is organized in a Binary Tree of buckets. Each block is mapped to a random leaf. To read a block, the client downloads the entire path from the root to that leaf, masking the specific block's position.
Security and Practical Considerations
Stash Management: In tree-based ORAM, if a block cannot be pushed back down the tree immediately, it is stored in a local stash. Managing the stash size is vital to avoid overflow and potential privacy leaks.
Amortized Cost: The efficiency is often calculated based on the average cost over many operations, as heavy "