Study Notes on Secure Outsourcing and Cryptography

Overview of Secure Outsourcing and Related Concepts
  • Context and Motivation:

    • The shift toward cloud computing necessitates outsourcing data to third-party servers. The primary challenge is maintaining data confidentiality and access pattern privacy concurrently with high performance.

    • The model typically assumes a semi-honest (honest-but-curious) server, which follows the protocol but attempts to learn information from the metadata and access sequences.

Key Themes
  • Security and Efficiency Trade-offs:

    • Data Privacy: Traditional encryption (AES, RSA) protects data content but not access patterns (which indices are being accessed).

    • Access Pattern Privacy: Critical because repetitive access to specific indices can leak the underlying nature of the data (e.g., medical records or stock trends).

    • Efficiency Metrics: Measured by Communication Complexity (total bits exchanged), Server Computation (CPU cycles per query), and Storage Overhead (extra space required on the server or client).

Major Concepts
  • PIR (Private Information Retrieval) vs. ORAM (Oblivious RAM):

    • PIR: Focused on "read-only" queries from a public database where the user wants to hide which item they are retrieving.

    • ORAM: Supports both "read" and "write" operations on a private, encrypted database. It is more complex as it must hide whether a request is a read or a write and shuffle data to prevent linkage between accesses.

Communication Complexity Comparisons

  • Information Theoretic PIR (IT-PIR):

    • Impossible with a single server unless the entire database is downloaded (O(n)O(n)).

    • With k2k \ge 2 non-colluding servers, the complexity reaches n1o(1)n^{1 - o(1)} or even lower (O(n1/k)O(n^{1/k})) depending on the generation of the protocol.

  • Computational PIR (CPIR):

    • Possible with a single server using cryptographic primitives like Homomorphic Encryption.

    • Communication is typically O(log2n)O(\log^2 n) or poly-logarithmic, but server computation is often linear (O(n)O(n)) because the server must touch every bit to hide the query index.

  • ORAM:

    • Communication overhead is usually logarithmic, such as O(log2n)O(\log^2 n) or O(logn)O(\log n) in advanced schemes like Path ORAM.

Private Information Retrieval (PIR)
  • The Model:

    • A client seeks to retrieve the ii-th bit of an nn-bit database DD without any server learning the value of ii.

    • Replicated Database: In IT-PIR, each of the kk servers holds an identical copy of DD. The security bound relies on the assumption that at most k1k-1 servers collude.

  • Specific Regimes:

    1. Short Answer Regime: Servers respond with very few bits (e.g., a single bit), while queries might be larger.

    2. Balanced Regime: Both the query from the client and the response from the server are optimized to be roughly equal in size, minimizing total bandwidth.

    3. Preprocessing PIR: Servers perform a one-time computation to create hints, reducing online query complexity.

Construction Generations and LDCs
  • First Generation: Based on the work of Chor et al., using basic linear algebra and XORing bits. For 2 servers, queries achieve O(n1/3)O(n^{1/3}) complexity.

  • Second Generation: Uses polynomial evaluation and Reed-Muller codes, breaking the cubic-root barrier.

  • Third Generation: Utilizes Locally Decodable Codes (LDCs). These codes allow a bit to be recovered by reading only a few random positions, which directly translates to high-efficiency PIR schemes with sub-polynomial complexity (no(1)n^{o(1)}).

Oblivious RAM (ORAM)
  • Mechanism:

    • ORAM works by constantly shuffling and re-encrypting data blocks. Every time a block is accessed, it is moved to a new random physical location.

    • Logical vs. Physical Addresses: The client maintains a mapping (often in a "Stash" or a small local index) so they know where the "logical" block is currently stored in the "physical" server memory.

  • The Square-Root ORAM:

    • Data is divided into the main memory and a "shelter" (buffer). After a certain number of accesses, the shelter is emptied and the main memory is permuted again.

  • Path ORAM:

    • The state-of-the-art structure where data is organized in a Binary Tree of buckets. Each block is mapped to a random leaf. To read a block, the client downloads the entire path from the root to that leaf, masking the specific block's position.

Security and Practical Considerations
  • Stash Management: In tree-based ORAM, if a block cannot be pushed back down the tree immediately, it is stored in a local stash. Managing the stash size is vital to avoid overflow and potential privacy leaks.

  • Amortized Cost: The efficiency is often calculated based on the average cost over many operations, as heavy "