T5e

P2P and DHT

Link Layers
  • The layers include: Ethernet, WiFi, Application, Transport, Network (IPv4, v6), Session, Presentation, Physical (Cables, Space and Bits). Messages, Packets, Frames, Bits Segments
Client/Server Model
  • Fixed Endpoints: Requires knowing the fixed endpoints, allowing for DNS randomness.
  • Dedicated Connection: Establishes and maintains a dedicated connection for the duration of the conversation.
  • Efficient Data Transfer: Aims to transfer data as efficiently as possible.
  • Centralized Services: Depends on well-known, centralized, "one-off" services.
  • Fragility and Inefficiency: Suffers from fragility and inefficiency.
Issues with the Client/Server Model
  • Centralized: Dependency on a particular service/provider.
  • Fragile: Acts as a honeypot for attacks and a point of failure.
  • Inefficient: One server serving many clients leads to bottlenecks.
  • Scaling Costs: Better service requires more server nodes, increasing costs.
  • Organizational Overheads: Managing more server nodes increases workload.
  • Lack of Privacy: Participants are easy to trace.
Desirable Application Characteristics
  • Decentralized: Composed of independent, 'equal' nodes working for mutual benefit.
  • Robust: No single point of failure or attack.
  • Efficient: No bottleneck, optimized for data transfers.
  • Scalable: Able to handle a large number of nodes.
  • Self-Organizing: Dynamic network formation with plug-and-play capabilities.
  • Anonymous: Difficult to trace participants
Peer-to-Peer (P2P)
  • Decentralized file/content sharing for:
    • Popular media.
    • Sensitive content.
    • Operating systems.
    • Games and updates.
    • etc.
  • Offloads servers (dynamic CDN).
  • Models:
    • Pure vs. Hybrid.
    • Structured vs. Unstructured.
  • Employs distributed directories, ledgers, and blockchain.
Performance in P2P
  • Centralized (One Server):
    • One server, N recipients.
    • Server bandwidth divided by N recipients.
    • Time is increased.
  • P2P (Everybody Has Pieces):
    • Effectively up to N servers, N recipients.
    • N peers each contributing their bandwidth.
    • Time is decreased.
Performance Limitations/Opportunities
  • Asymmetric Bandwidth:
    • Common in last mile connections where downloading is easier than uploading.
    • Need to leverage benefits of both.
  • Information Challenge:
    • Client/server has sufficient information.
    • P2P requires information to be distributed, which raises a bootstrap problem.
Napster
  • The first to go viral (1999).
  • Primarily MP3-oriented.
  • Central Index: Maps files to machines that store them.
  • File Discovery:
    • Query the index system.
    • Receive a machine that stores the required file.
    • Get the file from the machine.
  • Simple, easy-to-implement search engines.
  • Disrupted record shops, leading to iTunes and Spotify.
  • Disadvantage:
    • A single point of failure.
    • Shut down due to copyright issues.
Gnutella
  • 1st generation unstructured/decentralized P2P protocol (2000).
  • Query Flooding:
    • Send request to all neighbors.
    • Neighbors recursively multicast the request.
  • Advantage:
    • Decentralized, highly robust architecture.
    • Flexible queries.
  • Disadvantage:
    • Not scalable; inefficient; slow deep searches.
  • Has a TTL on queries.
  • The entire network can be swamped with requests.
  • Poor protocol design, insecure/fragile.
BitTorrent: Intentions
  • Developed by Bram Cohen in 2001.
  • Built on encrypted, distributed file protection.
  • Many chunks make it safer and faster.
  • Deals with asymmetric (last-mile) bandwidth.
    • Can only download as fast as uploader uploads.
  • Deals with inefficient downloading (TCP limits).
  • Aims for fairness and encourages unthrottled sharing.
  • Incorporates game theory concepts like tit-for-tat, Prisoners’ Dilemma, and Pareto/80-20 rule.
BitTorrent (Original)
  • Central Directory: Tracker
    • Tracks (only) all peers (identity).
    • Does not reveal all peers.
    • Has none of the file.
  • Static Metadata (Torrent) File
    • Tracker, File info.
    • Index of Pieces, hashes.
  • Pieces are 256/512kB, with SHA-1 hash.
BitTorrent Components
  • Swarm: All peers participating in the file sharing.
  • Seeder: Full-file-holder who only uploads.
  • Leecher: Downloader who downloads what they don’t have and uploads what they do have.
Network Efficiency in BitTorrent
  • Down:up bandwidth ratio is typically 4:1.
  • Downloads occur from a chosen, evolving subset, typically ~4 in parallel.
  • Uploads occur to those downloading and others.
  • Pieces are divided into sub-pieces (16kB).
  • Pipeline and fan-out requests for sub-pieces across peers.
  • Download sub-pieces in parallel to build a complete piece.
Piece Selection Algorithm
  1. Strict: sub-pieces to complete piece first
  2. Rarest piece has priority:
    1. Spread the seeding (seeder is bottleneck)
    2. Improve robustness (loss of seeder)
    3. Increase the spread/speed
    4. Increase your value
  3. Most-common last
  4. Random startup piece:
    1. Need to have something to share, quickly
  5. Endgame:
    1. Need to get only specific piece(s), ask all peers
Choosing Peers
  • Resource allocation considerations:
    • How much bandwidth does a peer have and offer?
    • What’s best for everyone?
  • Implementation:
    • Connect to "all".
    • Fix the pool of peers for uploading-to (4).
    • Identify the best uploaders.
      • non-trivial
      • best download rates from?
    • Allow them to download from you ("unchoke").
    • Check average upload rate every 10 seconds.
Evolving the Peer Pool
  • Allow others a chance.
  • “Optimistic unchoking”
  • Pick one other peer at random every 30 seconds.
  • Unchoke it.
  • Measure performance.
  • Compare to existing pool.
  • Keep the best 4, choke others.
  • Gives everyone a chance.
  • Reduces the impact of free-loaders.
Upload-Only Phase
  • Occurs after finishing downloading and willing to seed.
  • Cannot use download rates.
  • Can use upload rates from you to each (chosen) peer.
  • Best rates optimize distribution of pieces.
  • Best use of your upstream capacity.
  • Possibly less-busy downloader.
  • Once satisfied, move on to other peers.
Trackerless BT
  • Introduced by Cohen in 2005.
  • Uses a Distributed Hash Table (DHT).
  • Every node is a tracker.
  • Every node joins an overlay network (application-level connectivity).
  • Like a swarm, ignoring physical connectivity.
  • Used for information sharing, not file sharing.
DHT Requirements
  • Autonomy, Decentralization, Fault tolerance, Scalability
  • Tracker Hash Table is Distributed across all peers
  • Any peer can query for a “key” (piece, info, …)
  • DHT returns value for the key
  • Query resolution uses minimal # messages
  • DHT Peers only need to know about some other peers
  • Must be dynamic/robust to peers coming and going (aka churn)
Distributing the Tracker DB
  • Responsible Node, NodeID, Piece # =Key =KeyID, Information (Value).
  • Example:
    • 150.203.56.99, 10, 10, Peers a,b,c,d, …, Hash = JJJJJ
    • 150.203.56.99, 10, 20, Peers e,f,g,… Hash =KKKK
    • 150.203.56.99, 10, 30, Peers h,i… Hash=LLLL
    • 130.48.1.7, 90, 67, Peers h, Hash=MMMM
    • 130.48.1.7, 90, 78, Peers k,m, Hash=PPPP
    • 130.48.1.7, 90, 100, Peers , Hash = QQQQ
    • 45.48.17.6, 140, 107, Peers a,e,g, Hash = SSSS
    • 45.48.17.6, 140, 115, Peers b,d, Hash = VVVV
    • 45.48.17.6, 140, 149, Peers f,h,k, Hash = YYYY
Allocating K/V Pairs to Nodes
  • Every piece/key has an ID.
  • Every node (peer) has an ID on the same scale.
    • (BT: 160bits – hopefully a globally unique ID (GUID))
  • Associate keyID (row) with nearest successor NodeID.
  • Each node takes care of all keys “between” it and its predecessor node.
  • Example:
    • Piece (Key) ID-scale = [1..150][1..150]
    • NodeIDs = 10, 90, 140
    • KeyID 45 goes to NodeID 90
    • KeyID 111 goes to NodeID 140
    • KeyID 149 goes to NodeID 10
Partitioning the ID Number Space
  • Basically, divide the rows amongst DHT nodes.
  • Distribute K/V pairs evenly(-ish).
  • Ensure every node participates and contributes and is known.
  • 1-d (line, wraps ->circle)
  • 2-d (sheet, wraps -> surface of a sphere)
  • Map DHT nodes to the space.
  • As nodes get added, re-partition (re-distribute some rows amongst nodes).
  • As nodes leave, re-partition (re-distribute their rows).
  • PUT K/V pairs on nodes, and GET queries.
Example DHT
  • # of IDs = [1,150][1,150]
  • # of Nodes = 16
  • Each node knows:
    • Its successor
    • Its predecessor
Query Resolution
  • Node 18 asks:
    • GET Key 109
    • What’s its value?
  • A scaling issue:
    • O(N)O(N) steps
    • O(2)O(2) neighbors
DHT Improvements
  • Each node knows:
    • Its successor
    • Its successor+1
    • Its predecessor
    • A few others
  • Shortcut Nodes:
    • Often distributed exponentially
    • O(logN)O(log N) steps
    • O(logN)O(log N) neighbors
Storage and Robustness
  • Storage of K/V pair.
  • On appropriate node.
  • AND its successor.
  • AND its predecessor.
Churning
  • Nodes leave/join.
  • Node leaves:
    • Cleanly:
      • sends a message to its immediate neighbors
    • Crashes out:
      • detected by lack of ping responses
  • New Node joins:
    • Find a ‘boot peer’
DHT Use Cases
  • Leverage aggregate bandwidth.
  • Great for exact “searches” (lookups).
  • No good for vague, wildcard queries.
  • Hashes of regexp don’t work.
  • Great for large swarms, significant churn.
  • Many implementations with mostly minor parameter and algorithm variations.
  • Now used in other applications:
    • File sharing/file systems (p2p)
    • Databases
    • Naming systems (DNS-like)
    • CDNs
    • Communications/messaging, Event notification
    • Application-level multicast
    • Pub/Sub frameworks
    • Electronic Health Records