T5e
P2P and DHT
Link Layers
- The layers include: Ethernet, WiFi, Application, Transport, Network (IPv4, v6), Session, Presentation, Physical (Cables, Space and Bits). Messages, Packets, Frames, Bits Segments
Client/Server Model
- Fixed Endpoints: Requires knowing the fixed endpoints, allowing for DNS randomness.
- Dedicated Connection: Establishes and maintains a dedicated connection for the duration of the conversation.
- Efficient Data Transfer: Aims to transfer data as efficiently as possible.
- Centralized Services: Depends on well-known, centralized, "one-off" services.
- Fragility and Inefficiency: Suffers from fragility and inefficiency.
Issues with the Client/Server Model
- Centralized: Dependency on a particular service/provider.
- Fragile: Acts as a honeypot for attacks and a point of failure.
- Inefficient: One server serving many clients leads to bottlenecks.
- Scaling Costs: Better service requires more server nodes, increasing costs.
- Organizational Overheads: Managing more server nodes increases workload.
- Lack of Privacy: Participants are easy to trace.
Desirable Application Characteristics
- Decentralized: Composed of independent, 'equal' nodes working for mutual benefit.
- Robust: No single point of failure or attack.
- Efficient: No bottleneck, optimized for data transfers.
- Scalable: Able to handle a large number of nodes.
- Self-Organizing: Dynamic network formation with plug-and-play capabilities.
- Anonymous: Difficult to trace participants
Peer-to-Peer (P2P)
- Decentralized file/content sharing for:
- Popular media.
- Sensitive content.
- Operating systems.
- Games and updates.
- etc.
- Offloads servers (dynamic CDN).
- Models:
- Pure vs. Hybrid.
- Structured vs. Unstructured.
- Employs distributed directories, ledgers, and blockchain.
- Centralized (One Server):
- One server, N recipients.
- Server bandwidth divided by N recipients.
- Time is increased.
- P2P (Everybody Has Pieces):
- Effectively up to N servers, N recipients.
- N peers each contributing their bandwidth.
- Time is decreased.
- Asymmetric Bandwidth:
- Common in last mile connections where downloading is easier than uploading.
- Need to leverage benefits of both.
- Information Challenge:
- Client/server has sufficient information.
- P2P requires information to be distributed, which raises a bootstrap problem.
Napster
- The first to go viral (1999).
- Primarily MP3-oriented.
- Central Index: Maps files to machines that store them.
- File Discovery:
- Query the index system.
- Receive a machine that stores the required file.
- Get the file from the machine.
- Simple, easy-to-implement search engines.
- Disrupted record shops, leading to iTunes and Spotify.
- Disadvantage:
- A single point of failure.
- Shut down due to copyright issues.
Gnutella
- 1st generation unstructured/decentralized P2P protocol (2000).
- Query Flooding:
- Send request to all neighbors.
- Neighbors recursively multicast the request.
- Advantage:
- Decentralized, highly robust architecture.
- Flexible queries.
- Disadvantage:
- Not scalable; inefficient; slow deep searches.
- Has a TTL on queries.
- The entire network can be swamped with requests.
- Poor protocol design, insecure/fragile.
BitTorrent: Intentions
- Developed by Bram Cohen in 2001.
- Built on encrypted, distributed file protection.
- Many chunks make it safer and faster.
- Deals with asymmetric (last-mile) bandwidth.
- Can only download as fast as uploader uploads.
- Deals with inefficient downloading (TCP limits).
- Aims for fairness and encourages unthrottled sharing.
- Incorporates game theory concepts like tit-for-tat, Prisoners’ Dilemma, and Pareto/80-20 rule.
BitTorrent (Original)
- Central Directory: Tracker
- Tracks (only) all peers (identity).
- Does not reveal all peers.
- Has none of the file.
- Static Metadata (Torrent) File
- Tracker, File info.
- Index of Pieces, hashes.
- Pieces are 256/512kB, with SHA-1 hash.
BitTorrent Components
- Swarm: All peers participating in the file sharing.
- Seeder: Full-file-holder who only uploads.
- Leecher: Downloader who downloads what they don’t have and uploads what they do have.
Network Efficiency in BitTorrent
- Down:up bandwidth ratio is typically 4:1.
- Downloads occur from a chosen, evolving subset, typically ~4 in parallel.
- Uploads occur to those downloading and others.
- Pieces are divided into sub-pieces (16kB).
- Pipeline and fan-out requests for sub-pieces across peers.
- Download sub-pieces in parallel to build a complete piece.
Piece Selection Algorithm
- Strict: sub-pieces to complete piece first
- Rarest piece has priority:
- Spread the seeding (seeder is bottleneck)
- Improve robustness (loss of seeder)
- Increase the spread/speed
- Increase your value
- Most-common last
- Random startup piece:
- Need to have something to share, quickly
- Endgame:
- Need to get only specific piece(s), ask all peers
Choosing Peers
- Resource allocation considerations:
- How much bandwidth does a peer have and offer?
- What’s best for everyone?
- Implementation:
- Connect to "all".
- Fix the pool of peers for uploading-to (4).
- Identify the best uploaders.
- non-trivial
- best download rates from?
- Allow them to download from you ("unchoke").
- Check average upload rate every 10 seconds.
Evolving the Peer Pool
- Allow others a chance.
- “Optimistic unchoking”
- Pick one other peer at random every 30 seconds.
- Unchoke it.
- Measure performance.
- Compare to existing pool.
- Keep the best 4, choke others.
- Gives everyone a chance.
- Reduces the impact of free-loaders.
Upload-Only Phase
- Occurs after finishing downloading and willing to seed.
- Cannot use download rates.
- Can use upload rates from you to each (chosen) peer.
- Best rates optimize distribution of pieces.
- Best use of your upstream capacity.
- Possibly less-busy downloader.
- Once satisfied, move on to other peers.
Trackerless BT
- Introduced by Cohen in 2005.
- Uses a Distributed Hash Table (DHT).
- Every node is a tracker.
- Every node joins an overlay network (application-level connectivity).
- Like a swarm, ignoring physical connectivity.
- Used for information sharing, not file sharing.
DHT Requirements
- Autonomy, Decentralization, Fault tolerance, Scalability
- Tracker Hash Table is Distributed across all peers
- Any peer can query for a “key” (piece, info, …)
- DHT returns value for the key
- Query resolution uses minimal # messages
- DHT Peers only need to know about some other peers
- Must be dynamic/robust to peers coming and going (aka churn)
Distributing the Tracker DB
- Responsible Node, NodeID, Piece # =Key =KeyID, Information (Value).
- Example:
- 150.203.56.99, 10, 10, Peers a,b,c,d, …, Hash = JJJJJ
- 150.203.56.99, 10, 20, Peers e,f,g,… Hash =KKKK
- 150.203.56.99, 10, 30, Peers h,i… Hash=LLLL
- 130.48.1.7, 90, 67, Peers h, Hash=MMMM
- 130.48.1.7, 90, 78, Peers k,m, Hash=PPPP
- 130.48.1.7, 90, 100, Peers , Hash = QQQQ
- 45.48.17.6, 140, 107, Peers a,e,g, Hash = SSSS
- 45.48.17.6, 140, 115, Peers b,d, Hash = VVVV
- 45.48.17.6, 140, 149, Peers f,h,k, Hash = YYYY
Allocating K/V Pairs to Nodes
- Every piece/key has an ID.
- Every node (peer) has an ID on the same scale.
- (BT: 160bits – hopefully a globally unique ID (GUID))
- Associate keyID (row) with nearest successor NodeID.
- Each node takes care of all keys “between” it and its predecessor node.
- Example:
- Piece (Key) ID-scale = [1..150]
- NodeIDs = 10, 90, 140
- KeyID 45 goes to NodeID 90
- KeyID 111 goes to NodeID 140
- KeyID 149 goes to NodeID 10
Partitioning the ID Number Space
- Basically, divide the rows amongst DHT nodes.
- Distribute K/V pairs evenly(-ish).
- Ensure every node participates and contributes and is known.
- 1-d (line, wraps ->circle)
- 2-d (sheet, wraps -> surface of a sphere)
- Map DHT nodes to the space.
- As nodes get added, re-partition (re-distribute some rows amongst nodes).
- As nodes leave, re-partition (re-distribute their rows).
- PUT K/V pairs on nodes, and GET queries.
Example DHT
- # of IDs = [1,150]
- # of Nodes = 16
- Each node knows:
- Its successor
- Its predecessor
Query Resolution
- Node 18 asks:
- GET Key 109
- What’s its value?
- A scaling issue:
- O(N) steps
- O(2) neighbors
DHT Improvements
- Each node knows:
- Its successor
- Its successor+1
- Its predecessor
- A few others
- Shortcut Nodes:
- Often distributed exponentially
- O(logN) steps
- O(logN) neighbors
Storage and Robustness
- Storage of K/V pair.
- On appropriate node.
- AND its successor.
- AND its predecessor.
Churning
- Nodes leave/join.
- Node leaves:
- Cleanly:
- sends a message to its immediate neighbors
- Crashes out:
- detected by lack of ping responses
- New Node joins:
DHT Use Cases
- Leverage aggregate bandwidth.
- Great for exact “searches” (lookups).
- No good for vague, wildcard queries.
- Hashes of regexp don’t work.
- Great for large swarms, significant churn.
- Many implementations with mostly minor parameter and algorithm variations.
- Now used in other applications:
- File sharing/file systems (p2p)
- Databases
- Naming systems (DNS-like)
- CDNs
- Communications/messaging, Event notification
- Application-level multicast
- Pub/Sub frameworks
- Electronic Health Records