Internet Measurement Notes
Intro to Internet/Network Measurement
- Network structure (graph).
- Internet topology measurement.
- Internet-wide scanning.
Internet Measurements
- Network Debugging.
- Performance.
- Resilience.
- Security.
- Regulation and Policies.
- Broader impact on society:
- State censorship.
- Price and traffic discrimination.
- Impact of social media.
History of Internet Measurements
- First major academic measurement studies (e.g., Paxson, SIGCOMM 1997).
- RFC323: IETF formed measurement group(s) as early as 1972.
- 2001: First ACM SIGCOMM Internet Measurement Workshop.
- 2003: First ACM IMC (Internet Measurement Conference).
Internet Measurements Layers
- Layer 8: User/Political.
- e.g., (fake) news propagation in social networks.
- Physical Layer
- e.g., infrastructure geography.
- Transport Layer
- e.g., performance of transport protocols, congestion control.
- Network Layer
- e.g., routing failures, Internet topology, performance.
- Example of end-to-end video quality measurement.
- Example of censorship measurements and impact.
Internet Measurement - Fundamental Challenges
- Internet: Not designed with measurability in mind.
- "Current measurement practice often involves the exploitation of side-effects and unintended features of the network, or, in other words, the artful piling of hacks atop one another. This state of affairs is a direct result of the relative paucity of diagnostic and measurement capabilities built into today's network stack."
- Lack of ground truth.
- Lack of available data.
- Heterogeneity of the network – Generalizability of results.
- Privacy concerns, Ethics.
Examples of Internet Measurement Studies
- A Census of the Internet Address Space:
https://ant.isi.edu/address/browse/index.html - Code-Red Worm:
- On July 19, 2001, more than 359,000 computers connected to the Internet were infected with the Code-Red (CRv2) worm in less than 14 hours.
- Spread -
http://www.caida.org/research/security/code-red/newframes-small-log.gif
- Sapphire Worm:
- The fastest computer worm in history.
- Doubled in size every 8.5 seconds.
- Infected more than 90 percent of vulnerable hosts within 10 minutes.
- Witty Worm:
- Reached its peak activity after approximately 45 minutes, at which point the majority of vulnerable hosts had been infected.
http://www.caida.org/research/security/witty/animations/usa_big-witty_2h.gif
- Nyxem Email Virus:
- Estimate of the total number of infected computers is between 470K and 945K.
- At least 45K of the infected computers were also compromised by other forms of spyware or botware.
http://www.caida.org/research/security/blackworm/animations/nyxem-hosts-both-O2.gif
- Scam Hosting: Study dynamics of scam hosting infrastructure.
- Sipscan Scan:
- A botnet-orchestrated stealth scan of the entire IPv4 address space.
- During 31 Jan - 12 Feb 2011.
- Originated from ~3 million IP addresses.
- Heavily coordinated.
- Unusually covert scanning strategy.
- To discover and compromise VoIP-related (SIP server) infrastructure.
http://www.youtube.com/watch?v=n6MRlEJeD8M
Measurement 101: Network Structure
- To measure the Internet and network, we need to understand the features of networks.
- Network in a nutshell, is just a graph.
- Apply different graph measures to estimate network.
Key Network Properties
- Degree distribution:
- Path length:
- Clustering coefficient:
- Connected components:
- Note: There are many other graph measurements such as centrality measures. But only the above items are covered.
- Degree distribution : Probability that a randomly chosen node has degree .
- = # nodes with degree k
- Normalised histogram:
- Path
- A path is a sequence of nodes in which each node is linked to the next one
- Path can intersect itself and pass through the same edge multiple times
- E.g.: ACBDCDEG
- In a directed graph a path can only follow the direction of the “arrow”
- Distance
- Distance (shortest path, geodesic) between a pair of nodes is defined as the number of edges along the shortest path connecting the nodes
- If the two nodes are not connected, the distance is usually defined as infinite.
- In directed graphs paths need to follow the direction of the arrows.
- Consequence: Distance is not symmetric:
- Diameter
- Diameter: The maximum (shortest path) distance between any pair of nodes in a graph
- Average path length for a connected graph (component) or a strongly connected (component of a) directed graph
- Many times we compute the average only over the connected pairs of nodes (that is, we ignore “infinite” length paths)
- , where is the distance from node i to node j
- is the max number of edges (total number of node pairs) =
- Clustering coefficient
- What portion of i’s neighbors are connected?
- Node i with degree ki
- Average clustering coefficient:
- How to find connected components:
- Start from random node and perform Breadth First Search (BFS)
- Label the nodes BFS visited
- If all nodes are visited, the network is connected
- Otherwise find an unvisited node and repeat BFS
- E.g. Tarjan’s strongly connected components algorithm
- Size of the largest connected component Largest set where any two vertices can be joined by a path
- Largest component = Giant component
Example: MSN Messenger
1-month activity (when it was popular).
- 180 million users logged in.
- 30 million users engaged in conversations.
- More than 1.3 billion conversations.
- More than 6.5 billion exchanged messages.
Contact, Conversation, and Messaging as an undirected graph.
- Edge (u,v) if users u and v exchanged at least 1 msg.
- N=180 million people.
- E=1.3 billion edges.
Degree distribution: Heavily skewed, avg. degree= 14.4
Path length: 6.6
Clustering coefficient: 0.1140
Connectivity: giant component
Measurement Types
- Two types of measurements:
- Passive.
- Tools:
- Active.
- Tools:
- Passive.
Internet Topology Measurement
- What is topology measurement?
Simple Network Management Protocol (SNMP)
- Protocol for network management services.
- How do I ask my router for data?
- How do I alter that data? (i.e. config).
- How does the router report an error?
- Also defines the structure of information.
- MIB: Management Information Base
- 3 versions, many RFCs (starts at 1065-7).
- Simplistic view: asking router for info
- Router has counters on link traffic:
- Byte counts
- Packet counts
- "Manager" can poll router periodically, e.g. every 5 minutes
SNMP Counters
- Billing.
- Packet/byte counts for the link.
- May need further division if many customers share the link.
- Multi Router Traffic Grapher (MRTG).
- Time-series plot of input/output Bandwidth.
- Good for eyeballing anomalies.
- Examples of MRTG Usage Scenarios
- Loss of traffic.
- Bandwidth spike.
- Daily backups are being performed.
- Anomalous behaviour can sometimes be a sign of security trouble.
- Top graph was from a server suffering a Flashchat script exploit.
- Bottom graph was a server under DoS attack.
SNMP's Limitations
- SNMP can't tell the whole story.
What is a Flow?
- A unidirectional stream of packets between a source and destination.
- 7 main fields (can have more if relevant):
- Source IP address
- Destination IP address
- Layer 3 protocol type
- Type of Service
- Source port number
- Destination port number
- Input logical interface
- Other definitions possible
Flow Record
- A data structure describing the flow.
- Kept by the router.
- Includes byte and packet counts per flow.
- Also TCP flags, timestamps of first/last packets, etc.
Questions Addressed by Flow Data
- Application breakdown
- How much of my traffic (bytes and packets) is web traffic? (TCP Port 80)
- Flow counting
- How many active web connections?
- Are any of my hosts spam zombies?
Traffic Matrix
- Flow data can be used to construct Traffic Matrix.
- How much of the traffic is going from subnet A to subnet B.
- A and B can be in different ASes from where the flow data is collected.
- Not a trivial problem.
Traffic Matrix Types
- What is The Matrix?
- Represents the amount of data transmitted between every pair of network nodes
- Internal Traffic Matrix
- PoP to PoP (Point of Presence)
- External Traffic Matrix
- PoP to external AS
Cisco Netflow
- Tool to collect flow records at a router
- Cisco proprietary tool
- Most other vendors offer similar
- IPFIX (IP Flow Information Export)
- IETF standardized in RFC 5101-5103
- Used primarily by large transit providers
- Started as a cache for improving lookup performance
- If incoming packets are identified by flow, then don’t need to search through the forwarding table
- On each router interface, a flow cache stores per-flow information
- Flow ID
- Byte counts, packet counts
- IP addresses and ports
- Timestamps of flow start and end times
- etc
- Oh, and outgoing interface (i.e. the actual forwarding task)
- For each packet, the router will:
- Look up entry in the flow cache
- If found, update counters and timestamps
- Insert new flow record if not found
- Look up entry in the flow cache
Netflow Operation
- Periodically, flow records are exported to a collection machine
- Via UDP
- Flow records are purged from the cache and exported to a collection machine.
- When?
- When a flow has been idle too long (default 15 sec).
- Long-lived flows can be expired (default 30 min).
- Cache is full (caching policy takes over).
- TCP connection done (FIN) or reset (RST).
Problems with Netflow
- Router summarises flow data. Still use flow cache.
- Expired flows added to aggregation cache.
- Aggregation based on AS, source/dest prefix, port, etc.
- Reduces export data volume.
- No packet loss during anomalous traffic patterns when a large # records contributes to packet loss.
Traceroute
- Traceroute, introduced in 1988 by Van Jacobson
- Traceroute is a system administrators utility to trace the route IP packets from the current system take in getting to the destination system.
- How Traceroute Works:
- Launch a probe packet towards DST, with a TTL of 1
- Every router hop decrements the IP TTL of the packet by 1
- When the TTL hits 0, the packet is dropped, and the router sends an ICMP TTL Exceeded packet to SRC
- SRC receives this ICMP message, displays it as a traceroute “hop”
- Repeat from step 1, with TTL incremented by 1 each time, until…
- DST hop receives probe and returns ICMP Dest Unreachable
- SRC stops the traceroute upon receipt of ICMP Dest Unreachable
- Traceroute Anomalies
- Missing Hops
- Missing Destination
- Load Balancing
- No visibility into the return path (asymmetric routing)
- Shows IP addresses = router aliases != routers
Router Alias Resolution
- IP Address != Interface != Router
- Routers typically (not always!) reply with the IP address of the inbound interface (this violates RFC1812 but is common behaviour).
- Send UDP probe to a random high port.
Alias Resolution Example: Increasing IPID Field
- IP header has the IPID field.
- Original purpose: re-assemble fragmented IP packets.
- Often implemented as a counter:
Internet-Wide Scanning
- Scanning the entire IPv4 address space
- entire IPv4 Space: addresses = 4.3B addresses
- routable IPv4 space (excluding reserved ranges, multicast etc): ~3.7B addresses
- publicly routed IPv4 space: ~2.9B addresses (as at 2017)
- Can we just scan (probe) every single routed IPv4 address?
- The first full scans of the IPv4 space took weeks to months.
ZMap - Stateless Implementation
- Default case:
- We open a TCP socket, send a SYN packet wait for the destination to reply (or not to reply)
- ZMap:
- Bypass the TCP/IP stack of the OS; craft Ethernet frames directly, “fill up the pipe”
- Encode destination IP address into probe packets, match responses on arrival.
- TCP SRC port, TCP sequence number, TCP DST port , TCP ACK = SEQ + 1
- ZMap example: Track Heartbleed Vulnerability
- Availability of ZMap Data.
- ZMap-driven search engine.
Traceroute for ISP Topology Inference
- Traceroutes show single paths. How to effectively select target IP addresses?
Path Reductions
- Want to choose unique paths - with new information.
- Skip repeated traces of the same path.
- Expect the common case:
- Traceroute server has one ingress point
- Customer prefix has one egress point
- BGP peers have one early-exit per ingress.
- If we're wrong, we might miss some paths.
- New servers add paths or share load!
Reduction Effectiveness
- Brute force:
- All servers to all BGP prefixes, disaggregate ISP prefixes.
- 90-150 million traceroutes required
- BGP directed probes:
- All traceroutes identifiable from Route Views.
- 0.2-15 million traceroutes required
- Executed after path reduction:
- Traceroutes chosen by Rocketfuel.
- 8-300 thousand traceroutes required
- Directed probing and path reductions are effective at reducing the number of probes required to map an ISP
Traceroute for Large-Scale Topology Inference
- Need sufficient number of vantage points
- Need a smart way to select target IPs - Brute-Force probing the whole space ineffective
- Need to deal with traceroute issues
- Rocketfuel combines all these aspects together, leveraging BGP data to select target ranges, into a single system.