Internet Measurement Notes

Intro to Internet/Network Measurement

  • Network structure (graph).
  • Internet topology measurement.
  • Internet-wide scanning.

Internet Measurements

  • Network Debugging.
  • Performance.
  • Resilience.
  • Security.
  • Regulation and Policies.
  • Broader impact on society:
    • State censorship.
    • Price and traffic discrimination.
    • Impact of social media.

History of Internet Measurements

  • First major academic measurement studies (e.g., Paxson, SIGCOMM 1997).
  • RFC323: IETF formed measurement group(s) as early as 1972.
  • 2001: First ACM SIGCOMM Internet Measurement Workshop.
  • 2003: First ACM IMC (Internet Measurement Conference).

Internet Measurements Layers

  • Layer 8: User/Political.
    • e.g., (fake) news propagation in social networks.
  • Physical Layer
    • e.g., infrastructure geography.
  • Transport Layer
    • e.g., performance of transport protocols, congestion control.
  • Network Layer
    • e.g., routing failures, Internet topology, performance.
  • Example of end-to-end video quality measurement.
  • Example of censorship measurements and impact.

Internet Measurement - Fundamental Challenges

  • Internet: Not designed with measurability in mind.
  • "Current measurement practice often involves the exploitation of side-effects and unintended features of the network, or, in other words, the artful piling of hacks atop one another. This state of affairs is a direct result of the relative paucity of diagnostic and measurement capabilities built into today's network stack."
  • Lack of ground truth.
  • Lack of available data.
  • Heterogeneity of the network – Generalizability of results.
  • Privacy concerns, Ethics.

Examples of Internet Measurement Studies

  • A Census of the Internet Address Space: https://ant.isi.edu/address/browse/index.html
  • Code-Red Worm:
    • On July 19, 2001, more than 359,000 computers connected to the Internet were infected with the Code-Red (CRv2) worm in less than 14 hours.
    • Spread - http://www.caida.org/research/security/code-red/newframes-small-log.gif
  • Sapphire Worm:
    • The fastest computer worm in history.
    • Doubled in size every 8.5 seconds.
    • Infected more than 90 percent of vulnerable hosts within 10 minutes.
  • Witty Worm:
    • Reached its peak activity after approximately 45 minutes, at which point the majority of vulnerable hosts had been infected.
    • http://www.caida.org/research/security/witty/animations/usa_big-witty_2h.gif
  • Nyxem Email Virus:
    • Estimate of the total number of infected computers is between 470K and 945K.
    • At least 45K of the infected computers were also compromised by other forms of spyware or botware.
    • http://www.caida.org/research/security/blackworm/animations/nyxem-hosts-both-O2.gif
  • Scam Hosting: Study dynamics of scam hosting infrastructure.
  • Sipscan Scan:
    • A botnet-orchestrated stealth scan of the entire IPv4 address space.
    • During 31 Jan - 12 Feb 2011.
    • Originated from ~3 million IP addresses.
    • Heavily coordinated.
    • Unusually covert scanning strategy.
    • To discover and compromise VoIP-related (SIP server) infrastructure.
    • http://www.youtube.com/watch?v=n6MRlEJeD8M

Measurement 101: Network Structure

  • To measure the Internet and network, we need to understand the features of networks.
  • Network in a nutshell, is just a graph.
  • Apply different graph measures to estimate network.
Key Network Properties
  • Degree distribution: P(k)P(k)
  • Path length: hh
  • Clustering coefficient: CC
  • Connected components: ss
  • Note: There are many other graph measurements such as centrality measures. But only the above items are covered.
  • Degree distribution P(k)P(k): Probability that a randomly chosen node has degree kk.
    • NkN_k = # nodes with degree k
    • Normalised histogram: P(k)=NkNP(k) = \frac{N_k}{N}
  • Path
    • A path is a sequence of nodes in which each node is linked to the next one
    • P<em>n=i</em>0,i<em>1,i</em>2,,inP<em>n = {i</em>0, i<em>1, i</em>2, …, i_n}
    • P<em>n=(i</em>0,i<em>1),(i</em>1,i<em>2),(i</em>2,i<em>3),,(i</em>n1,in)P<em>n = {(i</em>0, i<em>1), (i</em>1, i<em>2), (i</em>2, i<em>3), …, (i</em>{n-1}, i_n)}
    • Path can intersect itself and pass through the same edge multiple times
    • E.g.: ACBDCDEG
    • In a directed graph a path can only follow the direction of the “arrow”
  • Distance
    • Distance (shortest path, geodesic) between a pair of nodes is defined as the number of edges along the shortest path connecting the nodes
    • If the two nodes are not connected, the distance is usually defined as infinite.
    • In directed graphs paths need to follow the direction of the arrows.
    • Consequence: Distance is not symmetric: h<em>B,Ch</em>C,Bh<em>{B,C} \ne h</em>{C,B}
  • Diameter
    • Diameter: The maximum (shortest path) distance between any pair of nodes in a graph
    • Average path length for a connected graph (component) or a strongly connected (component of a) directed graph
    • Many times we compute the average only over the connected pairs of nodes (that is, we ignore “infinite” length paths)
    • h=<em>i,jh</em>ijE<em>maxh = \frac{\sum<em>{i,j} h</em>{ij}}{E<em>{max}}, where h</em>ijh</em>{ij} is the distance from node i to node j
    • EmaxE_{max} is the max number of edges (total number of node pairs) = n(n1)2\frac{n(n-1)}{2}
  • Clustering coefficient
    • What portion of i’s neighbors are connected?
    • Node i with degree ki
    • Ci[0,1]C_i \in [0,1]
    • Average clustering coefficient: C=<em>iC</em>iNC = \frac{\sum<em>{i} C</em>i}{N}
  • How to find connected components:
    • Start from random node and perform Breadth First Search (BFS)
    • Label the nodes BFS visited
    • If all nodes are visited, the network is connected
    • Otherwise find an unvisited node and repeat BFS
    • E.g. Tarjan’s strongly connected components algorithm
    • Size of the largest connected component Largest set where any two vertices can be joined by a path
    • Largest component = Giant component

Example: MSN Messenger

  • 1-month activity (when it was popular).

    • 180 million users logged in.
    • 30 million users engaged in conversations.
    • More than 1.3 billion conversations.
    • More than 6.5 billion exchanged messages.
  • Contact, Conversation, and Messaging as an undirected graph.

    • Edge (u,v) if users u and v exchanged at least 1 msg.
    • N=180 million people.
    • E=1.3 billion edges.
  • Degree distribution: Heavily skewed, avg. degree= 14.4

  • Path length: 6.6

  • Clustering coefficient: 0.1140

  • Connectivity: giant component

  • C<em>k=1N</em>k<em>i:k</em>i=kCiC<em>k = \frac{1}{N</em>k} \sum<em>{i:k</em>i=k} C_i

Measurement Types

  • Two types of measurements:
    • Passive.
      • Tools:
    • Active.
      • Tools:

Internet Topology Measurement

  • What is topology measurement?

Simple Network Management Protocol (SNMP)

  • Protocol for network management services.
    • How do I ask my router for data?
    • How do I alter that data? (i.e. config).
    • How does the router report an error?
  • Also defines the structure of information.
    • MIB: Management Information Base
  • 3 versions, many RFCs (starts at 1065-7).
  • Simplistic view: asking router for info
  • Router has counters on link traffic:
    • Byte counts
    • Packet counts
    • "Manager" can poll router periodically, e.g. every 5 minutes
SNMP Counters
  • Billing.
    • Packet/byte counts for the link.
    • May need further division if many customers share the link.
  • Multi Router Traffic Grapher (MRTG).
    • Time-series plot of input/output Bandwidth.
    • Good for eyeballing anomalies.
  • Examples of MRTG Usage Scenarios
    • Loss of traffic.
    • Bandwidth spike.
    • Daily backups are being performed.
  • Anomalous behaviour can sometimes be a sign of security trouble.
    • Top graph was from a server suffering a Flashchat script exploit.
    • Bottom graph was a server under DoS attack.
SNMP's Limitations
  • SNMP can't tell the whole story.

What is a Flow?

  • A unidirectional stream of packets between a source and destination.
  • 7 main fields (can have more if relevant):
    • Source IP address
    • Destination IP address
    • Layer 3 protocol type
    • Type of Service
    • Source port number
    • Destination port number
    • Input logical interface
  • Other definitions possible
Flow Record
  • A data structure describing the flow.
  • Kept by the router.
  • Includes byte and packet counts per flow.
  • Also TCP flags, timestamps of first/last packets, etc.
Questions Addressed by Flow Data
  • Application breakdown
    • How much of my traffic (bytes and packets) is web traffic? (TCP Port 80)
  • Flow counting
    • How many active web connections?
    • Are any of my hosts spam zombies?
Traffic Matrix
  • Flow data can be used to construct Traffic Matrix.
    • How much of the traffic is going from subnet A to subnet B.
    • A and B can be in different ASes from where the flow data is collected.
    • Not a trivial problem.
Traffic Matrix Types
  • What is The Matrix?
    • Represents the amount of data transmitted between every pair of network nodes
    • Internal Traffic Matrix
      • PoP to PoP (Point of Presence)
    • External Traffic Matrix
      • PoP to external AS
Cisco Netflow
  • Tool to collect flow records at a router
  • Cisco proprietary tool
    • Most other vendors offer similar
    • IPFIX (IP Flow Information Export)
      • IETF standardized in RFC 5101-5103
  • Used primarily by large transit providers
  • Started as a cache for improving lookup performance
    • If incoming packets are identified by flow, then don’t need to search through the forwarding table
  • On each router interface, a flow cache stores per-flow information
    • Flow ID
    • Byte counts, packet counts
    • IP addresses and ports
    • Timestamps of flow start and end times
    • etc
    • Oh, and outgoing interface (i.e. the actual forwarding task)
  • For each packet, the router will:
    • Look up entry in the flow cache
      • If found, update counters and timestamps
    • Insert new flow record if not found
Netflow Operation
  • Periodically, flow records are exported to a collection machine
    • Via UDP
  • Flow records are purged from the cache and exported to a collection machine.
  • When?
    • When a flow has been idle too long (default 15 sec).
    • Long-lived flows can be expired (default 30 min).
    • Cache is full (caching policy takes over).
    • TCP connection done (FIN) or reset (RST).
Problems with Netflow
  • Router summarises flow data. Still use flow cache.
  • Expired flows added to aggregation cache.
  • Aggregation based on AS, source/dest prefix, port, etc.
  • Reduces export data volume.
  • No packet loss during anomalous traffic patterns when a large # records contributes to packet loss.

Traceroute

  • Traceroute, introduced in 1988 by Van Jacobson
  • Traceroute is a system administrators utility to trace the route IP packets from the current system take in getting to the destination system.
  • How Traceroute Works:
    1. Launch a probe packet towards DST, with a TTL of 1
    2. Every router hop decrements the IP TTL of the packet by 1
    3. When the TTL hits 0, the packet is dropped, and the router sends an ICMP TTL Exceeded packet to SRC
    4. SRC receives this ICMP message, displays it as a traceroute “hop”
    5. Repeat from step 1, with TTL incremented by 1 each time, until…
    6. DST hop receives probe and returns ICMP Dest Unreachable
    7. SRC stops the traceroute upon receipt of ICMP Dest Unreachable
  • Traceroute Anomalies
    • Missing Hops
    • Missing Destination
    • Load Balancing
    • No visibility into the return path (asymmetric routing)
    • Shows IP addresses = router aliases != routers
Router Alias Resolution
  • IP Address != Interface != Router
  • Routers typically (not always!) reply with the IP address of the inbound interface (this violates RFC1812 but is common behaviour).
  • Send UDP probe to a random high port.
Alias Resolution Example: Increasing IPID Field
  • IP header has the IPID field.
  • Original purpose: re-assemble fragmented IP packets.
  • Often implemented as a counter:

Internet-Wide Scanning

  • Scanning the entire IPv4 address space
    • entire IPv4 Space: 2322^{32} addresses = 4.3B addresses
    • routable IPv4 space (excluding reserved ranges, multicast etc): ~3.7B addresses
    • publicly routed IPv4 space: ~2.9B addresses (as at 2017)
    • Can we just scan (probe) every single routed IPv4 address?
  • The first full scans of the IPv4 space took weeks to months.
ZMap - Stateless Implementation
  • Default case:
    • We open a TCP socket, send a SYN packet wait for the destination to reply (or not to reply)
  • ZMap:
    • Bypass the TCP/IP stack of the OS; craft Ethernet frames directly, “fill up the pipe”
    • Encode destination IP address into probe packets, match responses on arrival.
    • TCP SRC port, TCP sequence number, TCP DST port , TCP ACK = SEQ + 1
  • ZMap example: Track Heartbleed Vulnerability
  • Availability of ZMap Data.
  • ZMap-driven search engine.

Traceroute for ISP Topology Inference

  • Traceroutes show single paths. How to effectively select target IP addresses?
Path Reductions
  • Want to choose unique paths - with new information.
  • Skip repeated traces of the same path.
  • Expect the common case:
    • Traceroute server has one ingress point
    • Customer prefix has one egress point
    • BGP peers have one early-exit per ingress.
  • If we're wrong, we might miss some paths.
  • New servers add paths or share load!
Reduction Effectiveness
  • Brute force:
    • All servers to all BGP prefixes, disaggregate ISP prefixes.
    • 90-150 million traceroutes required
  • BGP directed probes:
    • All traceroutes identifiable from Route Views.
    • 0.2-15 million traceroutes required
  • Executed after path reduction:
    • Traceroutes chosen by Rocketfuel.
    • 8-300 thousand traceroutes required
  • Directed probing and path reductions are effective at reducing the number of probes required to map an ISP
Traceroute for Large-Scale Topology Inference
  • Need sufficient number of vantage points
  • Need a smart way to select target IPs - Brute-Force probing the whole space ineffective
  • Need to deal with traceroute issues
  • Rocketfuel combines all these aspects together, leveraging BGP data to select target ranges, into a single system.