Internet Measurement Notes

Intro to Internet/Network Measurement

Network structure (graph).
Internet topology measurement.
Internet-wide scanning.

Internet Measurements

Network Debugging.
Performance.
Resilience.
Security.
Regulation and Policies.
Broader impact on society:
- State censorship.
- Price and traffic discrimination.
- Impact of social media.

History of Internet Measurements

First major academic measurement studies (e.g., Paxson, SIGCOMM 1997).
RFC323: IETF formed measurement group(s) as early as 1972.
2001: First ACM SIGCOMM Internet Measurement Workshop.
2003: First ACM IMC (Internet Measurement Conference).

Internet Measurements Layers

Layer 8: User/Political.
- e.g., (fake) news propagation in social networks.
Physical Layer
- e.g., infrastructure geography.
Transport Layer
- e.g., performance of transport protocols, congestion control.
Network Layer
- e.g., routing failures, Internet topology, performance.
Example of end-to-end video quality measurement.
Example of censorship measurements and impact.

Internet Measurement - Fundamental Challenges

Internet: Not designed with measurability in mind.
"Current measurement practice often involves the exploitation of side-effects and unintended features of the network, or, in other words, the artful piling of hacks atop one another. This state of affairs is a direct result of the relative paucity of diagnostic and measurement capabilities built into today's network stack."
Lack of ground truth.
Lack of available data.
Heterogeneity of the network – Generalizability of results.
Privacy concerns, Ethics.

Examples of Internet Measurement Studies

A Census of the Internet Address Space: https://ant.isi.edu/address/browse/index.html
Code-Red Worm:
- On July 19, 2001, more than 359,000 computers connected to the Internet were infected with the Code-Red (CRv2) worm in less than 14 hours.
- Spread - http://www.caida.org/research/security/code-red/newframes-small-log.gif
Sapphire Worm:
- The fastest computer worm in history.
- Doubled in size every 8.5 seconds.
- Infected more than 90 percent of vulnerable hosts within 10 minutes.
Witty Worm:
- Reached its peak activity after approximately 45 minutes, at which point the majority of vulnerable hosts had been infected.
- http://www.caida.org/research/security/witty/animations/usa_big-witty_2h.gif
Nyxem Email Virus:
- Estimate of the total number of infected computers is between 470K and 945K.
- At least 45K of the infected computers were also compromised by other forms of spyware or botware.
- http://www.caida.org/research/security/blackworm/animations/nyxem-hosts-both-O2.gif
Scam Hosting: Study dynamics of scam hosting infrastructure.
Sipscan Scan:
- A botnet-orchestrated stealth scan of the entire IPv4 address space.
- During 31 Jan - 12 Feb 2011.
- Originated from ~3 million IP addresses.
- Heavily coordinated.
- Unusually covert scanning strategy.
- To discover and compromise VoIP-related (SIP server) infrastructure.
- http://www.youtube.com/watch?v=n6MRlEJeD8M

Measurement 101: Network Structure

To measure the Internet and network, we need to understand the features of networks.
Network in a nutshell, is just a graph.
Apply different graph measures to estimate network.

Key Network Properties

Degree distribution: $P(k)$
Path length: $h$
Clustering coefficient: $C$
Connected components: $s$
Note: There are many other graph measurements such as centrality measures. But only the above items are covered.
Degree distribution $P(k)$ : Probability that a randomly chosen node has degree $k$ .
- $N_k$ = # nodes with degree k
- Normalised histogram: $P(k) = \frac{N_k}{N}$
Path
- A path is a sequence of nodes in which each node is linked to the next one
- $Pn = {i0, i1, i2, …, i_n}$
- $Pn = {(i0, i1), (i1, i2), (i2, i3), …, (i{n-1}, i_n)}$
- Path can intersect itself and pass through the same edge multiple times
- E.g.: ACBDCDEG
- In a directed graph a path can only follow the direction of the “arrow”
Distance
- Distance (shortest path, geodesic) between a pair of nodes is defined as the number of edges along the shortest path connecting the nodes
- If the two nodes are not connected, the distance is usually defined as infinite.
- In directed graphs paths need to follow the direction of the arrows.
- Consequence: Distance is not symmetric: $h{B,C} \ne h{C,B}$
Diameter
- Diameter: The maximum (shortest path) distance between any pair of nodes in a graph
- Average path length for a connected graph (component) or a strongly connected (component of a) directed graph
- Many times we compute the average only over the connected pairs of nodes (that is, we ignore “infinite” length paths)
- $h = \frac{\sum{i,j} h{ij}}{E{max}}$ , where $h{ij}$ is the distance from node i to node j
- $E_{max}$ is the max number of edges (total number of node pairs) = $\frac{n(n-1)}{2}$
Clustering coefficient
- What portion of i’s neighbors are connected?
- Node i with degree ki
- $C_i \in [0,1]$
- Average clustering coefficient: $C = \frac{\sum{i} Ci}{N}$
How to find connected components:
- Start from random node and perform Breadth First Search (BFS)
- Label the nodes BFS visited
- If all nodes are visited, the network is connected
- Otherwise find an unvisited node and repeat BFS
- E.g. Tarjan’s strongly connected components algorithm
- Size of the largest connected component Largest set where any two vertices can be joined by a path
- Largest component = Giant component

Example: MSN Messenger

1-month activity (when it was popular).
- 180 million users logged in.
- 30 million users engaged in conversations.
- More than 1.3 billion conversations.
- More than 6.5 billion exchanged messages.
Contact, Conversation, and Messaging as an undirected graph.
- Edge (u,v) if users u and v exchanged at least 1 msg.
- N=180 million people.
- E=1.3 billion edges.
Degree distribution: Heavily skewed, avg. degree= 14.4
Path length: 6.6
Clustering coefficient: 0.1140
Connectivity: giant component
$Ck = \frac{1}{Nk} \sum{i:ki=k} C_i$

Measurement Types

Two types of measurements:
- Passive.
  - Tools:
- Active.
  - Tools:

Internet Topology Measurement

What is topology measurement?

Simple Network Management Protocol (SNMP)

Protocol for network management services.
- How do I ask my router for data?
- How do I alter that data? (i.e. config).
- How does the router report an error?
Also defines the structure of information.
- MIB: Management Information Base
3 versions, many RFCs (starts at 1065-7).
Simplistic view: asking router for info
Router has counters on link traffic:
- Byte counts
- Packet counts
- "Manager" can poll router periodically, e.g. every 5 minutes

SNMP Counters

Billing.
- Packet/byte counts for the link.
- May need further division if many customers share the link.
Multi Router Traffic Grapher (MRTG).
- Time-series plot of input/output Bandwidth.
- Good for eyeballing anomalies.
Examples of MRTG Usage Scenarios
- Loss of traffic.
- Bandwidth spike.
- Daily backups are being performed.
Anomalous behaviour can sometimes be a sign of security trouble.
- Top graph was from a server suffering a Flashchat script exploit.
- Bottom graph was a server under DoS attack.

SNMP's Limitations

SNMP can't tell the whole story.

What is a Flow?

A unidirectional stream of packets between a source and destination.
7 main fields (can have more if relevant):
- Source IP address
- Destination IP address
- Layer 3 protocol type
- Type of Service
- Source port number
- Destination port number
- Input logical interface
Other definitions possible

Flow Record

A data structure describing the flow.
Kept by the router.
Includes byte and packet counts per flow.
Also TCP flags, timestamps of first/last packets, etc.

Questions Addressed by Flow Data

Application breakdown
- How much of my traffic (bytes and packets) is web traffic? (TCP Port 80)
Flow counting
- How many active web connections?
- Are any of my hosts spam zombies?

Traffic Matrix

Flow data can be used to construct Traffic Matrix.
- How much of the traffic is going from subnet A to subnet B.
- A and B can be in different ASes from where the flow data is collected.
- Not a trivial problem.

Traffic Matrix Types

What is The Matrix?
- Represents the amount of data transmitted between every pair of network nodes
- Internal Traffic Matrix
  - PoP to PoP (Point of Presence)
- External Traffic Matrix
  - PoP to external AS

Cisco Netflow

Tool to collect flow records at a router
Cisco proprietary tool
- Most other vendors offer similar
- IPFIX (IP Flow Information Export)
  - IETF standardized in RFC 5101-5103
Used primarily by large transit providers
Started as a cache for improving lookup performance
- If incoming packets are identified by flow, then don’t need to search through the forwarding table
On each router interface, a flow cache stores per-flow information
- Flow ID
- Byte counts, packet counts
- IP addresses and ports
- Timestamps of flow start and end times
- etc
- Oh, and outgoing interface (i.e. the actual forwarding task)
For each packet, the router will:
- Look up entry in the flow cache
  - If found, update counters and timestamps
- Insert new flow record if not found

Netflow Operation

Periodically, flow records are exported to a collection machine
- Via UDP
Flow records are purged from the cache and exported to a collection machine.
When?
- When a flow has been idle too long (default 15 sec).
- Long-lived flows can be expired (default 30 min).
- Cache is full (caching policy takes over).
- TCP connection done (FIN) or reset (RST).

Problems with Netflow

Router summarises flow data. Still use flow cache.
Expired flows added to aggregation cache.
Aggregation based on AS, source/dest prefix, port, etc.
Reduces export data volume.
No packet loss during anomalous traffic patterns when a large # records contributes to packet loss.

Traceroute

Traceroute, introduced in 1988 by Van Jacobson
Traceroute is a system administrators utility to trace the route IP packets from the current system take in getting to the destination system.
How Traceroute Works:
1. Launch a probe packet towards DST, with a TTL of 1
2. Every router hop decrements the IP TTL of the packet by 1
3. When the TTL hits 0, the packet is dropped, and the router sends an ICMP TTL Exceeded packet to SRC
4. SRC receives this ICMP message, displays it as a traceroute “hop”
5. Repeat from step 1, with TTL incremented by 1 each time, until…
6. DST hop receives probe and returns ICMP Dest Unreachable
7. SRC stops the traceroute upon receipt of ICMP Dest Unreachable
Traceroute Anomalies
- Missing Hops
- Missing Destination
- Load Balancing
- No visibility into the return path (asymmetric routing)
- Shows IP addresses = router aliases != routers

Router Alias Resolution

IP Address != Interface != Router
Routers typically (not always!) reply with the IP address of the inbound interface (this violates RFC1812 but is common behaviour).
Send UDP probe to a random high port.

Alias Resolution Example: Increasing IPID Field

IP header has the IPID field.
Original purpose: re-assemble fragmented IP packets.
Often implemented as a counter:

Internet-Wide Scanning

Scanning the entire IPv4 address space
- entire IPv4 Space: $2^{32}$ addresses = 4.3B addresses
- routable IPv4 space (excluding reserved ranges, multicast etc): ~3.7B addresses
- publicly routed IPv4 space: ~2.9B addresses (as at 2017)
- Can we just scan (probe) every single routed IPv4 address?
The first full scans of the IPv4 space took weeks to months.

ZMap - Stateless Implementation

Default case:
- We open a TCP socket, send a SYN packet wait for the destination to reply (or not to reply)
ZMap:
- Bypass the TCP/IP stack of the OS; craft Ethernet frames directly, “fill up the pipe”
- Encode destination IP address into probe packets, match responses on arrival.
- TCP SRC port, TCP sequence number, TCP DST port , TCP ACK = SEQ + 1
ZMap example: Track Heartbleed Vulnerability
Availability of ZMap Data.
ZMap-driven search engine.

Traceroute for ISP Topology Inference

Traceroutes show single paths. How to effectively select target IP addresses?

Path Reductions

Want to choose unique paths - with new information.
Skip repeated traces of the same path.
Expect the common case:
- Traceroute server has one ingress point
- Customer prefix has one egress point
- BGP peers have one early-exit per ingress.
If we're wrong, we might miss some paths.
New servers add paths or share load!

Reduction Effectiveness

Brute force:
- All servers to all BGP prefixes, disaggregate ISP prefixes.
- 90-150 million traceroutes required
BGP directed probes:
- All traceroutes identifiable from Route Views.
- 0.2-15 million traceroutes required
Executed after path reduction:
- Traceroutes chosen by Rocketfuel.
- 8-300 thousand traceroutes required
Directed probing and path reductions are effective at reducing the number of probes required to map an ISP

Traceroute for Large-Scale Topology Inference

Need sufficient number of vantage points
Need a smart way to select target IPs - Brute-Force probing the whole space ineffective
Need to deal with traceroute issues
Rocketfuel combines all these aspects together, leveraging BGP data to select target ranges, into a single system.