Networking and Application Layer Architecture Concepts of the Application Layer

Course Logistics and Introduction

Lecturer and Contact Information: The lecture for the networking portion of the paper is conducted by Junaid. Contact can be made via the email ID provided on Moodle. The lecturer intends to post contact hours for in-office help regarding the second half of the paper.
Acknowledgments: The lecture slides were originally created by Marino but have been updated with new content for the current semester.
Assignment 2: The assignment is currently available on Moodle with a deadline of May 17. Students are encouraged to start early and seek assistance from Teaching Assistants (TAs) during lab sessions.
Quizzes: The class format includes two quizzes per lecture. Each quiz contains five questions. One quiz is administered at the end of the first half of the lecture, and the second is administered at the end of the second half.
The Internet Protocol Model: While the OSI model contains more layers, in practical terms, a five-layer architecture is used:
- Application Layer: Where actual applications operate (e.g., Email, Web, DNS).
- Transport Layer: Manages end-to-end communication between client and server (TCP and UDP).
- Network Layer: Handles host addressing and routing (IPv4 and IPv6).
- Link Layer: Controls physical communication through mediums like Ethernet or Wireless.
- Physical Layer: The actual hardware such as fiber optics and wireless signals.

Network Application Communication Requirements

Networked Applications: These generally follow a client-server architecture. For example, a browser (client) in Hamilton sends a request to a web server in Sydney; the server responds with the requested data.
Data Integrity: Refers to the prevention of data loss or manipulation during transmission. Application types have varying needs:
- Reliable Data Transfer: Necessary for file transfers and web transactions which require $100\%$ data integrity.
- Loss-Tolerant Applications: Audio and video applications may afford some data loss without critical failure.
Timeliness: Relates to the delay or latency in data delivery. This is a critical requirement for real-time applications like video conferencing.
Throughput: The amount of data transmitted over the network in a given time. High throughput is essential for streaming high-quality video content.
Security: Ensures that the data is protected during transmission, a high priority for applications like email and financial transactions.

Transport Layer Protocols: TCP and UDP

Transmission Control Protocol (TCP):
- Reliability: Provides 100% reliable data transfer.
- Byte Streams: Data is communicated in the form of byte streams.
- Connection-Oriented: Requires a "handshake" or initial connection establishment between two applications before communication can begin.
User Datagram Protocol (UDP):
- Reliability: Considered unreliable as it does not guarantee delivery.
- Datagrams: Transfers data in the form of datagrams (bytes).
- Connectionless: No handshake is required, making it faster than TCP, though less reliable.

Electronic Mail (Email) Protocols

Protocols Used:
- Simple Mail Transfer Protocol (SMTP): Used for transferring email from the client to the mail server and between mail servers. It operates on both the client and server sides.
- Post Office Protocol (POP): A mail access protocol used to download or retrieve emails from the server.
- Internet Message Access Protocol (IMAP): An advanced mail access protocol that offers more features than POP, including the ability to manipulate stored messages directly on the server.
User Agents: These are categorized into target types:
- Desktop Applications: For example, Apple Mail or Outlook. The SMTP client is installed directly on the machine.
- Web-Based Agents: For example, Gmail or Yahoo. Users first send an HTTP request to the website before the email is processed via SMTP.
Workflow: A sender's user agent sends the message to the sender's mail server via SMTP. It is then transferred to the receiver's mail server via SMTP. Finally, the receiver's user agent retrieves the message using POP or IMAP.

Content Distribution and Streaming Services

Content Providers: Services like Spotify, Netflix, Apple TV, and Apple Music provide audio and video content in varying qualities and sizes.
Primary Requirements: Timeliness and throughput are the most important needs for streaming applications.
Buffering: Applications often use a layout buffer placeholder to pre-load segments of a video to prevent playback interruption during minor network fluctuations.
Content Delivery Networks (CDN):
- Function: Content is distributed across servers in various geographic locations to reduce costs and latency.
- Latency Reduction: Accessing a local server in New Zealand for content is faster than retrieving it from a distant origin server.
- Origin Server: The main server where the content is originally hosted.
- Workflow: If a local CDN server does not have the requested content, it retrieves it from the origin server; for popular content, it may be pre-loaded into the CDN in advance.
Video Conferencing and Live Streaming: Applications like Zoom, Google Meet, and Teams require real-time delivery. Unlike standard streaming, the content cannot be pre-loaded and must be provided "on the fly." Timeliness is the paramount requirement here.

Web Architecture and HTTP

Core Components: The web relies on HTML (Hypertext Markup Language), URLs (Uniform Resource Locators), and HTTP (Hypertext Transfer Protocol).
Web Page Structure: A web page consists of a base HTML file containing metadata and various referenced objects (images, audio, CSS files). Each referenced object can be accessed separately via its own URL.
URL Structure: A URL identifies the location of a resource.
- Example: https://waikato.ac.nz:80/study where 80 is the port number.
- Port Numbers: HTTP usually operates on port $80$ , while HTTPS (Secure) operates on port $443$ . Browsers generally resolve these ports automatically.
HTTP Protocol Basics: A client (browser) requests objects, and the server responds by sending the requested objects back. HTTP uses TCP as its underlying transport protocol.

HTTP Connection Types and Evolution

Non-Persistent HTTP:
- Each object requested requires a separate TCP connection to be opened and closed.
- To retrieve $2$ objects, the system requires four round trips ( $2 \times 2 = 4$ RTTs).
Persistent HTTP (HTTP 1.1):
- Multiple objects can be retrieved over a single TCP connection.
- Round Trip Times are calculated as $n + 1$ , where $n$ is the number of objects. For $2$ objects, $3$ RTTs are required.
HTTP Messages: These are text-based and readable.
- Request Methods:
  - GET: Retrieves body and header information.
  - HEAD: Retrieves only the headers of a resource.
  - POST: Sends user-input data to a server.
  - PUT: Used to upload or replace a file at a specific URL.
- Response Status Codes:
  - 200 OK: Request served successfully.
  - 301 Moved Permanently: Resource has been relocated.
  - 400 Bad Request: Generic error for a faulty request.
  - 404 Not Found: Requested document does not exist.
  - 505 HTTP Version Not Supported: Server does not support the HTTP protocol version used.
HTTP/2: Introduced to resolve the "Head of Line (HOL) Blocking" problem found in HTTP 1.1. In HTTP 1.1, if the first object in the queue is very large, smaller objects behind it must wait. HTTP/2 divides objects into "chunks" or "frames," interleaving them so that smaller objects can load while a larger object is still being processed.
Web Cookies: Used for session tracking and personalization. When a client visits a site (e.g., Amazon) for the first time, the server creates a unique ID, stores it in a database, and sends it back to be stored in the browser's cookie storage. Subsequent visits use this ID to provide personalized recommendations.

Domain Name System (DNS)

Purpose: A distributed database that translates human-readable hostnames (e.g., google.com) into IP addresses (e.g., $32$ -bit IPv4 addresses).
Transport: Operates on the application layer using UDP for speed.
Hierarchy of DNS Servers:
- Root DNS Servers: There are only $13$ logical root name servers worldwide (labeled $A$ through $M$ ), though they are replicated globally.
- Top-Level Domain (TLD) Servers: Responsible for domains like .com, .org, .net, and country codes like .nz.
- Authoritative DNS Servers: Servers owned by the organizations themselves (e.g., waikato.ac.nz).
Query Types:
- Iterative: The contacted server replies with the name of the next server to contact (e.g., "I don't know, ask the TLD server").
- Recursive: The contacted server takes the burden of resolving the query on behalf of the client by contacting other servers itself.
DNS Caching: Once a server learns a mapping, it caches it to speed up future requests. These entries have a Time to Live (TTL) and disappear after a specific duration.
DNS Poisoning: An attack occurs if a malicious actor manipulates a cached IP address before the record can be updated across the internet (which can take around $240$ minutes).
Resource Records (RR): Entries in the DNS database with fields (Name, Value, Type, TTL).
- Type A: Mapping of Hostname to IP address.
- Type NS: Mapping of domain to an authoritative name server.
- Type CNAME: Canonical names (aliases for the real name).
- Type MX: Mail Exchange records for email servers.

Practical Identification Tools

wget: A Linux command used to retrieve content from the web; it can be used to compare RTTs between persistent and non-persistent connections.
curl: A tool to view HTTP request and response headers.
dig: Used to query DNS servers and view the hierarchy of resolution for a specific domain.

Questions & Discussion

Question: Why is Head of Line (HOL) blocking bad?
Response: It causes a significant delay. If a large 5MB image takes 30 seconds to load, all other content on the page (text, links) is blocked for those 30 seconds. HTTP/2 solves this by loading content simultaneously through frames.
Question: Will translating google.com always return the same IP address?
Response: No. DNS performs load distribution. Depending on your location (e.g., New Zealand vs. another country), DNS will return different IP addresses to reduce load or latency.
Question: How many RTTs are required for an iterative DNS search with no cache?
Response: If it starts at Local DNS and goes through Root, TLD, and Authoritative, it requires $4$ RTTs. If TLD info is cached, it requires $3$ ; if the Authoritative server is known, it requires $2$ ; if the local DNS has the record, it only requires $1$ RTT.