Comprehensive Notes on the World Wide Web, Network Infrastructure, and Computing Impacts

Objectives and Functional Mechanics of World Wide Web Browsing

  • Learning Objectives:
      - Describe the process occurring when a URL is typed, from the initial request and response over the internet to the final rendering of a webpage.
      - Understand the role of the World Wide Web as a content-oriented ecosystem built atop the internet infrastructure.

  • Concept of the URL and WWW:
      - The prefix www. is a special part of a domain address indicating a server hosting content that meets World Wide Web standards. Although browsers often auto-insert it, it specifically identifies a web server.
      - The World Wide Web is distinct from the internet; the internet is the underlying global hardware infrastructure, whereas the Web is an ecosystem of linked pages, programs, and files.
      - It serves as an open, standardized platform for global communication and information sharing.

Origins and Historical Evolution of the Web

  • Foundational Vision:
      - British computer scientist Tim Berners-Lee recognized the need for a standardized platform while working at CERN near Geneva, Switzerland.
      - Before the Web, information was siloed on different computers, requiring users to log into specific machines and learn different programs for each.
      - Berners-Lee applied abstraction to design a generalized means of sharing information independent of specific hardware or software.
      - Collaborative Abstractions: Lower-level abstractions (like networking protocols) are combined to create higher-level abstractions like SMS, email, images, audio, and video.

  • Key Web Applications and Technologies:
      - The First Website: Launched on August 6, 1991, running on a NeXT cube computer at CERN. A warning sticker on the machine read: ‘This machine is a server. DO NOT POWER IT DOWN!!’
      - Web Browser: A client application running on an end user’s computer used to request and view pages.
      - Web Server: A program running on a remote computer that serves pages to clients.
      - HTML (Hypertext Markup Language): A protocol dictating how content is arranged and displayed by the browser.
      - URL (Uniform Resource Locator): A unique address identifying resources on the web, also known as a URI (Uniform Resource Identifier).
      - HTTP (Hypertext Transfer Protocol): The standard protocol for requesting and receiving linked resources; specifically designed for HTML documents.
      - HTTPS (Hypertext Transfer Protocol Secure): An extension of HTTP including procedures for encrypting data before transmission and decrypting it upon receipt.
      - HTTP Request: Made by a client to a named host on a server to access a resource.
      - HTTP Response: Made by a server to a client providing the requested resource.

  • Alternative Names Considered: Berners-Lee considered names like Information Mesh, Mine of Information, or Information Mine before settling on World Wide Web.

Hyperlinks and the Non-Linear Web

  • Structure of Information: Unlike the linear sequence of a book (Page 1, 2, 3), the Web is non-linear and consists of massively interconnected pages.
  • Hyperlinks Definition: Clickable text, images, or elements within an HTML document that allow users to request related documents in any sequence they choose.
  • HTML Anchor Tags: Hyperlinks are denoted using the anchor tag <a>...</a>. The href attribute contains the URL of the destination.
      - Example: <a href="http://www.google.com">search</a> creates a link for the word "search".
  • Interconnectedness Concepts:
      - Six Degrees of Separation: The idea of close interconnectedness where any two nodes can be linked by a small number of connections.
      - Oracle of Bacon: A game demonstrating how actors are connected to Kevin Bacon through shared film roles, illustrating network density.
      - Network Robustness: In social networks, routing, and DNS, high interconnectedness provides multiple pathways and ensures the system remains functional even if specific nodes fail.

Domain Name System (DNS) Fundamentals

  • Purpose: The DNS acts as the internet’s "directory assistance," translating human-friendly domain names into machine-readable IP addresses.
  • IP Addresses vs. Domain Names:
      - IPv4: A 32-bit number often represented in four octets (e.g., 157.240.20.35157.240.20.35).
      - Domain Name: A descriptive name (e.g., facebook.com) that is more intuitive for humans than numerical sequences.
  • DNS Lookup Process:
      - The browser isolates the domain name from the URL.
      - It sends a request to a pre-configured Domain Name Server (often provided by an ISP, or public ones like Google's at 8.8.8.88.8.8.8 and 8.8.4.48.8.4.4).
      - The DNS server searches its lookup table and returns the associated IP address.
      - The browser then contacts that IP address directly to retrieve the webpage.
  • DNS Hierarchy:
      - Organized as a tree starting with the Root (signified by a dot).
      - Top-Level Domains (TLD): Such as .com, .org, and .edu.
      - Subdomains: Divisions within TLDs used to organize specific sections of a site.
      - Scalability: This hierarchy allows the system to scale to billions of records.
  • Security Misconception: DNS is typically not highly secured because the mappings between hostnames and IP addresses are intended to be public.

Network Routing, Reliability, and Performance

  • Core Components:
      - Router: A device that determines the path for data to travel across a network.
      - Packet: A small chunk of data used for network transmission. Packets contain the payload and metadata (destination/source IP, size, order number).
  • Network Engineering Principles:
      - Redundancy: The inclusion of extra components or multiple paths between points. If one router fails, the data can be rerouted.
      - Fault Tolerance: The ability of a system to continue functioning despite the failure of some components.
      - Scalability: The capacity of the routing system to handle increasing amounts of traffic, as seen during the 2020 Coronavirus Pandemic.
  • Client-Server Model:
      - The Client (e.g., web browser, email app) initiates a request.
      - The Server processes the request and sends a response.
      - Analogy: A waiter (server) taking an order from a customer (client) and bringing food from the kitchen (processor/database).
  • Speed Metrics:
      - Bitrate: The amount of data in bits sent in a fixed amount of time (e.g., 750extMbps750 ext{ Mbps}).
      - Bandwidth: The maximum capacity of a system to transfer data (the "size of the pipe").
      - Latency: The time elapsed between transmission and receipt of a request (the travel time for a single bit).

Data Transmission Protocols: TCP/IP and HTTP

  • Packet Switching: Proposed by Vint Cerf and Bob Kahn in 1974. Data is broken into packets and sent across various routes to be reassembled at the destination.
  • Transmission Control Protocol (TCP):
      - Manages the sending of multiple packets.
      - Ensures all packets arrive, are in the correct order, and are accounted for.
  • HyperText Transfer Protocol (HTTP): Standardizes the language used for communicating with web servers to request and receive HTML pages.

Models of Computing: Sequential, Parallel, and Distributed

  • Sequential Computing:
      - Operations are performed one at a time in order.
      - Follows the Fetch-Decode-Process Cycle.
      - Performance is limited by the speed of a single processor and heat generation (thermal limits).
  • Distributed Computing:
      - A model using multiple devices to run a program.
      - Scalability: Allows solving problems too large for a single computer by "farming out" segments of data to a network of computers.
      - Examples: SETI@Home (searching for extraterrestrial intelligence using idle home computers via BOINC software), GIMPS (Great Internet Mersenne Prime Search), and Bitcoin Mining.
      - Malicious Use: Botnets utilize distributed computing to launch DDoS (Distributed Denial of Service) attacks or send spam using "zombie" computers.
  • Parallel Computing:
      - Breaks a program into smaller sequential operations performed simultaneously across multiple processors.
      - Consists of a parallel portion and a sequential portion.
      - Speedup Calculation: ext{Speedup} = rac{ ext{Sequential Time}}{ ext{Parallel Time}}
      - Efficiency Analysis: In a system with two processors, the total time is determined by the longest-running task on either processor, accounting for dependencies (tasks that must wait for another to finish).
  • Limits to Efficiency: Parallelism is limited by the sequential portions of a task and the overhead of managing instruction flow.

Cybersecurity and Information Protection

  • Encryption Types:
      - Symmetric Key Encryption: Uses a single key for both encryption and decryption. This creates a security risk because the key must be shared.
      - Asymmetric (Public-Key) Encryption: Uses a pair of keys: a public key for encryption (accessible to anyone) and a private key for decryption (known only to the recipient). This is used in RSA encryption.
  • Certificate Authorities (CAs): Issue digital certificates identifying the authenticity of websites. This is the basis of SSL (Secure Sockets Layer), indicated by a padlock icon in browsers.
  • The CIA Triad:
      - Confidentiality: Limiting access to information to authorized users.
      - Integrity: Ensuring information is accurate and unaltered.
      - Availability: Reliable access to information and systems.
  • Authentication:
      - Methods include strong passwords and Multi-Factor Authentication (MFA).
      - MFA categories: Knowledge (something you know), Possession (something you have), and Inherence (something you are, like biometrics).

Internet of Things (IoT) and Autonomous Technology

  • IoT Definition: The interconnection of everyday objects (embedded with computing devices and sensors) via the internet. Examples include smartwatches, GPS, and smart doorbells.
  • Sensor Networks: Autonomous sensors measuring environmentals (light, heat, sound) to facilitate smart interaction with physical systems.
  • Ethics and Liability:
      - The Trolley Problem: A thought experiment regarding ethical choices in life-or-death scenarios, relevant to the programming of autonomous vehicles.
      - Liability: Questions arise regarding who is at fault in autonomous vehicle accidents: the owner, the manufacturer, or the programmer.

Intellectual Property and Creative Credit

  • Copyright Law: Protects original works, granting creators exclusive rights to use, distribute, and sell their work.
  • Creative Commons: A public copyright license allowing free distribution of work, provided users follow specific conditions (Attribution, Non-Commercial, No Derivatives, Share Alike).
  • Open Source: Programs made freely available for redistribution and modification.
  • Open Access: Online research output free of restrictions on access and use.
  • Public Domain: Works not subject to copyright law, free for public use.

Computing Innovations and Data Life Cycle

  • Definition of Computing Innovation: An innovation that includes a computer program as an integral part of its function (e.g., smartphones, facial recognition, e-commerce).
  • IPOS Model (Input-Processing-Output-Storage):
      - Input: Captured via collection devices (sensors, keyboard, camera).
      - Processing: Data is transformed by software and the CPU.
      - Output: Information produced for the user (audio, visual, text).
      - Storage: Data is archived via hardware (RAM, Hard Drives) or remote cloud systems.
  • Personal Data and Privacy:
      - PII (Personally Identifiable Information): Information that describes or identifies an individual (SSN, age, biometric data).
      - Aggregation: Combining disparate data (cookies, geolocation) to build deep knowledge about an individual.

Impacts of Cloud Computing and the Digital Divide

  • Cloud Computing:
      - Allows offloading storage and processing to remote servers.
      - History: Evolved from "dumb terminals/thin clients" that connected to centralized building-sized computers.
      - 3-2-1 Backup Rule: Maintain 3 copies of data, on 2 different media, with 1 backup copy off-site.
      - Legal Risks: Users often give up significant rights to their data through Terms of Service (TOS) agreements, which may grant companies the right to use personal content for commercial purposes.
  • The Digital Divide:
      - The gap between those with sufficient access to technology and those without.
      - Key Factors: Physical resources (cost), lack of opportunity, digital literacy, and lack of skills.
      - Consequences: Exclusion from news, healthcare, government services, and the silencing of diverse voices in the digital ecosystem.

Questions & Discussion

  • Digital Interconnectedness: Does it surprise you that Kevin Bacon can be linked to so many actors with so few connections?
      - Response: The phenomenon illustrates the "small world" property of networks and high density of connections in social systems.
  • Privacy and Sharing: How accurate is Mark Zuckerberg's 2008 proposition that people are willing to share anything regardless of privacy?
      - Discussion: The rise of social media and the value found in digital connection has significantly shifted cultural attitudes toward personal privacy.
  • Autonomous Fault: If a self-driving car causes an accident, who is at fault?
      - Discussion: This remains a nebulous legal area involving potential responsibility for the carmaker, the software developers, the owner, or the passenger.
  • Digital Divide in Crisis: Was the divide more apparent during the 2020 pandemic?
      - Discussion: Lack of internet access directly hindered remote schooling and vaccine registration for marginalized populations.