Comprehensive Notes on the World Wide Web, Network Infrastructure, and Computing Impacts
Objectives and Functional Mechanics of World Wide Web Browsing
Learning Objectives:
- Describe the process occurring when a URL is typed, from the initial request and response over the internet to the final rendering of a webpage.
- Understand the role of the World Wide Web as a content-oriented ecosystem built atop the internet infrastructure.Concept of the URL and WWW:
- The prefixwww.is a special part of a domain address indicating a server hosting content that meets World Wide Web standards. Although browsers often auto-insert it, it specifically identifies a web server.
- The World Wide Web is distinct from the internet; the internet is the underlying global hardware infrastructure, whereas the Web is an ecosystem of linked pages, programs, and files.
- It serves as an open, standardized platform for global communication and information sharing.
Origins and Historical Evolution of the Web
Foundational Vision:
- British computer scientist Tim Berners-Lee recognized the need for a standardized platform while working at CERN near Geneva, Switzerland.
- Before the Web, information was siloed on different computers, requiring users to log into specific machines and learn different programs for each.
- Berners-Lee applied abstraction to design a generalized means of sharing information independent of specific hardware or software.
- Collaborative Abstractions: Lower-level abstractions (like networking protocols) are combined to create higher-level abstractions like SMS, email, images, audio, and video.Key Web Applications and Technologies:
- The First Website: Launched on August 6, 1991, running on a NeXT cube computer at CERN. A warning sticker on the machine read: ‘This machine is a server. DO NOT POWER IT DOWN!!’
- Web Browser: A client application running on an end user’s computer used to request and view pages.
- Web Server: A program running on a remote computer that serves pages to clients.
- HTML (Hypertext Markup Language): A protocol dictating how content is arranged and displayed by the browser.
- URL (Uniform Resource Locator): A unique address identifying resources on the web, also known as a URI (Uniform Resource Identifier).
- HTTP (Hypertext Transfer Protocol): The standard protocol for requesting and receiving linked resources; specifically designed for HTML documents.
- HTTPS (Hypertext Transfer Protocol Secure): An extension of HTTP including procedures for encrypting data before transmission and decrypting it upon receipt.
- HTTP Request: Made by a client to a named host on a server to access a resource.
- HTTP Response: Made by a server to a client providing the requested resource.Alternative Names Considered: Berners-Lee considered names like Information Mesh, Mine of Information, or Information Mine before settling on World Wide Web.
Hyperlinks and the Non-Linear Web
- Structure of Information: Unlike the linear sequence of a book (Page 1, 2, 3), the Web is non-linear and consists of massively interconnected pages.
- Hyperlinks Definition: Clickable text, images, or elements within an HTML document that allow users to request related documents in any sequence they choose.
- HTML Anchor Tags: Hyperlinks are denoted using the anchor tag
<a>...</a>. Thehrefattribute contains the URL of the destination.
- Example:<a href="http://www.google.com">search</a>creates a link for the word "search". - Interconnectedness Concepts:
- Six Degrees of Separation: The idea of close interconnectedness where any two nodes can be linked by a small number of connections.
- Oracle of Bacon: A game demonstrating how actors are connected to Kevin Bacon through shared film roles, illustrating network density.
- Network Robustness: In social networks, routing, and DNS, high interconnectedness provides multiple pathways and ensures the system remains functional even if specific nodes fail.
Domain Name System (DNS) Fundamentals
- Purpose: The DNS acts as the internet’s "directory assistance," translating human-friendly domain names into machine-readable IP addresses.
- IP Addresses vs. Domain Names:
- IPv4: A 32-bit number often represented in four octets (e.g.,).
- Domain Name: A descriptive name (e.g.,facebook.com) that is more intuitive for humans than numerical sequences. - DNS Lookup Process:
- The browser isolates the domain name from the URL.
- It sends a request to a pre-configured Domain Name Server (often provided by an ISP, or public ones like Google's atand).
- The DNS server searches its lookup table and returns the associated IP address.
- The browser then contacts that IP address directly to retrieve the webpage. - DNS Hierarchy:
- Organized as a tree starting with the Root (signified by a dot).
- Top-Level Domains (TLD): Such as.com,.org, and.edu.
- Subdomains: Divisions within TLDs used to organize specific sections of a site.
- Scalability: This hierarchy allows the system to scale to billions of records. - Security Misconception: DNS is typically not highly secured because the mappings between hostnames and IP addresses are intended to be public.
Network Routing, Reliability, and Performance
- Core Components:
- Router: A device that determines the path for data to travel across a network.
- Packet: A small chunk of data used for network transmission. Packets contain the payload and metadata (destination/source IP, size, order number). - Network Engineering Principles:
- Redundancy: The inclusion of extra components or multiple paths between points. If one router fails, the data can be rerouted.
- Fault Tolerance: The ability of a system to continue functioning despite the failure of some components.
- Scalability: The capacity of the routing system to handle increasing amounts of traffic, as seen during the 2020 Coronavirus Pandemic. - Client-Server Model:
- The Client (e.g., web browser, email app) initiates a request.
- The Server processes the request and sends a response.
- Analogy: A waiter (server) taking an order from a customer (client) and bringing food from the kitchen (processor/database). - Speed Metrics:
- Bitrate: The amount of data in bits sent in a fixed amount of time (e.g.,).
- Bandwidth: The maximum capacity of a system to transfer data (the "size of the pipe").
- Latency: The time elapsed between transmission and receipt of a request (the travel time for a single bit).
Data Transmission Protocols: TCP/IP and HTTP
- Packet Switching: Proposed by Vint Cerf and Bob Kahn in 1974. Data is broken into packets and sent across various routes to be reassembled at the destination.
- Transmission Control Protocol (TCP):
- Manages the sending of multiple packets.
- Ensures all packets arrive, are in the correct order, and are accounted for. - HyperText Transfer Protocol (HTTP): Standardizes the language used for communicating with web servers to request and receive HTML pages.
Models of Computing: Sequential, Parallel, and Distributed
- Sequential Computing:
- Operations are performed one at a time in order.
- Follows the Fetch-Decode-Process Cycle.
- Performance is limited by the speed of a single processor and heat generation (thermal limits). - Distributed Computing:
- A model using multiple devices to run a program.
- Scalability: Allows solving problems too large for a single computer by "farming out" segments of data to a network of computers.
- Examples: SETI@Home (searching for extraterrestrial intelligence using idle home computers via BOINC software), GIMPS (Great Internet Mersenne Prime Search), and Bitcoin Mining.
- Malicious Use: Botnets utilize distributed computing to launch DDoS (Distributed Denial of Service) attacks or send spam using "zombie" computers. - Parallel Computing:
- Breaks a program into smaller sequential operations performed simultaneously across multiple processors.
- Consists of a parallel portion and a sequential portion.
- Speedup Calculation: ext{Speedup} = rac{ ext{Sequential Time}}{ ext{Parallel Time}}
- Efficiency Analysis: In a system with two processors, the total time is determined by the longest-running task on either processor, accounting for dependencies (tasks that must wait for another to finish). - Limits to Efficiency: Parallelism is limited by the sequential portions of a task and the overhead of managing instruction flow.
Cybersecurity and Information Protection
- Encryption Types:
- Symmetric Key Encryption: Uses a single key for both encryption and decryption. This creates a security risk because the key must be shared.
- Asymmetric (Public-Key) Encryption: Uses a pair of keys: a public key for encryption (accessible to anyone) and a private key for decryption (known only to the recipient). This is used in RSA encryption. - Certificate Authorities (CAs): Issue digital certificates identifying the authenticity of websites. This is the basis of SSL (Secure Sockets Layer), indicated by a padlock icon in browsers.
- The CIA Triad:
- Confidentiality: Limiting access to information to authorized users.
- Integrity: Ensuring information is accurate and unaltered.
- Availability: Reliable access to information and systems. - Authentication:
- Methods include strong passwords and Multi-Factor Authentication (MFA).
- MFA categories: Knowledge (something you know), Possession (something you have), and Inherence (something you are, like biometrics).
Internet of Things (IoT) and Autonomous Technology
- IoT Definition: The interconnection of everyday objects (embedded with computing devices and sensors) via the internet. Examples include smartwatches, GPS, and smart doorbells.
- Sensor Networks: Autonomous sensors measuring environmentals (light, heat, sound) to facilitate smart interaction with physical systems.
- Ethics and Liability:
- The Trolley Problem: A thought experiment regarding ethical choices in life-or-death scenarios, relevant to the programming of autonomous vehicles.
- Liability: Questions arise regarding who is at fault in autonomous vehicle accidents: the owner, the manufacturer, or the programmer.
Intellectual Property and Creative Credit
- Copyright Law: Protects original works, granting creators exclusive rights to use, distribute, and sell their work.
- Creative Commons: A public copyright license allowing free distribution of work, provided users follow specific conditions (Attribution, Non-Commercial, No Derivatives, Share Alike).
- Open Source: Programs made freely available for redistribution and modification.
- Open Access: Online research output free of restrictions on access and use.
- Public Domain: Works not subject to copyright law, free for public use.
Computing Innovations and Data Life Cycle
- Definition of Computing Innovation: An innovation that includes a computer program as an integral part of its function (e.g., smartphones, facial recognition, e-commerce).
- IPOS Model (Input-Processing-Output-Storage):
- Input: Captured via collection devices (sensors, keyboard, camera).
- Processing: Data is transformed by software and the CPU.
- Output: Information produced for the user (audio, visual, text).
- Storage: Data is archived via hardware (RAM, Hard Drives) or remote cloud systems. - Personal Data and Privacy:
- PII (Personally Identifiable Information): Information that describes or identifies an individual (SSN, age, biometric data).
- Aggregation: Combining disparate data (cookies, geolocation) to build deep knowledge about an individual.
Impacts of Cloud Computing and the Digital Divide
- Cloud Computing:
- Allows offloading storage and processing to remote servers.
- History: Evolved from "dumb terminals/thin clients" that connected to centralized building-sized computers.
- 3-2-1 Backup Rule: Maintain 3 copies of data, on 2 different media, with 1 backup copy off-site.
- Legal Risks: Users often give up significant rights to their data through Terms of Service (TOS) agreements, which may grant companies the right to use personal content for commercial purposes. - The Digital Divide:
- The gap between those with sufficient access to technology and those without.
- Key Factors: Physical resources (cost), lack of opportunity, digital literacy, and lack of skills.
- Consequences: Exclusion from news, healthcare, government services, and the silencing of diverse voices in the digital ecosystem.
Questions & Discussion
- Digital Interconnectedness: Does it surprise you that Kevin Bacon can be linked to so many actors with so few connections?
- Response: The phenomenon illustrates the "small world" property of networks and high density of connections in social systems. - Privacy and Sharing: How accurate is Mark Zuckerberg's 2008 proposition that people are willing to share anything regardless of privacy?
- Discussion: The rise of social media and the value found in digital connection has significantly shifted cultural attitudes toward personal privacy. - Autonomous Fault: If a self-driving car causes an accident, who is at fault?
- Discussion: This remains a nebulous legal area involving potential responsibility for the carmaker, the software developers, the owner, or the passenger. - Digital Divide in Crisis: Was the divide more apparent during the 2020 pandemic?
- Discussion: Lack of internet access directly hindered remote schooling and vaccine registration for marginalized populations.