Chapter 1 Study Notes – Distributed System Models & Enabling Technologies
Scalable Computing over the Internet
- Computing has shifted from single-site, single-machine processing (1950s) to massively parallel, distributed, and cloud-based execution (2020s).
- Modern systems are data-intensive and network-centric; billions of people now access services that rely on large data centers and supercomputers.
- Key concerns: scalability, performance, availability, security, and energy efficiency.
- Five overlapping generations of computer platforms (≈20-year spans):
- 1950-1970 Mainframes (IBM 360, CDC 6400)
- 1960-1980 Minicomputers (DEC PDP-11, VAX)
- 1970-1990 Personal computers (VLSI microprocessors)
- 1980-2000 Portable & pervasive devices (wired/wireless)
- 1990-present HPC & HTC clusters, grids, clouds, IoT
- Trend: leverage shared web resources and massive data over the Internet.
- HPC pursues raw speed (Gflops→Pflops) for science/engineering simulations.
- HTC pursues high task‐throughput for web search, social networks, e-commerce; emphasises cost, energy, reliability, and security.
- Resulting infrastructure upgrades: fast servers, SSD/HDD arrays, high-bandwidth networks.
Emerging Paradigms: SOA, Virtualization, Cloud & IoT
- SOA enables Web 2.0 and mash-up services.
- Virtualization underpins elastic resource provisioning in clouds.
- Cloud computing provides utility-style access to compute/storage/network via data centers.
- RFID, GPS, and sensor miniaturisation lead to the Internet of Things (IoT) and cyber-physical systems (CPS).
Computing Paradigm Distinctions
- Centralised computing: all resources in one tightly-coupled system.
- Parallel computing: multiple processors, either shared or distributed memory, cooperating via shared memory or messages.
- Distributed computing: autonomous nodes with private memories communicating through a network; information exchange by message passing.
- Cloud computing: virtualised, elastic pools of resources; can be centralised or distributed, parallel or not, often priced as a utility.
- Concurrent, ubiquitous, and Internet computing are umbrella terms covering mixtures of the above.
Distributed System Families & Design Objectives
- Clusters, grids, peer-to-peer (P2P) networks, and Internet clouds distinguish themselves by hardware, OS, algorithms, protocols, and service models.
- Future HPC & HTC systems must meet four design goals:
- Efficiency (speed, programming ease, throughput/watt)
- Dependability (reliability, self-management, QoS under failures)
- Adaptation (billions of job requests, virtualised resources, variable workloads)
- Flexibility (run both scientific & business workloads effectively)
Technology Trends & Degrees of Parallelism
- Moore’s law: transistor count/processor speed ≈ doubling every 18 months (historically true 30 yrs).
- Gilder’s law: network bandwidth ≈ doubles yearly.
- Degrees of parallelism (DoP):
- Bit-level (BLP) → word processing
- Instruction-level (ILP) via pipelining, superscalar, VLIW, multithreading
- Data-level (DLP) via SIMD/vector/GPU instructions
- Task-level (TLP) via multicore, CMP
- Job-level (JLP) in distributed systems/grids/clouds
- Fine-grain (BLP/ILP/DLP) underpins coarse-grain (TLP/JLP) scalability.
Innovative Applications & Utility Computing
- Domains driving HPC/HTC: scientific simulations, genomics, weather, e-commerce, banking, telemedicine, traffic control, social networking, cyber-security, e-government, military C2.
- Economic and practical push toward utility computing – IT resources supplied on-demand, metered, with SLA/QoS guarantees.
Hype Cycle of Emerging Technologies (2010 snapshot)
- Five stages: Technology Trigger → Peak of Inflated Expectations → Trough of Disillusionment → Slope of Enlightenment → Plateau of Productivity.
- Cloud computing (2010) just past the peak; predicted 2–5 yrs to productivity.
- Examples: Consumer-generated media in disillusionment (
Internet of Things (IoT) & Cyber-Physical Systems (CPS)
- IoT: networked interconnection of everyday objects via RFID, sensors, GPS, IPv6 ( 2128 addresses ); projections: 100 trillion trackable objects, 1,000–5,000 per person.
- Communication patterns: H2H, H2T, T2T; connections at any time/place.
- CPS: tight integration of computation, communication & control with the physical world; "3C" feedback loops enabling smart cities, energy grids, etc.
Hardware Technologies for Network-Based Systems
- CPU evolution: VAX 11/780 (1 MIPS, 10 MHz) → Intel Core i7-990X (≈159 kMIPS, 3.46 GHz).
- Multicore architectures: private L1 caches, shared L2/L3, many-thread support (e.g., Sun Niagara II: 8 cores × 8 threads = 64 threads).
- Five micro-architectural execution styles: 4-issue superscalar, fine-grain MT, coarse-grain MT, chip-multiprocessor (CMP), simultaneous MT (SMT).
GPU Computing: Architecture, Programming & Power Efficiency
- GPUs offer hundreds–thousands of simpler cores; optimised for throughput vs CPU latency.
- NVIDIA Fermi (2011): 16 SMs × 32 CUDA cores = 512 cores; up to 515 Gflops DP per chip, 82.4 Tflops per board.
- CUDA programming model: CPU orchestrates, GPU executes massive data-parallel kernels.
- Power per instruction: CPU ≈ 2 nJ vs GPU ≈ 200 pJ (×10 more efficient); goals for exascale (~1018 flops) require ≈60 Gflops/W per core.
Memory, Storage & Networking Advances
- DRAM density: 16 Kbit (1976) → 64 Gbit (2011); capacity 4× every ≈3 yrs.
- Disk drives: 260 MB (1981) → 3 TB (2011); ≈10× every 8 yrs; SSDs add high IOPS, moderate write-cycle limits.
- Ethernet bandwidth: 10 Mbps (1979) → 1 Gbps (1999) → 40/100 GbE (2011) → ≥1 Tbps predicted.
Virtual Machines & Virtual Infrastructure
- VM configurations:
- Native/bare-metal (hypervisor in privileged mode)
- Hosted (VMM in user mode atop host OS)
- Dual-mode hybrids
- VM primitive operations: multiplexing, suspend → storage, resume/provision, live migration.
- Virtual infrastructure = dynamic mapping of physical compute/storage/network to VM-based applications; enables server consolidation (utilisation 5–15 % → 60–80 %).
Data Center Virtualization & Cloud Design
- ~43 million servers worldwide (2010); cost composition over 3 yrs: ≈30 % hardware, 60 % utilities & management, 10 % misc.
- Key design principles:
- Commodity x86 servers, cheap multi-TB disks, GbE switching.
- Software handles load-balancing, fault-tolerance, expansion.
- Four converging enablers: (1) hardware virtualisation & multicore, (2) utility/grid foundations, (3) SOA/Web 2.0, (4) autonomic management.
System Models: Clusters, Grids, P2P Networks & Clouds
- Clusters: 100s–10,000s homogeneous nodes on SAN/LAN; dominate TOP500 (417/500 systems in 2009).
- Grids: federated, often heterogeneous clusters/sites; emphasise resource sharing across organisations (TeraGrid, EGEE, ChinaGrid).
- P2P: millions of autonomous clients; no central control; logical overlay networks (structured/unstructured).
- Clouds: virtualised clusters in data centres, offering elastic resources; distinctions with grids blurring (elastic vs static allocation).
Cluster Architecture & Design Issues
- Nodes interconnected by low-latency/high-BW Myrinet, InfiniBand, or multi-level GbE; exposed as a single system via SSI middleware.
- Critical features: high availability, hardware fault-tolerance, fast comms (MPI/PVM), cluster-wide job management, dynamic load-balancing, scalability.
Grid Computing Infrastructure
- Grids = “utility power grid” analogy; couple computers, middleware, instruments, people across LAN/WAN/Internet.
- Two families:
- Computational/data grids (TeraGrid, ChinaGrid) – authenticated, centrally managed.
- P2P grids – open, self-organising, unreliable resources.
- OGSA & Globus Toolkit = de-facto standards; provide resource discovery, security (PKI, GSI), job submission.
Peer-to-Peer Networks: Structure & Applications
- Physical layer: ad-hoc connections over IP; logical layer: overlay network of peer IDs.
- Overlay types: unstructured (random graphs, flooding) vs structured (DHTs, deterministic routing).
- Four application families:
- Distributed file sharing (Gnutella, BitTorrent)
- Collaborative platforms/IM (Skype, MSN)
- Volunteer computing (SETI@home ≈25 Tflops)
- P2P platforms (JXTA, .NET)
- Challenges: heterogeneity, scalability, routing efficiency, reliability, trust, security, copyright.
Cloud Computing Landscape & Service Models
- IBM definition: "A cloud is a pool of virtualised compute resources hosting varied workloads."
- Service models:
- IaaS (servers, storage, networking, VMs) – e.g., AWS EC2.
- PaaS (middleware, runtime, DB, dev tools) – e.g., Google App Engine, Microsoft Azure.
- SaaS (browser-delivered applications; ERP, CRM, email) – e.g., Salesforce, Google Workspace.
- Deployment modes: private, public, managed (community), hybrid – each with distinct security/SLA splits.
- Eight major adoption drivers: location flexibility, peak-load sharing, separation of infra mgmt, cost reduction, simpler programming, discovery + distribution, security/privacy frameworks, new biz/pricing models.
Software Environments & Service-Oriented Architecture
- Layered stack (entity → communication → services) parallels OSI layers; technologies:
- WSDL/SOAP/UDDI for web services
- REST/HTTP/XML for lightweight alternatives
- Message-oriented middleware (JMS, WebSphere MQ) underneath.
- SOA evolution: sensors (SS) produce raw data → filter services (fs) → compute/storage/discovery clouds → portals/HUBs provide user access; goal: transform Data → Information → Knowledge → Wisdom → Decisions.
Distributed Operating Systems & Transparency
- Three design approaches (Tanenbaum): Network OS, middleware extension (e.g., MOSIX), true distributed OS (e.g., Amoeba).
- MOSIX2 overlays Linux, giving process migration, load balancing, partial SSI across clusters & clouds.
- Transparent computing model separates user data, apps, OS, hardware; enables SaaS independence and VM portability.
Programming Models and Middleware
- MPI: message-passing subroutines (C/FORTRAN); point-to-point + collective ops.
- MapReduce: Map→⟨key,value⟩ generation, Reduce merges values per key; highly scalable for TB–PB data.
- Hadoop: open-source MapReduce + HDFS storage; economic, efficient, reliable via replication.
- OGSA: standardises grid services; Globus Toolkit implements security (GSI), resource allocation, job mgmt.
- Throughput (MIPS, Tflops, TPS), response time, latency, QoS, energy Gflops/W.
- Scalability dimensions: size, software, application, technology.
- OS-image vs scalability: SMP (1 image, ≤100s cores) < NUMA < Cluster/Cloud < Grid < P2P (millions of images).
- Amdahl’s Law (fixed workload): S=α+n1−α1; efficiency E=αn+(1−α)1.
- Gustafson’s Law (scaled workload): S′=α+(1−α)n; efficiency E′=nα+(1−α).
- Availability: Availability=MTTF+MTTRMTTF; avoid single points of failure via redundancy, failover, clustering, VM migration.