Cloud Computing Notes

Introduction to Cloud Computing

  • Textbooks:

    • "Distributed and Cloud Computing" by Kai Hwang, Jack Dongarra, Geoffrey Fox.
    • "Designing Data-Intensive Applications" by Martin Kleppmann.
  • Reference Books:

    • "Docker in Action" by Jeff Nickoloff.
    • "Cloud Native DevOps with Kubernetes" by John Arundel and Justin Domingus.
    • "Moving to the clouds" by Dinkar Sitaram and Geetha Manjunath.

Storage in Cloud Computing

  • Definition of Data:
    • Collection of raw facts.
  • Types of Data:
    • Structured Data: Organized in rows and columns; typically stored in DBMS.
    • Unstructured Data: Difficult to identify and retrieve; stored in object stores.
  • Data Explosion:
    • Prediction of 572 Zettabytes (1021 bytes) by 2030 and 50,000 Zettabytes by 2050.
Characteristics of Storage Systems
  • Cost
  • Speed/Performance: Access time.
  • Reliability: Consistency of data.
  • Availability: Uptime of the system.
  • Scalability: Ability to grow as storage needs increase.
  • Management: Tools and processes for data handling.

Types of Storage Models

  1. File Storage:

    • Organized as a hierarchy of files and folders.
    • Examples include Network Attached Storage (NAS) and Direct Attached Storage (DAS).
  2. Block Storage:

    • Divides data into blocks, with unique identifiers for each.
    • Commonly used in SAN environments.
    • Benefits include quick retrieval and flexibility across different operating systems.
    • Drawbacks: Expensive and limited metadata handling capabilities.
  3. Object Storage:

    • Emerged to handle unstructured data such as images and video.
    • Organizes data as objects, which include data, metadata, and a unique identifier.
    • Benefits:
      • Unlimited scalability.
      • Efficient management of large volumes of data.
      • Cost-effective compared to traditional storage models.
      • Supports versioning and allows for unlimited metadata tags.
    • Common Use Cases: Big data storage, backups, archives, media storage, and facilitating streaming services.

Storage Architectures in Cloud Computing

  • Directly Attached Storage (DAS): Storage directly attached to the computer.
  • Network Attached Storage (NAS): Dedicated storage that provides file-level storage and access over a network.
  • Storage Area Network (SAN): High-speed network that provides access to consolidated block-level storage.

Evolution of Storage Systems

  • JBOD: Just A Bunch Of Disks, combining multiple physical storage disks into a single logical unit.

RAID – Redundant Array of Independent Disks

  • Purpose: Combines multiple disk drive components into logical units for performance and redundancy.
  • RAID Levels include RAID 0, RAID 1, RAID 4, etc.
RAID Architecture
  • Components: Ports, Controller, Cache, Disk Drives.

Hard Disk Drives (HDD) & Solid State Drives (SSD)

  • HDD: Consists of platters, read/write heads, and is divided into sectors and tracks.
  • SSD: Non-volatile memory with no moving parts, usually faster and more reliable than HDDs.

Cloud Storage

  • Definition: Allows data to be maintained, managed, and backed up remotely. Accessed via a network.
  • Advantages:
    • Dramatic reduction in total cost of ownership (TCO).
    • Unlimited scalability and on-demand services.
    • Supports multitenancy and ensures data durability.

Enablers for Storage Virtualization

  • File Systems: Structures that manage data storage and access on disks.
  • Logical Volume Manager (LVM): Creates logical volumes from physical storage.
  • Thin Provisioning: Allocates storage on demand rather than in bulk.

Distributed Transactions

  • Transaction Definition: A sequence of operations treated as a single unit.
  • ACID Properties:
    • Atomicity: Completes all or none.
    • Consistency: Maintains system invariants.
    • Isolation: Transactions are independent.
    • Durability: Permanent changes post-commit.
Commit Protocols
  • Two-Phase Commit (2PC): Ensures all nodes either commit or abort.
  • Phases of 2PC:
    1. Prepare phase: Coordinator asks all participants to prepare for the commit.
    2. Commit/Abort phase based on the responses received.

CAP Theorem

  • Principles: Consistency, Availability, Partition tolerance.
  • Trade-offs: Can only guarantee two out of the three properties in a distributed system.
Conclusion on CAP Theorem
  • Understanding the CAP theorem aids in making informed decisions about distributed systems and ensures the right balance according to application requirements.