Cloud Computing Notes
Introduction to Cloud Computing
Textbooks:
- "Distributed and Cloud Computing" by Kai Hwang, Jack Dongarra, Geoffrey Fox.
- "Designing Data-Intensive Applications" by Martin Kleppmann.
Reference Books:
- "Docker in Action" by Jeff Nickoloff.
- "Cloud Native DevOps with Kubernetes" by John Arundel and Justin Domingus.
- "Moving to the clouds" by Dinkar Sitaram and Geetha Manjunath.
Storage in Cloud Computing
- Definition of Data:
- Collection of raw facts.
- Types of Data:
- Structured Data: Organized in rows and columns; typically stored in DBMS.
- Unstructured Data: Difficult to identify and retrieve; stored in object stores.
- Data Explosion:
- Prediction of 572 Zettabytes (1021 bytes) by 2030 and 50,000 Zettabytes by 2050.
Characteristics of Storage Systems
- Cost
- Speed/Performance: Access time.
- Reliability: Consistency of data.
- Availability: Uptime of the system.
- Scalability: Ability to grow as storage needs increase.
- Management: Tools and processes for data handling.
Types of Storage Models
File Storage:
- Organized as a hierarchy of files and folders.
- Examples include Network Attached Storage (NAS) and Direct Attached Storage (DAS).
Block Storage:
- Divides data into blocks, with unique identifiers for each.
- Commonly used in SAN environments.
- Benefits include quick retrieval and flexibility across different operating systems.
- Drawbacks: Expensive and limited metadata handling capabilities.
Object Storage:
- Emerged to handle unstructured data such as images and video.
- Organizes data as objects, which include data, metadata, and a unique identifier.
- Benefits:
- Unlimited scalability.
- Efficient management of large volumes of data.
- Cost-effective compared to traditional storage models.
- Supports versioning and allows for unlimited metadata tags.
- Common Use Cases: Big data storage, backups, archives, media storage, and facilitating streaming services.
Storage Architectures in Cloud Computing
- Directly Attached Storage (DAS): Storage directly attached to the computer.
- Network Attached Storage (NAS): Dedicated storage that provides file-level storage and access over a network.
- Storage Area Network (SAN): High-speed network that provides access to consolidated block-level storage.
Evolution of Storage Systems
- JBOD: Just A Bunch Of Disks, combining multiple physical storage disks into a single logical unit.
RAID – Redundant Array of Independent Disks
- Purpose: Combines multiple disk drive components into logical units for performance and redundancy.
- RAID Levels include RAID 0, RAID 1, RAID 4, etc.
RAID Architecture
- Components: Ports, Controller, Cache, Disk Drives.
Hard Disk Drives (HDD) & Solid State Drives (SSD)
- HDD: Consists of platters, read/write heads, and is divided into sectors and tracks.
- SSD: Non-volatile memory with no moving parts, usually faster and more reliable than HDDs.
Cloud Storage
- Definition: Allows data to be maintained, managed, and backed up remotely. Accessed via a network.
- Advantages:
- Dramatic reduction in total cost of ownership (TCO).
- Unlimited scalability and on-demand services.
- Supports multitenancy and ensures data durability.
Enablers for Storage Virtualization
- File Systems: Structures that manage data storage and access on disks.
- Logical Volume Manager (LVM): Creates logical volumes from physical storage.
- Thin Provisioning: Allocates storage on demand rather than in bulk.
Distributed Transactions
- Transaction Definition: A sequence of operations treated as a single unit.
- ACID Properties:
- Atomicity: Completes all or none.
- Consistency: Maintains system invariants.
- Isolation: Transactions are independent.
- Durability: Permanent changes post-commit.
Commit Protocols
- Two-Phase Commit (2PC): Ensures all nodes either commit or abort.
- Phases of 2PC:
- Prepare phase: Coordinator asks all participants to prepare for the commit.
- Commit/Abort phase based on the responses received.
CAP Theorem
- Principles: Consistency, Availability, Partition tolerance.
- Trade-offs: Can only guarantee two out of the three properties in a distributed system.
Conclusion on CAP Theorem
- Understanding the CAP theorem aids in making informed decisions about distributed systems and ensures the right balance according to application requirements.