Cloud Computing and Data Engineering: Comprehensive Notes
Yesterday's recap
Topics discussed yesterday:
Data engineering: data governance, data ingestion, data manipulation, data standardization, data cleaning, and building data engineering pipelines.
Three types of data processing: batch processing, real-time processing, and near-real-time processing.
Big data overview: what big data is, how it is generated, and sources of big data.
Five V's of big data: volume, velocity, variety, veracity, and value.
Data types: structured data, unstructured data, semi-structured data.
File formats: CSV, JSON, Excel (and other formats mentioned).
Emphasis on theory as a foundation before labs: three to four days of theory, then labs from day five.
Today's topic: Cloud computing
Cloud computing defined: a model of delivering computing resources such as servers, storage, databases, networking, software, and analytics capabilities to host applications.
Typical lifecycle: develop locally on a laptop, then deploy to the cloud so users worldwide can access via a link.
Resources provided by cloud: virtual machines, storage, databases, networking, software, analytics, AI, and more.
Types of cloud models
Public cloud
Definition: resources (servers, storage, etc.) provided by third-party providers (e.g., AWS, Azure, Google Cloud Platform).
Major players with market share: AWS, Microsoft Azure, Google Cloud Platform (GCP).
Advantages:
Highly scalable and cost-effective.
Pay-for-what-you-use model (auto-scaling based on demand).
No need to manage physical infrastructure.
Example usage: Netflix uses AWS to stream content; Dropbox stores files in public cloud.
Auto-scaling concept: cost varies with usage, e.g., if maximum capacity is 5,000 users but actual users vary, you pay only for actual usage.
Example used in lecture: Day 1 = 1,000 users, Day 2 = 200 users, Day 3 = 2,400 users; with a max cap of 5,000 users, total charge could be \$4,600 over three days (instead of \$15,000 if all 5,000 were used each day).
Basic formula: ; total over period is \$
$$ ext{Total} = \