1/48
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What are the three types of data structures in AWS/ML context?
Structured Data: Organized with defined schema (e.g., database tables, CSV)
2. Unstructured Data: No predefined structure (e.g., text, video, audio, images)
3. Semi-structured Data: Tagged/categorized elements (e.g., XML, JSON, log files)
What are the "3 V's" of data properties?
Volume: Amount/size of data
2. Velocity: Speed of data generation and processing
3. Variety: Different types, structures, and sources of data
What is a Data Warehouse and when should you use it?
Definition: Centralized repository optimized for analysis of structured data from different sources
Eg. Redshift
Key features: Schema-on-write (ETL), star/snowflake schemas, complex read-heavy queries
Use when: You need structured data with fast complex queries, BI/analytics, integration from multiple sources
What is a Data Lake and when should you use it?
Definition: Storage repository for large amounts of raw data in native format
Key features: Schema-on-read (ELT), no preprocessing, flexible and agile
Eg. s3
Use when: Mix of structured/unstructured data, large volumes where cost-effectiveness matters, flexibility for future needs
What are the three S3 Glacier tiers and their retrieval times?
Glacier Instant Retrieval: Millisecond, 90-day minimum billing period
Glacier Flexible Retrieval: Expedited (1-5 min), Standard (3-5 hr), Bulk (5-12 hr free), 90-day minimum
Glacier Deep Archive: Standard (12 hr), Bulk (48 hr), 180-day minimum