1_Storage in the Data Pipeline

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/8

flashcard set

Earn XP

Description and Tags

Flashcards about data storage considerations in an analytics pipeline using AWS services.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

9 Terms

1
New cards

Modern Data Architecture

A modern data architecture with a central data lake storage surrounded by other data stores focused on specific workloads. It facilitates data movement while controlling access and enabling efficient analysis.

2
New cards

AWS Lake Formation

Provides management of the data lake within AWS.

3
New cards

AWS Glue

Provides the data catalog within AWS.

4
New cards

Amazon Athena

Offers a SQL query engine for directly analyzing data from the data lake.

5
New cards

Data Warehouse Use Case

Highly structured, curated data for complex queries and business analytics, justifying a higher storage cost.

6
New cards

Data Lake Use Case

Unstructured raw data available for exploration at a lower cost.

7
New cards

Amazon Redshift Spectrum

A service where you can efficiently query S3 buckets without moving the data to Amazon Redshift, costing less than warehouse storage.

8
New cards

Pipeline Storage Selection

Optimizes cost and business value by using a combination of storage types as data moves through the pipeline.

9
New cards

Major Components of AWS Data Architecture

Includes an Amazon S3 data lake and an Amazon Redshift data warehouse.