Storage

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/8

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

9 Terms

1
New cards

AWS Glue – File Conversion

Converts data files into optimized formats like Apache Parquet or ORC for analytics; improves query performance and reduces storage costs

2
New cards

AWS Glue Crawler

Scans data sources (S3, JDBC, etc.), infers schema, and populates the Glue Data Catalog automatically

3
New cards

Apache Parquet

File format optimized for analytics; columnar storage, compressed, and efficient for big data queries

4
New cards

AWS Batch

Manages batch computing workloads; automatically provisions compute resources, schedules jobs, and handles scaling for large-scale processing

5
New cards

Tip:

For storage-related exam questions, match the service to the data workflow:

  • Glue for ETL and cataloging, Batch for batch processing, optimized formats like Parquet for analytics efficiency.

6
New cards

You want to convert CSV files in S3 into a more efficient, columnar format for analytics. Which service and format should you use?

A) AWS Batch + JSON
B) AWS Glue + Apache Parquet
C) S3 Transfer Acceleration + ORC
D) Athena + CSV

Answer: B – Glue can convert files into Apache Parquet for analytics efficiency.

7
New cards

You need to automatically discover the schema of new files arriving in S3 and update your metadata catalog. What AWS service should you use?

A) AWS Batch
B) Glue Crawler
C) Athena
D) DataSync

Answer: B – Glue Crawler scans data sources and updates the Glue Data Catalog automatically.

8
New cards

Why would you store analytics data in Parquet format instead of CSV?

A) Parquet is row-based
B) Parquet is compressed and columnar, improving query performance
C) Parquet cannot be read by Athena
D) Parquet increases storage cost

Answer: B – Columnar and compressed format improves query performance and reduces storage costs.

9
New cards

You have thousands of data processing jobs that need to run daily with varying compute requirements. Which service is best suited?

A) Lambda
B) AWS Batch
C) Step Functions
D) EC2 Auto Scaling

Answer: B – AWS Batch manages compute provisioning, scheduling, and scaling for batch workloads.

Explore top flashcards