Section 1: Data Preparation and Ingestion - DEEPSEEK

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/29

flashcard set

Earn XP

Description and Tags

DEEPSEEK

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

30 Terms

1
New cards
Data Profiling
The process of examining data to collect statistics and information about its structure
2
New cards
Data Validation
Ensuring data conforms to predefined rules and constraints to maintain accuracy and consistency.
3
New cards
Data Transformation
Converting data from one format or structure to another (e.g.
4
New cards
ETL (Extract-Transform-Load)
A methodology where data is transformed before loading into a target system.
5
New cards
ELT (Extract-Load-Transform)
A methodology where raw data is loaded first
6
New cards
ETLT
A hybrid approach combining ETL and ELT for complex workflows.
7
New cards
Storage Transfer Service
A Google Cloud tool for transferring data from on-premises or cloud storage to Cloud Storage.
8
New cards
Transfer Appliance
A physical hardware device for securely transferring large offline datasets to Google Cloud.
9
New cards
Data Quality Assessment
Evaluating data for completeness
10
New cards
Cloud Data Fusion
A fully managed data integration service for building ETL/ELT pipelines.
11
New cards
BigQuery Data Transfer Service
Automates data imports from SaaS applications (e.g.
12
New cards
CSV (Comma-Separated Values)
A plain-text format for tabular data
13
New cards
JSON (JavaScript Object Notation)
A semi-structured format for nested or hierarchical data.
14
New cards
Apache Parquet
A columnar storage format optimized for analytics workloads.
15
New cards
Apache Avro
A row-based format supporting schema evolution and efficient serialization.
16
New cards
Structured Data
Data organized in predefined schemas (e.g.
17
New cards
Unstructured Data
Data with no predefined schema (e.g.
18
New cards
Semi-Structured Data
Data with partial organization (e.g.
19
New cards
Cloud Storage
A scalable object storage service for unstructured/semi-structured data.
20
New cards
BigQuery
A serverless data warehouse for structured analytics using SQL.
21
New cards
Cloud SQL
A managed relational database service for MySQL
22
New cards
Firestore
A NoSQL document database for mobile and web applications.
23
New cards
Bigtable
A high-performance NoSQL database for large-scale analytical workloads.
24
New cards
Spanner
A globally distributed relational database with horizontal scalability.
25
New cards
Regional Storage
Data stored in a single geographic region for low-latency access.
26
New cards
Dual-Regional Storage
Data replicated across two regions for higher availability.
27
New cards
Multi-Regional Storage
Data distributed globally across multiple regions for maximum redundancy.
28
New cards
Zonal Storage
Data stored in a single zone within a region (lower redundancy).
29
New cards
gcloud CLI
A command-line tool for interacting with Google Cloud services.
30
New cards
BQ CLI
A command-line interface specifically for BigQuery operations.