Data Engineering with AWS Flashcards

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/41

flashcard set

Earn XP

Description and Tags

Flashcards for reviewing data engineering concepts with AWS, focusing on data modeling, relational and NoSQL databases, Apache Cassandra, and database structuring techniques.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

42 Terms

1
New cards

Data Modeling

An abstraction that organizes elements based on their relationships, essential for database modeling.

2
New cards

Conceptual Data Modeling

Mapping concepts a database will have, similar to naming columns in Excel spreadsheets.

3
New cards

Logical Data Modeling

Mapping conceptual models to tables, schemas, and columns, making them more practical.

4
New cards

Physical Data Modeling

Turning logical data models into the database's Data Definition Language (DDL).

5
New cards

Relational Databases

Databases that organize data into tables with rows and columns, each row having a unique key.

6
New cards

Relational Database Management System (RDBMS)

Software for managing relational databases.

7
New cards

SQL

A language used to interact with relational databases.

8
New cards

ACID Transactions

Properties guaranteeing database transaction validity, including Atomicity, Consistency, Isolation, and Durability.

9
New cards

Atomicity

All or nothing processing of a transaction.

10
New cards

Consistency

Only transactions abiding by certain rules can change the database.

11
New cards

Isolation

Transactions are processed independently of each other.

12
New cards

Durability

Completed transactions are saved even if the system fails.

13
New cards

NoSQL Databases

Databases designed for simpler design, horizontal scaling, and finer control of availability.

14
New cards

Apache Cassandra

A NoSQL database that distributes data by partitions across nodes and servers, organized in columns and rows.

15
New cards

Keyspace

Collection of tables in Apache Cassandra.

16
New cards

Table (Cassandra)

Group of partitions in Apache Cassandra.

17
New cards

Partition (Cassandra)

Fundamental unit of access in Cassandra; a collection of rows.

18
New cards

Primary Key (Cassandra)

Consists of a partition key and clustering columns in Cassandra.

19
New cards

MongoDB

A NoSQL database with key lookups performed by key-value store, offering API that retrieves documents based on content search

20
New cards

DynamoDB

A NoSQL database where the data is represented as a collection of key and value pairs.

21
New cards

Apache HBase

A NoSQL database that uses tables, rows, and columns but allows column names and formats to vary from row to row.

22
New cards

Neo4J

A NoSQL database focused on relationships between entities, representing data as nodes and edges.

23
New cards

CQL (Cassandra Query Language)

Cassandra’s query language, similar to SQL but without JOINS, GROUP BY, or subqueries.

24
New cards

Normalization

Reduces data redundancy and increases data integrity in databases.

25
New cards

Denormalization

Done to improve read performance by making write performance worse through redundant copies of data.

26
New cards

Normal Form

Ensures a database is free from unwanted insertion, update, and deletion dependencies.

27
New cards

First Normal Form (1NF)

Each cell has unique and single values, with no sets, collections, or lists in a column.

28
New cards

Second Normal Form (2NF)

All columns must rely on the primary key, with no composite keys.

29
New cards

Third Normal Form (3NF)

There are no transitive dependencies in the database.

30
New cards

Fact Tables

Measurements, metrics, or facts of a business process, often numeric and aggregated.

31
New cards

Dimension Tables

Categorizes facts and measures to help answer business questions, typically people, products, places, or time.

32
New cards

Star Schema

One or more fact tables referencing any number of dimension tables, often denormalized.

33
New cards

Snowflake Schema

Logical arrangement of tables in a multidimensional database with a centralized fact table and multiple dimensions.

34
New cards

Distributed Database

A database scaled out horizontally and made of multiple machines.

35
New cards

Eventual Consistency

Guarantees that if no new updates are made to a data item, all access to that item will return the last updated state.

36
New cards

CAP Theorem

It is impossible for a distributed data store to guarantee more than two out of the three qualities: Consistency, Availability, and Partition Tolerance.

37
New cards

Consistency (CAP Theorem)

Every read gets the most correct piece of data or returns an error.

38
New cards

Availability (CAP Theorem)

Every request gets a response, but there’s no guarantee that the data is the latest update.

39
New cards

Partition Tolerance (CAP Theorem)

Functions regardless of losing network connectivity between nodes.

40
New cards

Primary Key (General)

How each row is uniquely identified and how data is distributed between nodes/servers in the system.

41
New cards

Partition Key

First element of primary key in noSQL databases which determines data distribution.

42
New cards

Clustering Columns

A primary key made of partition key and clustering columns which determine sort order within a partition.