Data Acquisition (22%) - Domain #3

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/38

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:29 AM on 6/23/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

39 Terms

1
New cards

Data integration

The process of combining data from multiple sources into a unified, consistent view for analysis or reporting

2
New cards

SQL JOIN

A query operation that combines rows from two or more tables based on a matching related column (key)

3
New cards

Concatenate (SQL)

A query operation that joins two or more strings or column values together into a single combined value

4
New cards

Filter (SQL)

A query operation using a WHERE clause that limits results to only rows meeting specific conditions

5
New cards

UNION (SQL)

A query operation that combines the result sets of two or more SELECT statements and removes duplicate rows

6
New cards

UNION ALL (SQL)

A query operation that combines result sets of two SELECT statements and keeps all rows including duplicates

7
New cards

Grouping (SQL)

A query operation using GROUP BY that organizes rows with the same values into summary groups for aggregation

8
New cards

Aggregate function

A function that performs a calculation on multiple rows and returns one result — examples: SUM, COUNT, AVG, MIN, MAX

9
New cards

Nested query / Subquery

A SELECT query written inside another query — used to pass results or conditions to the outer query

10
New cards

Indexing (query optimization)

Creating a data structure on a column to speed up data retrieval without scanning the entire table

11
New cards

Parameterization (query optimization)

Using placeholders instead of hard-coded values in queries — improves performance and prevents SQL injection

12
New cards

Subsets (query optimization)

Restricting a query to return only the data columns or rows needed — reduces processing time and resource use

13
New cards

Temporary tables (query optimization)

Short-lived tables created during a session to store intermediate results and simplify complex multi-step queries

14
New cards

ETL (Extract, Transform, Load)

A process where data is extracted from a source, transformed/cleaned outside the destination, then loaded in

15
New cards

ELT (Extract, Load, Transform)

A process where data is extracted and loaded into the destination first — then transformed within that system

16
New cards

Surveying

A primary data collection method that gathers information directly from individuals using structured questions

17
New cards

Sampling

A data collection method that selects a representative subset of a larger population for analysis

18
New cards

Missing values

Data values that are absent from a dataset — can skew analysis if not handled through imputation or removal

19
New cards

Duplication

The presence of repeated identical or near-identical records in a dataset — can inflate metrics and distort results

20
New cards

Redundancy

Unnecessary repetition of data across a dataset or database — wastes storage and can cause inconsistency

21
New cards

Outlier

A data point that falls significantly outside the normal range of the dataset — may indicate error or a real anomaly

22
New cards

Completeness

A data quality dimension measuring whether all required data fields are present and populated in a dataset

23
New cards

Validation

The process of verifying that data conforms to defined rules, formats, and quality standards before use

24
New cards

String manipulation

The process of modifying, formatting, or extracting parts of text data during transformation

25
New cards

RegEx (Regular Expressions)

A pattern-matching syntax used to find, validate, and manipulate text data based on defined character patterns

26
New cards

Conversion (data transformation)

Changing data from one format or data type to another — such as string to integer or date format changes

27
New cards

Clustering (data transformation)

Grouping similar data points together based on shared characteristics during the transformation process

28
New cards

Binning

A transformation that groups continuous numeric values into discrete labeled ranges or categories (e.g., age: 0-17, 18-34)

29
New cards

Augmentation

Enriching an existing dataset by adding new data from external sources to increase its usefulness

30
New cards

Exploding

A transformation that expands list-type or nested column values into separate individual rows

31
New cards

Scaling

A transformation that adjusts numeric values to fit within a specific range — such as 0 to 1 (min-max scaling)

32
New cards

Standardization

A transformation that rescales data so it has a mean of 0 and standard deviation of 1 (z-score normalization)

33
New cards

Imputation

A technique for replacing missing data values with substituted values — such as the mean, median, or mode

34
New cards

Parsing

Breaking down a raw string or complex data structure into its individual component parts for processing

35
New cards

Merging (data transformation)

Combining two or more datasets horizontally into one based on a common key or matching column

36
New cards

Appending

Adding new rows from one dataset to the bottom of another dataset that shares the same structure

37
New cards

Derived variable / Calculated field

A new column created from existing data through formulas or logic — such as Profit = Revenue - Cost

38
New cards

Deletion (data transformation)

Removing unwanted, irrelevant, or corrupt rows and columns from a dataset during the cleaning process

39
New cards