Data Acquisition (22%) - Domain #3

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/38

There's no tags or description

Looks like no tags are added yet.

Last updated 1:29 AM on 6/23/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

39 Terms

New cards

Data integration

The process of combining data from multiple sources into a unified, consistent view for analysis or reporting

New cards

SQL JOIN

A query operation that combines rows from two or more tables based on a matching related column (key)

New cards

Concatenate (SQL)

A query operation that joins two or more strings or column values together into a single combined value

New cards

Filter (SQL)

A query operation using a WHERE clause that limits results to only rows meeting specific conditions

New cards

UNION (SQL)

A query operation that combines the result sets of two or more SELECT statements and removes duplicate rows

New cards

UNION ALL (SQL)

A query operation that combines result sets of two SELECT statements and keeps all rows including duplicates

New cards

Grouping (SQL)

A query operation using GROUP BY that organizes rows with the same values into summary groups for aggregation

New cards

Aggregate function

A function that performs a calculation on multiple rows and returns one result — examples: SUM, COUNT, AVG, MIN, MAX

New cards

Nested query / Subquery

A SELECT query written inside another query — used to pass results or conditions to the outer query

New cards

Indexing (query optimization)

Creating a data structure on a column to speed up data retrieval without scanning the entire table

New cards

Parameterization (query optimization)

Using placeholders instead of hard-coded values in queries — improves performance and prevents SQL injection

New cards

Subsets (query optimization)

Restricting a query to return only the data columns or rows needed — reduces processing time and resource use

New cards

Temporary tables (query optimization)

Short-lived tables created during a session to store intermediate results and simplify complex multi-step queries

New cards

ETL (Extract, Transform, Load)

A process where data is extracted from a source, transformed/cleaned outside the destination, then loaded in

New cards

ELT (Extract, Load, Transform)

A process where data is extracted and loaded into the destination first — then transformed within that system

New cards

Surveying

A primary data collection method that gathers information directly from individuals using structured questions

New cards

Sampling

A data collection method that selects a representative subset of a larger population for analysis

New cards

Missing values

Data values that are absent from a dataset — can skew analysis if not handled through imputation or removal

New cards

Duplication

The presence of repeated identical or near-identical records in a dataset — can inflate metrics and distort results

New cards

Redundancy

Unnecessary repetition of data across a dataset or database — wastes storage and can cause inconsistency

New cards

Outlier

A data point that falls significantly outside the normal range of the dataset — may indicate error or a real anomaly

New cards

Completeness

A data quality dimension measuring whether all required data fields are present and populated in a dataset

New cards

Validation

The process of verifying that data conforms to defined rules, formats, and quality standards before use

New cards

String manipulation

The process of modifying, formatting, or extracting parts of text data during transformation

New cards

RegEx (Regular Expressions)

A pattern-matching syntax used to find, validate, and manipulate text data based on defined character patterns

New cards

Conversion (data transformation)

Changing data from one format or data type to another — such as string to integer or date format changes

New cards

Clustering (data transformation)

Grouping similar data points together based on shared characteristics during the transformation process

New cards

Binning

A transformation that groups continuous numeric values into discrete labeled ranges or categories (e.g., age: 0-17, 18-34)

New cards

Augmentation

Enriching an existing dataset by adding new data from external sources to increase its usefulness

New cards

Exploding

A transformation that expands list-type or nested column values into separate individual rows

New cards

Scaling

A transformation that adjusts numeric values to fit within a specific range — such as 0 to 1 (min-max scaling)

New cards

Standardization

A transformation that rescales data so it has a mean of 0 and standard deviation of 1 (z-score normalization)

New cards

Imputation

A technique for replacing missing data values with substituted values — such as the mean, median, or mode

New cards

Parsing

Breaking down a raw string or complex data structure into its individual component parts for processing

New cards

Merging (data transformation)

Combining two or more datasets horizontally into one based on a common key or matching column

New cards

Appending

Adding new rows from one dataset to the bottom of another dataset that shares the same structure

New cards

Derived variable / Calculated field

A new column created from existing data through formulas or logic — such as Profit = Revenue - Cost

New cards

Deletion (data transformation)

Removing unwanted, irrelevant, or corrupt rows and columns from a dataset during the cleaning process

New cards