1/38
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Data integration
The process of combining data from multiple sources into a unified, consistent view for analysis or reporting
SQL JOIN
A query operation that combines rows from two or more tables based on a matching related column (key)
Concatenate (SQL)
A query operation that joins two or more strings or column values together into a single combined value
Filter (SQL)
A query operation using a WHERE clause that limits results to only rows meeting specific conditions
UNION (SQL)
A query operation that combines the result sets of two or more SELECT statements and removes duplicate rows
UNION ALL (SQL)
A query operation that combines result sets of two SELECT statements and keeps all rows including duplicates
Grouping (SQL)
A query operation using GROUP BY that organizes rows with the same values into summary groups for aggregation
Aggregate function
A function that performs a calculation on multiple rows and returns one result — examples: SUM, COUNT, AVG, MIN, MAX
Nested query / Subquery
A SELECT query written inside another query — used to pass results or conditions to the outer query
Indexing (query optimization)
Creating a data structure on a column to speed up data retrieval without scanning the entire table
Parameterization (query optimization)
Using placeholders instead of hard-coded values in queries — improves performance and prevents SQL injection
Subsets (query optimization)
Restricting a query to return only the data columns or rows needed — reduces processing time and resource use
Temporary tables (query optimization)
Short-lived tables created during a session to store intermediate results and simplify complex multi-step queries
ETL (Extract, Transform, Load)
A process where data is extracted from a source, transformed/cleaned outside the destination, then loaded in
ELT (Extract, Load, Transform)
A process where data is extracted and loaded into the destination first — then transformed within that system
Surveying
A primary data collection method that gathers information directly from individuals using structured questions
Sampling
A data collection method that selects a representative subset of a larger population for analysis
Missing values
Data values that are absent from a dataset — can skew analysis if not handled through imputation or removal
Duplication
The presence of repeated identical or near-identical records in a dataset — can inflate metrics and distort results
Redundancy
Unnecessary repetition of data across a dataset or database — wastes storage and can cause inconsistency
Outlier
A data point that falls significantly outside the normal range of the dataset — may indicate error or a real anomaly
Completeness
A data quality dimension measuring whether all required data fields are present and populated in a dataset
Validation
The process of verifying that data conforms to defined rules, formats, and quality standards before use
String manipulation
The process of modifying, formatting, or extracting parts of text data during transformation
RegEx (Regular Expressions)
A pattern-matching syntax used to find, validate, and manipulate text data based on defined character patterns
Conversion (data transformation)
Changing data from one format or data type to another — such as string to integer or date format changes
Clustering (data transformation)
Grouping similar data points together based on shared characteristics during the transformation process
Binning
A transformation that groups continuous numeric values into discrete labeled ranges or categories (e.g., age: 0-17, 18-34)
Augmentation
Enriching an existing dataset by adding new data from external sources to increase its usefulness
Exploding
A transformation that expands list-type or nested column values into separate individual rows
Scaling
A transformation that adjusts numeric values to fit within a specific range — such as 0 to 1 (min-max scaling)
Standardization
A transformation that rescales data so it has a mean of 0 and standard deviation of 1 (z-score normalization)
Imputation
A technique for replacing missing data values with substituted values — such as the mean, median, or mode
Parsing
Breaking down a raw string or complex data structure into its individual component parts for processing
Merging (data transformation)
Combining two or more datasets horizontally into one based on a common key or matching column
Appending
Adding new rows from one dataset to the bottom of another dataset that shares the same structure
Derived variable / Calculated field
A new column created from existing data through formulas or logic — such as Profit = Revenue - Cost
Deletion (data transformation)
Removing unwanted, irrelevant, or corrupt rows and columns from a dataset during the cleaning process