1/70
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Regression
Explains the relationship between dependent and independent variables
Coefficients
Show the direction and magnitude of relationships
Model significance
F statistic less than 0.05 means model is significant
Independent variable significance
p-value less than 0.05
R-squared
Proportion of variation explained by the model
Adjusted R-squared
Accounts for sample size and number of variables
Optimization
Process to maximize or minimize an objective function
Decision variables
Unknown values the model determines
Objective function
Equation to minimize or maximize
Constraints
Limits such as demand, labor, or materials
Solver output
Shows optimal values and binding constraints
Data profiling
Investigates data quality and structure
Correctness
Right values are assigned
Validity
Acceptable values are entered
Consistency
Same characteristic represented the same way
Completeness
No missing instances or values
Data matching
Reconciles related records across tables
Issues
Nicknames, typos, reversed names
ETL tools
Provide advanced support for matching
Imputation
Replacing missing values with estimates
When applied
During ETL cleaning step
Purpose
Ensures data set remains usable
Fact tables
Store quantitative transaction data
Dimension tables
Provide descriptive context
Fact data
Can be aggregated and measured
Dimension data
Who, what, when, where details
Star schema
Simple structure with fact and dimension tables
Snowflake schema
Normalized dimensions into multiple tables
Star advantage
Easier and faster analysis
Snowflake advantage
Reduces redundancy
Outliers
Unusual data points compared to others
Dirty data
Missing, invalid, duplicate, inconsistent entries
Controls
Visualization, tests, comparisons to documents
Invalid data
Unacceptable entry (e.g., text in number field)
Incorrect data
Wrong but valid entry (e.g., wrong PO number)
Categorical data
Non-numeric labels (e.g., gender)
Ordinal data
Ranked values (e.g., satisfaction levels)
Interval data
Equal spacing, no true zero (e.g., temperature)
Ratio data
Equal spacing, true zero (e.g., revenue)
Relationships
Links between tables using keys
Primary key
Uniquely identifies rows in a table
Foreign key
References primary key in another table
Cardinality
Defines one-to-many or many-to-many links
Data issues
Missing instances, missing values, duplicates, outliers
Invalid issue
Wrong format of data
Inconsistent issue
Different formats for same value
Central location
Mean, median, mode
Dispersion
Variance and standard deviation
Symmetry
Mean, median, mode equal in distribution
Skewness
Asymmetry of data
Kurtosis
Peakedness or flatness of data
Inner join
Only matching rows from both tables
Left join
All rows from left table with matches from right
Right join
All rows from right table with matches from left
Full join
All rows from both with nulls for unmatched
Measured raw data
Directly observed values (e.g., price)
Non-measured raw data
Descriptive categories (e.g., product name)
Calculated data
Derived values (e.g., sales = qty × price)
Planning stage
Identify motivation, objectives, strategy
Analyze stage
Prepare, model, and explore data
Report stage
Interpret results and communicate findings
External motivation
From stakeholders or regulators
Internal motivation
To improve service or efficiency
Other motivations
Opportunities, problems, process improvement
Descriptive analysis
Describes what happened
Diagnostic analysis
Explains why it happened
Predictive analysis
Forecasts what will happen
Prescriptive analysis
Recommends what to do
Hypothesis testing
Compares null and alternative hypotheses
Type I error
Rejecting a true null (false positive)
Type II error
Failing to reject a false null (false negative)