1/29
Compendium of vocabulary terms and definitions regarding data fundamentals, linear regression, probability, and hypothesis testing as presented in the DATA1001/1901 lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is Tidy Data?
Data where each column is a variable and each row is an observation.
What does Tidy ≠ clean mean?
Tidy data structure does not imply absence of errors.
What are Qualitative Variables?
Categorical variables with no numerical value.
What are Quantitative Variables?
Numerical variables with meaningful values.
What is Sampling Bias?
Sample that doesn't represent the target population.
What is Response Bias?
Error from poorly worded survey questions.
What is Non-response Bias?
Bias from certain demographics not responding to surveys.
What is Data Linkage?
Combining datasets about the same individuals.
What is Measurement Error Formula?
Measurement = exact value + chance error + bias.
What is Chance Error?
Random measurement fluctuations.
What is Standard Deviation (SD)?
A distance measure that is always non-negative.
What is Linear Transformation (Mean)?
New Mean = a + b × (old mean).
What is Linear Transformation (SD)?
New SD = |b| × (old SD).
What is Correlation Coefficient (r)?
Unitless number (-1 to +1) for linear association strength.
What is Regression Line?
Line minimizing squared residuals.
What is Residual?
Actual - Predicted value.
What is Homoscedasticity?
Constant spread of residuals.
What is R²?
Variation percentage in y explained by x.
What is RMS Error?
SD of residuals in regression.
What is 68-95-99.7 Rule?
Data distribution in Normal Distribution.
What are Binomial Distribution Requirements?
Fixed trials with consistent probability of success.
What is Central Limit Theorem (CLT)?
Sample means approach normal distribution.
What is Prosecutor's Fallacy?
Confusing evidence probability with innocence probability.
What is P-value?
Probability of extreme results assuming null hypothesis.
What is Null Hypothesis (H₀)?
Claim of no difference or association.
What is Alternative Hypothesis (H₁)?
Claim of difference or association.
What is Chi-Squared Test of Independence?
Test for association between two categorical variables.
What is Confidence Interval (CI)?
Range to contain true parameter a specified percentage of times.
What is Extrapolation?
Predicting outside the data range.
What is Causation vs Association?
x predicting y does not mean x causes y.