1/46
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is high variance in a model?
Small changes in training data cause large changes in the fitted model.
What is bias in a model?
Error caused by overly simplistic/incorrect assumptions about the relationship.
Do more flexible models always predict better?
No. They may overfit and increase variance.
In linear regression, what does β₁ mean?
Average change in Y for a one-unit increase in X₁ holding others constant.
What does β₀ represent?
Intercept; expected Y when all predictors = 0.
What does BLUE mean?
Best Linear Unbiased Estimator.
What estimator is BLUE in OLS regression?
The least squares estimator.
If residual spread increases with fitted values, what may help?
Log transform of Y.
Another transformation for increasing variance?
Square root transform.
When is logit transformation commonly used?
When response is a proportion between 0 and 1.
What does the t-test test in regression?
Whether an individual coefficient equals zero.
What does the F-test test?
Whether all slope coefficients are zero (overall model significance).
If all individual t-tests reject, will F-test reject?
Yes, generally overall significance exists.
What does R² measure?
Proportion of variance in Y explained by predictors.
Higher R² means what?
Better fit to training data.
What assumption is made when predicting new data?
New observations follow same population/model as training data.
Which is more informative: narrow or wide prediction interval?
Narrower interval.
Should point prediction lie inside prediction interval?
Yes.
What does best subset selection do?
Evaluates all predictor subsets and chooses best by criterion.
Does stepwise always find best subset?
No.
What does lasso do?
Shrinks coefficients and can set some exactly to zero.
If λ = 0 in lasso, result equals what?
Ordinary Least Squares.
Why is lasso useful?
Variable selection + reduced overfitting.
What type of splits do decision trees use?
Recursive binary splits.
What is pruning a tree for?
Reduce overfitting / improve test accuracy.
What is bagging?
Bootstrap samples + average many trees.
What is random forest?
Bagging + random subset of predictors at each split.
If m = p in random forest, what is it?
Bagging.
Why random forest beats single tree?
Lower variance, better prediction.
Cost of bagging/random forest?
Less interpretability.
What does low Gini index mean?
Node is mostly one class (pure).
Lowest possible Gini?
0.
Purpose of PCA?
Reduce dimensionality while preserving variance.
What is first principal component?
Direction capturing maximum variance.
Do cumulative explained variances increase or decrease?
Increase
What is a scree plot used for?
Decide number of PCs.
Using all PCs gives what?
Full variance explained but less simplification.
Sum of variance explained by all PCs equals what?
100%.
Is clustering supervised or unsupervised?
Unsupervised.
Purpose of clustering?
Find homogeneous groups in data.
What must be chosen before K-means?
Number of clusters K.
Does K-means always give same result?
No, depends on starting centroids.
What does hierarchical clustering produce?
Dendrogram.
How do you choose cluster count in hierarchical clustering?
Cut dendrogram at chosen height.
Better training fit always means better test fit?
No.
More complexity always better?
No.
Correlation implies causation?
No.