1/185
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is machine learning?
A way to learn patterns from data so a system can make predictions or decisions without being explicitly programmed for every case.
What is a feature?
An input variable used by a model to make a prediction.
What is a label?
The target value a supervised model is trained to predict.
What is a training example?
One row or instance of data containing features and usually a label.
What is supervised learning?
Learning from examples where the correct target values are known.
What is unsupervised learning?
Finding patterns or structure in data without known target labels.
What is semi-supervised learning?
Training with a small amount of labeled data plus a larger amount of unlabeled data.
What is reinforcement learning?
Learning actions through rewards and penalties from interaction with an environment.
What is regression?
A supervised task where the model predicts a numeric value.
What is classification?
A supervised task where the model predicts a category or class.
What is binary classification?
Classification with two possible classes such as fraud or not fraud.
What is multiclass classification?
Classification with more than two possible classes where each example belongs to one class.
What is multilabel classification?
Classification where one example can belong to multiple classes at the same time.
What is a model?
A mathematical function or learned system that maps inputs to outputs.
What are model parameters?
Values learned during training such as weights in linear regression or a neural network.
What are hyperparameters?
Settings chosen before or during training that control how learning happens.
What is a training set?
Data used to fit model parameters.
What is a validation set?
Data used to tune model choices and hyperparameters during development.
What is a test set?
Held-out data used once near the end to estimate final model performance.
Why split data into train validation and test sets?
To estimate how well the model generalizes to new data and avoid evaluating on data used for training decisions.
What is generalization?
A model's ability to perform well on new unseen data.
What is overfitting?
When a model learns noise or quirks of the training data and performs poorly on new data.
What is underfitting?
When a model is too simple or poorly trained to capture the real pattern.
What is bias in the bias variance tradeoff?
Error from assumptions that are too simple and miss the true relationship.
What is variance in the bias variance tradeoff?
Error from being too sensitive to the particular training data.
What is the bias variance tradeoff?
Reducing bias often increases variance and reducing variance often increases bias; good models balance both.
What is data leakage?
When training data includes information that would not be available at prediction time.
Why is data leakage dangerous?
It makes validation results look better than real-world performance.
What is target leakage?
A feature accidentally contains the answer or information created after the target is known.
What is train test contamination?
The same or closely related examples appear in both training and evaluation data.
What is cross validation?
A resampling method that trains and evaluates a model on multiple splits of the data.
What is k fold cross validation?
The data is split into k parts; each part is used once as validation while the rest is used for training.
What is stratified sampling?
Sampling that preserves the class proportions across train validation and test splits.
What is time series split validation?
A validation method that trains on past data and evaluates on later data to respect time order.
Why should time series data not be randomly shuffled for validation?
Random shuffling can let future information influence training and create leakage.
What is point in time correctness?
Using only data that would have been known at the exact time a prediction was made.
What is a baseline model?
A simple reference model used to judge whether a more complex model is actually useful.
What is a dummy classifier?
A simple classifier that predicts using a basic rule such as the most common class.
What is linear regression?
A model that predicts a numeric target as a weighted sum of input features.
What is logistic regression?
A classification model that estimates class probability using a linear score passed through a sigmoid or similar link.
What is a decision tree?
A model that makes predictions by following a sequence of feature-based splits.
What is a random forest?
An ensemble of decision trees trained on random samples and random feature subsets.
What is gradient boosting?
An ensemble method that builds trees sequentially so each new tree corrects errors from previous trees.
What is XGBoost LightGBM or CatBoost?
Popular high-performance gradient boosting libraries for tabular data.
What is k nearest neighbors?
A method that predicts based on the labels or values of the closest training examples.
What is support vector machine?
A model that tries to find a separating boundary with the largest margin between classes.
What is naive Bayes?
A probabilistic classifier that uses Bayes rule with a strong feature independence assumption.
What is a neural network?
A model made of layers of connected units that learn nonlinear patterns from data.
What is a perceptron?
A basic neural network unit that computes a weighted sum and applies an activation function.
What is an activation function?
A function that adds nonlinearity to a neural network such as ReLU sigmoid or tanh.
What is backpropagation?
The algorithm that computes gradients through a neural network so weights can be updated.
What is gradient descent?
An optimization method that updates parameters in the direction that reduces loss.
What is stochastic gradient descent?
Gradient descent using one example or a small batch at a time rather than the full dataset.
What is a learning rate?
A hyperparameter that controls how large each parameter update is during optimization.
What happens if the learning rate is too high?
Training may be unstable and fail to converge.
What happens if the learning rate is too low?
Training may be very slow or get stuck before reaching a good solution.
What is a loss function?
A function that measures how wrong a model prediction is during training.
What is mean squared error?
A regression loss or metric that averages squared prediction errors.
What is mean absolute error?
A regression metric that averages the absolute size of prediction errors.
What is root mean squared error?
The square root of mean squared error which puts the error back in target units.
What is R squared?
A regression metric that measures the share of target variation explained by the model.
What is accuracy?
The fraction of predictions that are correct.
When can accuracy be misleading?
When classes are imbalanced and predicting the majority class looks good despite poor usefulness.
What is a confusion matrix?
A table showing counts of true positives false positives true negatives and false negatives.
What is a true positive?
A positive example correctly predicted as positive.
What is a false positive?
A negative example incorrectly predicted as positive.
What is a true negative?
A negative example correctly predicted as negative.
What is a false negative?
A positive example incorrectly predicted as negative.
What is precision?
Among predicted positives the fraction that were truly positive.
What is recall?
Among actual positives the fraction the model correctly found.
What is sensitivity?
Another name for recall or true positive rate.
What is specificity?
Among actual negatives the fraction the model correctly predicted as negative.
What is F1 score?
The harmonic mean of precision and recall.
When is F1 score useful?
When you need to balance precision and recall especially with imbalanced classes.
What is ROC AUC?
A threshold-independent ranking metric based on true positive rate versus false positive rate.
What is PR AUC?
Area under the precision recall curve often useful for rare positive classes.
What is log loss?
A classification metric that penalizes confident wrong probability predictions.
What is calibration?
How well predicted probabilities match actual outcome frequencies.
What is class imbalance?
A situation where one class appears much more often than another.
How can class imbalance be handled?
Use better metrics resampling class weights threshold tuning or more data for the rare class.
What is a classification threshold?
The probability cutoff used to turn predicted probabilities into class labels.
Why tune a classification threshold?
To match business costs such as preferring fewer false negatives or fewer false positives.
What is regularization?
A technique that discourages overly complex models to reduce overfitting.
What is L1 regularization?
A penalty based on absolute weight size that can drive some weights to zero.
What is L2 regularization?
A penalty based on squared weight size that shrinks weights toward zero.
What is dropout?
A neural network regularization method that randomly disables units during training.
What is early stopping?
Stopping training when validation performance stops improving.
What is feature scaling?
Transforming numeric features so they are on comparable ranges.
When is feature scaling important?
It is important for models based on distances or gradients such as KNN SVM and neural networks.
What is standardization?
Scaling a feature to have mean zero and standard deviation one.
What is normalization?
Scaling values to a fixed range often zero to one.
What is one hot encoding?
Representing categories as separate binary indicator columns.
What is ordinal encoding?
Representing categories with ordered integers when the order is meaningful.
What is target encoding?
Encoding a category using the average target value for that category with care to avoid leakage.
What is imputation?
Filling in missing values using a rule or model.
What is feature engineering?
Creating or transforming input variables to make patterns easier for a model to learn.
What is feature selection?
Choosing a subset of useful features and removing irrelevant or harmful ones.
What is dimensionality reduction?
Reducing the number of features while trying to preserve important information.
What is PCA?
Principal component analysis finds new directions that capture as much variance as possible.
What is clustering?
Grouping similar examples without using target labels.