1/34
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Exploratory data analysis (EDA)
Early analysis to understand patterns, issues, and distributions before modeling.
Data preprocessing
Cleaning, transforming, and organizing raw data before analysis/modeling.
Data cleaning / cleansing
Finding and correcting errors/inconsistencies in a dataset.
Data cleaning software
Tools that automate some cleaning based on predefined rules; still may need human review.
Manual review
Human inspection to decide whether questionable data should be removed or kept.
Irrelevant feedback entries
Data not useful for the current analysis; should be reviewed before removal.
Missing data
Expected values not present; may require removal or imputation.
Imputation
Filling missing values with estimates such as mean or median.
Duplicate rows
Repeated data entries; can bias analysis.
Static columns
Columns with constant values; often removed because they add no predictive information.
Low variance columns
Columns with little variation; may have little modeling value.
Standardization via Z-score
Scales data so each feature has mean 0 and standard deviation 1.
Min-max scaling
Scales values into a fixed range, commonly 0 to 1.
Feature engineering
Selecting, modifying, or creating features to improve model performance.
Feature transformation
Changing a feature’s values to make them more useful for modeling.
Skewness
Asymmetry in a distribution; one tail is longer or heavier.
Long tail
A distribution where a small number of observations stretch far from most values.
Outlier
A value very different from most others; can strongly affect models such as linear regression.
Linear regression
A regression model that can be sensitive to outliers because they affect the fitted line/parameters.
Parameter
A learned model value, such as a regression coefficient.
Biased prediction
A prediction systematically pushed in a wrong/skewed direction.
Robust model
A model less affected by outliers/noise.
Power transformation
A mathematical transformation that raises values to a power to improve distribution shape.
Box-Cox power transformation
Transformation used to stabilize variance and make data more normal.
Exponent
The power used in a transformation; exponent 1/2 means square root.
Square root transformation
Transformation using exponent 1/2; reduces right-skew and long tails in many positive-valued features.
Reciprocal transformation
Uses 1/x or exponent -1, not exponent 1/2.
Logarithm transformation
Uses logarithms; another common skewness treatment but not the same as square root.
Normal distribution
Symmetric bell-shaped distribution; many models work better when features are closer to normal.
Histogram
Chart showing counts/frequency of values within bins.
Bins
Intervals used to group values in a histogram or binning strategy.
Matplotlib
Python library for creating graphs and charts.
Pyplot
Matplotlib module used to create static, interactive, and animated visualizations.
NumPy
Python library for arrays, linear algebra, and numerical operations.
Append
Adding additional data columns/rows from another dataset; not a plotting module.