good one

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/181

There's no tags or description

Looks like no tags are added yet.

Last updated 6:59 AM on 4/7/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

182 Terms

New cards

Anscombe’s quartet

A set of four datasets that have nearly identical statistical summaries (mean, varianData science process

New cards

Information

Processed or organized data that has meaning and context.

New cards

Knowledge

Information interpreted and applied to make decisions or take action.

New cards

Data Science

An interdisciplinary field that uses statistics, mathematics, programming, and domain knowlKnowledge Engineering

New cards

Median

Middle value of a sorted dataset.

New cards

Mode

Most frequently occurring value.

New cards

Range

Difference between the largest and smallest values.

New cards

Variance

Average squared deviation from the mean.

New cards

Standard deviation

Square root of variance; represents spread in original units.

New cards

Interquartile range (IQR)

Difference between Q3 and Q1; robust measure of spread.

New cards

Skewness

Measure of asymmetry in a distribution.

New cards

Kurtosis

Measure of tail heaviness compared to a normal distribution.

New cards

Z-score

Standardized value calculated as (x − µ)/σ.

New cards

Expected value

Long-term average outcome of a random variable.

New cards

Population

Entire group of interest in a study.

New cards

Sample

Subset drawn from a population to estimate characteristics.

New cards

Parameter

Numerical summary of a population (µ, σ).

New cards

Statistic

Numerical summary of a sample (x

New cards

Law of large numbers

Sample mean approaches population mean as sample size increases.

New cards

Central limit theorem

Distribution of sample means approaches normal as sample size increases.

New cards

Correlation

Measures how strongly two variables move together.

New cards

Correlation vs causation

Correlation indicates association; causation requires proof that one variable cPearson correlation

New cards

Spearman correlation

Rank-based correlation robust to outliers and nonlinear relationships.

New cards

Covariance

Measures whether two variables increase or decrease together.

New cards

Outlier

Data point far from the rest of the dataset.

New cards

Probability

Likelihood of an event occurring.

New cards

Sample space

All possible outcomes of an experiment.

New cards

Conditional probability

Probability of event A occurring given event B.

New cards

Joint probability

Probability of two events occurring together.

New cards

Bayes’ theorem

Formula for updating probabilities using evidence.

New cards

Prior probability

Belief about an event before observing data.

New cards

Posterior probability

Updated belief after observing evidence.

New cards

Likelihood

Probability of observing data given parameters.

New cards

Normal distribution

Bell-shaped distribution defined by mean and standard deviation.

New cards

Empirical rule

68%, 95%, and 99.7% of data fall within 1, 2, and 3 standard deviations.

New cards

Binomial distribution

Distribution for number of successes in fixed trials.

New cards

Poisson distribution

Distribution for counts of events occurring in a time interval.

New cards

Exponential distribution

Distribution describing time between events.

New cards

Bernoulli distribution

Distribution for a single success/failure trial.

New cards

Structured data

Data organized into rows and columns.

New cards

Unstructured data

Data without predefined structure (text, images, audio).

New cards

Numeric data

Data represented with numbers.

New cards

Categorical data

Data represented as categories or labels.

New cards

Discrete variable

Variable with countable values.

New cards

Continuous variable

Variable with infinite values within a range.

New cards

Random variable

Variable whose value depends on random outcomes.

New cards

Bar graph

Used to compare categories.

New cards

Histogram

Shows distribution of numerical data.

New cards

Line graph

Shows trends over time.

New cards

Scatter plot

Shows relationship between two variables.

New cards

Box plot

Displays quartiles, spread, and outliers.

New cards

Pie chart

Shows proportions of a whole.

New cards

Heatmap

Uses color to represent data intensity or correlation.

New cards

Data visualization

Graphical representation of data to reveal patterns.

New cards

Data cleaning

Fixing errors, removing duplicates, handling missing values.

New cards

Data wrangling

Organizing raw data for analysis.

New cards

Data transformation

Converting data into a useful format.

New cards

Data quality issues

Problems such as missing values, duplicates, or noise.

New cards

Missing data mechanisms

MCAR, MAR, MNAR.

New cards

Imputation methods

Methods to fill missing values (mean, median, KNN, regression).

New cards

Feature engineering

Creating new features from existing data.

New cards

Feature scaling

Adjusting data ranges using normalization or standardization.

New cards

Feature encoding

Converting categorical variables to numerical format.

New cards

Feature selection

Choosing relevant variables for modeling.

New cards

Dimensionality reduction

Reducing number of variables while preserving information.

New cards

Principal component analysis (PCA)

Linear dimensionality reduction technique.

New cards

t-SNE / UMAP

Nonlinear dimensionality reduction for visualization.

New cards

Autoencoders

Neural networks used for dimensionality reduction.

New cards

Data leakage

When training data accidentally includes information from test data.

New cards

Machine learning

Algorithms that learn patterns from data.

New cards

Supervised learning

Learning using labeled data.

New cards

Unsupervised learning

Finding patterns in unlabeled data.

New cards

Reinforcement learning

Learning by rewards and penalties.

New cards

Training dataset

Data used to train a model.

New cards

Validation dataset

Data used to tune model parameters.

New cards

Test dataset

Data used to evaluate final model performance.

New cards

Bias-variance tradeoff

Balance between underfitting and overfitting.

New cards

Overfitting

Model memorizes training data but performs poorly on new data.

New cards

Underfitting

Model too simple to capture patterns.

New cards

Regularization

Techniques that prevent overfitting.

New cards

Early stopping

Stopping training when validation performance stops improving.

New cards

Ensemble methods

Combine multiple models for better performance.

New cards

Bagging

Training models independently on bootstrapped datasets.

New cards

Boosting

Sequentially improving models by focusing on errors.

New cards

Stacking

Combining predictions of multiple models.

New cards

Linear regression

Predicts continuous values using a best-fit line.

New cards

Multiple linear regression

Regression using multiple predictors.

New cards

Logistic regression

Classification model predicting probabilities.

New cards

K-nearest neighbors (KNN)

Classifies based on nearby training examples.

New cards

Naive Bayes

Probabilistic classifier assuming feature independence.

New cards

Decision tree

Tree structure splitting data based on conditions.

New cards

Random forest

Ensemble of decision trees using bagging.

New cards

Gradient boosting

Sequential ensemble method focusing on residual errors.

New cards

Support vector machine (SVM)

Classifier maximizing margin between classes.

New cards

K-means clustering

Groups data into k clusters.

New cards

Hierarchical clustering

Creates nested clusters represented as dendrograms.

New cards

DBSCAN

Density-based clustering algorithm.

New cards

Topic modeling (LDA)

Extracts topics from text documents.

New cards

Accuracy

Proportion of correct predictions.

100

New cards

Precision

True positives ÷ predicted positives.

Explore top notes

Ecce Romani ch. 1-12

Updated 1108d ago

0.0(0)

social security and ERISA

Updated 1217d ago

0.0(0)

DSAT

Updated 928d ago

0.0(0)

Arthritis Pain of the Elbow

Updated 1151d ago

0.0(0)

006 - Cell Membrane

Updated 855d ago

0.0(0)

Earth Science #1

Updated 1334d ago

0.0(0)

Local Area, Holiday, and Travel (EdExcel French)

Updated 592d ago

0.0(0)

Economics Semester 2

Updated 1064d ago

0.0(0)

Ecce Romani ch. 1-12

Updated 1108d ago

0.0(0)

social security and ERISA

Updated 1217d ago

0.0(0)

DSAT

Updated 928d ago

0.0(0)

Arthritis Pain of the Elbow

Updated 1151d ago

0.0(0)

006 - Cell Membrane

Updated 855d ago

0.0(0)

Earth Science #1

Updated 1334d ago

0.0(0)

Local Area, Holiday, and Travel (EdExcel French)

Updated 592d ago

0.0(0)

Economics Semester 2

Updated 1064d ago

0.0(0)

Explore top flashcards

Periodic Table First 20

20Updated 966d ago

0.0(0)

APUSH Unit 5 Test

41Updated 363d ago

0.0(0)

Linked Review

34Updated 943d ago

0.0(0)

Histology practical exam

33Updated 939d ago

0.0(0)

Au restaurant

61Updated 1271d ago

0.0(0)

APUSH Period 9 vocabulary

56Updated 1078d ago

0.0(0)

Great expectations test 1

20Updated 1126d ago

0.0(0)

psych final study guide chap 5

91Updated 850d ago

0.0(0)

Periodic Table First 20

20Updated 966d ago

0.0(0)

APUSH Unit 5 Test

41Updated 363d ago

0.0(0)

Linked Review

34Updated 943d ago

0.0(0)

Histology practical exam

33Updated 939d ago

0.0(0)

Au restaurant

61Updated 1271d ago

0.0(0)

APUSH Period 9 vocabulary

56Updated 1078d ago

0.0(0)

Great expectations test 1

20Updated 1126d ago

0.0(0)

psych final study guide chap 5

91Updated 850d ago

0.0(0)