good one

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/181

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:59 AM on 4/7/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

182 Terms

1
New cards

Anscombe’s quartet

A set of four datasets that have nearly identical statistical summaries (mean, varianData science process

2
New cards

Information

Processed or organized data that has meaning and context.

3
New cards

Knowledge

Information interpreted and applied to make decisions or take action.

4
New cards

Data Science

An interdisciplinary field that uses statistics, mathematics, programming, and domain knowlKnowledge Engineering

5
New cards

Median

Middle value of a sorted dataset.

6
New cards

Mode

Most frequently occurring value.

7
New cards

Range

Difference between the largest and smallest values.

8
New cards

Variance

Average squared deviation from the mean.

9
New cards

Standard deviation

Square root of variance; represents spread in original units.

10
New cards

Interquartile range (IQR)

Difference between Q3 and Q1; robust measure of spread.

11
New cards

Skewness

Measure of asymmetry in a distribution.

12
New cards

Kurtosis

Measure of tail heaviness compared to a normal distribution.

13
New cards

Z-score

Standardized value calculated as (x − µ)/σ.

14
New cards

Expected value

Long-term average outcome of a random variable.

15
New cards

Population

Entire group of interest in a study.

16
New cards

Sample

Subset drawn from a population to estimate characteristics.

17
New cards

Parameter

Numerical summary of a population (µ, σ).

18
New cards

Statistic

Numerical summary of a sample (x

19
New cards

Law of large numbers

Sample mean approaches population mean as sample size increases.

20
New cards

Central limit theorem

Distribution of sample means approaches normal as sample size increases.

21
New cards

Correlation

Measures how strongly two variables move together.

22
New cards

Correlation vs causation

Correlation indicates association; causation requires proof that one variable cPearson correlation

23
New cards

Spearman correlation

Rank-based correlation robust to outliers and nonlinear relationships.

24
New cards

Covariance

Measures whether two variables increase or decrease together.

25
New cards

Outlier

Data point far from the rest of the dataset.

26
New cards

Probability

Likelihood of an event occurring.

27
New cards

Sample space

All possible outcomes of an experiment.

28
New cards

Conditional probability

Probability of event A occurring given event B.

29
New cards

Joint probability

Probability of two events occurring together.

30
New cards

Bayes’ theorem

Formula for updating probabilities using evidence.

31
New cards

Prior probability

Belief about an event before observing data.

32
New cards

Posterior probability

Updated belief after observing evidence.

33
New cards

Likelihood

Probability of observing data given parameters.

34
New cards

Normal distribution

Bell-shaped distribution defined by mean and standard deviation.

35
New cards

Empirical rule

68%, 95%, and 99.7% of data fall within 1, 2, and 3 standard deviations.

36
New cards

Binomial distribution

Distribution for number of successes in fixed trials.

37
New cards

Poisson distribution

Distribution for counts of events occurring in a time interval.

38
New cards

Exponential distribution

Distribution describing time between events.

39
New cards

Bernoulli distribution

Distribution for a single success/failure trial.

40
New cards

Structured data

Data organized into rows and columns.

41
New cards

Unstructured data

Data without predefined structure (text, images, audio).

42
New cards

Numeric data

Data represented with numbers.

43
New cards

Categorical data

Data represented as categories or labels.

44
New cards

Discrete variable

Variable with countable values.

45
New cards

Continuous variable

Variable with infinite values within a range.

46
New cards

Random variable

Variable whose value depends on random outcomes.

47
New cards

Bar graph

Used to compare categories.

48
New cards

Histogram

Shows distribution of numerical data.

49
New cards

Line graph

Shows trends over time.

50
New cards

Scatter plot

Shows relationship between two variables.

51
New cards

Box plot

Displays quartiles, spread, and outliers.

52
New cards

Pie chart

Shows proportions of a whole.

53
New cards

Heatmap

Uses color to represent data intensity or correlation.

54
New cards

Data visualization

Graphical representation of data to reveal patterns.

55
New cards

Data cleaning

Fixing errors, removing duplicates, handling missing values.

56
New cards

Data wrangling

Organizing raw data for analysis.

57
New cards

Data transformation

Converting data into a useful format.

58
New cards

Data quality issues

Problems such as missing values, duplicates, or noise.

59
New cards

Missing data mechanisms

MCAR, MAR, MNAR.

60
New cards

Imputation methods

Methods to fill missing values (mean, median, KNN, regression).

61
New cards

Feature engineering

Creating new features from existing data.

62
New cards

Feature scaling

Adjusting data ranges using normalization or standardization.

63
New cards

Feature encoding

Converting categorical variables to numerical format.

64
New cards

Feature selection

Choosing relevant variables for modeling.

65
New cards

Dimensionality reduction

Reducing number of variables while preserving information.

66
New cards

Principal component analysis (PCA)

Linear dimensionality reduction technique.

67
New cards

t-SNE / UMAP

Nonlinear dimensionality reduction for visualization.

68
New cards

Autoencoders

Neural networks used for dimensionality reduction.

69
New cards

Data leakage

When training data accidentally includes information from test data.

70
New cards

Machine learning

Algorithms that learn patterns from data.

71
New cards

Supervised learning

Learning using labeled data.

72
New cards

Unsupervised learning

Finding patterns in unlabeled data.

73
New cards

Reinforcement learning

Learning by rewards and penalties.

74
New cards

Training dataset

Data used to train a model.

75
New cards

Validation dataset

Data used to tune model parameters.

76
New cards

Test dataset

Data used to evaluate final model performance.

77
New cards

Bias-variance tradeoff

Balance between underfitting and overfitting.

78
New cards

Overfitting

Model memorizes training data but performs poorly on new data.

79
New cards

Underfitting

Model too simple to capture patterns.

80
New cards

Regularization

Techniques that prevent overfitting.

81
New cards

Early stopping

Stopping training when validation performance stops improving.

82
New cards

Ensemble methods

Combine multiple models for better performance.

83
New cards

Bagging

Training models independently on bootstrapped datasets.

84
New cards

Boosting

Sequentially improving models by focusing on errors.

85
New cards

Stacking

Combining predictions of multiple models.

86
New cards

Linear regression

Predicts continuous values using a best-fit line.

87
New cards

Multiple linear regression

Regression using multiple predictors.

88
New cards

Logistic regression

Classification model predicting probabilities.

89
New cards

K-nearest neighbors (KNN)

Classifies based on nearby training examples.

90
New cards

Naive Bayes

Probabilistic classifier assuming feature independence.

91
New cards

Decision tree

Tree structure splitting data based on conditions.

92
New cards

Random forest

Ensemble of decision trees using bagging.

93
New cards

Gradient boosting

Sequential ensemble method focusing on residual errors.

94
New cards

Support vector machine (SVM)

Classifier maximizing margin between classes.

95
New cards

K-means clustering

Groups data into k clusters.

96
New cards

Hierarchical clustering

Creates nested clusters represented as dendrograms.

97
New cards

DBSCAN

Density-based clustering algorithm.

98
New cards

Topic modeling (LDA)

Extracts topics from text documents.

99
New cards

Accuracy

Proportion of correct predictions.

100
New cards

Precision

True positives ÷ predicted positives.

Explore top notes

note
Ecce Romani ch. 1-12
Updated 1108d ago
0.0(0)
note
social security and ERISA
Updated 1217d ago
0.0(0)
note
DSAT
Updated 928d ago
0.0(0)
note
Arthritis Pain of the Elbow
Updated 1151d ago
0.0(0)
note
006 - Cell Membrane
Updated 855d ago
0.0(0)
note
Earth Science #1
Updated 1334d ago
0.0(0)
note
Economics Semester 2
Updated 1064d ago
0.0(0)
note
Ecce Romani ch. 1-12
Updated 1108d ago
0.0(0)
note
social security and ERISA
Updated 1217d ago
0.0(0)
note
DSAT
Updated 928d ago
0.0(0)
note
Arthritis Pain of the Elbow
Updated 1151d ago
0.0(0)
note
006 - Cell Membrane
Updated 855d ago
0.0(0)
note
Earth Science #1
Updated 1334d ago
0.0(0)
note
Economics Semester 2
Updated 1064d ago
0.0(0)

Explore top flashcards

flashcards
Periodic Table First 20
20
Updated 966d ago
0.0(0)
flashcards
APUSH Unit 5 Test
41
Updated 363d ago
0.0(0)
flashcards
Linked Review
34
Updated 943d ago
0.0(0)
flashcards
Histology practical exam
33
Updated 939d ago
0.0(0)
flashcards
Au restaurant
61
Updated 1271d ago
0.0(0)
flashcards
APUSH Period 9 vocabulary
56
Updated 1078d ago
0.0(0)
flashcards
Great expectations test 1
20
Updated 1126d ago
0.0(0)
flashcards
psych final study guide chap 5
91
Updated 850d ago
0.0(0)
flashcards
Periodic Table First 20
20
Updated 966d ago
0.0(0)
flashcards
APUSH Unit 5 Test
41
Updated 363d ago
0.0(0)
flashcards
Linked Review
34
Updated 943d ago
0.0(0)
flashcards
Histology practical exam
33
Updated 939d ago
0.0(0)
flashcards
Au restaurant
61
Updated 1271d ago
0.0(0)
flashcards
APUSH Period 9 vocabulary
56
Updated 1078d ago
0.0(0)
flashcards
Great expectations test 1
20
Updated 1126d ago
0.0(0)
flashcards
psych final study guide chap 5
91
Updated 850d ago
0.0(0)