Video Lecture: Descriptive and Inferential Statistics - Vocabulary Flashcards

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/51

Earn XP

Description and Tags

Vocabulary flashcards covering core concepts from descriptive statistics, distributions, and inferential statistics as discussed in the video notes.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

52 Terms

New cards

Central Limit Theorem (CLT)

States that the sum (or average) of a large number of independent, identically distributed random variables tends toward a normal distribution regardless of the original distributions.

New cards

Normal distribution (Gaussian)

A symmetric bell-shaped distribution that many statistical methods assume; characterized by its mean and standard deviation.

New cards

Type I Error

Rejecting a true null hypothesis (false positive); the probability of this error is denoted by alpha.

New cards

Type II Error

Failing to reject a false null hypothesis (false negative); the probability of this error is denoted by beta.

New cards

Significance level (alpha)

The probability threshold for rejecting the null hypothesis when it is true (commonly 0.05).

New cards

Power of a test

The probability of correctly rejecting a false null hypothesis; equal to 1 minus beta.

New cards

R-squared (coefficient of determination)

Measures the proportion of variance in the dependent variable explained by the model; ranges from 0 to 1, with 1 indicating a perfect fit.

New cards

Correlation

A measure of the strength and direction of a linear relationship between two variables; does not imply causation.

New cards

Causation

A cause-and-effect relationship where changes in one variable bring about changes in another.

New cards

Lurking variable (confounding variable)

An outside factor that affects both variables of interest, potentially creating a spurious association.

New cards

Parametric tests

Statistical tests that assume a specific population distribution (often normal), e.g., t-tests, ANOVA.

New cards

Non-parametric tests

Tests that do not assume a specific population distribution; e.g., Mann-Whitney U, Kruskal-Wallis.

New cards

p-value

The probability of observing the data (or more extreme) under the null hypothesis; used to decide on reject/fail to reject.

New cards

Cross-validation

A model evaluation method that partitions data into training and validation sets to assess performance.

New cards

k-fold cross-validation

A form of cross-validation where data are split into k subsets; train on k-1 and test on the remaining fold, rotating.

New cards

Bootstrapping

A resampling technique (sampling with replacement) used to estimate the distribution of a statistic.

New cards

Descriptive statistics

Techniques that summarize data (e.g., central tendency and dispersion) without making inferences about a population.

New cards

Mean

Arithmetic average of a set of numbers.

New cards

Median

The middle value in an ordered data set.

New cards

Mode

The most frequently occurring value in a data set.

New cards

Population

The entire group of interest in a study.

New cards

Sample

A subset drawn from a population used to estimate population characteristics.

New cards

Parameter

A numerical characteristic of a population (e.g., population mean).

New cards

Statistic

A numerical characteristic computed from a sample (e.g., sample mean).

New cards

Handling missing data by deletion

Removing records with missing values (listwise or pairwise deletion).

New cards

Imputation

Replacing missing values with estimated values (mean/median/mode or model-based).

New cards

Interquartile Range (IQR)

Q3 minus Q1; a robust measure of dispersion not affected by extreme values.

New cards

Skewness

A measure of asymmetry in a distribution; negative means left tail longer, positive means right tail longer.

New cards

Box plot

A graphical display showing the median, Q1, Q3, and whiskers (min/max) of a dataset.

New cards

Variance

Average of the squared deviations from the mean.

New cards

Standard deviation

Square root of the variance; measures spread in the same units as the data.

New cards

Range

Difference between the maximum and minimum values in a dataset.

New cards

Z-score

The number of standard deviations a value is from the mean: z = (X − μ) / σ.

New cards

Covariance

A measure of how two variables vary together; not standardized.

New cards

Pearson correlation coefficient (r)

A standardized measure of linear relationship between two variables, ranging from -1 to 1.

New cards

Kurtosis

A measure of the tailedness or extremity of a distribution's tails.

New cards

Simpson's Paradox

A trend observed within subgroups reverses when data are aggregated across groups.

New cards

Outliers

Values far from the rest of the data that can distort statistics like the mean.

New cards

Log transformation

Applying a logarithm to data to reduce skew and stabilize variance.

New cards

Histogram

A bar chart showing the frequency distribution of data divided into bins.

New cards

Probability Density Function (PDF)

A function describing the probability distribution of a continuous random variable.

New cards

Probability Mass Function (PMF)

A function describing the probabilities of the discrete outcomes of a random variable.

New cards

Poisson distribution

A discrete distribution for counting rare events; parameter λ equals the mean rate.

New cards

Binomial distribution

Distribution of the number of successes in n independent Bernoulli trials with probability p.

New cards

Hypothesis testing

A framework for testing assumptions about a population using sample data.

New cards

A/B testing

An experimental design comparing two versions (A and B) to determine which performs better.

New cards

Null Hypothesis (H0)

A statement of no effect or no difference to be tested against.

New cards

Alternative Hypothesis (HA)

A statement that there is an effect or difference to be detected.

New cards

Independent samples t-test

Tests whether the means of two independent groups are different.

New cards

Paired t-test

Tests whether the means of paired observations (e.g., before/after) differ.

New cards

ANOVA (Analysis of Variance)

A test comparing means across three or more groups to see if at least one differs.

New cards

Chi-square test

Tests independence between categorical variables or goodness-of-fit of observed frequencies.