Stats

0.0(0)

Studied by 0 people

View linked note

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/52

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

53 Terms

New cards

Actual values of the response variable

New cards

Predicted value of the response variable

y-hat; ŷ

New cards

Residual

e = y - ŷ; represents the difference between actual and predicted values.

New cards

Least Squares Regression Line (LSRL)

A regression line that minimizes the squared residuals.

New cards

R²

Represents the fraction of the variation in the response variable explained by the regression line, with values between 0 and +1.

New cards

Homogeneity of variance

Condition where residuals have similar spread; violations appear when residuals spread out.

New cards

Standard Error

Summarizes the typical size of residuals, giving a rough estimate of the model's accuracy.

New cards

Law of Large Numbers

The long-run relative frequency of repeated independent events approaches the true relative frequency as the number of trials increases.

New cards

Conditional Probability

The probability of an event given the occurrence of another event, represented as P(B | A).

New cards

Binomial Model

Appropriate for a random variable that counts the number of successes in a fixed number of Bernoulli Trials.

New cards

Complement Rule

P(Aᶜ) = 1 - P(A); describes the probability of the complement event.

New cards

Outcome

The value measured, observed, or reported for a trial in a random phenomenon.

New cards

Event

A collection of outcomes from a random phenomenon.

New cards

Random Variable

A variable whose value depends on a random event, denoted by X.

New cards

Expected Value

Theoretical long-run average of a random variable, denoted by E(X) or μ.

New cards

Outlier

A data point with a large residual or high leverage.

New cards

Influential Point

A data point that, when omitted, causes a significant change in the slope of the regression model.

New cards

Formula for Least Squares Regression Line (LSRL)

The LSRL can be expressed as ŷ = b₀ + b₁x, where b₀ is the y-intercept and b₁ is the slope.

New cards

Formula for R²

R² = 1 - (SS_res / SS_tot), where SS_res is the sum of squared residuals and SS_tot is the total sum of squares.

New cards

Formula for Expected Value

E(X) = Σ [x * P(x)], where x represents the outcomes and P(x) their probabilities.

New cards

Standard Error of the Estimate

SE = √(Σe² / (n - 2)), where e are residuals and n is the number of data points.

New cards

Formula for Conditional Probability

P(A | B) = P(A ∩ B) / P(B), where P(A ∩ B) is the probability of both A and B occurring.

New cards

Variance of a Random Variable

Var(X) = E(X²) - [E(X)]², indicating the spread of the random variable relative to its mean.

New cards

Standard Deviation

A measure of the dispersion or spread of a set of values, denoted as σ (for population) or s (for sample).

New cards

Formula for Standard Deviation (Population)

σ = √(Σ(x - μ)² / N), where μ is the population mean and N is the number of data points.

New cards

Formula for Standard Deviation (Sample)

s = √(Σ(x - x̄)² / (n - 1)), where x̄ is the sample mean and n is the number of data points.

New cards

Sampling Distribution

The probability distribution of a statistic obtained through a large number of samples drawn from a specific population.

New cards

Central Limit Theorem

States that the distribution of the sample means will approach a normal distribution as the sample size increases, regardless of the shape of the population distribution.

New cards

Z-Score

The number of standard deviations a data point is from the mean, calculated as Z = (X - μ) / σ.

New cards

Confidence Interval

A range of values, derived from the sample statistics, that is likely to contain the value of an unknown population parameter.

New cards

Formula for Confidence Interval for Mean

CI = x̄ ± Z*(σ/√n), where Z* is the Z-score corresponding to the desired confidence level, σ is the standard deviation, and n is the sample size.

New cards

Margin of Error

The amount of error that is allowed in the sample; calculated as E = Z*(σ/√n) for confidence intervals.

New cards

(b₁) Slope of the Regression Line

Indicates the expected change in the response variable for a one-unit change in the predictor variable.

New cards

Simple Linear Regression

A statistical method that models the relationship between a dependent variable and one independent variable by fitting a linear equation to observed data.

New cards

Multiple Linear Regression

A statistical technique that uses multiple independent variables to predict the value of a dependent variable.

New cards

Goodness of Fit

A measure of how well the observed outcomes match the expected outcomes in a model.

New cards

Overfitting

A modeling error that occurs when a model is too complex, capturing noise instead of the underlying data pattern.

New cards

Underfitting

A modeling error that occurs when a model is too simple to capture the underlying trend of the data.

New cards

Regression Coefficient

Represents the change in the dependent variable for a one-unit change in the predictor variable.

New cards

Interaction Term

A variable in a regression model that accounts for the effect of two or more independent variables acting together on the dependent variable.

New cards

Cross-Validation

A technique for assessing how the results of a statistical analysis will generalize to an independent data set.

New cards

Model Specification

The process of developing a model that accurately represents the relationship between the variables.

New cards

Residual Analysis

The examination of residuals to assess the goodness of fit of a model.

New cards

Bernoulli Trial

A random experiment where there are only two possible outcomes: success or failure.

New cards

Success Probability (p)

The probability of success on a single Bernoulli trial in a binomial experiment.

New cards

Binomial Distribution

The probability distribution of the number of successes in a fixed number of independent Bernoulli trials.

New cards

Formula for Binomial Probability

P(X = k) = (n choose k) * p^k * (1-p)^(n-k), where n is the number of trials and k is the number of successes.

New cards

Mean of Binomial Distribution

The expected number of successes in a binomial distribution, calculated as μ = n * p.

New cards

Variance of Binomial Distribution

The spread of a binomial distribution, calculated as Var(X) = n * p * (1 - p).

New cards

Cumulative Binomial Probability

The probability that the number of successes is less than or equal to a certain number k.

New cards

Sampling without Replacement

A sampling method where selected individuals are not returned to the population for subsequent trials.

New cards

Sampling with Replacement

A sampling method where selected individuals are returned to the population for subsequent trials.

New cards

Normal Approximation to the Binomial

When n is large and p is not too close to 0 or 1, the binomial distribution can be approximated by a normal distribution.