1/52
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Actual values of the response variable
y
Predicted value of the response variable
y-hat; ŷ
Residual
e = y - ŷ; represents the difference between actual and predicted values.
Least Squares Regression Line (LSRL)
A regression line that minimizes the squared residuals.
R²
Represents the fraction of the variation in the response variable explained by the regression line, with values between 0 and +1.
Homogeneity of variance
Condition where residuals have similar spread; violations appear when residuals spread out.
Standard Error
Summarizes the typical size of residuals, giving a rough estimate of the model's accuracy.
Law of Large Numbers
The long-run relative frequency of repeated independent events approaches the true relative frequency as the number of trials increases.
Conditional Probability
The probability of an event given the occurrence of another event, represented as P(B | A).
Binomial Model
Appropriate for a random variable that counts the number of successes in a fixed number of Bernoulli Trials.
Complement Rule
P(Aᶜ) = 1 - P(A); describes the probability of the complement event.
Outcome
The value measured, observed, or reported for a trial in a random phenomenon.
Event
A collection of outcomes from a random phenomenon.
Random Variable
A variable whose value depends on a random event, denoted by X.
Expected Value
Theoretical long-run average of a random variable, denoted by E(X) or μ.
Outlier
A data point with a large residual or high leverage.
Influential Point
A data point that, when omitted, causes a significant change in the slope of the regression model.
Formula for Least Squares Regression Line (LSRL)
The LSRL can be expressed as ŷ = b₀ + b₁x, where b₀ is the y-intercept and b₁ is the slope.
Formula for R²
R² = 1 - (SS_res / SS_tot), where SS_res is the sum of squared residuals and SS_tot is the total sum of squares.
Formula for Expected Value
E(X) = Σ [x * P(x)], where x represents the outcomes and P(x) their probabilities.
Standard Error of the Estimate
SE = √(Σe² / (n - 2)), where e are residuals and n is the number of data points.
Formula for Conditional Probability
P(A | B) = P(A ∩ B) / P(B), where P(A ∩ B) is the probability of both A and B occurring.
Variance of a Random Variable
Var(X) = E(X²) - [E(X)]², indicating the spread of the random variable relative to its mean.
Standard Deviation
A measure of the dispersion or spread of a set of values, denoted as σ (for population) or s (for sample).
Formula for Standard Deviation (Population)
σ = √(Σ(x - μ)² / N), where μ is the population mean and N is the number of data points.
Formula for Standard Deviation (Sample)
s = √(Σ(x - x̄)² / (n - 1)), where x̄ is the sample mean and n is the number of data points.
Sampling Distribution
The probability distribution of a statistic obtained through a large number of samples drawn from a specific population.
Central Limit Theorem
States that the distribution of the sample means will approach a normal distribution as the sample size increases, regardless of the shape of the population distribution.
Z-Score
The number of standard deviations a data point is from the mean, calculated as Z = (X - μ) / σ.
Confidence Interval
A range of values, derived from the sample statistics, that is likely to contain the value of an unknown population parameter.
Formula for Confidence Interval for Mean
CI = x̄ ± Z*(σ/√n), where Z* is the Z-score corresponding to the desired confidence level, σ is the standard deviation, and n is the sample size.
Margin of Error
The amount of error that is allowed in the sample; calculated as E = Z*(σ/√n) for confidence intervals.
(b₁) Slope of the Regression Line
Indicates the expected change in the response variable for a one-unit change in the predictor variable.
Simple Linear Regression
A statistical method that models the relationship between a dependent variable and one independent variable by fitting a linear equation to observed data.
Multiple Linear Regression
A statistical technique that uses multiple independent variables to predict the value of a dependent variable.
Goodness of Fit
A measure of how well the observed outcomes match the expected outcomes in a model.
Overfitting
A modeling error that occurs when a model is too complex, capturing noise instead of the underlying data pattern.
Underfitting
A modeling error that occurs when a model is too simple to capture the underlying trend of the data.
Regression Coefficient
Represents the change in the dependent variable for a one-unit change in the predictor variable.
Interaction Term
A variable in a regression model that accounts for the effect of two or more independent variables acting together on the dependent variable.
Cross-Validation
A technique for assessing how the results of a statistical analysis will generalize to an independent data set.
Model Specification
The process of developing a model that accurately represents the relationship between the variables.
Residual Analysis
The examination of residuals to assess the goodness of fit of a model.
Bernoulli Trial
A random experiment where there are only two possible outcomes: success or failure.
Success Probability (p)
The probability of success on a single Bernoulli trial in a binomial experiment.
Binomial Distribution
The probability distribution of the number of successes in a fixed number of independent Bernoulli trials.
Formula for Binomial Probability
P(X = k) = (n choose k) * p^k * (1-p)^(n-k), where n is the number of trials and k is the number of successes.
Mean of Binomial Distribution
The expected number of successes in a binomial distribution, calculated as μ = n * p.
Variance of Binomial Distribution
The spread of a binomial distribution, calculated as Var(X) = n * p * (1 - p).
Cumulative Binomial Probability
The probability that the number of successes is less than or equal to a certain number k.
Sampling without Replacement
A sampling method where selected individuals are not returned to the population for subsequent trials.
Sampling with Replacement
A sampling method where selected individuals are returned to the population for subsequent trials.
Normal Approximation to the Binomial
When n is large and p is not too close to 0 or 1, the binomial distribution can be approximated by a normal distribution.