1/47
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
t = (X̄ - μ)/(s/((n)^1/2))
t-score formula
Use the z-score
σ is known OR
n is huge and σ is credibly known from process history
Use the t-score
σ is unknown and data are roughly normal
OR n ≥ ~30
Degrees of Freedom (df)
The number of independent values in a data set that are free to vary when estimating a parameter.
Used in t-distribution calculations
Represents the number of independent pieces of information available to estimate variability after one parameter (the mean) has been estimated from the data.
df = n - 1
Formulas for df
t_(α/2, n - 1)
Symbol that represents the t-critical value that cuts off an area of α/2 in each tail of the t-distribution with n-1 degrees of freedom.
Determines how far the sample mean can fall from the true mean when constructing a confidence interval for small samples.
X ± t_(α/2, n - 1) (s/((n)^1/2))
Formula for one-sample t confidence interval
The Assumptions that a Sample must meet for the t-confidence interval
Sample for the t confidence interval, should have the distribution that is not extremely skewed and should not have any extreme outliers (or n larger). Must check plots (histogram/QQ) and context.
Bootstrap Confidence Interval
A method for estimating the uncertainty of a statistic (like the mean or median) by resampling the original data many times with replacement and recalculating the statistic for each resample.
trimmed mean
For the moderate skew/outliers consider a ____________ or transform or use a Bootstrap Confidence Interval (if allowed)
Yes, it is true
Is it true that you must never “trim” outliers just to shrink the intervals, only use it for the documented errors
Margin of Error (MOE)
Half-width of a Confidence Interval
E = (z_(α/2))(σ/((n)^1/2))
Margin of Error of a z confidence interval
E = (t_(α/2))(s/((n)^1/2))
Margin of Error of a t confidence interval
n = ((z_(α/2)σ)/E)^2
Formula for the sample size (n) when estimating a population mean using a z-confidence interval, needed to achieve a desired margin of error (E)
Always round up n to the next whole number
Use pilot’s study or historical population if σ is unknown.
The formula ensures the confidence interval has the specified precision (E) at the chosen confidence level (1 - α)
Pilot Study Sample Standard Deviation (s)
A small, preliminary study done before the main one
If the population standard deviation is unknown, the sample standard deviation (s) calculated from the preliminary data
Gets used as an estimate of σ in the following sample size formula
n = ((z_(α/2)s)/E)^2
Gives a realistic idea of how variable the data are,
Helping plan how many samples are needed in the full study.
Historical Population Standard Deviation (σ)
Also called published value of the population standard deviation (σ)
Example: from past experiments, industry data, or technical reports as an estimate or variability.
This approach assumes the new data behave similarly to the older or related data.
Finite Population Correction (FPC)
The correction that prevents overestimating the variability when a large portion of the population is sampled.
When sampling without replacement from a finite population of size N, the variability of the sample mean is slightly smaller than in infinite populations.
(σ_(X̄, FPC)) = (σ/((n)^1/2))((N - n)/(N - 1)^1/2
Finite Population Correction formula.
Used if the sampling fraction n/N > 0.05
Adjusts the Standard Error
Sample Proportion
Represents the fraction of successes in a sample.
X
Symbol that represents the number of successes in the sample proportion
p̂ = X/n
Formula for the sample proportion, which is used as an estimate of the population proportion p.
Wilson Interval
Used when the sample size is small or
When np̂ and n(1 - p̂) < 10,
where the normal approximation is unreliable
Adjusts both the center (mean estimate and width of the confidence interval
To provide a more accurate estimate of the population proportion p for small samples
Tends to produce intervals that are tighter and more balanced around the true proportion than the standard normal-based method.
Agresti-Coull Interval
Used when the sample size is small or
When np̂ and n(1 - p̂) < 10,
Improves accuracy by adding small correction;
Which is usually 2 artificial successes and 2 artificial failures;
Done before computing p̂.
Increases stability in the estimated proportion and generally provides better coverage probability than the large-sample (normal) confidence interval.
Clopper-Pearson Interval
An exact binomial confidence interval
Used when the normal approximation doesn’t hold.
Guarantees that the true confidence level is at least what is stated
(never underestimates coverage)
Often conservative, meaning the interval is wider than necessary
But ensures high reliability for small or discrete samples.
p̂ ± (z_(α/2))((p̂(1 - p))/n)^1/2
Large-Sample Confidence Interval for a Proportion (p)
np̂ ≥ 10 and n(1 - p̂) ≥ 10
Conditions when the sample is large and in order to ensure that the sampling distribution of p̂ is approximately normal
two-sided
Use _________ Confidence Intervals unless only a minimum or maximum matters (spec/specification)
100(1 - α)%: X̄ - z_α SE
One sided lower bound
100(1 - α)%: X̄ + z_α SE
One sided upper bound
Paired Sample
Same units are measured twice (before/after, left/right)
Analyze differences: D_i: Confidence Interval for mean of D using t
Two-sample
Used when there are independent groups.
Pooled t-Test
A two-sample t-test used when the population variances are approximately equal.
Combines (or “pools”) the two samples
Into a single, common estimate of variance to compute the standard error
This increases precision when the equal-variance assumptions holds.
(s_p)^2 = (((n1 - 1)(s1)^2) + ((n2 - 1)(s2)^2))/(n1 + n2 - 2)
Pooled variance formula
Welch t-Test
A two sample t-test used when the population variances are not equal (heterogenous variances).
Does not assumes equal variances
And instead adjusts both the standard error and degrees of freedom accordingly
Is a more robust and reliable version when sample sizes or variances differ.
Q-Q Plot (Quantile-Quantile Plot)
A graphical tool used to check whether a dataset follows a specified distribution (most commonly the normal distribution)
Plots the quantiles of the sample data against the quantiles of a theoretical normal distribution.
straight diagonal line
In a QQ plot, if the points fall roughly along a ____________________, the data are appropriately normal.
skewness, non-normality
Systematic curves of deviations inside a QQ plot indicate ___________ or ____________
Multiplicative Data
Dataset where the peaks and troughs of the pattern become larger as the trend increases
Data Transformation
A mathematical modification applied to each data point to make the data more normal, stabilize variance, or improve model fit.
Count Data
Dataset consisting of non-negative, integer values that represent the number of times an event occurs within a specific unit of time or space
log(x)
Transformation used for right-skewed or multiplicative data
x^1/2 (square root)
Transformation used for count data
1/x (reciprocal)
Transformation used for the strong right-skew
n ≥ 30, Central Limit Theorem
If _______, then the t-interval works well due to the _____________________, unless the data have extreme Skewness or outlier
n < 30, roughly symmetric
If ______, inspect the histogram or QQ plot.
If the data are _________________, proceed with t.
If highly skewed, then try a data transformation.
SE Mean
The estimated standard deviation of the sample mean (s/(n)^1/2), also known as the standard error of the mean.
X̄ ± (critical value) × (SE Mean)
Equation for the Endpoints of the Confidence Interval