Confidence Intervals (Part II)

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/47

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

48 Terms

1
New cards

t = (X̄ - μ)/(s/((n)^1/2))

t-score formula

2
New cards

Use the z-score

  • σ is known OR

  • n is huge and σ is credibly known from process history

3
New cards

Use the t-score

  • σ is unknown and data are roughly normal

  • OR n ≥ ~30

4
New cards

Degrees of Freedom (df)

The number of independent values in a data set that are free to vary when estimating a parameter.

Used in t-distribution calculations

Represents the number of independent pieces of information available to estimate variability after one parameter (the mean) has been estimated from the data.

5
New cards

df = n - 1

Formulas for df

6
New cards

t_(α/2, n - 1)

Symbol that represents the t-critical value that cuts off an area of α/2 in each tail of the t-distribution with n-1 degrees of freedom.

Determines how far the sample mean can fall from the true mean when constructing a confidence interval for small samples.

7
New cards

X ± t_(α/2, n - 1) (s/((n)^1/2))

Formula for one-sample t confidence interval

8
New cards

The Assumptions that a Sample must meet for the t-confidence interval

Sample for the t confidence interval, should have the distribution that is not extremely skewed and should not have any extreme outliers (or n larger). Must check plots (histogram/QQ) and context.

9
New cards

Bootstrap Confidence Interval

A method for estimating the uncertainty of a statistic (like the mean or median) by resampling the original data many times with replacement and recalculating the statistic for each resample.

10
New cards

trimmed mean

For the moderate skew/outliers consider a ____________ or transform or use a Bootstrap Confidence Interval (if allowed)

11
New cards

Yes, it is true

Is it true that you must never “trim” outliers just to shrink the intervals, only use it for the documented errors

12
New cards

Margin of Error (MOE)

Half-width of a Confidence Interval

13
New cards

E = (z_(α/2))(σ/((n)^1/2))

Margin of Error of a z confidence interval

14
New cards

E = (t_(α/2))(s/((n)^1/2))

Margin of Error of a t confidence interval

15
New cards

n = ((z_(α/2)σ)/E)^2

Formula for the sample size (n) when estimating a population mean using a z-confidence interval, needed to achieve a desired margin of error (E)

  • Always round up n to the next whole number

  • Use pilot’s study or historical population if σ is unknown.

  • The formula ensures the confidence interval has the specified precision (E) at the chosen confidence level (1 - α)

16
New cards

Pilot Study Sample Standard Deviation (s)

  • A small, preliminary study done before the main one

  • If the population standard deviation is unknown, the sample standard deviation (s) calculated from the preliminary data

  • Gets used as an estimate of σ in the following sample size formula

  • n = ((z_(α/2)s)/E)^2

  • Gives a realistic idea of how variable the data are,

  • Helping plan how many samples are needed in the full study.

17
New cards

Historical Population Standard Deviation (σ)

  • Also called published value of the population standard deviation (σ)

  • Example: from past experiments, industry data, or technical reports as an estimate or variability.

  • This approach assumes the new data behave similarly to the older or related data.

18
New cards

Finite Population Correction (FPC)

The correction that prevents overestimating the variability when a large portion of the population is sampled.

When sampling without replacement from a finite population of size N, the variability of the sample mean is slightly smaller than in infinite populations.

19
New cards

(σ_(X̄, FPC)) = (σ/((n)^1/2))((N - n)/(N - 1)^1/2

Finite Population Correction formula.

Used if the sampling fraction n/N > 0.05

Adjusts the Standard Error

20
New cards

Sample Proportion

Represents the fraction of successes in a sample.

21
New cards

X

Symbol that represents the number of successes in the sample proportion

22
New cards

p̂ = X/n

Formula for the sample proportion, which is used as an estimate of the population proportion p.

23
New cards

Wilson Interval

  • Used when the sample size is small or

  • When np̂ and n(1 - p̂) < 10,

  • where the normal approximation is unreliable

  • Adjusts both the center (mean estimate and width of the confidence interval

  • To provide a more accurate estimate of the population proportion p for small samples

  • Tends to produce intervals that are tighter and more balanced around the true proportion than the standard normal-based method.

24
New cards

Agresti-Coull Interval

  • Used when the sample size is small or

  • When np̂ and n(1 - p̂) < 10,

  • Improves accuracy by adding small correction;

  • Which is usually 2 artificial successes and 2 artificial failures;

  • Done before computing p̂.

  • Increases stability in the estimated proportion and generally provides better coverage probability than the large-sample (normal) confidence interval.

25
New cards

Clopper-Pearson Interval

  • An exact binomial confidence interval

  • Used when the normal approximation doesn’t hold.

  • Guarantees that the true confidence level is at least what is stated

  • (never underestimates coverage)

  • Often conservative, meaning the interval is wider than necessary

  • But ensures high reliability for small or discrete samples.

26
New cards

p̂ ± (z_(α/2))((p̂(1 - p))/n)^1/2

Large-Sample Confidence Interval for a Proportion (p)

27
New cards

np̂ ≥ 10 and n(1 - p̂) ≥ 10

Conditions when the sample is large and in order to ensure that the sampling distribution of p̂ is approximately normal

28
New cards

two-sided

Use _________ Confidence Intervals unless only a minimum or maximum matters (spec/specification)

29
New cards

100(1 - α)%: X̄ - z_α SE

One sided lower bound

30
New cards

100(1 - α)%: X̄ + z_α SE

One sided upper bound

31
New cards

Paired Sample

  • Same units are measured twice (before/after, left/right)

  • Analyze differences: D_i: Confidence Interval for mean of D using t

32
New cards

Two-sample

Used when there are independent groups.

33
New cards

Pooled t-Test

  • A two-sample t-test used when the population variances are approximately equal.

  • Combines (or “pools”) the two samples

  • Into a single, common estimate of variance to compute the standard error

  • This increases precision when the equal-variance assumptions holds.

34
New cards

(s_p)^2 = (((n1 - 1)(s1)^2) + ((n2 - 1)(s2)^2))/(n1 + n2 - 2)

Pooled variance formula

35
New cards

Welch t-Test

  • A two sample t-test used when the population variances are not equal (heterogenous variances).

  • Does not assumes equal variances

  • And instead adjusts both the standard error and degrees of freedom accordingly

  • Is a more robust and reliable version when sample sizes or variances differ.

36
New cards

Q-Q Plot (Quantile-Quantile Plot)

  • A graphical tool used to check whether a dataset follows a specified distribution (most commonly the normal distribution)

  • Plots the quantiles of the sample data against the quantiles of a theoretical normal distribution.

37
New cards

straight diagonal line

In a QQ plot, if the points fall roughly along a ____________________, the data are appropriately normal.

38
New cards

skewness, non-normality

Systematic curves of deviations inside a QQ plot indicate ___________ or ____________

39
New cards

Multiplicative Data

Dataset where the peaks and troughs of the pattern become larger as the trend increases

40
New cards

Data Transformation

A mathematical modification applied to each data point to make the data more normal, stabilize variance, or improve model fit.

41
New cards

Count Data

Dataset consisting of non-negative, integer values that represent the number of times an event occurs within a specific unit of time or space

42
New cards

log(x)

Transformation used for right-skewed or multiplicative data

43
New cards

x^1/2 (square root)

Transformation used for count data

44
New cards

1/x (reciprocal)

Transformation used for the strong right-skew

45
New cards

n ≥ 30, Central Limit Theorem

If _______, then the t-interval works well due to the _____________________, unless the data have extreme Skewness or outlier

46
New cards

n < 30, roughly symmetric

  • If ______, inspect the histogram or QQ plot.

  • If the data are _________________, proceed with t.

  • If highly skewed, then try a data transformation.

47
New cards

SE Mean

The estimated standard deviation of the sample mean (s/(n)^1/2), also known as the standard error of the mean.

48
New cards

X̄ ± (critical value) × (SE Mean)

Equation for the Endpoints of the Confidence Interval