June 5 unit 6: confidence intervals

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/24

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

25 Terms

New cards

statistical inference

New cards

statistical inference

provides methods for drawing conclusions about a population from sample data
we can never be certain that our sample data fairly represents the population
to quantify this uncertainty, in statistical inference, we use the language of probability
just like with probability, the foundation of inference lies on long-run predictable behaviour
by taking “good“ samples (e.g. SRS), we can draw conclusions with a high probability of being correct
- e.g. we can use the sample mean X-bar as an estimator for the population mean u… how good of an estimator is X-bar

New cards

statistical inference pt. 2

(BEEG STAR) the probability that X-bar = u is equal to 0, because of continuity (X-bar is a continuous variable!)
sample mean tells us nothing as to how accurate we believe our estimate to be
instead, we would like to use the sample mean to construct an interval of values to estimate the population mean u

New cards

confidence intervals

we know that the sample mean will vary from sample to sample. suppose we were to take many samples of the same size, n.
we would like to construct an interval in such a way that u is contained in the interval for most samples… what
that is, we would like to be confident that the interval we construct contains the value of the parameter (mean, u) we are trying to estimate

New cards

if x-bar is within 1.96 standard deviations of u, then u is also within 1.96 standard deviations of x-bar
in other words, in 95% of all samples, u lies within [x-bar +- z formulaaa…]
- estimate +- margin of error

New cards

interpretation of the 95% confidence interval for u:

+ true mean of the population

“If we repeatedly took simple random samples of the same size from the same population and constructed the interval in a similar manner, 95% of all such intervals would contain the true mean u of the population“

New cards

each confidence interval has associated with it a confidence level C, (whihc gives the probability that the interval will contain the true value of the population mean u)

for e.g., a 95% confidence interval has a confidence level of 95%

we choose the confidence level ourselves

since our goal is typically to estimate a parameter with a high probability of being correct, we always use a high confidence level (usually 90% or higher)

the form of a general level C confidence interval for our population mean u is x-bar += z* (o/sqrtn) where:

z* is the value of Z such that:
- P(-z* <= Z <= z*) = C
C% of observations fall within z* std. dev of u
C% of values of X-bar fall within z*(o/sqrtn) of u
C% of constructed intervals contain u

New cards

probability = C

probability = (1-C)/2

New cards

we can find z* for any confidence level C from Table 1

the values of z* for the most common confidence levels are given in the last row of Table 2
the values z* that mark off a specific area under the standard normal curve are called the critical values of the distribution

New cards

when confidence level increases, the margin of error also increases

thus if we increase the confidence level, we must sacrifice our precision of estimation!
if we want to be more sure that our interval contains u, then we have to expand the interval!

New cards

interpretation (18.26, 25.22) C = 95%

“if we were to take repeated samples of 40 employees and compute the interval in a similar manner, then 95% of such intervals would contain the true mean hourly wage”

New cards

central limit theorem

X-bar ~(dot) N(u, o), apply when n >= 30 (unit 5)

if not told about distribution shape, and less than 30 (less than 30 what, sample size), cant calculate.

New cards

how to reduce the length of the interval without sacrificing our precision of estimation?

increase the sample size!!!!
lower margin of error!!!!
- increasing the sample size by a factor of k reduces the margin or error by a factor of (sqrtk).
- m(new) = z* o/sqrtn*k = 1/sqrtk(z* o/sqrtn) = m(old)/sqrtk

New cards

population mean = u

is a fixed value
if u is between interval, probability is 100% within
if u is not, probability is 0%
raining or not raining? (when its already raining)

confidence interval interpretation template:

if we repeatedly took samples of the same size from the same population and constructed the intervals in a similar manner, then C% of such intervals would contain the population mean u. *write in context of the question

New cards

sample mean = x-bar

New cards

know difference between sample size and population size

… formula doesn’t care about population

New cards

when collecting a sample, always consider the purpose of our data collection

often, we would like to achieve a certain precision of estimation (I.e. a particular margin of error)
to accomplish this, we need to find out how large our sample size needs to be
n = (z* o/m)² … always round up

New cards

n = (z* o/m)²

k!!!!!

New cards

sample size
(STAR) if we want to divide the margin of error byk, we need a sample zise that is k² times the original sample size
(STAR) to reduce the margin of error to one third its original value (I.e. reduce it by a factor of 3), then we need 9 times more individuals in our sample

New cards

our formula for the confidence interval holds only if the data were collected using a SRS. good formulas cannot rescue us from poor sampling methods

since the sample mean is strongly influenced by outliers, the confidence interval is also strongly influenced by outliers

New cards

we are using the true population standard deviation o in our calculations

in practice, this is not a realistic assumption
we are making this unreasonable assumption now to establish the framework for building confidence intervals

New cards

the margin of error covers only random sampling errors

it does not reflect any degree of undercoveragem nonresponse, or other forms of bias
i.e. “error“ is a reflection of only the inherent variation in the population (quantified by o). “error“ does not mean that we made a mistake!

New cards

slides over

New cards