1/24
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
statistical inference
statistical inference
provides methods for drawing conclusions about a population from sample data
we can never be certain that our sample data fairly represents the population
to quantify this uncertainty, in statistical inference, we use the language of probability
just like with probability, the foundation of inference lies on long-run predictable behaviour
by taking “good“ samples (e.g. SRS), we can draw conclusions with a high probability of being correct
e.g. we can use the sample mean X-bar as an estimator for the population mean u… how good of an estimator is X-bar
statistical inference pt. 2
(BEEG STAR) the probability that X-bar = u is equal to 0, because of continuity (X-bar is a continuous variable!)
sample mean tells us nothing as to how accurate we believe our estimate to be
instead, we would like to use the sample mean to construct an interval of values to estimate the population mean u
confidence intervals
we know that the sample mean will vary from sample to sample. suppose we were to take many samples of the same size, n.
we would like to construct an interval in such a way that u is contained in the interval for most samples… what
that is, we would like to be confident that the interval we construct contains the value of the parameter (mean, u) we are trying to estimate
if x-bar is within 1.96 standard deviations of u, then u is also within 1.96 standard deviations of x-bar
in other words, in 95% of all samples, u lies within [x-bar +- z formulaaa…]
estimate +- margin of error
interpretation of the 95% confidence interval for u:
+ true mean of the population
“If we repeatedly took simple random samples of the same size from the same population and constructed the interval in a similar manner, 95% of all such intervals would contain the true mean u of the population“
each confidence interval has associated with it a confidence level C, (whihc gives the probability that the interval will contain the true value of the population mean u)
for e.g., a 95% confidence interval has a confidence level of 95%
we choose the confidence level ourselves
since our goal is typically to estimate a parameter with a high probability of being correct, we always use a high confidence level (usually 90% or higher)
the form of a general level C confidence interval for our population mean u is x-bar += z* (o/sqrtn) where:
z* is the value of Z such that:
P(-z* <= Z <= z*) = C
C% of observations fall within z* std. dev of u
C% of values of X-bar fall within z*(o/sqrtn) of u
C% of constructed intervals contain u
probability = C
probability = (1-C)/2
we can find z* for any confidence level C from Table 1
the values of z* for the most common confidence levels are given in the last row of Table 2
the values z* that mark off a specific area under the standard normal curve are called the critical values of the distribution
when confidence level increases, the margin of error also increases
thus if we increase the confidence level, we must sacrifice our precision of estimation!
if we want to be more sure that our interval contains u, then we have to expand the interval!
interpretation (18.26, 25.22) C = 95%
“if we were to take repeated samples of 40 employees and compute the interval in a similar manner, then 95% of such intervals would contain the true mean hourly wage”
central limit theorem
X-bar ~(dot) N(u, o), apply when n >= 30 (unit 5)
if not told about distribution shape, and less than 30 (less than 30 what, sample size), cant calculate.
how to reduce the length of the interval without sacrificing our precision of estimation?
increase the sample size!!!!
lower margin of error!!!!
increasing the sample size by a factor of k reduces the margin or error by a factor of (sqrtk).
m(new) = z* o/sqrtn*k = 1/sqrtk(z* o/sqrtn) = m(old)/sqrtk
population mean = u
is a fixed value
if u is between interval, probability is 100% within
if u is not, probability is 0%
raining or not raining? (when its already raining)
confidence interval interpretation template:
if we repeatedly took samples of the same size from the same population and constructed the intervals in a similar manner, then C% of such intervals would contain the population mean u. *write in context of the question
sample mean = x-bar
know difference between sample size and population size
… formula doesn’t care about population
when collecting a sample, always consider the purpose of our data collection
often, we would like to achieve a certain precision of estimation (I.e. a particular margin of error)
to accomplish this, we need to find out how large our sample size needs to be
n = (z* o/m)² … always round up
n = (z* o/m)²
k!!!!!
k
sample size
(STAR) if we want to divide the margin of error byk, we need a sample zise that is k² times the original sample size
(STAR) to reduce the margin of error to one third its original value (I.e. reduce it by a factor of 3), then we need 9 times more individuals in our sample
our formula for the confidence interval holds only if the data were collected using a SRS. good formulas cannot rescue us from poor sampling methods
since the sample mean is strongly influenced by outliers, the confidence interval is also strongly influenced by outliers
we are using the true population standard deviation o in our calculations
in practice, this is not a realistic assumption
we are making this unreasonable assumption now to establish the framework for building confidence intervals
the margin of error covers only random sampling errors
it does not reflect any degree of undercoveragem nonresponse, or other forms of bias
i.e. “error“ is a reflection of only the inherent variation in the population (quantified by o). “error“ does not mean that we made a mistake!
slides over