1/90
holy cram
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Quantitative variable
takes numerical values for a measured of counted quantity
categorical variable
takes on values that are category names or group labels
bar groups show frequency (how many) or relative frequency (percent)
misleading graphs
vertical axis must start at 0
beware of using images for bar graphs
segmented bar graph
stack up bars to make 100%
mosaic plot
segmented bar graph where the width of bars is proportional to the group size
association
if knowing the value of one variable helps us predict the other variable
segmented bar graphs are different
2 different quantitative data
discrete and continuous
discrete quantitative data
countable number of values
continuous quantitative data
infinite values
SOCV
shape, outliers, center , variability with context]
interpret standard deviation
“The context typically varies by SD from the mean of “
how to find variance?
(SD)2
Greated affected by outliers (nonresistant)
mean and standard deviation
resistant to change
median
1.5 x IQR method
low outlier < Q1 - 1.5(IQR)
high outlier > Q3 + 1.5(IQR)
SD Method for outliers
low outlier < mean - 2(SD)
high outlier > mean + 2(SD)
calculate IQR
MAX - MIN
Boxplots
5 number summary:
min, Q1 , med, Q3, max
Interpret percentile
“The pth percentile is the value that has the p% of the data less than or equal to it.”
Q1 percentile
25th percentile (0.25)
What percentile is the Median?
50th percentile (0.50).
Q3 percentile
75th percentile (0.75)
interpret Z-score
“context” is z-score standard deviations above/below the mean of “
what do z-scores show?
They show position relative to other values in the distribution
Empirical rule (68-95-99.7)
how to find z-score for a given proportion
use Table A, or TI84: InvNorm
Describe a relationship (DUFS + Context0
Direction (positive/negative, none)
Unusual features (outliers, clusters)
Form (linear or nonlinear)
Strength
Interpret Correlation ( r )
“The linear relationship between x and y is strength and direction .”
interpret coefficient of determination r2
“the percent of the variation in y explained by the linear relationship with x”
does correlation equal causation?
NO
how to calculate residual
residual = Actual - Predicted
(r=a-p)I
Interpret residual
“the actual context was residual value above/below the predicted value for x=#”
interpret y-intercept
“when x=0 context, the predicted y-context is y-intercept.”
interpret slope
“For each additional x-context the predicted y-context increases/decreases by slope.”
Least Squares regression line
minimizes the sum of the squared residuals
horizontal outliers
tilt the least squares regression line
vertical outliers
shift the least squares regression line up or down
high leverage outliers
very large of small x-values
influential outliers
if removed, big changes happen to the slope, y-int and correlation ( r ).
choosing best regression model
check the scatter plot for a linear pattern
check residual plot for no leftover pattern
check for the r2 that is closest to 1.
convenience sample
people are easy to reach, can lead to bias
voluntary response
people choose to respond, can lead to bias
what samples can lead to bias?
convenience sample and voluntary response
simple random sample (SRS)
label individuals
randomize
select
label individuals in SRS
assign numbers or write names on slips of papers
randomize in SRS
random number generator (no repeats) or names in a hat (shuffle)
stratified random sample
splot the population into groups (strata) then choose an SRS from each strata.
Homogenous grouping
each strata has individuals with shared attributes or characteristics
what leads to the best estimates in a sampling method?
low bias and low variability
cluster sample
heterogenous groups, sample all from some groups
systematic random sampling
choose a random starting point and go from equal intervals
undercoverage
some people are less likely to be chose
ex. calling landlines, surveying homeowners
nonresponse
people cant be reached or refuse to answer
ex. don’t answer of Hang up phone calls
response bias
problems in the data gathering instrument or process
ex. people lie (self reported responses), wording of quesion
observational study
no treatment imposed
experiment
treatment imposed, allow us to show causation
well-designed experiment steps
comparison (2 or more treatments)
random assignment
replication (more than 1 in each treatment group)
control (keep other variables constant)
what does random assignment do?
allows us to show causation
placebo effect
when a fake treatment works
blinding
when subjects (single blind) and/or experimenters (double blind) don’t know about treatments
randomized block design
separate subjects into blocks and then randomly assign treatments within each block
block
group of experimental units that are similar
matched pairs design
subjects are paired (block size 2) and then randomly assigned to a treatment and each subject receives two treatments
statistically significant
when results of an experiment are unlikely (less than 5% (0.05)) to happen purely by chance.
If statistically significant, we have convincing evidence the treatment caused the difference.
what does a random sample allow us to do?
Allows us to generalize our conclusions to the population from which we sampled
long run relative frequency
always between 0 and 1 (inclusive)
predictable
short run relative frequency
predictable
law of large numbers
simulated probabilities tend to get closer to the true probability as the number of trials increase.
sample space, list of all possible outcomes
P(E) = # outcomes in E/ total # outcomes in sample space
complement rule, probability of an event not happening
P(Ac) = 1- P(A)
P(A and B)
P(A n B) where both occur
P(A or B)
P(A U B) one of the other or both
addition rule in probability
P(A or B) = P(A) + P(B) - P(A and B)M
Mutually exclusive
Events A and B can’t occur together
P(A and B) = 0, so P(A or B) = P(A) + P(B)co
conditional probability
the probability of one event given another has occurred
P(A | B)= P(A and B) / P(B)
independent events
knowing whether or not one event occurs does not changed the probability of the other event
If P(A) = P(A | B) = P(A | Bc)
then A and B are independent
general multiplication rule
If A and B are independent
P( A and B) = P(A) x P(B)
probability of getting at least 1
1 - P(none)
discrete random variable
takes a countable number of values with gaps between
continuous random variable
has infinite values with no gaps
ex. uniform, normal
sample size
as sample size increases, variability decreases
central limit theorem
the sampling distribution of x-bar is approximately normal when the sample size is => 30
confidence interval
point estimate +- margin of error
P.E = A+B/2
M.E = B - A / 2
interpret confidence interval
we are % confident that the interval from A to B captures the true context
all values between A and B are plausible
Margin of Error
increased confidence, increased M.E leading to a wider interval
increased sample size, decreased M.E leading to narrower interval
interpret confidence level
if we take many, many samples and calculate a confidence interval for each, about % if them will capture the true context.
conditions for constructing a confidence interval for proportion
random condition (must have random sample)
10% condition (when sampling w/o replacement, check n <= 10% (N population))
Large Counts condition ( n(p-hat) >= 10 and n( 1-(p-hat) ) => 10
interpret p value
assuming the null hypothesis is true ( p = ho context), there is a p-value probability of getting a p-hat of or more extreme purely by chance.
conclusion
Because p-value < significance level (or p-value > sig. lvl.), we reject the null hypothesis (or fail to reject) and we do (or do not) have convincing evidence for Ha context.
type I error
the null hypothesis (context) is true, but we find convincing evidence for Ha(context
type II error
the ha(context) is true but we don’t find convincing evidence for Ha(context)