1/187
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
data
observations gathered for analysis; numerical or non-numerical
cases
subjects we obtain data about
when making comparisons, we would like to
determine association and determine causation
statistical inference
the process of using data from a sample to gain information about the population
sampling bias
occurs when the method of selecting a sample causes the sample to differ from the population in some relevant way
we try to obtain a sample that is (?) of the population
representative
simple random sampling
all groups of the population have the same chance of being chosen
is a voluntary sample a good sampling method?
no
association
values of one variable tends to be related to values of the other variable
causation
changing the value of the explanatory variable influences the value of the response variable
T/F: association automatically means causation
no; association does not imply causation
confounding variable
third variable that is associated with both the explanatory and response variable
observational study
a study in which the researcher does not actively control the value of any variable, but simply observes the values as they naturally exist
experiment
a study in which the researcher actively controls one or more of the explanatory variables; aka randomized experiment
which study can find causation
experiment
do confounding variables exist in a randomized experiement
no
three explanations for why association may be observed in sample data
there is a causal relationship or association
there is an association, but it is due to confounding variables
there is no association; it is random chance
how do you avoid confounding variables
use random assignment
randomized comparative experiment
randomizing cases into different groups and then comparing results to response variable
matched pairs experiment
each case gets both treaments in random order and examine individual differences
random sampling vs random assignment
each unit has same chance of being chosen vs placing units into groups by chance
random selection allows us to
make generalizations about the population
random assignment allows us to
make conclusions about causality
frequency
number of times a value is observed in a data set
relative frequency
number of times a value is observed divided by the total number of observations
the sum of all relative frequencies is
1
one categorical variable summary statistics
frequency table, proportion
one categorical variable visualization
bar or pie chart
two categorical variable summary statistics
two-way table, difference is proportions
two categorical variable visualization
segmented or side-by-side bar chart
mode
the category that occurs most frequently
segmented bar chart
the height of each bar represents the frequency of one categorical variable and the segmented colors split each bar by the other categorical variable
side by side bar chart
separate bar charts are given for each group of one of the categorical variables
visualizing one quantitative variable
dot plot or histogram
shapes
symmetric or skewed
measures of center
mean or median
right skewed
tail of distribution extends out to the right
left skewed
tail of distribution extends out to the left
resistance
we say a statistic is resistant if it is unaffected by extreme values
which measure of center is impacted by utliers
mean
which measure of center is not impacted by outliers
median
left skewed mean vs median
mean < median
right skewed mean vs median
mean > median
standard deviation
a number that measures how far away the typical observation is from the mean
a larger standard deviation means
the data values are more spread out and have more variability
T/F: standard deviation is not affected by outliers and skeweness
false
IQR
Q3-Q1
only use the standard deviation as the measure of spread when you are using the (?) as the measure of center
mean
use this measure of center and this measure of spread when skewed
median and IQR
boxplot
graphical representation of the five-number summary
left skewed box plot
median line to the right of the box; left whisker longer
right skewed box plot
median line to the left of the box; right whisker longer
standard error
the standard deviation of the sample statistics
a low standard error means
statistics vary little from sample to sample
as the sample size increases, the variability of sample statistics tends to (?) and sample statistics tend to be (?) to the true value of the population parameter
decrease; closer
does the shape of the population affect the center of each sampling distribution?
no
confidence interval
captures parameter for a specified proportion of all samples
confidence interval formula
sample statistic ± critical value*(SE)
confidence interval interpretation
we are 95% confident that an interval captures the true population parameter
95% rule
if a distribution is approximately symmetric and bell-shaped, about 95% of the data should fall within 2 standard deviations of the mean
95% rule formula
statistic ± 2(SE)
bootstrapping
technique or simulating a sampling distribution when you do not have a population from which to sample
bootstrap sample
sample with replacement from the original sample using the same sample size
bootstrap distribution shape
bell shaped and symmetric
bootstrap distribution center
centered at sample statistic value
bootstrap confidence interval
when symmetric and bell-shaped, statistic ± 2(SE)
as the confidence level increases, the width of the confidence interval…
increases
as the sample size increases, the width of the confidence interval…
decreases
statistical test
used to determine whether results from a sample are convincing enough to allow us to conclude something about the population
goal of a hypothesis test
asses evidence provided by the sample data to test a claim made about a population parameter
null hypothesis
H-naught
alternative hypothesis
H-A
null hypothesis meaning
no change; always equals zero
alternative hypothesis meaning
claim for which we seek evidence; different from zero
hypothesis tests are always written in (?) notation
population parameter
two-sided hypothesis
H0: p1 = p2
HA: p1 does not equal p2
left-sided hypothesis
H0: p1 = p2
HA: p1 < p2
right-sided hypothesis
H0: p1 = p2
HA: p1 > p2
the null hypothesis is assumed to be (?) throughout the hypothesis test
true
how do you determine the p-value by hand?
count how many points were observed that are greater than or equal to the sample and divide that number by 100
p-value
proportion of samples that would give a statistic as extreme as the observed sample result when the null hypothesis is true
two-tailed p-value
when the alternative hypothesis contains a does not equal sign, the p-value is twice the proportion of the smallest tail
p-value < a
reject null hypothesis
p-value > a
do not reject the null hypothesis
smaller p values mean the sample results are
statistically significant
formal hypothesis test has only two possible conclusions
reject or do not reject the null hypothesis
possible significance levels
0.05, 0.01, 0.1
conclusions of hypothesis tests
conclude in terms of H-A in context of the question
type I error
occurs when we reject a true null hypothesis
type II error
occurs when we do not reject a false null hypothesis
if a = 0.05, there is a (?)% chance of a type I error
5
as a sample size increases, statistics in the randomization distribution will be more closely concentrated around the..
null value
a larger sample size (?) the chance of making a type II error
decreases
two methods of statistical inference
confidence intervals and hypothesis tests
sampling distribution
shows distribution of sample statistics obtained from a population, centered at true value of population parameter
bootstrap distribution
simulates a distribution of sample statistics for the population, centered at value of original sample statistic
randomization distribution
simulates a distribution of sample statistics for a population in which the null hypothesis is true, centered at value stated in null hypothesis
(-L, -U)
does not capture 0; reject the null hypothesis
(L, U)
does not capture 0; reject the null hypothesis
(-L, U)
captures 0; do not reject the null hypothesis