1/60
Contains U1, U2, and U3
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What makes a study an Observational Study?
treatments were not randomly assigned
ex: This is an observational study b/c volunteers were not assigned to whether they watched TV or didn’t watch TV.
What makes a study an Experiment?
treatments ARE randomly assigned
ex: This is an experiment because the teacher randomly assigns which students will use the physical program and which students will use the virtual program.
How do you conduct a Simple Random Sample?
State how you will number the population
State the method used (random number generator)
State the range of numbers used and how many you will select. Also, communicate no repeated numbers allowed.
When do you conduct a Stratified Random Sample?
Use stratification when a variable that is inherent to the subjects/units (ie. age, gender) might influence the outcome of the experiment.
What are the advantages and disadvantages of a Stratified Random Sample?
Advantage:
reduces variability in the response variable (state the response variable)
often more representative of the population as each group is proportionally represented in the sample
Disadvantage:
often time consuming to carry out
How do you conduct a Stratified Random Sample?
Group individuals into their strata
State how you will number the population (number may change for each strata)
State the method used (random number generator)
State the range of numbers used and how many you will select. Also, communicate no repeated numbers allowed.
When do you conduct a Cluster Random Sample?
formed based on convenience instead of the confounding variable
What are the advantages and disadvantages of a Cluster Random Sample?
Advantages:
quicker to carry out than stratifying
Disadvantages:
the sample may not be representative of the population
How do you conduct a Cluster Random Sample?
Group individuals based off of proximity
Ex. If you’re taking a sample of a hotel, group the subjects by their floor
State how you will number the population
State the method used (random number generator)
State the range of numbers used and how many you will select. Also, communicate no repeated numbers allowed.
When does a study have selection bias?
when no randomness was used to select the sample/ poorly selected
What are the types of selection bias?
Convenience Sampling
sampled based on what is convenient for the researcher
ex. sampling the first 30 people you see
Voluntary Response Sampling
the people who respond are often similar to each other with respect to their views/opinions
ex. surveys online
When does a study have nonresponse bias?
there was random sampling
HOWEVER, the people contacted for the study can’t be contacted or refuse to participate
ex. mail people the survey, but don’t follow up if they don’t mail it back
When does a study have response bias?
participants were randomly selected AND the people responded
but the responses are quite inaccurate
How does response bias occur?
Leading questions: people know what answer you want so they tell you that answer
Confusing questions: people don’t understand the question
Awkwardness between the researcher & individual answering
What is the explanatory variable?
a variable whose levels are manipulated intentionally
the independent variable
EXPLAINS the response variable
What is the response variable?
a variable that is the outcome of a study; what you are measuring
dependent variable
is what happens in RESPONSE to the explanatory variable being manipulated
What is a confounding variable?
a variable that is related to the explanatory variable and possibly influences the response variable
What are the criteria of a well-designed experiment?
comparison: must compare two or more treatment groups
random assignment: experimental units/subjects must be randomly assigned to treatments
control: control potential confounding variables by keeping all other variables constant for all groups
replication: must have more than one experimental unit/subject in each treatment
What is the purpose of random assignment?
random assignment creates roughly equivalent groups (provide context)
allows for fair comparison between (provide context about the treatments)
can be attributed to the treatment (context) instead of untested variables
What is the statistical advantage of blocking by the confounding variable?
blocking separates natural chance variability in responses from the differences due to the confounding variable (state with context)
this makes it easier to determine if one treatment is better/worse than the other treatment (makes more sense with context)
Why is replication important?
there is more than one unit/subject in each treatment (provide context for both the unit and treatment)
this is important to show that the results aren’t due to random chance
What is a completely randomized designed experiment?
most basic designs
take a whole group of subjects/units and use a random method to assign them to treatment groups (typically of equal/similar size)
What is a randomized block designed experiment?
units/subjects are FIRST separated into blocks, and then random assignment to treatments is done within those sections
Reason: separates natural chance variability in responses from differences due to the blocking variable. This makes it easier to determine if one treatment is really more effective than another.
What is a matched pair design?
Pairing where 2 similar subjects are paired together
ex. pair golfers with a similar skill level
Pairing where each subject does both treatments
What can you generalize for an observational study?
can only generalize the findings of a study to the population from which the sample was selected from
What can you generalize from a well-designed experiment?
results from the experiment can be generalized to others who are similar to the volunteers
results can be generalized to the general population when the subjects/units ARE randomly sampled from the population before randomly assigning to treatments
What is categorical data?
data is placed into one of several groups of categories
ex. favorite color, car model, zip code
appropriate graphs: bar graph, segmented bar graph, mosaic plot, pie chart
What is quantitative data?
data is numerical and it makes sense to average the values
ex. height, length, age, # of siblings, # of pairs of shoes you own
appropriate graphs: dotplot, stemplot, histogram, box plot
What are the characteristics of the mean?
\overline{x}=\frac{\sum_{}^{}x_{i}^{^{}}}{n}
add up all the values in the set, then divide by the number of values
heavily influenced by outliers
What are the characteristics of the median?
the middle value of an ordered distribution
not influenced by outliers
What are the characteristics of range?
Max - Min
Highly influenced by outliers
What are the characteristics of the Interquartile Range (IQR)?
Q3 - Q1
measures how wide the middle 50% of the data is
not influenced by outliers, b/c only values toward the middle of the distribution are used in the calculation
How do you find outlier(s)?
Do the upper and lower fence test
LF = Q1 - 1.5(IQR)
x < LF = outlier
UF = Q3 + 1.5(IQR)
x > UF = outlier
What is the five-number summary?
min
Q1
median
Q3
max
What are the characteristics of standard deviation?
s=\sqrt{\frac{1}{n-1}\sum_{}^{}\left(x_{i}-\overline{x}\right)^2}
the on average amount of deviation from the mean
since it’s related to the mean, it is heavily influenced by outliers
What does skewed left mean?
long tail on the left
What does skewed right mean?
long tail on the right
Characteristics of a stem plot
you can tell the shape of a graph
easily find the middle
can find the unusual features
can easily find the spread
Characteristics of a histogram
can’t easily find the middle
can find the spread
can find unusual features
can find the shape
How do describe/ compare a data set?
CUSS + CONTEXT
center - median/mean (based on presence of outliers)
unusual features - gaps/outliers
spread - range/IQR/standard deviation (based on presence of outliers)
shape - unimodal/skewed/symmetric
context of the problem
NOTE: when comparing, use words like: greater than, less than, equivalent
How do you create a relative cumulative frequency graph?
make the graph from 0%-100%
take the values from the relative frequency and then add them in order
rf1: 36%, rf2: 29%
rcf1: 36%, rcf: 65%
What is the important aspect to remember about relative cumulative frequency tables?
Whatever value is chosen, the amount is up to that value
ex. How many automobiles have a score less than 180?
93%, the leftover 7% is over 180.
What are the characteristics of z-scores?
z=\frac{x_{i}-\overline{x}}{s}
measure of relative position
tells you how many standard deviations above or below the mean a value is
explanation: (context) is z-score standard deviations above/below the mean score (context).
when comparing z-scores, whoever has the highest generally did better
What are the characteristics of a scatterplot?
shows the relationship btwn. two quantitative variables
How do you describe the characteristics of a linear relationship?
DUST + CONTEXT
Direction - positive/negative
Unusual features - outliers (points that fall outside of the line of data)
Strength
strong - data points fall close to the line
weak - data points are very spread out around the line
Type - linear, nonlinear
context of the problem
What are the characteristics of the correlation coefficient?
r only measures the strength of a linear relationship - between two quantitative variables
measured between -1< r < 1 (can equal -1 or 1)
if r is near zero = relationship is weak
if r is near one = relationship is strong
r has no units
What are the basic characteristics of the least squares regression line (LSRL)?
\displaylines{\overline{y}=a+bx\\ }
“y hat” = predicted y value
the line of best fit that minimizes the sum of the squares of the residuals
What happens when you predict (“y hat”) with the LSRL?
Interpolating
when the x-value is inside the domain of the data
Extrapolating
when the s-value is outside the domain of the data
How do you interpret the slope/“b” of the LSRL?
As (x in context) increases by 1 (unit), the predicted (y in context) increases/decreases by (b).
How do you interpret the intercept/“a” of the LSRL?
When (x in context) is zero, the predicted (y in context) is (a).
How do you interpret the correlation/ r?
You must address:
strength - moderate/weak/strong
type - linear/nonlinear
direction - positive/negative
context
How do you characterize the coefficient of determination/ r2?
calculated by squaring the correlation
Interpretation: r2% of the variation in (y in context) is attributed to the linear relationship with (x in context).
What are the characteristics of a residual?
\varepsilon=y-\overline{y}
residual = actual - predicted
the vertical distances between each point on a scatterplot and the LSRL
How do you interpret residual plots?
no obvious pattern = this model is appropriate
clear/obvious pattern = this model isn’t appropriate
fanning is also a bad thing