describe what a sampling distribution is
a hypothetical distribution of sample means
why is a sampling distribution important for statistical inference?
It helps us decide how likely is our data: Relatively likely = retain null, Relatively unlikely = reject null
What is the standard error of the mean?
the standard deviation of sample means. Quantifies dispersion
How is the standard error of the mean related to sampling distributions?
The standard error of the sampling distribution of a sample mean is an estimate of how far the mean of the sampling distribution of a sample mean is from the population mean
What are the steps to hypothesis testing?
set up the hypothesis
Create a sampling distribution
determine the threshold for statistical significance
Determine the likelihood of our data in the context of the sampling distribution
Decide if our findings are statistically significant or not (based on steps 3 and 4)
Define and describe alpha
The type one error rate. It is the acceptable threshold for falsely rejecting H0. Set to .05
Define and describe beta
Equal to type two error. Assuming that the alternative hypothesis is true, the likelihood of falsy retaining the null hypothesis.
how are alpha and beta related to type one and type two errors
Type one error is equal to alpha.
Type two error is equal to beta
How would increasing the alpha level affect the critical values and beta?
The critical values will change (lower bound moves up/upper bound moves down), and beta decreases.
How would decreasing the alpha level affect the critical values and beta?
• What has changed: Alpha level (accept more means), Critical values, increases Beta level (retain more means)
If the difference between the null and alternative mean increased, how would that affect alpha, the critical values, and beta?
-alpha level: fixed value at .05
-critical values: these change b/c the null mean shifts
-beta: changes
-p-value- changes b/the null mean shifts.
How would increasing the sample size affect alpha, critical values, and beta?
-alpha: no change
-critical values: will change b/c the standard error of the man will change
-p-value: will change b/c the alternative mean is more extreme, and the standard error of the mean decreases
-beta level: will decrease b/c the critical vales and the standard error of the mean change
Scatterplot: Be able to describe a positive and negative linear relationship
If the slope is positive, then there is a positive linear relationship, i.e., as one increases, the other increases. If the slope is negative, then there is a negative linear relationship, i.e., as one increases the other variable decreases.
how is the p-value related to alpha?
If the p-value is greater than alpha, you accept the null hypothesis.
How do you decide to retain or reject the null hypothesis?
Use the threshold for statistical significance and the the likelihood of our data in the context of the sampling distribution
What are some problems with p-values?
-not a measure of effect size
-cant asses clinical significance
-forces binary decisions
Scatterplot: Be able to describe a scatterplot and a linear relationship between two variables
display an individual’s scores on a two-dimensional space. Scores for the “predictor” variable on the x-axis. Scores for the “outcome "variable on the y-axis
Scatterplot: Be able to the describe strength of the linear relationship
The more spread out the points are, the weaker the relationship. If the points are clearly clustered, or closely follow a curve or line, the relationship is described as strong.
What is the covariance?
the starting point for quantifying how x is related to y
How would you interpret the covariance?
The average product deviation btw 2 variables of x and y
What are the conceptual limitations to the covariance?
abstract concept hat doesn’t exist in the real world
What is a correlation?
standard co-variance bounded btw (-1,+1)
Compare and contrast a covariance and correlation
Covariance is an indicator of the extent to which 2 random variables are dependent on each other. A higher number denotes higher dependency. Correlation is a statistical measure that indicates how strongly two variables are related
What are the steps to significance testing for a correlation?
State the Research Hypothesis.
State the Null Hypothesis.
Select a probability of error level (alpha level)
Select and compute the test for statistical significance.
Interpret the results.
Describe range restriction: If variables x and y are correlated, how does restricting the range of x affect the variance of x and y? How does it affect the covariance?
affects the correlation between variables, and it will go down.
If x and y are uncorrelated, how does restricting the range of x affect the variance of x, the variance of y, and their covariance?
\n
No effect
Be able to identity and define a “predictor” and “outcome”
• Scores for the “predictor” variable on the x-axis
• Scores for the “outcome "variable on the y-axis
\n
What do we mean by “predicted” value of y?
• A variable that why you want to explain why scores are different • Usually, notated by y • Dependent variable: an outcome in an experimental design • A variable that is “dependent” on experimental condition randomly assigned to (treatment, control) • Criterion: just another general label for the y-variable in a regression
What are some advantages to linear regression compared to correlation?
The main advantage in using regression within your analysis is that it provides you with a detailed look of your data (more detailed than correlation alone) and includes an equation that can be used for predicting and optimizing your data in the future
what are some similarities between linear regression and correlation?
Both work to quantify the direction and strength of the relationship between two numeric variables.
What regression coefficients are included in an intercept-only model?
It compares a model with no predictors to the model that you specify. The model with zero predictor variables is also called “Intercept Only Model”. Y is included
For an intercept-only model, how we interpret the regression intercept?
• Estimated value of the outcome based on the predictors in the model. If we do not have any other information (i.e., predictors), our best guess about a random person’s sample average
For an intercept-only model, what would the line of best fit look like?
a horizontal line.
What is the interpretation of the regression intercept when there is quantitative predictor in the model?
The intercept is the conditional mean when all predictor(s) in the model are zero
What is the interpretation of the regression slope?
The slope is interpreted as the change of y for a one unit increase in x. This is the same idea for the interpretation of the slope of the regression line.
Statistical hypothesis
This is implied when we test for significance. Includes: Null hypothesis: the null and alternative mean are statistically equal -Difference is likely due to chance, Alternative hypothesis- the null and alternative are statically different. The difference is unlikely due to chance. Ad it only cares if there is a mean difference or not
Research hypothesis
Explicit about direction, our intuitive understanding of hypothesis. Does not include null and alternative.
determine the likelihood of our data in context of the sample data
p-value: assumes the null is true, the likelihood of getting a result as extreme as ours
Be able to describe a correlation between a qualitative and quantitative variable. What information does it provide versus not provide?
Correlation does not tell use the precise mean difference. The valance (−/+) tells use whether the group coded 1 mean is higher or lower compared to the group coded 0.
Predictor variable
• A variable meant to explain differences in the outcome of interest
• Usually, notated by x
• Independent variable: a variable that is experimentally manipulated (treatment, control)
• A variable that is “independent” from all other variables in the model
• Covariate: A potential confounding variable, included for statistical control