1/35
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
examination of a relationship between 2 variables
scatterplot
statistics provides us
with the ability to quantify the relationship between measurement variables
If there really was nothing going on in the population, what us the chance that this relationship would have been observed?
if the chance was small, we declare the relationship to be statistically significant and not to be just a fluke.
To believe an observed relationship, it must be shown to be statistically significant
a relationship is deemed to be statistically significant if the chance of observing the relationship, when there is actually nothing going on in the population, is less than
5%
statistical significance may be misinterpreted - 1
A minor or weak relationship will achieve statistical significance when the sample is very large - a relationship which is declared to be statistically significant is not necessarily a strong relationship
statistical significance may be misinterpreted - 2
A very strong relationship won't achieve statistical significance if the sample is very small - a small sample means not enough observations have been taken to rule out chance as an explanation of the relationship
correlation
Pearson product-moment correlation
Correlation coefficient
r
correlation measures
the strength of linear relationships only
correlation between 2 measurement variables
a measure of how closely their values fall to a straight line
a positive correlation indicates
the variables increase together
a negative correlation indicates
as one variable increases the other variable decreases
correlation is unaffected by
units of measurement
correlation of r = +1
there is a perfect linear relationship between the 2 variables
r = -1
that there is a perfect linear relationship between the 2 variables
r = 0
there is no relationship between the two variables
2 main problems
Outliers can substantially inflate or deflate correlations
Combining groups within the population inappropriately can mask relationships
masking relationships
The variables may be correlated within a group, but when the data is merged these relationships will be masked
Correlation ⇏ causation
if two variables are legitimately correlated do not be fooled into thinking that there is a causal connection between them
regression
we have one explanatory variable and one response variable
sample linear regression
fitted to these data to provide a quantitative summary of the relationship between sales and advertising budget
least squares line
Focus on the response variable and try to fit the line so that the observed response values are as close as possible to the line. This means that the vertical deviation between each response (or y) data point and the line is as small as possible
the vertical deviations
squared and then added up for all points in the scatter plot. The line which minimizes this sum of squared distance is the line which fits the data best. This is called the least squares line
R2
the amount of the total variation in the response variable that is explained by the explanatory variable
R2 = 1
all of the observed responses lie exactly on a straight line. In other words, our explanatory variable explains all of the variation in our response variable
R2 = 0
our explanatory variable explains none of the variation in our response variable - the linear regression model is not a good model for the data
multiple linear regression
an extension of simple linear regression with more that one explanatory variable
significant relationships
To test whether a relationship is statistically significant or not we need to perform a hypothesis test
hypothesis testing allows
to use a sample of data to decide between 2 statements about a population characteristic
population characteristic
things like the mean of the population or the proportion of the population who have a particular property
the null hypothesis
hypothesis which we initially assume to be true
the alternative hypothesis
the reason the data was collected in the first place - suspect null false
significance levels
the probability of making a type I error when the null hypothesis is true
p-values
the probability of observing a value of the characteristic of interest the same, or more extreme than what was actually observed in the data
type I error
false positive
type II error
false negative
interpreting p-values
If p-value ≤ α fail to accept H0 at the α% significance level
If p-value ≥ α do not reject H0 at the α% significance level