1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Univariate data
Only one variable to describe, can be graphed (e.g. histogram) or numerically described (e.g. mean and SD)
Bivariate data
Data that has two variables, can be graphed on a scatterplot, can be numerically described with each individual variable and the association between the two variables can be described
Correlation
Statistical technique that is used to measure and describe the relationship between two variables, specifically how strongly and in what way the two variables are related
T or F: Knowing how two variables are correlated allows us to make predictions
T
What kind of variable is used with correlational data
Measured variable, typically one that cannot be manipulated
What are the 4 ways that the correlation can help us describe the data
Direction
Shape
Strength
Magnitude
Direction of correlation coefficient
Can be positive or negative
What does it mean if the correlation coefficient is negative, positive or zero
Negative - variables change in the opposite direction (one increases the other decreases)
Positive - variables change in the same direction (one increases the other increases)
Zero - no relationship, points are scattered widely
Shape of correlation coefficient
Can be linear or curvilinears
What does it mean if the shape of a scatterplot is linear or curvilinear
Linear - straight line relationship, data points are clustered around a line
Curvilinear - consistent predictable relationship that is expressed in a quadratic, cubic or quartic shape
What is the problem regarding the correlation coefficient if a scatterplot has a curvilinear relationship
It will underestimate the relationship as it is only meant to view linear relationships, this is why its important to always graph data
Strength of correlation coefficient
Subjective measure of relationship, can be weak moderate or strong depending on how closely the data points are clustered together
Magnitude of correlation coefficient
Objective measure of relationship based on computed r value that ranges from -1 to 1
T or F: an r value of -0.8 is weaker than an r value of +0.3
F, it is stronger, the polarity simply explains the direction of the relationship
Pearson correlation
measures the degree and direction of the linear relationship between two variables
How is Pearson’s r calculated
Degree to which X and Y vary together / degree to which X and Y vary separately
What are the 3 steps to calculate Pearson’s r
Plot data
Compute univariate stats (mean and SD for each variable separately)
Compute bivariate stats (compute the relationship between deviation scores to determine if they deviate in the same or opposite direction)
How does one calculate bivariate stats
HINT: 4 steps
Compute deviation scores
Compute sums of products (SP)
Compute covariance (COV)
Compute Correlation coefficient (r)
How does one compute deviation scores
Scores of X = x - x̄
Scores of Y = y -ȳ
Sums of Products (SP)
Tells us whether the scores deviate in the same or opposite direction (essentially the same as SS, but with two variables)
How is SP calculated
SP = Σ(x−x̄)(y−ȳ)
Covariance (COV)
Measure of the average extent for which scores on two variables covary from their respective means across the entire group of scores
How is COV calculated
COV = SP / N = Σ(x−x̄)(y−ȳ) / N
What does COV tell us if the number is positive, negative or zero
Positive - Deviate consistently in the same direction, graph is positive
Negative - Deviate consistently in the opposite direction, graph is negative
Zero - Some values vary in the same direction others vary in the opposite direction
How is Pearson’s r calculated
r = COVxy / SDxSDy
What are the 2 major categories for interpreting the Pearson correlation
Human behaviour
Test reliability
What are the general guidelines for human behaviour
No relationship - value is between 0 and |.10|
Weak relationship - value ranges from |.10| to |.30|
Moderate - value ranges from >|.30| to |.50|
Strong - value is > |.50|
What are the general guidelines for test reliability
Very desirable - value is > .85
Moderately desirable - value is .70 to .85 (moderately acceptable)
Not desirable - value is < .70 (poor reliability)
Note: Test reliability must be positive
What are the 6 major factors effecting r
Sampling error
Unmeasured 3rd variable
Heterogenous sample
Sampling from a truncated range
Non-linearity
Heteroscedasticity in data
Sampling error
Naturally occurring discrepancy that exists between a sample statistic and corresponding parameter (sample and parameter will never be equivalent), can make r larger or smaller
Unmeasured third variable
Could be that correlation is caused by an outside variable, why we cannot assume causation from correlation, can cause r to be larger or smaller
Mediator
Type of unmeasured third variable, when an outside variable exists due to a cause that effects the outcome (cause does not directly imply outcome, mediator explains this)
Moderator
Type of unmeasured third variable that can affect the relationship in an unknown way by influencing the strength of the relationship
Spurious
Type of unmeasured third variable, there is no cause and simply a correlation due to a number of outside forces
Heterogenous sample
Data in which the sample of observations can be subdivided into two distinct sets on the basis of some other variable, can cause r to be larger or smaller
T or F: a heterogenous sample can cause r to be larger, smaller or even zero
T, can imply a relationship that does not exist, can also cause r = 0 due to the two homogenous groups being oppositely correlated from one another
Sampling from truncated range
Severely restricted range may provide different correlation, can cause r to be larger or smaller
T or F: We an generalize a correlation beyond the sample range of data
F, can lead to a correlation that is entirely false
Non-linearity
We may get a smaller r value due to the data being non-linear, can lead to r being underestimated
Heteroscedacticity in data
Variance in y is not constant across the range of x variables, typically caused by a skew in one/both variables, can cause r to be underestimated
Homoscedasticity
Variable in Y scores remains constant across increasing values of X (this is a good thing)
What are the 4 types of correlations we calculate in this class
Pearson’s r
Spearman rho (ρ, rs)
Point biserial (rpb)
Phi (ɸ)
What kind of variables need to be used with Pearson’s r
Both variables need to be interval/ratio
What kind of variables need to be used with spearman rho
used any time at least one of our variables are ordinal and the other is interval/ratio/nominal or if there is a weak curvilinear relationship with interval/ratio data
T or F: spearman rho should be used if data heteroskedastic
T
How is spearman rho calculated
Convert all scores into ranks and then use the pearson formula to find how consistent increase in one variable are associated with another
T or F: if pearson r is 1 then spearman rho is also 1
T
Point biserial
Used when one variable is nominal and the other is interval/ratio, calculated using the pearson formula (only if nominal value is ≤ 2 categories)
Phi
Use when both variables are nominal, both groups must be dichotomous
Dichotomous
Categorical variables with only two possible, mutually exclusive outcomes (e.g. yes/no)