3.1: Correlation and Variate Relationships
Explanatory variable: a variable that attempts to explain or influence observed outcomes
What is being used to make the prediction
Displayed on the x-axis
Response variable: a variable that measures some outcome
What is being predicted
Displayed on the y-axis
Form: linear, curve, u-shape, etc.
Unusual Points: outliers, influential points
Outlier: a point with a large residual (usually decreases the correlation)
Influential: a point which draws the line toward it (usually increases the correlation)
Direction: positive or negative association (or neither)
Positive association—as one variable increases, so does the other
Negative association—as one variable increases, the other decreases
Strength: how closely the points follow the form
Strong, weak, moderately strong/weak
Individual points with large residuals are outliers in the y direction because they lie far from the line that describes the overall pattern
Individual points that are extreme in the x direction may not have large residuals, but can be very important; such points are influential if removing them would markedly change the results of the calculation
Gives the direction and strength of a linear relationship
Does not imply causation
Makes no distinction between explanatory and response variables
Can switch x’s and y’s and they would still be correlated
Both variables must be quantitative
Standardized and will not change if we change/convert units of measurement from x, y, or both
r itself has no units
Positive r = positive association
Negative r = negative association
Correlation only measures strength and direction of linear relationships
-1 ≤ x ≤ 1 always
The closer r is to 1 or -1, the stronger the linear form
The closer r is to 0, the weaker the linear form and the more scattered the points are
r does not tell the whole story
Two-way table: a table that displays data for two categorical variables about the same group of individuals
Marginal distribution: the total for one categorical variable
The yellow box shows the marginal distribution for gender, and the purple box is the marginal distribution of opinions
Conditional distribution: the distribution within just one value of one variable
Often uses language of the probability of A “given” B
Also known as segmented bar charts
Segmented bar graph: a chart that displays categorical data as a percentage of the whole
Similar to a pie chart
Mosaic plot: a segmented bar graph used to compare groups where the widths of the bars are proportional to the size of the groups
Mosaic plots of the same data from the previous section:
Explanatory variable: a variable that attempts to explain or influence observed outcomes
What is being used to make the prediction
Displayed on the x-axis
Response variable: a variable that measures some outcome
What is being predicted
Displayed on the y-axis
Form: linear, curve, u-shape, etc.
Unusual Points: outliers, influential points
Outlier: a point with a large residual (usually decreases the correlation)
Influential: a point which draws the line toward it (usually increases the correlation)
Direction: positive or negative association (or neither)
Positive association—as one variable increases, so does the other
Negative association—as one variable increases, the other decreases
Strength: how closely the points follow the form
Strong, weak, moderately strong/weak
Individual points with large residuals are outliers in the y direction because they lie far from the line that describes the overall pattern
Individual points that are extreme in the x direction may not have large residuals, but can be very important; such points are influential if removing them would markedly change the results of the calculation
Gives the direction and strength of a linear relationship
Does not imply causation
Makes no distinction between explanatory and response variables
Can switch x’s and y’s and they would still be correlated
Both variables must be quantitative
Standardized and will not change if we change/convert units of measurement from x, y, or both
r itself has no units
Positive r = positive association
Negative r = negative association
Correlation only measures strength and direction of linear relationships
-1 ≤ x ≤ 1 always
The closer r is to 1 or -1, the stronger the linear form
The closer r is to 0, the weaker the linear form and the more scattered the points are
r does not tell the whole story
Two-way table: a table that displays data for two categorical variables about the same group of individuals
Marginal distribution: the total for one categorical variable
The yellow box shows the marginal distribution for gender, and the purple box is the marginal distribution of opinions
Conditional distribution: the distribution within just one value of one variable
Often uses language of the probability of A “given” B
Also known as segmented bar charts
Segmented bar graph: a chart that displays categorical data as a percentage of the whole
Similar to a pie chart
Mosaic plot: a segmented bar graph used to compare groups where the widths of the bars are proportional to the size of the groups
Mosaic plots of the same data from the previous section: