Are apparent patterns and associations in data always meaningful?
No, apparent patterns and associations in data may be random or not meaningful.
What types of graphs can be used to represent one categorical variable broken down by categories of another categorical variable?
Side-by-side bar graphs, segmented bar graphs, and mosaic plots.
How can graphical representations of two categorical variables be used?
They can be used to compare distributions and determine if variables are associated.
What is a two-way table, and what does it summarize?
A two-way table, or contingency table, summarizes two categorical variables, with cell entries showing frequency counts or relative frequencies.
How is joint relative frequency calculated in a two-way table?
It is calculated by dividing a cell frequency by the total for the entire table.
What are marginal relative frequencies in a two-way table?
They are the row and column totals divided by the total for the entire table.
What is a conditional relative frequency?
It is a relative frequency for a specific part of the table, such as cell frequencies in a row divided by the total for that row.
What can summary statistics for two categorical variables be used for?
They can be used to compare distributions and determine if variables are associated.
What is a bivariate quantitative data set?
It consists of observations of two different quantitative variables made on individuals in a sample or population.
What does a scatterplot show?
It shows two numeric values for each observation, with one on the x-axis and one on the y-axis.
What is an explanatory variable?
It is a variable used to explain or predict the values of a response variable.
How do you describe a scatterplot?
By considering its form, direction, strength, and any unusual features.
What are the possible directions of association in a scatterplot?
The association can be positive (as one variable increases, the other increases) or negative (as one variable increases, the other decreases).
How can the form of association in a scatterplot be described?
It can be linear or non-linear.
How is the strength of association in a scatterplot described?
By how closely the points follow a pattern, such as linear, and it can be strong, moderate, or weak.
What are some unusual features in a scatterplot?
Clusters of points or points with large discrepancies between actual and predicted values.
What does the correlation coefficient (r) measure?
It measures the direction and strength of the linear association between two quantitative variables.
How is the correlation coefficient (r) usually determined?
The most common way is by using technology.
Does a correlation close to ±1 always indicate a linear relationship?
No, a correlation close to ±1 does not always mean that a linear model is appropriate.
What are key properties of the correlation coefficient (r)?
It is unit-free, always between -1 and 1, and an r of 0 indicates no linear association.
Does correlation imply causation?
No, a relationship between two variables does not mean that changes in one cause changes in the other.
What is a simple linear regression model?
It’s an equation that uses an explanatory variable (x) to predict a response variable (y).
How is the predicted response value (y-hat) calculated in a linear regression model?
y = a + bx, where a is the y-intercept, b is the slope, and x is the explanatory variable value.
What is extrapolation, and why is it risky?
Extrapolation is predicting a response value beyond the range of the data, which makes the prediction less reliable.
What is a residual in regression?
It’s the difference between the actual value and the predicted value: residual = y - y-hat.
What does apparent randomness in a residual plot indicate?
It suggests that the linear model is appropriate for the data.
What does the least-squares regression model do?
It minimizes the sum of the squares of the residuals and includes the point (x̄, ȳ).
How is the slope (b) of a regression line calculated?
b = r(s_y / s_x), where r is the correlation, s_y is the standard deviation of the response variable, and s_x is the standard deviation of the explanatory variable.
Does the y-intercept always have a logical interpretation in context?
No, the y-intercept might not always have a logical meaning in context
What are the coefficients in the least-squares regression model?
They are the estimated slope and y-intercept.
What does the slope represent in a regression line?
It represents the amount the predicted y-value changes for every unit increase in x
What does the y-intercept represent in a regression line?
It represents the predicted value of the response variable when the explanatory variable equals 0.
What is the coefficient of determination (r²) in linear regression?
It’s the square of the correlation (r) and indicates the proportion of variation in the response variable explained by the explanatory variable
Why might transformations of variables be used in regression?
Transformations can make data more linear, improving the model’s fit.
What does increased randomness in residual plots after data transformation suggest?
It suggests that the transformed data is a better fit for a linear model.