1/23
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Univariate Data
Data with a singular variable.
Bivariate Data
Data with multiple variables.
Steps for Data Analysis
Plot the data, pointing out patterns and outliers.
Use a numerical summary including r, r2 for bivariate.
Overall regression pattern to make a simplified model.
Models for Data
Univariate: Normal distribution; bivariate: linear, power, exponential model.
Explanatory Variable
A variable that predicts or explains changes in a response variable.
Response Variable
A variable that measures the outcome of a data set.
Scatterplots
A measure of the relationship between two quantitative variables for the same individuals.
Describing Scatterplots
Direction: positive, negative, no association.
Form: linear, non-linear.
Strength: weak, moderate, very strong
Context: mention variables, question asked.
Unusual features: outliers, deviation, clusters.
Correlation r
Describes the correlation between two variables in a linear model; between -1 and 1. Does not describe causation or form and is not resistant.
Regression Line
A line that summarizes the relationship between two variables when one variable is explanatory; y hat = a + bx.
Residuals
Leftover variation in response variables from the predicted response variable; y - y hat.
a
The predicted y hat when x is equal to 0.
b
The predicted change in y hat per x increasing by one unit.
Least Squares Regression Line
The line that makes the sum of squared residuals as small as possible.
Residual Plot
A plot that displays residuals on the y-axis and the explanatory variable on the x-axis.
Linear Regression on a Residual Plot
Should have no obvious patterns, small in size, and no curved pattern.
Standard Deviation of Residuals
The typical size of a residual; measures the typical distance between y and y hat.
R2
The coefficient of determination, which measures the percent of variability in y accounted for by the regression line, y hat.
Computer Regression Output
slope: b (in response variable)
y-intercept: a
s: standard deviation
r and r2
Using Summary Statistics to Create a Regression Line
Slope: b=r(sy/sx)
Y=intercept: a=mean y minus b(mean x)
High leverage
Much larger or smaller x-value than other points.
Outliers
Has a large residual, usually in the y-value.
Influential Point
A point that if removed, substantially changes the slope of the regression line.