Unit 2: Correlation and Regression

0.0(0)
studied byStudied by 2 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/100

flashcard set

Earn XP

Description and Tags

Flashcards covering key vocabulary and concepts from the lecture notes on Correlation and Regression.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

101 Terms

1
New cards

Unit 2 Focus

Correlation and Regression

2
New cards

Unit 2 Summary Topics

Association, explanatory variable, response variable, scatterplots, correlation, least squares criterion and least squares regression line, prediction, slope, intercept, r 2, residuals, outliers, influential observations, association vs. causation, lurking variables, and extrapolation.

3
New cards

Unit 1 Statistics

Comparing two or more populations with respect to the same variable.

4
New cards

Unit 2 New question:

Examine relationships between two or more variables with respect to the same population?

5
New cards

Examine Relationships Focus

The nature of the relationship between the variables.

6
New cards

Examine Relationships Focus

One of the variables might be thought to explain/predict the other one.

7
New cards

Explanatory Variable

Variable thought to explain/predict the other one, denoted by X, values represented by x.

8
New cards

Response Variable

The variable that is being explained/predicted, denoted by Y, values represented by y.

9
New cards

Explanatory Variable Role

To explain or predict the response variable.

10
New cards

Explanatory Variable Example

Number of cups of coffee per day

11
New cards

Response Variable Example

Number of hours of sleep

12
New cards

Explanatory/Response Variable Example

Percentage grades in English and Math courses - only interested in the nature of the relationship.

13
New cards

Scatterplots Purpose

To visualize/display the relationship between two quantitative variables.

14
New cards

Scatterplots Definition

Displays the values of two different quantitative variables measured on the same individuals, on a Cartesian plane.

15
New cards

Scatterplots Variable Positioning

Explanatory variable on the x-axis, response variable on the y-axis. If there isn't an explanatory/response variable, the choice of axes is arbitrary.

16
New cards

Scatterplots Exam Score Example Question

What do you notice? (re: relationship between classes missed and exam score)

17
New cards

Scatterplots Examination

Form, direction, strength, outliers.

18
New cards

Linear Relationship

A straight line would do a fairly good job at approximating the relationship between the two variables.

19
New cards

Non-Linear Relationships

Quadratic, logarithmic, exponential, etc.

20
New cards

Negative Association

The pattern of points slopes downwards from left to right.

21
New cards

Positive Association

The pattern of points slopes upwards from left to right.

22
New cards

Strength of Relationship

Determined by how close the points lie to a simple form, such as a straight line.

23
New cards

Strong Relationship

Points fall quite close to the line.

24
New cards

Weak Relationship

Points appear to be randomly scattered and many fall far from the approximating line.

25
New cards

Outliers for Bivariate Data

Observation may be outlying in the x-direction, the y-direction, or both. An outlier could simply fall outside the general pattern of points.

26
New cards

Linear Relationship Strength Assessment

Better to use a numerical measure, called correlation.

27
New cards

Correlation Coefficient

Denoted by r, measures both the direction and strength of a linear relationship between two quantitative variables.

28
New cards

Correlation Coefficient Formula

r = (1 / (n - 1)sx sy) * Σ((xi - x̄)(yi - ȳ))

29
New cards

Correlation Calculation Steps

Calculate x̄, ȳ, sx and sy; calculate deviations xi - x̄ and yi - ȳ; multiply corresponding deviations; add the n products; divide by (n - 1)sx sy.

30
New cards

Correlation Properties: Positive values

Indicate a positive association.

31
New cards

Correlation Properties: Negative values

Indicate a negative association

32
New cards

Correlation Properties

r is always a number between -1 and 1 (inclusive).

33
New cards

Correlation Properties: r near 1

Indicates a strong positive linear association.

34
New cards

Correlation Properties: Positive r near 0

Indicates a weak positive linear association.

35
New cards

Correlation Properties: r near -1

Indicates a strong negative linear association.

36
New cards

Correlation Properties: Negative r near 0

Indicates a weak negative linear association.

37
New cards

Correlation Properties: r = 1

Perfect positive linear relationship.

38
New cards

Correlation Properties: r = -1

Perfect negative linear relationship.

39
New cards

Correlation Properties: r = 0

No linear association.

40
New cards

Correlation Properties: Units

r has no units; it's just a number.

41
New cards

Correlation Properties: Distinction

Makes no distinction between X and Y.

42
New cards

Correlation Properties: Units Change

Changing the units of X and Y does not affect the correlation.

43
New cards

Correlation Limitations

Measures only the strength of a linear relationship. Useless if there's another type of relationship.

44
New cards

Correlation Caveats

Correlation does not imply causation!!!!!!

45
New cards

Lurking Variable

A variable that helps explain the relationship between variables in a study but is not included in the study itself.

46
New cards

Correlation Affected by Outliers

Yes, due to dependence on sample mean and standard deviation.

47
New cards

Regression Line

A straight line that describes how a response variable Y changes as an explanatory variable X changes.

48
New cards

Regression Line Use

Used to predict values of Y for given values of X.

49
New cards

Least Squares Regression Line

Line that minimizes the sum of squared deviations in the vertical direction: Σ(yi - ŷi)^2.

50
New cards

Least Squares Regression Line Formula

ŷ = b0 +b1x, where: b1 = r (sy/sx) (slope) and b0 = ȳ - b1x̄ (intercept).

51
New cards

Slope of Least Squares Regression Line

Predicted increase in y when x increases by one unit.

52
New cards

Intercept of Least Squares Regression Line

Predicted value of y when x=0.

53
New cards

Slope Formula

r * (sy / sx)

54
New cards

Intercept Formula

y bar - b1 * x bar

55
New cards

r^2 Definition

Fraction of variation in Y that is accounted for by its regression on X.

56
New cards

r^2 = 1

Predict Y exactly for any value of X.

57
New cards

r^2 = 0

Regression on X tells us absolutely nothing about the value of Y.

58
New cards

Residual Definition

The value yi-ŷi (for i = 1, 2, 3, …, n): actual value of y - predicted value of y.

59
New cards

Residual

Reflects the error of our prediction

60
New cards

Positive Residual

An observation falls above the least squares regression line.

61
New cards

Negative Residual

An observation falls below the least squares regression line.

62
New cards

Least Squares Regression Line Minimization

Minimizes the sum of squared residuals.

63
New cards

Extrapolation

The process of predicting a value of Y for a value of X that's outside our range of data.

64
New cards

Extrapolation Prediction

Gives unreliable predictions and should be avoided.

65
New cards

Scatterplots Y - direction outlier

Generally has little effect on the regression line

66
New cards

Scatterplots Not outlier

Falls outside the general pattern of points--bivariate outlier. Generally has little effect on the regression line.

67
New cards

Scatterplots X - direction outlier

Has a strong effect on the regression line.

68
New cards

Influential Observation

Removing it from the data set would dramatically alter the position of the least squares regression line (and thus the value of r^2 as well).

69
New cards

Influential Observation Outliers

Outliers in the x-direction are typically influential observations.

70
New cards

Least Squares Regression Line Property

Always passes through the point (x̄, ȳ).

71
New cards

Observational Study

A study where individuals are simply observed. The observed relationship could be due to one or more lurking variables.

72
New cards

Experiment

The values of the explanatory variable are randomly "assigned" to the sample units, rather than simply being observed prior to the study.

73
New cards

Marijuana Example

Correlation between X and Y was calculated to be r = 0.85 among teens.

74
New cards

Drug Lurking Variable

The availability of drugs in different cities.

75
New cards

Realistic Observational Studies

Realistically, observational studies are often more feasible

76
New cards

Scatterplots Categorical Variables

Sometimes, a scatterplot may actually be displaying two or more distinct relationships

77
New cards

Categorical Variables Conclusion

Careful when examining a relationship to ensure that the data belongs to only one population.

78
New cards

Association

Relationship between two variables.

79
New cards

Explanatory Variable

Variable used to predict or explain changes in the response variable.

80
New cards

Response Variable

Variable that is affected by the explanatory variable.

81
New cards

Scatterplot

Graphical representation of the relationship between two quantitative variables.

82
New cards

Correlation

Statistical measure that describes the strength and direction of a linear relationship between two variables.

83
New cards

R^2

Coefficient of determination, indicating the proportion of variance in the response variable that is predictable from the explanatory variable(s).

84
New cards

Residual

Difference between the actual value and the predicted value.

85
New cards

Outlier

Data point that differs significantly from other data points in the set.

86
New cards

Influential Observation

Observation that, if removed, would substantially change the fitted regression line.

87
New cards

Lurking Variable

Variable that is not measured in the study but affects the relationship between the explanatory and response variables.

88
New cards

Correlation Coefficient (r)

A number between -1 and +1 expressing the degree of relationship between two variables

89
New cards

Extrapolation

Predicting values outside the range of the data; can lead to unreliable predictions.

90
New cards

Least Squares Criterion

Method of finding the regression line that minimizes the sum of the squares of the vertical distances between the data points and the line.

91
New cards

Strength

How closely the data follows the overall pattern

92
New cards

Predicted Value

The value of y-hat

93
New cards

Direction

Can be positive or negative depending on association

94
New cards

Causation

One variable directly affects the other

95
New cards

Least-squares Regression Line

The line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible

96
New cards

Slope

The amount by which y is predicted to change when x increases by one unit

97
New cards

Intercept

The predicted valve for y when x=0

98
New cards

Linear Regression

a statistical method used to fit a linear model to a given data set.

99
New cards

Association vs Causation

Just because to variables are correlated, does not imply that one causes another.

100
New cards

Linear Relationship

When the data is somewhat close to forming a straight line.