Studied by 5 people

0.0(0)

Get a hint

Hint

1

r

r=Correlation coefficient, the **average cross product of z scores**

Definition

Measures the

**relationship**BETWEEN 2**numeric**variablesStrength and association

Measures

**direction**(+-) and**strength**(-1 to 1), not shape

HOW closely points

**cluster**around the “center” of data

Data

Univariate data→ mean

bivariate data→ regression line

Unitless, so changing the units does nothing

r must be BETWEEN -1 and 1, with 1 meaning perfect correlation

Not affected by which variable(x,y) is changing units

SAME Sign(+-) as the

**direction**of the**slope**STRONGLY affected by

**extreme**valuesIf 1 variable has an equattion→ use it

(s-x) and (x) have a negative correlation because there is a negative

Math

r= Σ ((x-x̄)/s_{x})*(y-ȳ)s_{y})) / (n-1)

= Σ( z_{x- }z_{y}) / (n-1)

New cards

2

Strength of r

STRENGTH of r, **correlation coefficient **

__Numbers__

0 to 0.5 →weak

0.5-0.8 →moderate

0.8 onwards→strong

A negative: LRSL is

**overpredicting**data→ negative**association**A positive: LRRSL is

**underpredicting**data→ positive**association**

New cards

3

Least Square Regression Line(LSRL)

Estimates and

**predictions**, not actual valuesreasonable only WITHIN the

**domain**of the data(InterpolationMUST pass through the mean(x̄, ȳ)

Regression OUTLIERS

indicated by a point falling

**far**away from the overall patternpoints with relatively large

**discrepancies**BETWEEN the value of the response variable, y, and a predicted value for the response variable,ŷ

__Math__

LSRL=ŷ =a+bx

a =y intercept

b=slope

b=r(s_{y /}s_{x})

SSE= Σ(y-ŷ)

y= Actual

ŷ=predicted

New cards

4

r^{2}

r^{2=Coefficient of }^{determination}

Calculates the

**proportion**of the**variance**(variability) of one variable that is PREDICTED by the other variable“ r

^{2 }as a 5 of the**total variation i**n Y can be explained by the**linear relationship**BETWEEN X and Y in the regression line. “

What % of

**total data**can be explained by the regression line?Greater r

^{2% }→ Better fit

Math

1-r^{2}_{ = }HOW much **variability** in Y is unaccountable by the regression line.

New cards

5

Describing Scatterplots

**SOFA**

S:Strength( Strong, Moderate, Weak, variability and **Heteroscedasticity**)

O: Outliers( in x, y direction, or BOTH)

F: Form(Linear or curved)

A: Association (Positive, negative, or no composition")

Describing SOFA

**relationship**BETWEEN variables

__STEPS__

Identify the variables, cases, and scale of measure

Describe overall shape

Describe the trend through the slope

describe strength

Generalization

Note any lurking variables OR causation

New cards

6

Heteroscedasticity

Unequal variation in the plot

“Fanning left/right”

Doesn’t cause bias in the

**coefficient estimates**, but make them less precise.Lower precision increases the likelihood that the coefficient estimates are

**further**from the correct population value.

tends to produce p-values that are smaller than they should be

New cards

7

Scatterplots

Graph

change can be seen in frequency bar charts

clusters→ modes(peaks, which can also show bimodal)

Scatterplots are only for bivariate data

New cards

8

Z score

Standardised Z

x, y values will be based on their +-, meaning their points location on the 4 quadrants of the coordinate plane, the origin (0,0) being the

**intersection**

New cards

9

Regression

HOW 2 numerical variables AFFECT each other

(x, y) are not interchangeable

“Casual” affect, but NOT causation

Positive when independent and dependent variables are

**both**increasing or decreasing togetherNegative when independent and dependent variables are going

**opposite ways**(ie. one is increasing the other is decreasing)

Mean

the

**regression to the mean**: in ANY elliptical cloud of points whenever the correlation, r, is not perfectA line fitting through this elliptical cloud has a

**slope**of 1

New cards

10

Interpolation

Predicting data value **within** the dataset

New cards

11

Extrapolation

Predicting data value **Outside** the dataset

New cards

12

Slope interpretation

“for every 1 unit increase in the explanatory variable, x, there is a slope increase/decrease in the response variable, y.

New cards

13

SSE

The sum of square residual error

New cards

14

Residuals

*distance measurement

The net sum of residual and mean=0

The DIFFERENCE between an observed Y value and its predicted value from the regression line

Decreases when the regression line fits MORE data

__Math__

Residual= Y-ŷ

Positive output: linear model UNDERestimated the actual response variable

Negative output: linear model OVERestimated the actual response variable

New cards

15

Residual Plots

Scatter plot of regression residuals AGAINST the predicted y values

a “barometer” for HOW well the regression lines fit the data

curvature →sign of curvature in the

**original**plot, meaning the original was a nonlinear regression

New cards

16

rules for regression

The

**sum**of residuals=0horizontal line:

**mean**of residuals=0Residual Scattered=better fit for data

Residual have pattern/curve= Not an appropriate line

New cards

17

Missed features in Scatterplots

These points will change the measurement

Influential points

High leverage points

outliers

lurking variables

New cards

18

Influential points

examples: Outliers, high-leverage

removal of points→sharply CHANGE the regression line

**High leverage**x values are far from x̄

line up with pattern: doesn’t influence equation, strengthens correlation, r, and determination, r²

Not line up with pattern: dramatically CHANGES the equation, an

**influential point**

**Outliers**may cause r² and S to CHANGE

**lurking variables**Correlation ≠ causation

New cards

19

Slope Changing Transformations

Line of fit to a scatterplot should be considered for a plot with

**curvature**→**adjust**the plot using transformationsNonlinear transformations change the shape of the graph, linear won’t

in terms of slope and correlation,r

ONLY required if a linear model/scatterplot has

**curvature**use log(ln) or log(log) depending on the plot

exponential and power

New cards

20

Exponential transformation

y=ab^{x }

New equation: log y=log a +x log b

generally used for growth in

**population**

New cards

21

Power Transformation

y= ax^{b}

New Equation: log y-log a+b log x

need to log the variable → final answer is 10

^{ans}

generally used for

**relationships**BETWEEN height and weight

New cards

22

Common Transformations

New cards

23

Power Transformations

New cards

24

Calculator interpretation

Variable =x, explanatory variable

coefficient constant=y intercept, a

coefficient with variable= slope, b

Error SS=SSE

Residual:SSE

New cards