Review- Ch 3 Statistics

studied byStudied by 6 people
0.0(0)
Get a hint
Hint

r

1 / 23

24 Terms

1

r

r=Correlation coefficient, the average cross product of z scores

Definition

  • Measures the relationship BETWEEN 2 numeric variables

  • Strength and association

    • Measures direction(+-) and strength (-1 to 1), not shape

  • HOW closely points cluster around the “center” of data

Data

  • Univariate data→ mean

  • bivariate data→ regression line

  • Unitless, so changing the units does nothing

  • r must be BETWEEN -1 and 1, with 1 meaning perfect correlation

  • Not affected by which variable(x,y) is changing units

  • SAME Sign(+-) as the direction of the slope

  • STRONGLY affected by extreme values

  • If 1 variable has an equattion→ use it

    • (s-x) and (x) have a negative correlation because there is a negative

Math

r= Σ ((x-x̄)/sx)*(y-ȳ)sy)) / (n-1)

= ÎŁ( zx- zy) / (n-1)

<p>r=Correlation coefficient, the <strong>average cross product of z scores</strong></p><p>Definition  </p><ul><li><p>Measures the <strong>relationship</strong> BETWEEN 2 <strong>numeric</strong> variables  </p></li><li><p>Strength and association</p><ul><li><p>Measures <strong>direction</strong>(<span style="color: green">+</span><span style="color: red">-</span>) and <strong>strength</strong> (<span style="color: red">-1</span> to<span style="color: green"> 1</span>), <span style="color: red">not</span> shape </p></li></ul></li><li><p>HOW closely points <strong>cluster</strong> around the “center” of data</p></li></ul><p>Data</p><ul><li><p>Univariate data→ mean</p></li><li><p>bivariate data→ regression line </p></li><li><p><mark data-color="red">Unitless</mark>, so changing the units does <span style="color: red">nothing</span> </p></li><li><p>r must be BETWEEN <span style="color: red">-1</span> and<span style="color: green"> 1,</span> with 1 meaning <mark data-color="green">perfect</mark> correlation </p></li><li><p><span style="color: red">Not</span> affected by which variable(x,y) is changing units </p></li><li><p>SAME Sign(<span>+-</span>)  as the <strong>direction</strong> of the <strong>slope</strong></p></li><li><p>STRONGLY affected by <strong>extreme</strong> values </p></li><li><p>If 1 variable has an equattion→ use it</p><ul><li><p>(s-x) and (x) have a negative correlation because there is a negative </p></li></ul></li></ul><p>Math</p><p> r=  Σ ((x-x̄)/s<sub>x</sub>)*(y-ȳ)s<sub>y</sub>)) / (n-1)</p><p>= Σ( z<sub>x<sup>- </sup></sub>z<sub>y</sub>) / (n-1)</p>
New cards
2

Strength of r

STRENGTH of r, correlation coefficient

Numbers

0 to 0.5 →weak

0.5-0.8 →moderate

0.8 onwards→strong

  • A negative: LRSL is overpredicting data→ negative association

  • A positive: LRRSL is underpredicting data→ positive association

<p>STRENGTH of r, <strong>correlation coefficient </strong></p><p><u>Numbers</u></p><p>0 to 0.5 →weak</p><p>0.5-0.8 →moderate</p><p>0.8 onwards→strong </p><ul><li><p>A <span style="color: red">negative</span>: LRSL is <strong>overpredicting</strong> data→ <span style="color: red">negative</span> <strong>association</strong></p></li><li><p>A <span style="color: green">positive</span>:   LRRSL is <strong>underpredicting</strong> data→ <span style="color: green">positive</span> <strong>association</strong></p></li></ul>
New cards
3

Least Square Regression Line(LSRL)

  • Estimates and predictions, not actual values

  • reasonable only WITHIN the domain of the data(Interpolation

  • MUST pass through the mean(xĚ„, Čł)

  • Regression OUTLIERS

    • indicated by a point falling far away from the overall pattern

    • points with relatively large discrepancies BETWEEN the value of the response variable, y, and a predicted value for the response variable,Ĺ·

Math

LSRL=Ĺ· =a+bx

  • a =y intercept

  • b=slope

b=r(sy /sx)

SSE= ÎŁ(y-Ĺ·)

  • y= Actual

  • Ĺ·=predicted

New cards
4

r2

r2=Coefficient of determination

  • Calculates the proportion of the variance(variability) of one variable that is PREDICTED by the other variable

    • “ r2 as a 5 of the total variation in Y can be explained by the linear relationship BETWEEN X and Y in the regression line. “

  • What % of total data can be explained by the regression line?

  • Greater r2% → Better fit

Math

1-r2 = HOW much variability in Y is unaccountable by the regression line.

New cards
5

Describing Scatterplots

SOFA

S:Strength( Strong, Moderate, Weak, variability and Heteroscedasticity)

O: Outliers( in x, y direction, or BOTH)

F: Form(Linear or curved)

A: Association (Positive, negative, or no composition")

  • Describing SOFA relationship BETWEEN variables

STEPS

  1. Identify the variables, cases, and scale of measure

  2. Describe overall shape

  3. Describe the trend through the slope

  4. describe strength

  5. Generalization

  6. Note any lurking variables OR causation

New cards
6

Heteroscedasticity

  • Unequal variation in the plot

  • “Fanning left/right”

  • Doesn’t cause bias in the coefficient estimates, but make them less precise.

    • Lower precision increases the likelihood that the coefficient estimates are further from the correct population value.

  • tends to produce p-values that are smaller than they should be

New cards
7

Scatterplots

Graph

  • change can be seen in frequency bar charts

  • clusters→ modes(peaks, which can also show bimodal)

  • Scatterplots are only for bivariate data

New cards
8

Z score

  • Standardised Z

  • x, y values will be based on their +-, meaning their points location on the 4 quadrants of the coordinate plane, the origin (0,0) being the intersection

New cards
9

Regression

  • HOW 2 numerical variables AFFECT each other

  • (x, y) are not interchangeable

  • “Casual” affect, but NOT causation

  • Positive when independent and dependent variables are both increasing or decreasing together

  • Negative when independent and dependent variables are going opposite ways(ie. one is increasing the other is decreasing)

Mean

  • the regression to the mean: in ANY elliptical cloud of points whenever the correlation, r, is not perfect

    • A line fitting through this elliptical cloud has a slope of 1

New cards
10

Interpolation

Predicting data value within the dataset

New cards
11

Extrapolation

Predicting data value Outside the dataset

New cards
12

Slope interpretation

“for every 1 unit increase in the explanatory variable, x, there is a slope increase/decrease in the response variable, y.

New cards
13

SSE

The sum of square residual error

New cards
14

Residuals

*distance measurement

  • The net sum of residual and mean=0

  • The DIFFERENCE between an observed Y value and its predicted value from the regression line

    • Decreases when the regression line fits MORE data

Math

Residual= Y-Ĺ·

Positive output: linear model UNDERestimated the actual response variable

Negative output: linear model OVERestimated the actual response variable

New cards
15

Residual Plots

  • Scatter plot of regression residuals AGAINST the predicted y values

  • a “barometer” for HOW well the regression lines fit the data

  • curvature →sign of curvature in the original plot, meaning the original was a nonlinear regression

New cards
16

rules for regression

  1. The sum of residuals=0

  2. horizontal line: mean of residuals=0

  3. Residual Scattered=better fit for data

  4. Residual have pattern/curve= Not an appropriate line

New cards
17

Missed features in Scatterplots

  • These points will change the measurement

  • Influential points

  • High leverage points

  • outliers

  • lurking variables

New cards
18

Influential points

  • examples: Outliers, high-leverage

  • removal of points→sharply CHANGE the regression line

  • High leverage

    • x values are far from xĚ„

    • line up with pattern: doesn’t influence equation, strengthens correlation, r, and determination, r²

    • Not line up with pattern: dramatically CHANGES the equation, an influential point

  • Outliers

    • may cause r² and S to CHANGE

  • lurking variables

    • Correlation ≠ causation

New cards
19

Slope Changing Transformations

  • Line of fit to a scatterplot should be considered for a plot with curvature → adjust the plot using transformations

  • Nonlinear transformations change the shape of the graph, linear won’t

    • in terms of slope and correlation,r

  • ONLY required if a linear model/scatterplot has curvature

    • use log(ln) or log(log) depending on the plot

    • exponential and power

New cards
20

Exponential transformation

y=abx

New equation: log y=log a +x log b

  • generally used for growth in population

<p>y=ab<sup>x </sup></p><p>New equation: log y=log a +x log b </p><ul><li><p>generally used for <mark data-color="green">growth </mark>in <strong>population</strong> </p></li></ul>
New cards
21

Power Transformation

y= axb

New Equation: log y-log a+b log x

  • need to log the variable → final answer is 10ans

  • generally used for relationships BETWEEN height and weight

<p>y= ax<sup>b</sup></p><p>New Equation: log y-log a+b log x</p><ul><li><p>need to log the variable → final answer is 10<sup>ans</sup></p></li></ul><ul><li><p>generally used for <strong>relationships</strong> BETWEEN height and weight   </p></li></ul>
New cards
22

Common Transformations

knowt flashcard image
New cards
23

Power Transformations

knowt flashcard image
New cards
24

Calculator interpretation

Variable =x, explanatory variable

coefficient constant=y intercept, a

coefficient with variable= slope, b

Error SS=SSE

Residual:SSE

New cards

Explore top notes

note Note
studied byStudied by 8601 people
... ago
4.6(43)
note Note
studied byStudied by 2469 people
... ago
5.0(2)
note Note
studied byStudied by 1 person
... ago
5.0(1)
note Note
studied byStudied by 26989 people
... ago
4.9(62)
note Note
studied byStudied by 17 people
... ago
5.0(1)
note Note
studied byStudied by 7 people
... ago
5.0(1)
note Note
studied byStudied by 9 people
... ago
5.0(1)
note Note
studied byStudied by 2 people
... ago
5.0(1)

Explore top flashcards

flashcards Flashcard (41)
studied byStudied by 1 person
... ago
5.0(1)
flashcards Flashcard (22)
studied byStudied by 28 people
... ago
5.0(2)
flashcards Flashcard (26)
studied byStudied by 4 people
... ago
5.0(1)
flashcards Flashcard (41)
studied byStudied by 13 people
... ago
5.0(1)
flashcards Flashcard (197)
studied byStudied by 27 people
... ago
5.0(3)
flashcards Flashcard (63)
studied byStudied by 10 people
... ago
5.0(1)
flashcards Flashcard (30)
studied byStudied by 6 people
... ago
5.0(1)
flashcards Flashcard (32)
studied byStudied by 5 people
... ago
5.0(1)
robot