Review- Ch 3 Statistics

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

r

r=Correlation coefficient, the average cross product of z scores

Definition

  • Measures the relationship BETWEEN 2 numeric variables

  • Strength and association

    • Measures direction(+-) and strength (-1 to 1), not shape

  • HOW closely points cluster around the “center” of data

Data

  • Univariate data→ mean

  • bivariate data→ regression line

  • Unitless, so changing the units does nothing

  • r must be BETWEEN -1 and 1, with 1 meaning perfect correlation

  • Not affected by which variable(x,y) is changing units

  • SAME Sign(+-) as the direction of the slope

  • STRONGLY affected by extreme values

  • If 1 variable has an equattion→ use it

    • (s-x) and (x) have a negative correlation because there is a negative

Math

r= Σ ((x-x̄)/sx)*(y-ȳ)sy)) / (n-1)

= ÎŁ( zx- zy) / (n-1)

<p>r=Correlation coefficient, the <strong>average cross product of z scores</strong></p><p>Definition  </p><ul><li><p>Measures the <strong>relationship</strong> BETWEEN 2 <strong>numeric</strong> variables  </p></li><li><p>Strength and association</p><ul><li><p>Measures <strong>direction</strong>(<span style="color: green">+</span><span style="color: red">-</span>) and <strong>strength</strong> (<span style="color: red">-1</span> to<span style="color: green"> 1</span>), <span style="color: red">not</span> shape </p></li></ul></li><li><p>HOW closely points <strong>cluster</strong> around the “center” of data</p></li></ul><p>Data</p><ul><li><p>Univariate data→ mean</p></li><li><p>bivariate data→ regression line </p></li><li><p><mark data-color="red">Unitless</mark>, so changing the units does <span style="color: red">nothing</span> </p></li><li><p>r must be BETWEEN <span style="color: red">-1</span> and<span style="color: green"> 1,</span> with 1 meaning <mark data-color="green">perfect</mark> correlation </p></li><li><p><span style="color: red">Not</span> affected by which variable(x,y) is changing units </p></li><li><p>SAME Sign(<span>+-</span>)  as the <strong>direction</strong> of the <strong>slope</strong></p></li><li><p>STRONGLY affected by <strong>extreme</strong> values </p></li><li><p>If 1 variable has an equattion→ use it</p><ul><li><p>(s-x) and (x) have a negative correlation because there is a negative </p></li></ul></li></ul><p>Math</p><p> r=  Σ ((x-x̄)/s<sub>x</sub>)*(y-ȳ)s<sub>y</sub>)) / (n-1)</p><p>= Σ( z<sub>x<sup>- </sup></sub>z<sub>y</sub>) / (n-1)</p>
2
New cards

Strength of r

STRENGTH of r, correlation coefficient

Numbers

0 to 0.5 →weak

0.5-0.8 →moderate

0.8 onwards→strong

  • A negative: LRSL is overpredicting data→ negative association

  • A positive: LRRSL is underpredicting data→ positive association

<p>STRENGTH of r, <strong>correlation coefficient </strong></p><p><u>Numbers</u></p><p>0 to 0.5 →weak</p><p>0.5-0.8 →moderate</p><p>0.8 onwards→strong </p><ul><li><p>A <span style="color: red">negative</span>: LRSL is <strong>overpredicting</strong> data→ <span style="color: red">negative</span> <strong>association</strong></p></li><li><p>A <span style="color: green">positive</span>:   LRRSL is <strong>underpredicting</strong> data→ <span style="color: green">positive</span> <strong>association</strong></p></li></ul>
3
New cards

Least Square Regression Line(LSRL)

  • Estimates and predictions, not actual values

  • reasonable only WITHIN the domain of the data(Interpolation

  • MUST pass through the mean(xĚ„, Čł)

  • Regression OUTLIERS

    • indicated by a point falling far away from the overall pattern

    • points with relatively large discrepancies BETWEEN the value of the response variable, y, and a predicted value for the response variable,Ĺ·

Math

LSRL=Ĺ· =a+bx

  • a =y intercept

  • b=slope

b=r(sy /sx)

SSE= ÎŁ(y-Ĺ·)

  • y= Actual

  • Ĺ·=predicted

4
New cards

r2

r2=Coefficient of determination

  • Calculates the proportion of the variance(variability) of one variable that is PREDICTED by the other variable

    • “ r2 as a 5 of the total variation in Y can be explained by the linear relationship BETWEEN X and Y in the regression line. “

  • What % of total data can be explained by the regression line?

  • Greater r2% → Better fit

Math

1-r2 = HOW much variability in Y is unaccountable by the regression line.

5
New cards

Describing Scatterplots

SOFA

S:Strength( Strong, Moderate, Weak, variability and Heteroscedasticity)

O: Outliers( in x, y direction, or BOTH)

F: Form(Linear or curved)

A: Association (Positive, negative, or no composition")

  • Describing SOFA relationship BETWEEN variables

STEPS

  1. Identify the variables, cases, and scale of measure

  2. Describe overall shape

  3. Describe the trend through the slope

  4. describe strength

  5. Generalization

  6. Note any lurking variables OR causation

6
New cards

Heteroscedasticity

  • Unequal variation in the plot

  • “Fanning left/right”

  • Doesn’t cause bias in the coefficient estimates, but make them less precise.

    • Lower precision increases the likelihood that the coefficient estimates are further from the correct population value.

  • tends to produce p-values that are smaller than they should be

7
New cards

Scatterplots

Graph

  • change can be seen in frequency bar charts

  • clusters→ modes(peaks, which can also show bimodal)

  • Scatterplots are only for bivariate data

8
New cards

Z score

  • Standardised Z

  • x, y values will be based on their +-, meaning their points location on the 4 quadrants of the coordinate plane, the origin (0,0) being the intersection

9
New cards

Regression

  • HOW 2 numerical variables AFFECT each other

  • (x, y) are not interchangeable

  • “Casual” affect, but NOT causation

  • Positive when independent and dependent variables are both increasing or decreasing together

  • Negative when independent and dependent variables are going opposite ways(ie. one is increasing the other is decreasing)

Mean

  • the regression to the mean: in ANY elliptical cloud of points whenever the correlation, r, is not perfect

    • A line fitting through this elliptical cloud has a slope of 1

10
New cards

Interpolation

Predicting data value within the dataset

11
New cards

Extrapolation

Predicting data value Outside the dataset

12
New cards

Slope interpretation

“for every 1 unit increase in the explanatory variable, x, there is a slope increase/decrease in the response variable, y.

13
New cards

SSE

The sum of square residual error

14
New cards

Residuals

*distance measurement

  • The net sum of residual and mean=0

  • The DIFFERENCE between an observed Y value and its predicted value from the regression line

    • Decreases when the regression line fits MORE data

Math

Residual= Y-Ĺ·

Positive output: linear model UNDERestimated the actual response variable

Negative output: linear model OVERestimated the actual response variable

15
New cards

Residual Plots

  • Scatter plot of regression residuals AGAINST the predicted y values

  • a “barometer” for HOW well the regression lines fit the data

  • curvature →sign of curvature in the original plot, meaning the original was a nonlinear regression

16
New cards

rules for regression

  1. The sum of residuals=0

  2. horizontal line: mean of residuals=0

  3. Residual Scattered=better fit for data

  4. Residual have pattern/curve= Not an appropriate line

17
New cards

Missed features in Scatterplots

  • These points will change the measurement

  • Influential points

  • High leverage points

  • outliers

  • lurking variables

18
New cards

Influential points

  • examples: Outliers, high-leverage

  • removal of points→sharply CHANGE the regression line

  • High leverage

    • x values are far from xĚ„

    • line up with pattern: doesn’t influence equation, strengthens correlation, r, and determination, r²

    • Not line up with pattern: dramatically CHANGES the equation, an influential point

  • Outliers

    • may cause r² and S to CHANGE

  • lurking variables

    • Correlation ≠ causation

19
New cards

Slope Changing Transformations

  • Line of fit to a scatterplot should be considered for a plot with curvature → adjust the plot using transformations

  • Nonlinear transformations change the shape of the graph, linear won’t

    • in terms of slope and correlation,r

  • ONLY required if a linear model/scatterplot has curvature

    • use log(ln) or log(log) depending on the plot

    • exponential and power

20
New cards

Exponential transformation

y=abx

New equation: log y=log a +x log b

  • generally used for growth in population

<p>y=ab<sup>x </sup></p><p>New equation: log y=log a +x log b </p><ul><li><p>generally used for <mark data-color="green">growth </mark>in <strong>population</strong> </p></li></ul>
21
New cards

Power Transformation

y= axb

New Equation: log y-log a+b log x

  • need to log the variable → final answer is 10ans

  • generally used for relationships BETWEEN height and weight

<p>y= ax<sup>b</sup></p><p>New Equation: log y-log a+b log x</p><ul><li><p>need to log the variable → final answer is 10<sup>ans</sup></p></li></ul><ul><li><p>generally used for <strong>relationships</strong> BETWEEN height and weight   </p></li></ul>
22
New cards

Common Transformations

knowt flashcard image
23
New cards

Power Transformations

knowt flashcard image
24
New cards

Calculator interpretation

Variable =x, explanatory variable

coefficient constant=y intercept, a

coefficient with variable= slope, b

Error SS=SSE

Residual:SSE