Ch.3 Stats Vocab

0.0(0)
studied byStudied by 11 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/31

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

32 Terms

1
New cards

univariate data

one set of data

  • ex: boxplot, ogive, histogram, timeplot, dotplot, ribbon chart, pie chart

  • describe w/ SOCS

2
New cards

bivariate data

two quantitative data sets

  • always graph data on scatterplots!

  • ex: tables, scatterplots, correlation, LSRL

  • describe w/ FODS (form, outliers, direction, strength)

3
New cards

scatterplot

shows relationship between two QUANTITATIVE variables that were measured on the same individual

  • has a horizontal & vertical axis

  • each individual = one point

4
New cards

describe distribution (BIVARIATE)

form, strength, direction, outliers/deviations

  • IN CONTEXT!!!!

5
New cards

form

general shape of the scatterplot

  • linear or nonlinear

  • nonlinear = curved, exponential, cluster, multiple clusters, etc

6
New cards

strength

describes the association between the two variables; how closely related are the two variables

  • ex: strong, moderate, weak

  • ALWAYS use the r-value

7
New cards

direction

the type of association; the region the scatterplot appears to be going to

  • can be:

    • positive - increases in explanatory variable = increases in response variable

    • negative - increases in explanatory variable = decreases in response variable

    • none/no - increases in explanatory variable = no predicted region the scatterplot is going toward

8
New cards

outliers/deviations

any points that don’t really fit the pattern, have large residuals

  • this is measured approximately

9
New cards

formula for describing distribution

there is a strong/moderate/weak (r = a) positive/negative linear/nonlinear association between variable x and variable y

10
New cards

correlation coefficient (r)

measure of the direction and strength of the association

  • only used for LINEAR relationships

  • does not depend on units of measurement (can interchange variables)

  • between -1 and 1

  • sensitive to outliers

  • does NOT mean causation or form

11
New cards

calculate r-value

product of the z-scores (x and y) over n-1

<p>product of the z-scores (x and y) over n-1</p>
12
New cards

regression line

summarizies the relationship between two variables, but only in a specific setting: when one variable helps explain the other

  • ŷ = a + bx

  • ŷ = y-hat, the predicted value of the response variable

  • x = the explanatory variable

13
New cards

a

y-intercept

14
New cards

b

slope, can be calculated w/

  • r × (Sy/Sx)

15
New cards

least-squares regression line/LSRL

the line of best fit; the regression line that makes the sum of the squared residuals a small as possible

  • NOT the same as a regression line; it is a very specific type of regression line

16
New cards

how to find LSRL

  • ŷ = a + bx

  • (x̄, ȳ) is always on the line

  • slope = r × (Sy/Sx)

  • plug and chug

    • plug in mean coordinates into ur general equation

    • solve for a

    • write out equation and define variables

17
New cards

residuals

the leftovers or prediction errors in the vertical axis; y - ŷ = actual - predicted

  • positive —> y > ŷ

    • actual is higher than predicted

  • negative —> y < ŷ

    • actual is lower than predicted

  • none —> y = ŷ

    • actual is the same as predicted

18
New cards

slope in context formula

On average, or every increase in 1 (unit of explanatory/x variable), the predicted (response variable) increases/decreases by slope (unit of response/y variable)

19
New cards

y-intercept in context formula

When the (explanatory/x variable) is at 0 (unit of explanatory variable), the predicted (response/y variable) is at “a” (unit of response variable)

20
New cards

extrapolation

the use of the regression line for a prediction far outside of the interval of the x-values used to create the line

  • often not accurate

21
New cards

residual plot

a scatterplot that displays the residuals on the vertical axis and explanatory variable on the horizontal axis

  • linear model = appropriate if

    • no obvious patterns

    • relatively small in size

    • even scatter above and below x-axis

22
New cards

standard error (Se)

the average size of a residual; measures how far, on average, each value differs from the predicted

<p>the average size of a residual; measures how far, on average, each value differs from the predicted</p><p></p>
23
New cards

Se in context formula

On average, the predicted (response variable) differs from the actual (response variable) by about Se (units of response variable)

24
New cards

Total Sum of Squares of Errors (SST) or Sum of Squares Total

the overall measure of variation in the y-values

  • uses y-bar, not y-hat

  • sum of the difference between the actual and average squared

  • on a scatterplot —> draw average as a horizontal line and fine difference

<p>the overall measure of variation in the y-values</p><ul><li><p>uses y-bar, not y-hat</p></li></ul><ul><li><p>sum of the difference between the actual and average squared</p></li><li><p>on a scatterplot —&gt; draw average as a horizontal line and fine difference</p></li></ul><p></p>
25
New cards

standard deviation from average (Sy)

measures how far, on average, each value differs from the mean

  • Sy = square root of variance

<p>measures how far, on average, each value differs from the mean</p><ul><li><p>Sy = square root of variance</p></li></ul><p></p>
26
New cards

Sum of the Squares of Errors (SSE) or Residuals

amount of variation in the residuals

  • uses y-hat, not y-bar

<p>amount of variation in the residuals</p><ul><li><p>uses y-hat, not y-bar</p></li></ul><p></p>
27
New cards

R-squared/Coefficient of Determination

measures the percent reduction in the sum of squared residuals when using the LSRL to make predictions, rather than the mean value of y

  • the percent of the variablity in the response variable that is accounted by the LSRL

  • can also use the correlation coefficient squared

<p>measures the percent reduction in the sum of squared residuals when using the LSRL to make predictions, rather than the mean value of y</p><ul><li><p>the percent of the variablity in the response variable that is accounted by the LSRL</p></li><li><p>can also use the correlation coefficient squared</p></li></ul><p></p>
28
New cards

r-squared in context formula

the amount of variation that has been explained/accounted for by the linear relationship between (response and explanatory variables) is __%

29
New cards

learn how to read computer regression output

be able to find

  • the slope b

  • the y intercept a

  • the values of s

  • the value of r2

30
New cards

is the linear model appropriate

  1. residual plot

  2. r2

  3. sum of residuals squared

31
New cards

high leverage

points that are extreme in the x direction

32
New cards

influential

points that, if removed, substantially change the regression line

  • change in slope, y-intercept, correlation, coefficient of determination, or increases in standard deviation