Ch.3 Stats Vocab

0.0(0)
studied byStudied by 21 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/37

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

38 Terms

1
New cards

univariate data

one set of data

  • ex: boxplot, ogive, histogram, timeplot, dotplot, ribbon chart, pie chart

  • describe w/ SOCS

2
New cards

bivariate data

two quantitative data sets

  • always graph data on scatterplots!

  • ex: tables, scatterplots, correlation, LSRL

  • describe w/ FODS (form, outliers, direction, strength)

3
New cards

scatterplot

shows relationship between two QUANTITATIVE variables that were measured on the same individual

  • has a horizontal & vertical axis

  • each individual = one point

4
New cards

describe distribution (BIVARIATE)

form, strength, direction, outliers/deviations

  • IN CONTEXT!!!!

5
New cards

form

general shape of the scatterplot

  • linear or nonlinear

  • nonlinear = curved, exponential, cluster, multiple clusters, etc

6
New cards

strength

describes the association between the two variables; how closely related are the two variables

  • ex: strong, moderate, weak

  • ALWAYS use the r-value

7
New cards

direction

the type of association; the region the scatterplot appears to be going to

  • can be:

    • positive - increases in explanatory variable = increases in response variable

    • negative - increases in explanatory variable = decreases in response variable

    • none/no - increases in explanatory variable = no predicted region the scatterplot is going toward

8
New cards

outliers/deviations

any points that don’t really fit the pattern, have large residuals

  • this is measured approximately

  • may decrease or increase a correlation coefficent

9
New cards

formula for describing distribution

There is a strong/moderate/weak (r = a) positive/negative linear/nonlinear association between variable x and variable y. In general, as the explanatory variable increases, the response variable increases/decreases.

10
New cards

correlation coefficient (r)

measure of the direction and strength of the association

  • only used for LINEAR relationships

  • does not depend on units of measurement (can interchange variables)

  • between -1 and 1

  • sensitive to outliers

  • does NOT mean causation or form

  • requires both explanatory and response variables

11
New cards

calculate r-value

product of the z-scores (x and y) over n-1

<p>product of the z-scores (x and y) over n-1</p>
12
New cards

regression line

summarizes the relationship between two variables, but only in a specific setting: when one variable helps explain the other

  • ŷ = a + bx

  • ŷ = y-hat, the predicted value of the response variable

  • x = the explanatory variable

13
New cards

a

y-intercept

14
New cards

b

slope, can be calculated w/

  • r × (Sy/Sx)

15
New cards

least-squares regression line/LSRL

the line of best fit; the regression line that makes the sum of the squared residuals a small as possible

  • NOT the same as a regression line; it is a very specific type of regression line

  • always passes through (x̄, ȳ)

16
New cards

how to find LSRL

  • ŷ = a + bx

  • (x̄, ȳ) is always on the line

  • slope = r × (Sy/Sx)

  • plug and chug

    • plug in mean coordinates into ur general equation

    • solve for a

    • write out equation and define variables

17
New cards

residuals

the leftovers or prediction errors in the vertical axis; y - ŷ = actual - predicted

  • positive —> y > ŷ

    • actual is higher than predicted

  • negative —> y < ŷ

    • actual is lower than predicted

  • none —> y = ŷ

    • actual is the same as predicted

18
New cards

slope in context formula

On average, or every increase in 1 (unit of explanatory/x variable), the predicted (response variable) increases/decreases by slope (unit of response/y variable)

19
New cards

y-intercept in context formula

When the (explanatory/x variable) is at 0 (unit of explanatory variable), the predicted (response/y variable) is at “a” (unit of response variable)

20
New cards

extrapolation

the use of the regression line for a prediction far outside of the interval of the x-values used to create the line

  • often not accurate

21
New cards

residual plot

a scatterplot that displays the residuals on the vertical axis and explanatory variable on the horizontal axis

  • linear model = appropriate if

    • no obvious patterns

    • relatively small in size

    • even scatter above and below x-axis

22
New cards

standard error (Se)

the average size of a residual; measures how far, on average, each value differs from the predicted

<p>the average size of a residual; measures how far, on average, each value differs from the predicted</p><p></p>
23
New cards

Se in context formula

On average, the predicted (response variable) differs from the actual (response variable) by about Se (units of response variable)

24
New cards

Total Sum of Squares of Errors (SST) or Sum of Squares Total

the overall measure of variation in the y-values

  • uses y-bar, not y-hat

  • sum of the difference between the actual and average squared

  • on a scatterplot —> draw average as a horizontal line and fine difference

<p>the overall measure of variation in the y-values</p><ul><li><p>uses y-bar, not y-hat</p></li></ul><ul><li><p>sum of the difference between the actual and average squared</p></li><li><p>on a scatterplot —&gt; draw average as a horizontal line and fine difference</p></li></ul><p></p>
25
New cards

standard deviation from average (Sy)

measures how far, on average, each value differs from the mean

  • Sy = square root of variance

<p>measures how far, on average, each value differs from the mean</p><ul><li><p>Sy = square root of variance</p></li></ul><p></p>
26
New cards

Sum of the Squares of Errors (SSE) or Residuals

amount of variation in the residuals

  • uses y-hat, not y-bar

<p>amount of variation in the residuals</p><ul><li><p>uses y-hat, not y-bar</p></li></ul><p></p>
27
New cards

R-squared/Coefficient of Determination

measures the percent reduction in the sum of squared residuals when using the LSRL to make predictions, rather than the mean value of y

  • the percent of the variablity in the response variable that is accounted by the LSRL

  • can also use the correlation coefficient squared

  • can only be applied to LINES, not just any curve

  • influential points that lie near LSRL —> increase the value

<p>measures the percent reduction in the sum of squared residuals when using the LSRL to make predictions, rather than the mean value of y</p><ul><li><p>the percent of the variablity in the response variable that is accounted by the LSRL</p></li><li><p>can also use the correlation coefficient squared</p></li><li><p>can only be applied to LINES, not just any curve</p></li><li><p>influential points that lie near LSRL —&gt; increase the value</p></li></ul><p></p>
28
New cards

r-squared in context formula

the amount of variation that has been explained/accounted for by the linear relationship between (response and explanatory variables) is __%

or

___% of the variation in the (response variable) can be explained by/accounted for by the linear relationship with (explanatory variable)

29
New cards

learn how to read computer regression output

be able to find

  • the slope b

  • the y intercept a

  • the values of s

  • the value of r2

30
New cards

is the linear model appropriate

  1. residual plot —> scattered randomly and evenly

  2. r2 —> high percentage

  3. sum of residuals squared —> small total residuals

31
New cards

comparing models criteria

when comparing two different LSRL models, make sure to list the numbers and observations for both models when explaining.

  • must use these three: residual plot, r2, and SSE (sum of residuals squared)

  • optional: comparing point predictions (within range?), nicer scatterplot (curved or not curved?)

32
New cards

general formula for comparing LSRL models

  1. general statement - Model #A does a better job at predicting the response variable with explanatory variable.

  2. three evidence in context with justification

    1. SSE - There is less amount of errors with Model #A (SSE Model #A versus SSE Model #B)

    2. R2 - There is more variation explained with Model #A (R2 %) than Model #B (R2 %).

    3. Residual plot - The residual plot for Model #A indicates a better model because ___ (more scattered? even distribution? less cluster? less pattern?), while Model #B has __ (less scatter? more clusters? more pattern?).

33
New cards

high leverage

points that are extreme in the x direction

34
New cards

influential

points that, if removed, substantially change the regression line

  • change in slope, y-intercept, correlation, coefficient of determination, or increases in standard deviation

35
New cards

exponential growth

when a variable increases by multiplication by a fixed amount as time increases by a fixed amount

36
New cards

transforming the data

applying another function (ex: logarithm or square root) to a quantitative variable

  • must be applied to ALL inputs of that variable

  • typically done with the intention of “linearizing” data

37
New cards

finding residuals of transformed data (explanatory data)

plug values in like normal, making sure to correctly enter the x-value into the function

  • ex:

    •  ŷ = a + b × log(x) —> y - ŷ

    •  ŷ = a + b × ex  —> y - ŷ

    •  ŷ = a + b × x2 —> y - ŷ

38
New cards

finding residuals of transformed data (response variable)

plug in x-values into function (if applicable), but makes sure to “undo” the function for the y-variable

  • compare apples to apples; dont want to compare the exponent to the real value

  • ex:

    •  log(ŷ) = a + b × log(x) —> y - 10ŷ

    •  eŷ = a + b × x —> y - ln(ŷ)

    •  (ŷ)2 = a + b × x2 —> y - sqrt(ŷ)