Unit 1 & 2 Review

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/30

flashcard set

Earn XP

Description and Tags

Flashcards for reviewing distributions and two variable data.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

31 Terms

1
New cards

Shape (Distributions)

Refers to the overall form of the distribution, such as symmetric, skewed (left or right), uniform, or bimodal.

2
New cards

Outliers (Distributions)

Values that lie far away from the majority of the data. Represented as dots on modified boxplots.

3
New cards

Center (Distributions)

The typical or average value in a dataset. Measures include the mean (non-resistant) and median (resistant).

4
New cards

Spread (Distributions)

Describes how the data is dispersed or scattered. Measures include range (max - min, non-resistant), IQR (Q3 - Q1, resistant), and standard deviation (non-resistant).

5
New cards

5-Number Summary

Consists of the minimum, Q1, median, Q3, and maximum values of a dataset. Used to create boxplots.

6
New cards

Boxplots

A visual representation of the 5-number summary. Modified boxplots show outliers as individual points.

7
New cards

Cumulative Graph

A graph where the x-axis represents the data and the y-axis represents the cumulative relative frequency (percentiles).

8
New cards

Uniform Distribution

A distribution where all values have approximately equal relative frequency.

9
New cards

Right Skewed Distribution

A distribution where the tail extends to the right, indicating higher values are more spread out.

10
New cards

Left Skewed Distribution

A distribution where the tail extends to the left, indicating lower values are more spread out.

11
New cards

Bimodal Distribution

A distribution with two distinct peaks, indicating two common ranges of values.

12
New cards

Unimodal Distribution

A distribution with one distinct peak.

13
New cards

Symmetric Distribution

A distribution where the two halves are mirror images of each other.

14
New cards

Non-Resistant Measures

Statistical measures that are greatly affected by outliers. Examples: mean, standard deviation, range.

15
New cards

Resistant Measures

Statistical measures that are not greatly affected by outliers. Examples: median, IQR.

16
New cards

IQR (Interquartile Range)

The difference between the third quartile (Q3) and the first quartile (Q1). A measure of statistical dispersion.

17
New cards

Outlier Lower Boundary (LB)

Q1 - 1.5(IQR). Any data point below this is considered an outlier.

18
New cards

Outlier Upper Boundary (UB)

Q3 + 1.5(IQR). Any data point above this is considered an outlier.

19
New cards

z-score

A measure of how many standard deviations an element is from the mean. Z = (data point - mean) / standard deviation

20
New cards

Empirical Rule (68-95-99.7 Rule)

In a normal distribution, approximately 68% of values fall within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3.

21
New cards

Linear Transformation

Transforming data using the equation ax + b. Adding 'b' changes the center; multiplying by 'a' changes both center and spread.

22
New cards

Correlation

A statistical measure that describes the extent to which two variables relate linearly. Ranges from -1 to +1.

23
New cards

Coefficient of Determination (R-squared)

The proportion of the variance in the dependent variable that is predictable from the independent variable(s).

24
New cards

Least Squares Regression Line (LSRL)

The line of best fit. Represented by the equation y = a + bx.

25
New cards

Slope (b) in LSRL

The change in y for every one-unit change in x. b = r(Sy/Sx)

26
New cards

y-intercept (a) in LSRL

The value of y when x is 0. a = y - bx

27
New cards

Conditional Proportions

The proportion of events given a specific condition. (e.g. Agree | Male)

28
New cards

Marginal Proportions

The proportion of a specific event out of the total. (e.g. % that agree)

29
New cards

Mosaic Plot

A visual representation of the relationship between two categorical variables, where the width of bars represent the marginal proportions of one variable, and the height of segments within the bars respresent the conditional proportions of the other variable.

30
New cards

Residual Plot

A graph that plots the residuals (the differences between observed and predicted values) against the predictor values.

31
New cards

Influential Points

Points that have a disproportionately large impact on the position of the LSRL.