1/30
Flashcards for reviewing distributions and two variable data.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Shape (Distributions)
Refers to the overall form of the distribution, such as symmetric, skewed (left or right), uniform, or bimodal.
Outliers (Distributions)
Values that lie far away from the majority of the data. Represented as dots on modified boxplots.
Center (Distributions)
The typical or average value in a dataset. Measures include the mean (non-resistant) and median (resistant).
Spread (Distributions)
Describes how the data is dispersed or scattered. Measures include range (max - min, non-resistant), IQR (Q3 - Q1, resistant), and standard deviation (non-resistant).
5-Number Summary
Consists of the minimum, Q1, median, Q3, and maximum values of a dataset. Used to create boxplots.
Boxplots
A visual representation of the 5-number summary. Modified boxplots show outliers as individual points.
Cumulative Graph
A graph where the x-axis represents the data and the y-axis represents the cumulative relative frequency (percentiles).
Uniform Distribution
A distribution where all values have approximately equal relative frequency.
Right Skewed Distribution
A distribution where the tail extends to the right, indicating higher values are more spread out.
Left Skewed Distribution
A distribution where the tail extends to the left, indicating lower values are more spread out.
Bimodal Distribution
A distribution with two distinct peaks, indicating two common ranges of values.
Unimodal Distribution
A distribution with one distinct peak.
Symmetric Distribution
A distribution where the two halves are mirror images of each other.
Non-Resistant Measures
Statistical measures that are greatly affected by outliers. Examples: mean, standard deviation, range.
Resistant Measures
Statistical measures that are not greatly affected by outliers. Examples: median, IQR.
IQR (Interquartile Range)
The difference between the third quartile (Q3) and the first quartile (Q1). A measure of statistical dispersion.
Outlier Lower Boundary (LB)
Q1 - 1.5(IQR). Any data point below this is considered an outlier.
Outlier Upper Boundary (UB)
Q3 + 1.5(IQR). Any data point above this is considered an outlier.
z-score
A measure of how many standard deviations an element is from the mean. Z = (data point - mean) / standard deviation
Empirical Rule (68-95-99.7 Rule)
In a normal distribution, approximately 68% of values fall within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3.
Linear Transformation
Transforming data using the equation ax + b. Adding 'b' changes the center; multiplying by 'a' changes both center and spread.
Correlation
A statistical measure that describes the extent to which two variables relate linearly. Ranges from -1 to +1.
Coefficient of Determination (R-squared)
The proportion of the variance in the dependent variable that is predictable from the independent variable(s).
Least Squares Regression Line (LSRL)
The line of best fit. Represented by the equation y = a + bx.
Slope (b) in LSRL
The change in y for every one-unit change in x. b = r(Sy/Sx)
y-intercept (a) in LSRL
The value of y when x is 0. a = y - bx
Conditional Proportions
The proportion of events given a specific condition. (e.g. Agree | Male)
Marginal Proportions
The proportion of a specific event out of the total. (e.g. % that agree)
Mosaic Plot
A visual representation of the relationship between two categorical variables, where the width of bars represent the marginal proportions of one variable, and the height of segments within the bars respresent the conditional proportions of the other variable.
Residual Plot
A graph that plots the residuals (the differences between observed and predicted values) against the predictor values.
Influential Points
Points that have a disproportionately large impact on the position of the LSRL.