Describe a distribution
CUSS and BS
Center: mean, if skewed —> median
Unusual Features- "potential outliers"
Shape- skew, modal, normal, symmetrical, uniform
Spread- SD w/ mean, IQR w/ median
Outlier Rule
value that falls more than 1.5IQR above Q3 or Q1
Lower Outlier < Q1 - 1.5IQR
Upper Outlier > Q1 + 1.5IQR
How can we use a graph to compare the mean and the median?
Mean follows the tails, median at the peak
Skewed Left: Mean < median
Roughly Symmetric: mean ~ median
Skewed Right: Mean > median
Interpret the standard deviation
Standard deviation is the typical distance that the values are away from the mean
How do we describe the relationship between the two variable (like in a scatterplot)?
Direction- positive/negative
Unusual features- outliers, influential observations
Form- linear or curved
Strength- weak --> strong
Compare two distributions
CUSS + BS
Use comparison words: "similar to" "less/greater than"
How to find the mean, SD, and 5-number summary using a graphing calculator
Enter data in List 1
Stat -->Calc
1-Var Stats
Leave "FreqList" blank. Select Calculate.
How to calculate a LSRL using a graphing calculator
Enter the x-values in L1 and the y-values in L2
Stat -->Calc
8: LinReg (a+bx)
Leave "FreqList" blank. Select Calculate.
What is the IQR?
The Interquartile range (IQR) is defined as the difference between the third and first quartiles: Q3 - Q1
Q1 and Q3 form the boundaries for the middle 50% of values in an ordered data set
How do I calculate the percentile of a particular value in a data set?
-Order the date (little Lexi to the left)
-Count the # of values that are less than or equal to the value of interest
-Count the # of values in the data set
Percentile= #of values less than or equal to the value of interest/ # of values in the data set (Express the decimal as a percentile)
Interpret the y-intercept of the Least Squares Regression Line
The PREDICTED y-context when x-context is 0 is y-intercept value
Interpret the slope of the Least Squares Regression Line
The PREDICTED y-context will increase/decrease by (slope) with each additional 1 unit of x-context
interpret the coefficient of determination (r^2)
The coefficient of determination gives the percent of the variation of y-context that is explained by the least squares regression line using x = x-context
Properties of correlation (r)
-'r' is unitless
-'r' is always between -1 and 1
-'r' is greatly affected by regression outliers
-If direction is negative, then 'r' < 0
-If the direction positive, then 'r' > 0
-The closer 'r' is to -1 or 1, the stronger the relationship
-The closer that 'r' is to 0, the WEAKER the relationship
Regression Outlier
An outlier in regression is a point that does not follow the general trend shown in the rest of the data and has a large residual
Correlation (r)
gives the strength and direction of the linear relationship between 2 quantitative variables
High-Leverage Point
A high-leverage point in regression has a substantially larger or smaller x-value than other observations have
Influential Point
An influential point in regression is any point that, is removed changes the relationship substantially (creates big changes to slope and/or intercept)
Outliers and high-leverage points are often influential
What is the difference between categorical and quantitative variables?
A categorical variable takes on values that are category names or group labels
A quantitative variable is one that takes on numerical values for a measured or counted quantity
What is the difference between discrete and continuous variables?
A discrete variable can take on a countable number of values. The number of values may be finite or infinite. (THINK: Discrete = countable, ex: # of ppl)
A continuous variable can take on infinitely many values, but those values cannot be counted (THINK: Continuous = Must be measured, ex: height)