Statistics Flashcards
Skewness and Extreme Values
- Skewness depends on extreme values.
Skewed Right and Income
- In a skewed right distribution, extreme high values pull the mean to the right.
- The average income in the US is around 55,000.
- The top 1% of income earners make \$250,000 or more per year.
- Most people in this country are in the top 1% globally.
Uniform Distribution
- All outcomes are equally likely (e.g., rolling a single die).
- Each number on a die has a 1/6 chance of being rolled.
- Uniform distributions are rare in nature.
Standardized Exams and Skewness
- Standardized exam scores (e.g., SAT) are typically skewed to the right.
- This is because there are always some people who do not study and score poorly.
Dice Experiment
- Two fair dice are thrown 100 times.
- Pips are the dots on the dice.
- A frequency histogram records the sum of the pips.
Analyzing the Histogram
- The most frequent outcome is determined from the highest bar.
- Analyzing the graph requires understanding how information is laid out.
- The difference between two numbers is found by subtracting.
Percentage Calculation
- To determine the percentage of time a three was observed:
- A three was observed 5 times.
- Percentage = (Part / Whole) * 100
- Percentage = (5 / 100) * 100 = 5\%
Conversions
- Convert decimals to percentages and percentages to decimals.
- Convert a fraction to decimal or decimal to fraction is not covered in this class.
Histogram Shape
- The shape of the histogram in this experiment is close to bell-shaped.
Measures of Central Tendency
Average (Mean)
- The average can be calculated as the sum of all data points divided by the number of observations.
- For example, average test grade: (80 + 90 + 70 + 85 )/ 4
- This specific type of average is called the arithmetic mean.
- Population mean is represented by the symbol \mu.
- Formula: \mu = \frac{\sum xi}{N}, where \sum xi is the sum of all values and N is the number of values.
Median
- The median is the middle value in a dataset sorted from least to greatest.
- If the dataset is: 1, 2, 3, 4, 5, the median is 3.
Even Number of Data Points
- When there is an even number of data points, take the two middle numbers and find their average.
Mean vs. Median
- If the dataset has extreme values, the mean can be significantly affected.
- The mean is sensitive to extreme values; the median is resistant (stubborn).
- If data is symmetric/bell-shaped, the mean is a better representation.
- If data is skewed, the median is a better representation.
Examples
- Household income: Median is better due to the influence of extremely high incomes.
- Height of adult males: Mean is suitable because it is bell-shaped.
- Crime rates in cities: Likely skewed due to disparities.
Natural Processes
- Height of adult males, number of eggs a chick lays, weight of infant newborns are bell-shaped.
- The distance of the electron to the proton in an atom has a bell-shaped distribution.
Mode
- Mode is the value that appears most frequently in a dataset (highest repetition).
- Best used for qualitative data.
Examples
- Streaming services used the most.
- Soda people drink the most.
- Cars people drive.