Statistics Flashcards

Skewness and Extreme Values

  • Skewness depends on extreme values.

Skewed Right and Income

  • In a skewed right distribution, extreme high values pull the mean to the right.
  • The average income in the US is around 55,000.
  • The top 1% of income earners make \$250,000 or more per year.
  • Most people in this country are in the top 1% globally.

Uniform Distribution

  • All outcomes are equally likely (e.g., rolling a single die).
  • Each number on a die has a 1/6 chance of being rolled.
  • Uniform distributions are rare in nature.

Standardized Exams and Skewness

  • Standardized exam scores (e.g., SAT) are typically skewed to the right.
  • This is because there are always some people who do not study and score poorly.

Dice Experiment

  • Two fair dice are thrown 100 times.
  • Pips are the dots on the dice.
  • A frequency histogram records the sum of the pips.

Analyzing the Histogram

  • The most frequent outcome is determined from the highest bar.
  • Analyzing the graph requires understanding how information is laid out.
  • The difference between two numbers is found by subtracting.

Percentage Calculation

  • To determine the percentage of time a three was observed:
    • A three was observed 5 times.
    • Percentage = (Part / Whole) * 100
    • Percentage = (5 / 100) * 100 = 5\%

Conversions

  • Convert decimals to percentages and percentages to decimals.
  • Convert a fraction to decimal or decimal to fraction is not covered in this class.

Histogram Shape

  • The shape of the histogram in this experiment is close to bell-shaped.

Measures of Central Tendency

Average (Mean)

  • The average can be calculated as the sum of all data points divided by the number of observations.
  • For example, average test grade: (80 + 90 + 70 + 85 )/ 4
  • This specific type of average is called the arithmetic mean.
  • Population mean is represented by the symbol \mu.
  • Formula: \mu = \frac{\sum xi}{N}, where \sum xi is the sum of all values and N is the number of values.

Median

  • The median is the middle value in a dataset sorted from least to greatest.
  • If the dataset is: 1, 2, 3, 4, 5, the median is 3.
Even Number of Data Points
  • When there is an even number of data points, take the two middle numbers and find their average.

Mean vs. Median

  • If the dataset has extreme values, the mean can be significantly affected.
  • The mean is sensitive to extreme values; the median is resistant (stubborn).
  • If data is symmetric/bell-shaped, the mean is a better representation.
  • If data is skewed, the median is a better representation.
Examples
  • Household income: Median is better due to the influence of extremely high incomes.
  • Height of adult males: Mean is suitable because it is bell-shaped.
  • Crime rates in cities: Likely skewed due to disparities.
Natural Processes
  • Height of adult males, number of eggs a chick lays, weight of infant newborns are bell-shaped.
  • The distance of the electron to the proton in an atom has a bell-shaped distribution.

Mode

  • Mode is the value that appears most frequently in a dataset (highest repetition).
  • Best used for qualitative data.
Examples
  • Streaming services used the most.
  • Soda people drink the most.
  • Cars people drive.