KA

Basic Statistical Concepts

Chapter Two: Basic Statistical Concepts

Overview of Statistical Concepts

  • Many students feel apprehensive about statistics.

  • The focus will be on basic statistical ideas, primarily: descriptive statistics, frequency, central tendency, and variability.

Descriptive Statistics

  • Descriptive statistics is a method used to summarize the data collected from experiments, making it easier to understand.

  • Components discussed include:

    • Frequency

    • Central tendency

    • Variability

Frequency
  • Frequency refers to how often a particular observation appears in the data.

  • Representation:

    • Generally represented using a histogram, which can be a bar chart or a line graph. Both convey the same information.

    • Axes:

    • Y-axis: Frequency of observations (how many times an observation occurred).

    • X-axis: Variable of interest (e.g., test scores).

  • Example:

    • If students take a standardized test, a histogram might show the following score distributions:

    • 1 person scored 350.

    • 2 people scored 400.

    • 3 people scored 450.

    • 4 people scored 500.

Types of Distributions
  • Most traits show a normal distribution, which is symmetric, meaning:

    • If folded in half, both sides are congruent.

    • Example: Height of individuals likely to follow a normal distribution, with most individuals around an average height and fewer as you get to extreme heights.

  • Skewed Distributions:

    • Negatively skewed distribution:

    • Characterized by data being clustered on the right side of the graph (e.g., a very easy exam where most scores are high).

    • Key feature: Tail points to the left (negative skew).

    • Positively skewed distribution:

    • Characterized by data concentrated on the left side of the graph (e.g., the amount of time to complete a very easy exam).

    • Key feature: Tail points to the right (positive skew).

Recognizing Skewness
  • To determine skewness:

    • Place an imaginary arrowhead at the tail of the distribution:

    • For negative skew, arrow points left (toward negative numbers).

    • For positive skew, arrow points right (toward positive numbers).

Central Tendency

  • Central tendency provides a measure of the center of the data set. It can be measured in three primary ways:

    • Mean: Average of all numbers in a data set.

    • Median: The middle value when the numbers are arranged in ascending order.

    • Mode: The value that appears most frequently.

Central Tendency in Normal Distribution
  • In a normal distribution, the value of the mean, median, and mode are all equal:

    • Example:

    • If the average income is $30,000:

      • Mode = $30,000

      • Median = $30,000

      • Mean = $30,000

Central Tendency in Skewed Distribution
  • In skewed distributions, mean, median, and mode differ:

    • Example for Positively Skewed Distribution:

    • Mode < Median < Mean

    • If extreme values are added, they disproportionately raise the mean while having less effect on the median.

    • Mathematical Representation:

      • For the set {1, 2, 3, 4, 5}:

      • Mean = 3,

      • Median = 3,

      • Mode = 3.

      • When adding an extreme observation (e.g., 100),

      • New Mean > Median.

  • Results:

    • The mode remains largely unchanged due to its frequency basis.

    • The median provides a more accurate reflection of a 'typical' value in the case of skewed distributions due to less distortion from extreme values.

Variability

  • Variability measures how much the data points differ from each other.

  • Variation can be observed even with the same measure of central tendency among different groups:

    • Example: Two groups with an average score of 15:

    • One group tightly clustered around 15 (low variability).

    • Another group more spread out from 15 (high variability).

Standard Deviation
  • One common measure of variability is standard deviation:

    • Represents distance away from the mean or measure of central tendency.

    • Graphical Representation:

    • First standard deviation above and below the mean contains approximately 34% of observations each.

    • As additional standard deviations are calculated, the percentage of observations shrinks.

    • Real-world Application:

    • For an exam with a mean score of 75% and a standard deviation of ±10%, 68% of students scored between:

      • 65% (75% - 10%)

      • 85% (75% + 10%).

    • Conversely, if the standard deviation were reduced to ±5%, then 68% would score between:

    • 70% (75% - 5%)

    • 80% (75% + 5%).

  • Implications of Standard Deviation:

    • A smaller standard deviation indicates less variability and a clustering of scores around the mean; larger standard deviations indicate more spread and variability among scores.