Statistics Review: Continuous vs. Discrete Variables, Standard Deviation, and Normal Distributions

Statistics Review: Continuous vs. Discrete Variables, Standard Deviation, and Normal Distributions

Continuous vs. Discrete Variables: The PhD Percentage Example

  • Initial Confusion: Students often confuse percentages derived from counts as discrete variables because individual counts (e.g., number of people) are discrete.
    • If the question were about the number of faculty members with PhDs, then it would be a discrete variable.
    • This is because you can have 1 or 2 people with PhDs, but not 1.5 or 1.67 people.
  • Clarification: Percentage of PhDs as a Continuous Variable:
    • The percentage of faculty members with PhDs is a continuous variable.
    • Reasoning: Continuous values can take on any number within a range, without gaps or jumps.
    • Example 1: If 3 out of 10 faculty members have PhDs, the percentage is \frac{3}{10} = 30\%.
    • Example 2: If 31 out of 100 faculty members have PhDs, the percentage is \frac{31}{100} = 31\%.
    • Example 3: If 311 out of 1000 faculty members have PhDs, the percentage is \frac{311}{1000} = 31.1\%.
    • Example 4: If 3111 out of 10000 faculty members have PhDs, the percentage is \frac{3111}{10000} = 31.11\%.
    • Observation: As the denominator (total faculty) increases, the possible percentages become more granular. You can have 30\%, 31\%, 31.1\%, 31.11\%, 31.116\%, or even repeating decimals like 33.3\overline{3}\% (from 3 out of 9 faculty).
    • Key Takeaway: Even when dealing with discrete counts of individuals, performing mathematical operations like division (to calculate a percentage) can result in a continuous value, as there are no gaps between possible percentage values.

Estimating Standard Deviation in a Normal Distribution

  • Context: Used for clicker questions or quick estimations.
  • Step 1: Assess Normality:
    • Ask: Does the distribution look normal?
    • Characteristics of a normal distribution: unimodal, symmetric, bell-shaped.
    • Crucial Point: This estimation method only applies to normal distributions. It is not valid for heavily skewed or bimodal distributions.
  • Step 2: Identify the Mean (Center):
    • For a symmetric normal distribution, the mean is in the exact middle.
    • Example: If the histogram balances around 200, then the mean is approximately 200.
  • Step 3: Determine Where the Tails Die Out:
    • Look for the points on either side of the mean where the distribution's frequency approaches zero.
    • Example: If the mean is 200, and the tails taper off around 50 on the lower end and 350 on the upper end.
  • Step 4: Calculate the Distance from the Mean to the Tail's End:
    • Subtract the mean from the upper tail's end: 350 - 200 = 150
    • Subtract the lower tail's end from the mean: 200 - 50 = 150
    • This distance (e.g., 150) represents approximately three standard deviations (3 imes\sigma) in a normal distribution.
  • Step 5: Estimate the Standard Deviation:
    • Divide the distance by 3: \sigma \approx \frac{\text{Distance}}{3}.
    • Example: \sigma \approx \frac{150}{3} = 50.
    • Hence, an estimated standard deviation for the example distribution is about 50.

Characteristics of a Normal Distribution: Not All Symmetric, Unimodal Distributions are Normal

  • Common Misconception: Being unimodal and symmetric does not automatically make a distribution normal.
  • Normal Distribution Defined: It has a very specific bell shape with characteristic tails and curve properties, which can be described by a formal mathematical formula (not necessary to memorize).
  • Examples of Non-Normal, Unimodal, Symmetric Distributions:
    • Data piled up in the middle and then dropping off very quickly (e.g., rectangular or spiky distribution).
    • Distributions that are unimodal and symmetric but lack the gradual bell shape and tapering tails of a true normal distribution (e.g., too flat, too peaked, or too uniform in the middle).
  • Key Point: When we refer to a