Statistics Review: Continuous vs. Discrete Variables, Standard Deviation, and Normal Distributions

Continuous vs. Discrete Variables: The PhD Percentage Example

Initial Confusion: Students often confuse percentages derived from counts as discrete variables because individual counts (e.g., number of people) are discrete.
- If the question were about the number of faculty members with PhDs, then it would be a discrete variable.
- This is because you can have $1$ or $2$ people with PhDs, but not $1.5$ or $1.67$ people.
Clarification: Percentage of PhDs as a Continuous Variable:
- The percentage of faculty members with PhDs is a continuous variable.
- Reasoning: Continuous values can take on any number within a range, without gaps or jumps.
- Example 1: If $3$ out of $10$ faculty members have PhDs, the percentage is $\frac{3}{10} = 30\%$ .
- Example 2: If $31$ out of $100$ faculty members have PhDs, the percentage is $\frac{31}{100} = 31\%$ .
- Example 3: If $311$ out of $1000$ faculty members have PhDs, the percentage is $\frac{311}{1000} = 31.1\%$ .
- Example 4: If $3111$ out of $10000$ faculty members have PhDs, the percentage is $\frac{3111}{10000} = 31.11\%$ .
- Observation: As the denominator (total faculty) increases, the possible percentages become more granular. You can have $30\%$ , $31\%$ , $31.1\%$ , $31.11\%$ , $31.116\%$ , or even repeating decimals like $33.3\overline{3}\%$ (from $3$ out of $9$ faculty).
- Key Takeaway: Even when dealing with discrete counts of individuals, performing mathematical operations like division (to calculate a percentage) can result in a continuous value, as there are no gaps between possible percentage values.

Estimating Standard Deviation in a Normal Distribution

Context: Used for clicker questions or quick estimations.
Step 1: Assess Normality:
- Ask: Does the distribution look normal?
- Characteristics of a normal distribution: unimodal, symmetric, bell-shaped.
- Crucial Point: This estimation method only applies to normal distributions. It is not valid for heavily skewed or bimodal distributions.
Step 2: Identify the Mean (Center):
- For a symmetric normal distribution, the mean is in the exact middle.
- Example: If the histogram balances around $200$ , then the mean is approximately $200$ .
Step 3: Determine Where the Tails Die Out:
- Look for the points on either side of the mean where the distribution's frequency approaches zero.
- Example: If the mean is $200$ , and the tails taper off around $50$ on the lower end and $350$ on the upper end.
Step 4: Calculate the Distance from the Mean to the Tail's End:
- Subtract the mean from the upper tail's end: $350 - 200 = 150$
- Subtract the lower tail's end from the mean: $200 - 50 = 150$
- This distance (e.g., $150$ ) represents approximately three standard deviations ( $3 imes\sigma$ ) in a normal distribution.
Step 5: Estimate the Standard Deviation:
- Divide the distance by $3$ : $\sigma \approx \frac{\text{Distance}}{3}$ .
- Example: $\sigma \approx \frac{150}{3} = 50$ .
- Hence, an estimated standard deviation for the example distribution is about $50$ .

Characteristics of a Normal Distribution: Not All Symmetric, Unimodal Distributions are Normal

Common Misconception: Being unimodal and symmetric does not automatically make a distribution normal.
Normal Distribution Defined: It has a very specific bell shape with characteristic tails and curve properties, which can be described by a formal mathematical formula (not necessary to memorize).
Examples of Non-Normal, Unimodal, Symmetric Distributions:
- Data piled up in the middle and then dropping off very quickly (e.g., rectangular or spiky distribution).
- Distributions that are unimodal and symmetric but lack the gradual bell shape and tapering tails of a true normal distribution (e.g., too flat, too peaked, or too uniform in the middle).
Key Point: When we refer to a