Statistics Review: Continuous vs. Discrete Variables, Standard Deviation, and Normal Distributions
Statistics Review: Continuous vs. Discrete Variables, Standard Deviation, and Normal Distributions
Continuous vs. Discrete Variables: The PhD Percentage Example
- Initial Confusion: Students often confuse percentages derived from counts as discrete variables because individual counts (e.g., number of people) are discrete.
- If the question were about the number of faculty members with PhDs, then it would be a discrete variable.
- This is because you can have 1 or 2 people with PhDs, but not 1.5 or 1.67 people.
- Clarification: Percentage of PhDs as a Continuous Variable:
- The percentage of faculty members with PhDs is a continuous variable.
- Reasoning: Continuous values can take on any number within a range, without gaps or jumps.
- Example 1: If 3 out of 10 faculty members have PhDs, the percentage is \frac{3}{10} = 30\%.
- Example 2: If 31 out of 100 faculty members have PhDs, the percentage is \frac{31}{100} = 31\%.
- Example 3: If 311 out of 1000 faculty members have PhDs, the percentage is \frac{311}{1000} = 31.1\%.
- Example 4: If 3111 out of 10000 faculty members have PhDs, the percentage is \frac{3111}{10000} = 31.11\%.
- Observation: As the denominator (total faculty) increases, the possible percentages become more granular. You can have 30\%, 31\%, 31.1\%, 31.11\%, 31.116\%, or even repeating decimals like 33.3\overline{3}\% (from 3 out of 9 faculty).
- Key Takeaway: Even when dealing with discrete counts of individuals, performing mathematical operations like division (to calculate a percentage) can result in a continuous value, as there are no gaps between possible percentage values.
Estimating Standard Deviation in a Normal Distribution
- Context: Used for clicker questions or quick estimations.
- Step 1: Assess Normality:
- Ask: Does the distribution look normal?
- Characteristics of a normal distribution: unimodal, symmetric, bell-shaped.
- Crucial Point: This estimation method only applies to normal distributions. It is not valid for heavily skewed or bimodal distributions.
- Step 2: Identify the Mean (Center):
- For a symmetric normal distribution, the mean is in the exact middle.
- Example: If the histogram balances around 200, then the mean is approximately 200.
- Step 3: Determine Where the Tails Die Out:
- Look for the points on either side of the mean where the distribution's frequency approaches zero.
- Example: If the mean is 200, and the tails taper off around 50 on the lower end and 350 on the upper end.
- Step 4: Calculate the Distance from the Mean to the Tail's End:
- Subtract the mean from the upper tail's end: 350 - 200 = 150
- Subtract the lower tail's end from the mean: 200 - 50 = 150
- This distance (e.g., 150) represents approximately three standard deviations (3 imes\sigma) in a normal distribution.
- Step 5: Estimate the Standard Deviation:
- Divide the distance by 3: \sigma \approx \frac{\text{Distance}}{3}.
- Example: \sigma \approx \frac{150}{3} = 50.
- Hence, an estimated standard deviation for the example distribution is about 50.
Characteristics of a Normal Distribution: Not All Symmetric, Unimodal Distributions are Normal
- Common Misconception: Being unimodal and symmetric does not automatically make a distribution normal.
- Normal Distribution Defined: It has a very specific bell shape with characteristic tails and curve properties, which can be described by a formal mathematical formula (not necessary to memorize).
- Examples of Non-Normal, Unimodal, Symmetric Distributions:
- Data piled up in the middle and then dropping off very quickly (e.g., rectangular or spiky distribution).
- Distributions that are unimodal and symmetric but lack the gradual bell shape and tapering tails of a true normal distribution (e.g., too flat, too peaked, or too uniform in the middle).
- Key Point: When we refer to a