Normal (Parametric) Distribution Study Notes

Normal (Parametric) Distribution

Definition and Characteristics

  • Normal Distribution: Commonly referred to as parametric distribution.

    • Defined by two key parameters:

    • Mean: The average value of the dataset.

    • Standard Deviation (SD): A measure of the spread of data points around the mean.

    • Visual representation in graphs:

    • Histogram: Unimodal, displaying a symmetric bell-shaped curve.

    • Mean, median, and mode coincide at the central value.

    • Box and whisker graph: Symmetrical with the median positioned mid-way between the upper and lower quartiles and extremes.

Areas Under the Curve

  • The area under the normal distribution curve is crucial for determining probabilities and percentages, applicable across all normal distributions:

    • 68% of data lies within 1 standard deviation from the mean.

    • 95% of data resides within 2 standard deviations from the mean.

    • 99.7% of data falls within 3 standard deviations from the mean.

Data Distribution Illustration

  • Distribution schematic for parametric data showing:

    • Labels on the horizontal axis:

    • $x - 3s$, $x - 2s$, $x - s$, $x$, $x + s$, $x + 2s$, $x + 3s$.

    • Distribution percentages:

    • 34% lies between the mean and 1 standard deviation each side (i.e., $x$ to $x + s$ and $x$ to $x - s$).

    • 14% falls between $x + s$ to $x + 2s$ and $x - s$ to $x - 2s$.

    • 2% exists in the tails ($x + 2s$ to $x + 3s$ and $x - 2s$ to $x - 3s$).

Standard Deviation

  • Standard Deviation (SD): A vital statistic in evaluating data variance around the mean.

    • Formula:
      s = \sqrt{\frac{\sum (X - \bar{X})^2}{n - 1}}

    • Where:

      • $s$: Sample standard deviation.

      • $X$: Each data value.

      • $ar{X}$: Sample mean.

      • $n$: Number of values in the sample.

  • Steps to calculate Standard Deviation:

    1. Calculate the mean: Sum all data points and divide by their count.

    2. Determine deviations: Subtract the mean from each data point.

    3. Square each deviation: Square the result from step 2.

    4. Sum the squares: Total all squared deviations from step 3.

    5. Calculate variance: Divide the sum from step 4 by $(n - 1)$.

    6. Square root of the variance: Take the square root of the variance obtained in step 5 to get s.

Importance of Standard Deviation

  • Lower standard deviation values indicate less variability among measurements, showcasing a higher degree of accuracy in the mean estimate.

  • When comparing means from two distinct datasets, knowing the standard deviation helps distinguish whether observed differences are due to distribution variance or signify actual differences in means.

Standard Error of the Mean (SEM)

  • Standard Error of the Mean (SEM): Represents the precision of the sample mean's estimation of the population mean.

    • Formula for SEM:
      SEM = \frac{s}{\sqrt{n}}

  • Characteristics:

    • Always less than the standard deviation, assuming multiple measurements are conducted.

    • SEM is inversely proportional to the square root of the sample size:

    • Doubling the sample size halves the standard error.

Implications of Standard Error

  • SEM significance in hypothesis testing:

    • 68% of data falls within 1 SEM from the mean.

    • 95% of data is located within 2 SEMs from the mean.

  • Sample size increase decreases SEM, allowing for better mean precision determination.

Graphical Representation of Standard Error

  • Error Bars: Visual representation of SEM in graphical data.

    • From the mean value, error bars extend 1 SEM or 2 SEMs up and down depending on required confidence levels (68% or 95% probability respectively).

  • Interpretation of error bars in data:

    • Overlapping error bars suggest no statistically significant effect due to the independent variable.

    • Non-overlapping error bars indicate a statistically significant difference in any two datasets compared.