Descriptive Statistics Notes

Descriptive Statistics

Purpose of Descriptive Statistics

  • Describe Sample: The primary goal is to summarize and describe the characteristics of a given sample of data.

  • Not Population: It is crucial to remember that descriptive statistics specifically characterize the sample, not the entire population from which the sample was drawn, without further inferential analysis.

  • Sample Characteristics: Provides a concise overview of the data's main features.

  • External Validity: While descriptive, understanding sample characteristics can later inform considerations of external validity, though it doesn't directly establish it.

  • Abnormalities: Helps in identifying unusual data points or patterns within the sample.

  • Outliers: Facilitates the detection of outliers, which are scores significantly different from other scores in the dataset.

Central Tendency

Measures of central tendency indicate the center or typical value of a distribution.

  • Mean:

    • The most common average.

    • Calculated by summing all scores and dividing by the number of scores.

    • Formula: Mean=Sum of ScoresNumber of Scores\text{Mean} = \frac{\text{Sum of Scores}}{\text{Number of Scores}} of the sample Xˉ=Xn\bar{X} = \frac{\sum X}{n}.

  • Mode:

    • The most frequently occurring score in a dataset.

    • Can be used with all types of data (nominal, ordinal, interval, ratio).

  • Median:

    • The middle-most score in a dataset when scores are arranged in ascending or descending order.

    • If there is an even number of scores, the median is the average of the two middle scores.

Variability

Measures of variability describe the spread or dispersion of scores in a dataset.

  • Range:

    • The simplest measure of variability.

    • Calculated as the difference between the highest and the lowest score in the dataset.

    • Formula: Range=Highest ScoreLowest Score\text{Range} = \text{Highest Score} - \text{Lowest Score}.

  • Variance:

    • Represents the average of the squared deviations from the mean.

    • It quantifies how much the individual data points stray from the mean.

    • Calculation Example: Given scores 75,80,70,85,9075, 80, 70, 85, 90. The mean is 8080.

      1. Calculate deviations from the mean (score - mean):

        • 7580=575 - 80 = -5

        • 8080=080 - 80 = 0

        • 7080=1070 - 80 = -10

        • 8580=585 - 80 = 5

        • 9080=1090 - 80 = 10

      2. Square the deviations (score - mean)2^2):

        • (5)2=25(-5)^2 = 25

        • (0)2=0(0)^2 = 0

        • (10)2=100(-10)^2 = 100

        • (5)2=25(5)^2 = 25

        • (10)2=100(10)^2 = 100

      3. Sum of Squares (SS): Add the squared deviations: 25+0+100+25+100=25025 + 0 + 100 + 25 + 100 = 250.

      4. Calculate Variance: Divide the sum of squares by the number of scores (NN).

        • Variance=Sum of SquaresN=2505=50\text{Variance} = \frac{\text{Sum of Squares}}{N} = \frac{250}{5} = 50

    • Formula for sample variance: s2=(XXˉ)2ns^2 = \frac{\sum (X - \bar{X})^2}{n}.

  • Standard Deviation:

    • The square root of the variance.

    • Brings the measure of variability back to the original scale of measurement, making it more interpretable than variance.

    • Formula: Standard Deviation=Variance\text{Standard Deviation} = \sqrt{\text{Variance}} or s=(XXˉ)2ns = \sqrt{\frac{\sum (X - \bar{X})^2}{n}}.

Frequency Distribution

  • A display of the number of occurrences of each score or interval of scores in a dataset.

  • It helps visualize the pattern of scores in a sample, for example, the distribution of grades as shown in the provided