STAT Ch 2.1-2.3

Descriptive Statistics

Histograms

  • Definition: A histogram is a graphical representation of data that groups data into intervals (bins).

    • Properties:

    • Each rectangle represents a bin and its height corresponds to the count of data values in that bin.

    • First value in each bin is represented on the horizontal axis.

    • Consecutive bins touch each other.

  • Example: The vertical axis of a frequency histogram can show either frequency or relative frequency.

  • Bin Width: Changing the bin width affects the shape of the histogram.

    • Smaller bins lead to a spikier histogram.

Stemplots

  • Also known as stem-and-leaf plots.

  • Use Cases: Useful for small datasets and when technology is unavailable.

  • Construction:

    • Each observation is divided into a "stem" (all but the last digit) and a "leaf" (the last digit).

  • Example of Stemplot:

    • For the data set: 1, 1, 1…

    • Stemplot would show values split by stems and leaves.

Features of Numerical Distribution

  • Important Features:

    • Shape: Visual appearance (symmetric, skewed, etc.).

    • Center: Typical value (mean or median).

    • Spread: Variability or range of data.

    • Outliers: Data points that differ significantly from others.

Analyzing the Shape of Distribution

  • Characteristics to consider:

    • Is the distribution symmetric or skewed?

    • How many mounds are present? (Unimodal, Bimodal, Multimodal)

    • Are there unusual values (outliers)?

  • Symmetric Distribution: Roughly equal on both sides.

  • Skewed Distribution: Most data on one sidewith a tail on the other (right or left skewed).

  • Uniform Distribution: All bars in a histogram have similar height, indicating equal frequency.

Measures of Center and Spread

  • For Symmetric Distributions:

    • Measure for Center: Mean (balancing point).

    • Measure for Spread: Standard Deviation.

  • For Skewed Distributions:

    • Measure for Center: Median (middle value).

    • Measure for Spread: Interquartile Range (IQR).

Mean

  • Sample Mean: ar{x} = \frac{\Sigma x}{n}

  • Population Mean: \mu = \frac{\Sigma x}{N}

  • Interpretation of mean: Represents the average of the dataset.

Standard Deviation

  • Measures how far each data point is from the mean.

  • More than 68% of data falls within one standard deviation in a normal distribution.

  • Standard Deviation Formula:

    1. Find the deviation (distance) of each observation from the mean: x - \bar{x} .

    2. Square each deviation.

    3. Sum the squared deviations.

    4. Divide by (n-1) (sample) or N (population).

    5. Take the square root of the result.

Interquartile Range (IQR)

  • Measures the middle 50% of variability in data.

  • Calculation:

    • Find the first (Q1) and third quartiles (Q3).

    • Calculate IQR: IQR = Q3 - Q1

Effect of Outliers

  • Outliers can significantly distort the mean.

  • The median is preferred as a measure of center when outliers are present.

Comparing Measures of Center

  • Symmetric Distributions: Use mean and standard deviation.

  • Skewed Distributions: Use median and IQR.