Notes for Topic 1: Describing Data Visually and Numerically

STAT 2020: Topic 1 — Describing Data Visually and Numerically

  • This topic introduces how to describe data both visually and numerically, focusing on foundational ideas used throughout descriptive statistics.

WHAT IS STATISTICS?

  • Statistics is the science of data.
  • Data are numbers with a context.
  • Biostatistics is the application of statistics to biology, including design and analysis of experiments and observational studies.

TWO BRANCHES OF STATISTICS

  • Descriptive Statistics
    • Methods for organizing, summarizing, and presenting data in an informative way.
  • Inferential Statistics
    • Methods for drawing conclusions about a population based on data from a sample.

POPULATION VS SAMPLE

  • Population: all subjects or items of interest. Size is denoted by N.
  • Sample: a group selected from a population. Size is denoted by n.
  • Many different samples can be drawn from a given population; the number of distinct samples depends on both population and sample sizes.

TERMINOLOGY

  • Data: observations (measurements, genders, survey responses) collected.
  • Parameter: a number describing a population characteristic.
  • Statistic: a number describing a sample characteristic (sample statistic).
  • The observed value of a statistic is used to estimate the unobserved value of a parameter.
  • A statistic is unbiased if its sampling distribution mean equals the parameter it estimates.

INDIVIDUALS AND VARIABLES

  • Individuals: the objects described in a data set (people, animals, plants, things).
    • Examples: freshmen, newborns, golden retrievers, fields of corn, cells.
  • A variable: a property that characterizes an individual; can take different values across individuals.
    • Examples: Age, gender, blood pressure, blood type, leaf length, flower color.

TWO TYPES OF VARIABLES

  • Quantitative (numerical) variables
    • A quantity assessed or measured for each individual; we can report the average.
    • Examples: Age (years), blood pressure (mm Hg), leaf length (cm).
  • Categorical (qualitative) variables
    • A characteristic describing each individual; we can report counts or proportions.
    • Examples: Gender (male/female), blood type (A, B, AB, O), flower color (white, yellow, red).

PROBLEM: CLASSIFYING VARIABLES

  • Data table example (Patients A–G) with variables such as Diagnosis and Age at death.
  • Question: What is being recorded about those individuals? For each variable, is it numeric (quantitative) or a statement (categorical)?

ANSWER: CLASSIFYING VARIABLES (REVIEW)

  • Reiterate that for each listed variable, determine whether it is quantitative (numeric) or categorical (qualitative).

COMMON WAYS TO CHART QUANTITATIVE DATA

  • Histograms
    • A summary graph for a single variable; useful for understanding pattern of variability, especially for large data sets.
  • Dotplots (or Stem & Leaf plots)
    • Graphs for raw data; useful for describing variability, especially for small data sets.
  • Time Series Plots
    • Graphs with a sequence on the horizontal axis (e.g., time); a line connects points to emphasize changes over time.

VISUALIZING DATA: HISTOGRAMS

  • A histogram is a graph where:
    • Horizontal axis: classes of data values.
    • Vertical axis: frequencies (or relative frequencies).
    • Heights of bars correspond to frequencies, and bars are drawn adjacent to each other.
  • Related variants: Relative Frequency Histogram.

MAKING A HISTOGRAM

1) Divide the range of the quantitative variable into equal-size intervals (classes/bins) to form the horizontal axis.
2) Vertical axis represents either the frequency (counts) or the relative frequency (percent of total).
3) For each class, draw a column whose height is the count or percent in that class.

MAKING A HISTOGRAM: GUINEA PIG SURVIVAL TIME EXAMPLE

  • Example: Guinea pig survival time (days) after inoculation with a pathogen (n = 72).
  • Build a histogram with class size 50, starting at zero (zero included in the first class).

VISUALIZING DATA: DOTPLOTS

  • A dotplot shows each data value as a point on a scale; equal values are stacked.
  • Example: Height of poplar trees under different treatments; patterns suggest fertilizer increases height.

VISUALIZING DATA: STEM & LEAF PLOT

  • Structure:
    • Stem on top; leaves listed below.
    • Example data shown (stems 7, 6, 5, etc.) with leaves arranged to reveal distribution.

MEASURES OF CENTER

  • The center of a data set is a representative value indicating where the data cluster.
  • Main measures:
    • Mean
    • Median
    • Mode

MEASURE OF CENTER: THE MEAN

  • Definition: The arithmetic average.
  • Formula: $$ar{x} = rac{x1 + x2 + \