Biostatistics, Chapters I & II

Sampling

  • Population: complete collection of all measurements or data that are being considered.
  • Sample: sub-collecion of members selected from a population
  • Simple Random Sample: each member of the population has the same change of being included, and samples are chosen independently
  • Cluster Sampling: dividing the population into groups by a category. All of the individuals within the single group are the sample.
  • Stratified Random Sampling: divide the population into groups (strata) based on one+ classification criteria. Then perform a simple random sample within each strata
  • Sampling Bias: some members of the population have a higher chance to be selected than others.

Variables

  • Categorical Variables: two+ categories, but no intrinsic ordering (ex: blood type)
  • Ordinal Variable: categorical variables but with a clear ordering (small/medium/large)
  • Numeric Variables   * Discrete Variables: a numeric variable for which we can list the possible values (think: integers)   * Continuous Variable: a numeric variable that is measured on a continuous scale (temperature, height)
  • Bar Charts: frequency distribution for categorical variables
  • Histograms: frequency distribution but no spaces

Frequency Variables

  • Mean, denoted by ȳ   * Mean: The average of the observations   * Only for discrete or continuous data   * ȳ = (Σ yi)/(n)   * Sensitive to outliers
  • Median, denoted by ỹ   * N is odd: (n + 1)th largest value   * N is even: average of (n/2)th largest value and (n/(2) + 1)th
  • Symmetric and Unimodal Curve   *
  • Symmetric and Multimodal Curve   *

Box Plots

  • Quartiles   * Q1 = 25th Percentile   * Q2 = 50th Percentile (Median)   * Q3 = 75th Percentile
  • Fences   * LF = Q1 - h   * UF = Q3 + h   * h = 1.5(Q3 - Q1)   * Outliers are any points that lie outside of the LF and UF
  • Drawing a Box Plot   * Central box from Q1 to Q3   * Line in the middle is Q2   * Whiskers extend to the point CLOSEST to the LF & UF (not the actual values of the fences)   * Outliers are marked by small circles

Label y axis

Variance

  • Sample variance   * s^2 = Σ(yi - ȳ)^2 / n - 1   * Remember to subtract one from n
  • Simple Standard deviation   * Sqrt(s^2)   * Same unit as the original data value
  • \

\