Biostatistics, Chapters I & II

Sampling

  • Population: complete collection of all measurements or data that are being considered.

  • Sample: sub-collecion of members selected from a population

  • Simple Random Sample: each member of the population has the same change of being included, and samples are chosen independently

  • Cluster Sampling: dividing the population into groups by a category. All of the individuals within the single group are the sample.

  • Stratified Random Sampling: divide the population into groups (strata) based on one+ classification criteria. Then perform a simple random sample within each strata

  • Sampling Bias: some members of the population have a higher chance to be selected than others.

Variables

  • Categorical Variables: two+ categories, but no intrinsic ordering (ex: blood type)

  • Ordinal Variable: categorical variables but with a clear ordering (small/medium/large)

  • Numeric Variables

    • Discrete Variables: a numeric variable for which we can list the possible values (think: integers)

    • Continuous Variable: a numeric variable that is measured on a continuous scale (temperature, height)

  • Bar Charts: frequency distribution for categorical variables

  • Histograms: frequency distribution but no spaces

Frequency Variables

  • Mean, denoted by ȳ

    • Mean: The average of the observations

    • Only for discrete or continuous data

    • ȳ = (Σ yi)/(n)

    • Sensitive to outliers

  • Median, denoted by ỹ

    • N is odd: (n + 1)th largest value

    • N is even: average of (n/2)th largest value and (n/(2) + 1)th

  • Symmetric and Unimodal Curve

  • Symmetric and Multimodal Curve

Box Plots

  • Quartiles

    • Q1 = 25th Percentile

    • Q2 = 50th Percentile (Median)

    • Q3 = 75th Percentile

  • Fences

    • LF = Q1 - h

    • UF = Q3 + h

    • h = 1.5(Q3 - Q1)

    • Outliers are any points that lie outside of the LF and UF

  • Drawing a Box Plot

    • Central box from Q1 to Q3

    • Line in the middle is Q2

    • Whiskers extend to the point CLOSEST to the LF & UF (not the actual values of the fences)

    • Outliers are marked by small circles

Label y axis

Variance

  • Sample variance

    • s^2 = Σ(yi - ȳ)^2 / n - 1

    • Remember to subtract one from n

  • Simple Standard deviation

    • Sqrt(s^2)

    • Same unit as the original data value

robot