Biostatistics, Chapters I & II
Sampling
- Population: complete collection of all measurements or data that are being considered.
- Sample: sub-collecion of members selected from a population
- Simple Random Sample: each member of the population has the same change of being included, and samples are chosen independently
- Cluster Sampling: dividing the population into groups by a category. All of the individuals within the single group are the sample.
- Stratified Random Sampling: divide the population into groups (strata) based on one+ classification criteria. Then perform a simple random sample within each strata
- Sampling Bias: some members of the population have a higher chance to be selected than others.
Variables
- Categorical Variables: two+ categories, but no intrinsic ordering (ex: blood type)
- Ordinal Variable: categorical variables but with a clear ordering (small/medium/large)
- Numeric Variables
* Discrete Variables: a numeric variable for which we can list the possible values (think: integers)
* Continuous Variable: a numeric variable that is measured on a continuous scale (temperature, height) - Bar Charts: frequency distribution for categorical variables
- Histograms: frequency distribution but no spaces
Frequency Variables
- Mean, denoted by ȳ
* Mean: The average of the observations
* Only for discrete or continuous data
* ȳ = (Σ yi)/(n)
* Sensitive to outliers - Median, denoted by ỹ
* N is odd: (n + 1)th largest value
* N is even: average of (n/2)th largest value and (n/(2) + 1)th - Symmetric and Unimodal Curve
*
- Symmetric and Multimodal Curve
*
Box Plots
- Quartiles
* Q1 = 25th Percentile
* Q2 = 50th Percentile (Median)
* Q3 = 75th Percentile - Fences
* LF = Q1 - h
* UF = Q3 + h
* h = 1.5(Q3 - Q1)
* Outliers are any points that lie outside of the LF and UF - Drawing a Box Plot
* Central box from Q1 to Q3
* Line in the middle is Q2
* Whiskers extend to the point CLOSEST to the LF & UF (not the actual values of the fences)
* Outliers are marked by small circles
Label y axis
Variance
- Sample variance
* s^2 = Σ(yi - ȳ)^2 / n - 1
* Remember to subtract one from n - Simple Standard deviation
* Sqrt(s^2)
* Same unit as the original data value - \