Course Name: P8131 Biostatistical Methods II
Instructor: Wenpin Hou, Ph.D.
Department: Biostatistics, Columbia University
Dobson, A. J., & Barnett, A. G. (2008). An Introduction to Generalized Linear Model (3rd Ed.). Chapman & Hall.
Fitzmaurice, G. M., Laird, N. M. & Ware, J. H. (2011). Applied Longitudinal Analysis (2nd Ed.). Wiley.
Hosmer, D. W., Lemeshow, S., & May, S. (2008). Applied Survival Analysis (2nd Ed.). Wiley.
Faraway, J. J. (2016). Extending the Linear Model with R (2nd Ed.). Chapman & Hall.
Agresti, A. (2015). Foundations of Linear and Generalized Linear Models (1st Ed.). John Wiley & Sons, Inc.
Diggle, P. J., Heagerty, P., Liang, K. Y., & Zeger, S. L. (2013). Analysis of Longitudinal Data (2nd Ed.). Oxford.
Klein, J. P., & Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data (2nd Ed.). Springer.
Casella, G., & Berger, R. L. (2024). Statistical Inference (2nd Ed.). Springer.
Homework: 40% (10 assignments, equal weights, submit via Canvas, late submissions not accepted)
Midterm Exam: 30% (Date: March 13, in class, one double-sided A4 reference sheet allowed)
Final Exam: 30% (Date: April 29, in class, one double-sided A4 reference sheet allowed)
Canvas: Check frequently for homework, materials, and grades
Honor Code: Adhere to the Mailman School Student Honor Code; academic integrity violations will be reported.
Instructor: Tuesdays, 1:30-2:30 PM
Teaching Assistants:
Safiya Sirota: Wednesdays, 10-11 AM, ARB R638
Ting-Hsuan Chang: Thursdays, 1-2 PM, ARB R638
Yuying Lu: Mondays, 1-2 PM, ARB R638
Qin Huang: Tuesdays, 3-4 PM with Huanyu, ARB R627
Huanyu Chen: Tuesdays, 3-4 PM, ARB R627
Classroom Policy: Attendance highly recommended; participation encouraged; do not share course material online without permission. Direct administrative questions to Paul McCullough (pm2692).
Continuation: This course continues P8130 (Biostatistical Methods I) and covers key areas including:
Generalized Linear Models
Longitudinal Data Analysis
Survival Analysis
Objective: Introduce basic concepts of each topic and demonstrate their application in real-world problems.
Software: The R programming language will primarily be used, but students may also use SAS, Matlab, Python, SPSS, or Excel for homework analysis.
Exponential family distributions
Generalized linear model basics
Logistic regression
Nominal and ordinal logistic regression
Poisson regression
Contingency table
Case study
Context: Measurements on 81 children post-corrective spinal surgery
Response Variable: Kyphosis (presence/absence of deformity)
Covariates:
Age of child (months)
Number of vertebrae involved
Starting point of involved vertebrae
Questions: Relationship of covariates to response; patient screening feasibility
Participants: 150 men and 150 women
Focus: Importance of air conditioning and power steering in car buying
Data Representation: Importance ratings by gender and age group
Question: Relationship between sex, age, and car preferences
Objective: Assess damage risk in ships based on various factors
Variables:
Ship type (A-E)
Year of construction (1960-79)
Period of operation
Aggregated months of service
Number of damage accidents
Questions: Impact of these variables on damage occurrences
Definition: A large family of distributions including normal, exponential, Poisson, Bernoulli, binomial, gamma, and beta.
Parameter Representation: Written as f(y; θ) = s(y)t(θ) exp(a(y)b(θ)), where a, b, s, and t are known functions.
Canonical Form: If a(y) = y, it’s the canonical form with b(θ) as the natural parameter.
Integration: Must satisfy Z f(y; θ)dy = 1 for r.v. Y
Derivatives with respect to θ:
d/dθ Z f(y; θ) dy = d/dθ [1] = 0
d2f(y; θ)/dθ2 = conditions for expectations and variances
Normal Distribution: f(y) = (1/√(2πσ)) exp[-(y - µ)²/(2σ²)]
Objective: Parameter µ (mean), with σ² as a nuisance parameter.
Canonical Parameters: b(µ), c(µ), d(y) in canonical form, illustrating relationships among the distributions and their parameters.
Bernoulli: f(y) = p^y (1-p)^(1-y)
Binomial: f(y; n, p) = (n choose y) * p^y (1-p)^(n-y)
Poisson: f(y; λ) = (λ^y e^(-λ))/y!
Gamma and Beta: Additional family members illustrating various statistical applications.
Moment Generating Functions:
Express relationships among parameters and moments of the distributions
Formulas: Useful formulas for transformations of distributions to ease computations.