Biostatistics: Introduction to Biostatistics (B300) - Practice Flashcards

Course logistics and access

Course: Biostatistics Introduction to Biostatistics B300 (in-person and online components)
Canvas layout overview:
- Lectures posted just before the lecture (usually the night before)
- Lectures accessible via a PowerPoint file for today’s topic
- Assignments: Homework and quizzes posted under Assignments
- Quizzes: graded online; Homework: graded by TA (to be assigned)
- Syllabus and course information available in Canvas; a separate official syllabus document is posted
- Zoom link available for remote attendance or one-on-one help
- Kaltura Media Gallery: video recordings of lectures (videos go there after posting)
Class participation and attendance:
- Recording the class may influence some students to stay home; the instructor encourages attendance for better learning outcomes
- Attendance tracking may be implemented in the future, but not strictly required for all sessions
Office hours and contact:
- Instructor: Nayus (pronounced like “my goose” as a mnemonic); open to scheduling by appointment
- Office location (RG 6056) and phone; email address: a9@iu.edu
- No fixed schedule of formal office hours; contact for meetings as needed
Class format notes:
- The instructor also teaches two online classes (B300 Fort Wayne and B301 - Biostatistics for Health Information Management) and records lectures for those classes as well
- The class uses a blended format: in-person meetings, online access, and recorded lectures
Textbook and software overview:
- Textbook: OpenIntro Statistics, 4th edition (open-access, low cost)
- OpenIntro website: openintro.org; supports a free PDF download and low-cost paperback options
- R software for statistics: introduced in Week 2; setup guidance provided via poseit.cloud (free plan available)
- OpenIntro labs: Introduction to R and R Studio available on the textbook site; recommended to complete before heavy usage in class
Emphasis on self-directed learning:
- The instructor will refer to OpenIntro labs and slides; slides are edited for clarity and supplemented with instructor notes
- A cheat sheet (one 8.5 × 11, two-sided) will be used during exams; not an open-book/open-notes policy, but a two-sided sheet with essential notes
Exam and assessment structure:
- Three in-class exams (must be taken in person unless a very good reason is approved in advance)
- Final exam scheduled for the last day of the semester (December 19) at 3:00 PM
- Homework due two weeks after each lecture; quizzes due at 11:59 PM on the due date
- A placeholder policy: if a student is unable to attend an exam for a legitimate reason, prior approval is required; otherwise, attendance is expected
Grading and adjustments:
- The grading scheme uses a percentage scale; if overall grades are too low, z-score adjustments may be applied to raise some grades
- The instructor discourages copying or “homework leeches”; collaboration is allowed but copying is forbidden
Additional course notes:
- The syllabus includes sections on academic integrity, sexual misconduct, accommodations (AES), course withdrawal, and incompletes
- If you request disability accommodations, contact AES (Accessible Educational Services)
Opening video: statistics in the real world
- A short video introduces descriptive statistics (data organization and presentation) and inferential statistics (drawing conclusions about populations from samples), with examples across occupations and contexts
- Probability and interpretation of results (confidence and significance) are introduced as part of the foundation for later topics
Personal context from the instructor:
- Background: long career in statistics; experience at a major hospital, Lilly (pharmaceuticals), and as a university lecturer since 1990
- Personal anecdotes illustrate real-world applications of statistics (clinical trials, hospital quality control, travel data, zoo attendance, etc.)
- A note on teaching style: the instructor sometimes teaches seated due to prior surgeries but aims to remain engaging and accessible
Opening case study overview (Chronic Fatigue Syndrome):
- Objective: evaluate cognitive behavioral therapy (CBT) vs relaxation for chronic fatigue syndrome
- Participant pool: 142 patients recruited; 60 entered the study; exclusions for criteria/health status; some declined
- Study design: randomized assignment to two groups; treatment group (CBT) vs control (relaxation); nt = 27, nc = 26
- Outcome measure: proportion of patients with “good results”
- Results:
- Treatment group: 19/27 good results (CBT)
- Control group: 5/26 good results (relaxation)
- Key computations:
- Proportion in treatment: $p_t = rac{19}{27} = 0.70$
- Proportion in control: $p_c = rac{5}{26} approx 0.1923$
 - Percentage form: 70% vs ~19% (control)
- Difference in proportions: $ext{Difference} = pt - pc = 0.70 - 0.1923 approx 0.5077 ext{ (about } 51 ext{ percentage points)}$
- Interpretive notes:
- The large observed difference suggests the CBT treatment may have a real effect beyond random fluctuation, but external generalizability is limited due to the specific volunteer sample
- The concept of random variation is introduced to explain why not all samples yield identical results; a larger or more diverse sample could be explored in future studies
Data basics: types of variables and data structure
- Data collection context: survey of students in an introductory statistics course; variables collected include:
- Gender (categorical)
- Introvert vs extrovert (categorical; often treated as ordinal when using a scale)
- Sleep hours (numerical, continuous)
- Bedtime category (categorical, ordinal)
- Number of countries visited (numerical, discrete)
- Dread level for the class (ordinal scale 1–5; can be treated numerically for averages but inherently categorical)
- Data matrix concept:
- Rows = observations (e.g., students)
- Columns = variables (features)
- Variable types:
- Numerical vs categorical:
  - Numerical: measurements or counts (continuous or discrete)
  - Categorical: groups or categories (nominal) or ordered categories (ordinal)
- Continuous vs discrete (numeric types):
  - Discrete numerical: counts (e.g., number of children, number of classes taken)
  - Continuous numerical: measurements (e.g., height, weight, age)
- Examples from the lecture:
  - Age is often treated as discrete but is technically continuous (you can have fractional ages)
  - ZIP code is categorical (a code, not a numeric quantity with intrinsic magnitude)
  - Area code is categorical
  - Ordinal categorical variables have a natural order (e.g., cancer stages: 1 to 4; dread level 1–5)
- Explanatory vs response variables (for potential causal interpretation):
- Explanatory (independent) variable: e.g., hours of study
- Response (dependent) variable: e.g., GPA
- Note: labeling does not prove causation; relationships observed in data may be correlational or due to confounding factors
- Observational studies vs experiments:
- Observational: data collected without random assignment; can reveal associations but not causal conclusions
- Experiment: random assignment to treatments; can establish causal relationships
- Association vs independence:
- Associated/dependent: there is a relationship or connection between variables
- Independent: no association observed between variables in the data
- Scatter plots: example with head length vs skull width shows a positive association (the variables tend to move together)
Sampling concepts: populations, samples, and census
- Population: all members of the group of interest (e.g., all adults, all women with lupus, all Americans in an election population)
- Sample: a subset of the population actually measured or surveyed
- Census: complete enumeration of the population; often impractical due to cost, time, and locating difficult-to-reach individuals (e.g., immigrants)
- Example scenarios:
- Population for a lupus drug study: all people with lupus; sample might be 2,000 people from around the world
- Election polling: population = all registered voters; sample might be ~1,000 respondents per survey
- Why sampling is used: practical constraints make census infeasible; sampling aims to infer population characteristics from a representative subset
Anecdotal evidence vs evidence-based conclusions
- Anecdotal evidence:
- Based on limited, non-representative examples (e.g., uncle who smoked for decades without health issues)
- Historically used in early smoking research but later found to be insufficient for general conclusions
- Evidence-based conclusions:
- Based on large, representative samples and systematic analysis showing consistent patterns (e.g., smoking linked to lung cancer and heart disease)
Practical implications for biostatistics and research design
- Distinguish between descriptive statistics (summarizing data) and inferential statistics (drawing conclusions about populations from samples)
- Understand the limitations of observational studies and the value of randomized experiments for causal inference
- Recognize the importance of data quality, sample representativeness, and potential biases (e.g., nonresponse, selection bias)
- Use of standardization and z-scores for grade adjustments when necessary; understanding how adjustments affect interpretation of results
Quick reference concepts to memorize for quizzes/exams
- Proportion and percentage definitions:
- Proportion: $p = \frac{\text{number of successes}}{\text{total}}$
- Percentage: $\text{Percentage} = p \times 100\%$
- Example from the case study:
- Treatment success proportion: $p_t = \frac{19}{27} = 0.70 \; (70\%)$
- Control success proportion: $p_c = \frac{5}{26} \approx 0.1923 \; (\approx 19\%)$
- Observed difference: $\Delta p = pt - pc \approx 0.70 - 0.1923 = 0.5077 \approx 0.51 \; (51\text{ percentage points})$
- Probability and random variation intuition (e.g., coin flip analogy) to understand why observed differences may reflect true effects or random fluctuation
Notable formulas to remember for later use
- Z-score (conceptual form; used for grading adjustments): $z = \frac{X - \mu}{\sigma}$ (where X is an observed score, μ is the mean, σ is the standard deviation)
- Basic proportion and percentage as shown above
Summary takeaways
- The course blends lectures, online materials, and recorded videos; attendance is encouraged for engagement and better performance
- OpenIntro Statistics is the core textbook; free PDF available and affordable paperback options; supportive online labs for R and R Studio
- A three-exam in-class format with a final exam; assignments due on a two-week cycle; cheat sheet use is allowed as a study aid
- Core concepts introduced include descriptive vs inferential statistics, variability, probability, and the logic of sampling, observation, and experimentation
- Data types and variable classification are foundational for choosing appropriate analyses
- Real-world context and ethical considerations are integrated (e.g., avoiding data fabrication, respecting participants, and recognizing limitations of evidence)
Connections to prior and future topics
- This first lecture lays the groundwork for exploratory data analysis, later moving toward inference, probability, and sampling distributions
- Subsequent lectures will cover exploratory analysis through to inference, building on the data basics established here
Real-world relevance and applications
- Statistics informs medical decision-making, public health policy, market research, sports analytics, and many other fields
- Understanding statistical literacy helps evaluate claims in news, research articles, and policy debates
Ethical and practical implications
- Emphasizes the importance of proper study design, avoiding inappropriate generalizations, and recognizing biases in data collection
- Highlights the role of transparency (sharing data, methods) and the limits of extrapolating from small or non-representative samples
Quick tips for study preparation
- Review the difference between population and sample and be able to identify the population for a given study
- Practice identifying variable types from sample survey questions and data examples
- Work through the case study proportions and practice converting to percentages and percentage point differences
- Familiarize yourself with the OpenIntro OpenBook strategy and R/Lab resources prior to hands-on weeks
Next steps in course progression
- Expect to begin using R and RStudio after week 1; complete Intro to R labs on OpenIntro before deep statistical analyses
- Prepare for the first in-class exam and the associated cheat sheet; plan assignments and quizzes on the two-week timeline
- Follow the schedule: Lecture 1 today (Introduction to Data); Lecture 2 on September 8 (Summarizing and Reading assignments); and continue with planned exam dates and breaks