Biostatistics: Introduction to Biostatistics (B300) - Practice Flashcards

Course logistics and access

  • Course: Biostatistics Introduction to Biostatistics B300 (in-person and online components)

  • Canvas layout overview:

    • Lectures posted just before the lecture (usually the night before)

    • Lectures accessible via a PowerPoint file for today’s topic

    • Assignments: Homework and quizzes posted under Assignments

    • Quizzes: graded online; Homework: graded by TA (to be assigned)

    • Syllabus and course information available in Canvas; a separate official syllabus document is posted

    • Zoom link available for remote attendance or one-on-one help

    • Kaltura Media Gallery: video recordings of lectures (videos go there after posting)

  • Class participation and attendance:

    • Recording the class may influence some students to stay home; the instructor encourages attendance for better learning outcomes

    • Attendance tracking may be implemented in the future, but not strictly required for all sessions

  • Office hours and contact:

    • Instructor: Nayus (pronounced like “my goose” as a mnemonic); open to scheduling by appointment

    • Office location (RG 6056) and phone; email address: a9@iu.edu

    • No fixed schedule of formal office hours; contact for meetings as needed

  • Class format notes:

    • The instructor also teaches two online classes (B300 Fort Wayne and B301 - Biostatistics for Health Information Management) and records lectures for those classes as well

    • The class uses a blended format: in-person meetings, online access, and recorded lectures

  • Textbook and software overview:

    • Textbook: OpenIntro Statistics, 4th edition (open-access, low cost)

    • OpenIntro website: openintro.org; supports a free PDF download and low-cost paperback options

    • R software for statistics: introduced in Week 2; setup guidance provided via poseit.cloud (free plan available)

    • OpenIntro labs: Introduction to R and R Studio available on the textbook site; recommended to complete before heavy usage in class

  • Emphasis on self-directed learning:

    • The instructor will refer to OpenIntro labs and slides; slides are edited for clarity and supplemented with instructor notes

    • A cheat sheet (one 8.5 × 11, two-sided) will be used during exams; not an open-book/open-notes policy, but a two-sided sheet with essential notes

  • Exam and assessment structure:

    • Three in-class exams (must be taken in person unless a very good reason is approved in advance)

    • Final exam scheduled for the last day of the semester (December 19) at 3:00 PM

    • Homework due two weeks after each lecture; quizzes due at 11:59 PM on the due date

    • A placeholder policy: if a student is unable to attend an exam for a legitimate reason, prior approval is required; otherwise, attendance is expected

  • Grading and adjustments:

    • The grading scheme uses a percentage scale; if overall grades are too low, z-score adjustments may be applied to raise some grades

    • The instructor discourages copying or “homework leeches”; collaboration is allowed but copying is forbidden

  • Additional course notes:

    • The syllabus includes sections on academic integrity, sexual misconduct, accommodations (AES), course withdrawal, and incompletes

    • If you request disability accommodations, contact AES (Accessible Educational Services)

  • Opening video: statistics in the real world

    • A short video introduces descriptive statistics (data organization and presentation) and inferential statistics (drawing conclusions about populations from samples), with examples across occupations and contexts

    • Probability and interpretation of results (confidence and significance) are introduced as part of the foundation for later topics

  • Personal context from the instructor:

    • Background: long career in statistics; experience at a major hospital, Lilly (pharmaceuticals), and as a university lecturer since 1990

    • Personal anecdotes illustrate real-world applications of statistics (clinical trials, hospital quality control, travel data, zoo attendance, etc.)

    • A note on teaching style: the instructor sometimes teaches seated due to prior surgeries but aims to remain engaging and accessible

  • Opening case study overview (Chronic Fatigue Syndrome):

    • Objective: evaluate cognitive behavioral therapy (CBT) vs relaxation for chronic fatigue syndrome

    • Participant pool: 142 patients recruited; 60 entered the study; exclusions for criteria/health status; some declined

    • Study design: randomized assignment to two groups; treatment group (CBT) vs control (relaxation); nt = 27, nc = 26

    • Outcome measure: proportion of patients with “good results”

    • Results:

    • Treatment group: 19/27 good results (CBT)

    • Control group: 5/26 good results (relaxation)

    • Key computations:

    • Proportion in treatment: p_t = rac{19}{27} = 0.70

    • Proportion in control: p_c = rac{5}{26} approx 0.1923

      • Percentage form: 70% vs ~19% (control)

    • Difference in proportions: ext{Difference} = pt - pc = 0.70 - 0.1923
      approx 0.5077 ext{ (about } 51 ext{ percentage points)}

    • Interpretive notes:

    • The large observed difference suggests the CBT treatment may have a real effect beyond random fluctuation, but external generalizability is limited due to the specific volunteer sample

    • The concept of random variation is introduced to explain why not all samples yield identical results; a larger or more diverse sample could be explored in future studies

  • Data basics: types of variables and data structure

    • Data collection context: survey of students in an introductory statistics course; variables collected include:

    • Gender (categorical)

    • Introvert vs extrovert (categorical; often treated as ordinal when using a scale)

    • Sleep hours (numerical, continuous)

    • Bedtime category (categorical, ordinal)

    • Number of countries visited (numerical, discrete)

    • Dread level for the class (ordinal scale 1–5; can be treated numerically for averages but inherently categorical)

    • Data matrix concept:

    • Rows = observations (e.g., students)

    • Columns = variables (features)

    • Variable types:

    • Numerical vs categorical:

      • Numerical: measurements or counts (continuous or discrete)

      • Categorical: groups or categories (nominal) or ordered categories (ordinal)

    • Continuous vs discrete (numeric types):

      • Discrete numerical: counts (e.g., number of children, number of classes taken)

      • Continuous numerical: measurements (e.g., height, weight, age)

    • Examples from the lecture:

      • Age is often treated as discrete but is technically continuous (you can have fractional ages)

      • ZIP code is categorical (a code, not a numeric quantity with intrinsic magnitude)

      • Area code is categorical

      • Ordinal categorical variables have a natural order (e.g., cancer stages: 1 to 4; dread level 1–5)

    • Explanatory vs response variables (for potential causal interpretation):

    • Explanatory (independent) variable: e.g., hours of study

    • Response (dependent) variable: e.g., GPA

    • Note: labeling does not prove causation; relationships observed in data may be correlational or due to confounding factors

    • Observational studies vs experiments:

    • Observational: data collected without random assignment; can reveal associations but not causal conclusions

    • Experiment: random assignment to treatments; can establish causal relationships

    • Association vs independence:

    • Associated/dependent: there is a relationship or connection between variables

    • Independent: no association observed between variables in the data

    • Scatter plots: example with head length vs skull width shows a positive association (the variables tend to move together)

  • Sampling concepts: populations, samples, and census

    • Population: all members of the group of interest (e.g., all adults, all women with lupus, all Americans in an election population)

    • Sample: a subset of the population actually measured or surveyed

    • Census: complete enumeration of the population; often impractical due to cost, time, and locating difficult-to-reach individuals (e.g., immigrants)

    • Example scenarios:

    • Population for a lupus drug study: all people with lupus; sample might be 2,000 people from around the world

    • Election polling: population = all registered voters; sample might be ~1,000 respondents per survey

    • Why sampling is used: practical constraints make census infeasible; sampling aims to infer population characteristics from a representative subset

  • Anecdotal evidence vs evidence-based conclusions

    • Anecdotal evidence:

    • Based on limited, non-representative examples (e.g., uncle who smoked for decades without health issues)

    • Historically used in early smoking research but later found to be insufficient for general conclusions

    • Evidence-based conclusions:

    • Based on large, representative samples and systematic analysis showing consistent patterns (e.g., smoking linked to lung cancer and heart disease)

  • Practical implications for biostatistics and research design

    • Distinguish between descriptive statistics (summarizing data) and inferential statistics (drawing conclusions about populations from samples)

    • Understand the limitations of observational studies and the value of randomized experiments for causal inference

    • Recognize the importance of data quality, sample representativeness, and potential biases (e.g., nonresponse, selection bias)

    • Use of standardization and z-scores for grade adjustments when necessary; understanding how adjustments affect interpretation of results

  • Quick reference concepts to memorize for quizzes/exams

    • Proportion and percentage definitions:

    • Proportion: p = \frac{\text{number of successes}}{\text{total}}

    • Percentage: \text{Percentage} = p \times 100\%

    • Example from the case study:

    • Treatment success proportion: p_t = \frac{19}{27} = 0.70 \; (70\%)

    • Control success proportion: p_c = \frac{5}{26} \approx 0.1923 \; (\approx 19\%)

    • Observed difference: \Delta p = pt - pc \approx 0.70 - 0.1923 = 0.5077 \approx 0.51 \; (51\text{ percentage points})

    • Probability and random variation intuition (e.g., coin flip analogy) to understand why observed differences may reflect true effects or random fluctuation

  • Notable formulas to remember for later use

    • Z-score (conceptual form; used for grading adjustments): z = \frac{X - \mu}{\sigma} (where X is an observed score, μ is the mean, σ is the standard deviation)

    • Basic proportion and percentage as shown above

  • Summary takeaways

    • The course blends lectures, online materials, and recorded videos; attendance is encouraged for engagement and better performance

    • OpenIntro Statistics is the core textbook; free PDF available and affordable paperback options; supportive online labs for R and R Studio

    • A three-exam in-class format with a final exam; assignments due on a two-week cycle; cheat sheet use is allowed as a study aid

    • Core concepts introduced include descriptive vs inferential statistics, variability, probability, and the logic of sampling, observation, and experimentation

    • Data types and variable classification are foundational for choosing appropriate analyses

    • Real-world context and ethical considerations are integrated (e.g., avoiding data fabrication, respecting participants, and recognizing limitations of evidence)

  • Connections to prior and future topics

    • This first lecture lays the groundwork for exploratory data analysis, later moving toward inference, probability, and sampling distributions

    • Subsequent lectures will cover exploratory analysis through to inference, building on the data basics established here

  • Real-world relevance and applications

    • Statistics informs medical decision-making, public health policy, market research, sports analytics, and many other fields

    • Understanding statistical literacy helps evaluate claims in news, research articles, and policy debates

  • Ethical and practical implications

    • Emphasizes the importance of proper study design, avoiding inappropriate generalizations, and recognizing biases in data collection

    • Highlights the role of transparency (sharing data, methods) and the limits of extrapolating from small or non-representative samples

  • Quick tips for study preparation

    • Review the difference between population and sample and be able to identify the population for a given study

    • Practice identifying variable types from sample survey questions and data examples

    • Work through the case study proportions and practice converting to percentages and percentage point differences

    • Familiarize yourself with the OpenIntro OpenBook strategy and R/Lab resources prior to hands-on weeks

  • Next steps in course progression

    • Expect to begin using R and RStudio after week 1; complete Intro to R labs on OpenIntro before deep statistical analyses

    • Prepare for the first in-class exam and the associated cheat sheet; plan assignments and quizzes on the two-week timeline

    • Follow the schedule: Lecture 1 today (Introduction to Data); Lecture 2 on September 8 (Summarizing and Reading assignments); and continue with planned exam dates and breaks