Stats 240 Lecture Notes: Error, Bias, and the Research Process

Course context and teaching team

  • 02:40 Stats 240 course on design of and structured data; part of a three-lecturer teaching team: Andrew Spall, Thomas Lumley, Beatrix Jones.
  • Thomas Lumley: top biostatistician, key figure behind R software, highly respected; Beatrix Jones: excellent statistician; Andrew focuses on practical interpretation over heavy Greek formulas.
  • No class reps in this course yet; intention to appoint a class rep next week to connect students with the department and lecturers; class reps seen as a conduit between students and lecturers, with access to staff and a venue for feedback.
  • Course origin: 02:40 shifted from a 300-level course to a 200-level course (content reduced); this shift motivates strong feedback loops to ensure the course still fits students’ needs.
  • Emphasis on feedback: feedback channels include lecturers and the class rep; meetings (catered, with lunch) for undergraduate class reps to discuss course updates and issues.

Course structure and logistics

  • Canvas as the course’s “bible”: central hub for materials, updates, and communications; slides posted by lecturers; note pages available as PDFs.
  • Team dynamics and schedule:
    • Three hours of contact per week: 2-hour lecture on Monday + 1-hour lecture on Wednesday.
    • Course split into three parts across the semester:
    • Parts 1–2 taught by Andrew (first half of the semester).
    • Part 3 taught by Beatrix (after mid-semester break).
    • Thomas Lumley covers the middle portion (the next four weeks after Andrew’s initial weeks).
  • Content focus by lectureria:
    • Andrew: understanding how research is done and where errors can arise; emphasis on interpreting results and spotting non-routine Greek-formula work.
    • Thomas Lumley: statistical analyses, design and analyses related to survey sampling; noted as a leading figure in survey statistics.
    • Beatrix Jones: experimental design.
  • Course level and language: higher English literacy than Stage 1; heavy reading load; no formal course book; emphasis on printed/readings and writing-based assessment.
  • Assessment structure:
    • Five assignments worth 4 marks each (spread across the term).
    • Quizzes: 15% of the grade; multiple attempts allowed until correct; intended as learning feedback rather than a punitive test.
    • Midterm: cumulative on Andrew’s and Thomas’s material (first half of the course).
    • Final exam: 50% of the grade; must pass to pass the course.
    • A timetable with exact dates and weights is available on Canvas.
  • Reading approach:
    • Readings are essential; a reading guide is provided to identify key themes rather than rote memorization of minutiae.
    • Lectures are designed to lead into readings; readings inform assignments, tests, and the final exam.

Focus of the course and key themes

  • Core objective: understand how research is structured, where errors can arise, and how to evaluate data quality in real-world contexts.
  • Central theme for Andrew’s six lectures: what can go wrong in research; developing a well-honed ability to assess data quality and detect unreliable work (i.e., a “bullshit detector”).
  • Important distinction: the course aims to build foundational knowledge even if some readings are revisited multiple times.
  • Real-world relevance: skills for decision makers (policymakers, business leaders) to assess the quality of information; not just how to perform calculations but how results were produced.
  • Career relevance: surveys and experiments still dominate in many settings; AI and big data amplify the need for data-quality literacy rather than replacing traditional survey/experimental skills.
  • Examples and anecdotes used to illustrate points: polling mistakes, real-world data collection failures, and the importance of design and framing.

Practical class activity: class survey demonstration

  • Three volunteers selected to act as enumerators (counting survey data in class).
  • Task: determine which faculties students are enrolled in by counting responses:
    • Faculties listed: Science (Stats), Arts, Engineering, Business, Education, Medicine & Health Sciences, Creative Arts & Industries (Dance, etc.), Law, etc.
  • Process:
    • Enumerators record responses on paper for left-hand side counts of faculty categories.
    • Class participants indicate their faculty affiliation by raising hands.
  • Purpose: illustrate the data-collection process and the potential sources of error (e.g., non-response, misclassification, miscounts) that can arise in a real survey.
  • Lesson: the counts from this exercise can be used as an assignment question to analyze the data-collection process and identify potential issues.

Science of knowledge production: two main pathways

  • Surveys: structured observation used to gather information about a population; data collection via questionnaires or structured observation.
  • Experiments: active manipulation of variables to observe effects; not just observing but intervening and testing cause-and-effect.
  • Implication: both approaches produce knowledge, but each has distinct error sources and design considerations; this course emphasizes evaluating the quality of information from both paths.

Relevance to modern data practice and ethics

  • AI and data science: data quality remains crucial; modern tools can amplify or propagate biases if built on flawed data.
  • Official statistics and the public sector: government data (e.g., Stats NZ, census considerations) and the rise of administrative data fusion in policy (e.g., social investment approach).
  • Global context: EU increasing research budgets; NZ considering shifts away from census data toward administrative data; debates about data representativeness and measurement.
  • Practical career paths for statisticians and analysts: from data analysts to senior analysts and research fellows; the importance of a master’s degree and specialized training in official statistics and applied stats.
  • Cross-disciplinary work: statisticians often collaborate with product teams, managers, and data professionals in multidisciplinary environments.

Key statistical concepts introduced in this course

  • The research process as a cycle with multiple decision points and potential errors at each step.
  • Non-sampling vs sampling errors:
    • Sampling error (random error): arises from observing a sample rather than the entire population; decreases with larger sample size.
    • Non-sampling error (bias and other systematic errors): arises from design, measurement, processing, and population frame issues; often biased and not reduced simply by increasing sample size.
  • The concept of total survey error: the combination of sampling error and non-sampling errors that together determine the accuracy of a survey estimate.
  • Bias (systematic error): a non-random, directional deviation that pulls results away from the truth in a consistent direction; can be caused by many sources and can accumulate across steps of the research process.
  • Random error (sampling error): variability due to sampling; can be reduced by larger samples and can be quantified with standard errors.
  • Accuracy vs precision (diagrammatic concept):
    • Accuracy: closeness of the measured value to the true value.
    • Precision: closeness of repeated measurements to one another.
    • Ideally, measurements should be both accurate and precise; in practice, there is often a trade-off.
  • The three big non-sampling error categories:
    • Coverage errors: arising when the sampling frame does not cover the target population well (selection bias).
    • Measurement errors (information bias): errors in how data are collected or recorded (question wording, interviewer effects, respondent misunderstandings).
    • Processing errors and combined errors: mistakes in data handling or combining data sources; especially relevant in epidemiology (numerator/denominator bias).
  • Roles in design and conduct to prevent bias:
    • Involve statisticians early in survey design; intuitive prevention is much more effective than post-hoc corrections.
    • Pretesting and piloting: test measurement instruments and data-collection processes before full rollout.
    • Training and supervision, robust data processing protocols, and quality control.
    • If bias is present, it should be identified, its direction and potential magnitude appraised, and discussed in reporting.
  • Conceptual framework for evaluating data quality:
    • Theory drives research questions, populations, data sources, variables, analysis, and reporting; theory shapes what is included and how results are interpreted.
    • Trade-offs in resources (time, data sources, money) impact quality; increasing precision/accuracy often requires more time and larger samples, with diminishing returns.
  • Population concepts and sampling frames (diagrammatic thinking supported by text):
    • Target population: the full group about which we want to learn.
    • Frame population: the population from which the sampling frame is drawn (often a practical subset).
    • Sampling frame: a list or mechanism from which the sample is drawn; ideally matches the target population.
    • Sampled population: the group actually selected for participation.
    • Respondents: those within the sampled population who actually respond and meet eligibility criteria.
    • Ineligible vs not included: ineligible refers to people who do not meet the eligibility criteria; not included refers to those who are not part of the target population or who were not reached.
  • Visualizing the relationships: target population vs frame vs sample vs respondents (with mismatches causing bias).
  • Eligibility strategies:
    • Include questions in the survey to screen for eligibility to ensure respondents belong to the target population.
    • If respondents are not eligible, thank them and do not continue with questions.

Illustrative historical examples to contextualize error and bias

  • 1930s Chicago Tribune polling (Dewey vs. Truman):
    • Telephone survey favored wealthier respondents who had telephones; sampling frame biased toward a subset of the population.
    • Result: Dewey predicted to win, but Truman won in reality—illustrates how frame and sampling bias can lead to incorrect conclusions.
  • Space Shuttle Challenger O-ring failure:
    • Discussion of non-random, design-related errors; failing to account for joint temperature effects led to catastrophic failure.
    • Example used to illustrate how design and testing conditions can introduce non-random error if the right questions are not asked and the data are not properly analyzed.
  • Live TV recount and public discourse on election results:
    • Demonstrates how misinterpretation of data and poor statistical reasoning (e.g., misreading polling information) can derail credible discourse.
  • “Garbage in, garbage out” (GAIBO):
    • Emphasizes that data quality determines the reliability of analysis; poor input yields poor output, especially in AI and automated systems.

Ethical and practical implications

  • Responsible data collection and reporting: the importance of transparency about data sources, sampling frames, and potential biases.
  • Data quality in public policy: decisions rely on official statistics and surveys; the shift toward administrative data makes assessing representativeness and validity even more critical.
  • Equity considerations: historical missteps (e.g., numerator/denominator bias in Maori health data) highlight the need for careful measurement and inclusive data practices.
  • Career ethics and professional practice: statisticians must balance design quality, resource constraints, and the dissemination of accurate interpretations.

Practical guidance for exam preparation and study tips

  • Core takeaways to remember:
    • Error vs bias: bias is a non-random, directional bias; random error is sampling-based variability.
    • Total survey error combines sampling and non-sampling errors; neither type can be ignored.
    • Early involvement of statisticians in survey design greatly reduces bias and improves data quality.
    • Pretesting, piloting, and rigorous data-management practices are essential.
    • Theory informs every stage of research planning, data collection, and reporting.
  • Key formulas to recall (conceptual, not computation-heavy):
    • Let the true value be \mu and observed values be X_i for i = 1,…,n.
    • Sample mean: Xˉ=1n<em>i=1nX</em>i\bar{X} = \frac{1}{n}\sum<em>{i=1}^n X</em>i
    • Bias: Bias(Xˉ)=E[Xˉ]μ\text{Bias}(\bar{X}) = E[\bar{X}] - \mu
    • Variance of the sample mean: Var(Xˉ)=σ2n\mathrm{Var}(\bar{X}) = \frac{\sigma^2}{n}
    • Standard error: SE(Xˉ)=σn\mathrm{SE}(\bar{X}) = \frac{\sigma}{\sqrt{n}}
    • Accuracy vs precision intuition: a diagrammatic balance between closeness to truth (accuracy) and repeatability (precision).
  • Study strategy:
    • Do the readings; use the reading guide to extract key themes.
    • Focus on understanding where errors can originate rather than memorizing details.
    • Prepare for the final exam by ensuring you can discuss how design choices affect bias and error, not just compute statistics.
    • Use real-world examples from the lecture (e.g., sampling frames, eligibility, pretesting) as anchors for concepts.

Contact, resources, and next steps

  • Lecturer contact: Andrew can be reached by email; he travels and works internationally; his notes and slides are posted on Canvas.
  • Suggested further reading and resources: polling guides and different case studies available on the instructor’s site; these materials complement the course readings and provide practical illustrations.
  • Breaks and logistics: there is a ten-minute break between lectures; a lost-property note and other classroom logistics were mentioned during the session.
  • Preview of next lecture (Lecture 2): distinction between error and bias; emphasis on the role of theory; readings will be available via Canvas; focus on how theory informs data collection and analysis.

Summary in a single line

  • This lecture lays the groundwork for Stats 240 by outlining the course structure, the distinction between survey and experimental data, the central importance of data quality and bias, the practicalities of data collection (including a live class polling activity), and the theoretical framework that underpins how to evaluate and design robust research.