Biostatistics Notes

Statistics Overview

Learning Objectives

  • Define statistics.

  • Describe the range of applications of statistics.

  • Identify situations in which statistics can be misleading.

  • Distinguish between different types of data.

  • Describe and graph categorical data.

Examples Illustrating Statistical Interpretation

  • Example 1: A new advertisement for Yangs ice cream introduced in late May of last year resulted in a 30% increase in ice cream sales for the following three months. The conclusion is that the advertisement was effective.

  • Example 2: 75% more interracial marriages are occurring this year than 25 years ago. The conclusion is that our society accepts interracial marriages.

What is Statistics?

  • Statistics is a field of study concerned with:

    • The collection, organization, summarization, and analysis of data.

    • The drawing of inferences about the body of data when only a part of data is observed.

  • Statistics provides:

    • A way of organizing information on a more formal basis than relying on the exchange of anecdotes and personal experience.

    • Taking variation into account.

What is Biostatistics?

  • Biostatistics is the application of statistics to a wide range of topics in medicine, public health, or biology.

  • It encompasses:

    • The design of experiments, especially in medicine, public health, pharmacy, and agriculture.

    • The collection, summarization, and analysis of data from those experiments.

    • The interpretation of, and inference from, the results.

Applications of Biostatistics

  • Health and Medicines, including epidemiology, health services research, nutrition, environmental health and healthcare policy & management.

  • Design and analysis of clinical trials in medicine.

  • Assessment of severity state of a patient with prognosis of outcome of a disease.

  • How risk factors or characteristics might be related to the development or progression of disease identified?

  • How is the rate of development of new disease estimated?

  • How is the extent of a disease in a group or region affected?

  • How is the effectiveness of new drug determined?

Questions to Consider

  • Which of the following are part(s) of statistics?

    • A. Numerical calculations

    • B. Graphs

    • C. Interpretations and decisions based on the numbers and graphs

  • Higher rates of ice cream consumption and drowning for a city correspond. This leads people to believe that eating ice cream can somehow put you at risk of drowning. What could be another interpretation?

  • You hear in a commercial that 80% of children prefer to eat a certain kind of cereal for breakfast. What can you conclude?

    • This cereal is superior to all others…at least according to kids.

    • You need to know more about where these data came from before making any conclusions.

    • 20% of kids prefer to eat other cereals.

  • What should you take into consideration when evaluating statistical claims?

    • A. The statistics presented

    • B. The sources of the statistical findings

    • C. The procedures used to generate the claims

Descriptive vs. Inferential Statistics

  • Descriptive Statistics

    • Numbers that are used to summarize and describe data.

    • Example: If we are analyzing birth certificates, a descriptive statistic might be the percentage of certificates issued in Thimphu, or the average age of the mother.

    • Descriptive statistics are just descriptive. They do not involve generalizing beyond the data at hand.

  • Inferential Statistics

    • Data from sample used to draw inferences about a population.

    • Tools for generalizing beyond actual observations.

    • Generalize from a sample to a population.

    • Sample will not be identical to the population. So, generalizations will have some error.

Inferential Statistics: Estimation vs. Hypothesis Testing

  • Estimation

    • Generalize from a sample to a population.

    • Sample will not be identical to the population. So, generalizations will have some error.

  • Hypothesis testing

    • Test the hypothesis about the population from which the Sample or Samples are drawn.

Sampling

  • When doing population research, we often need to perform an analysis on a sample of the larger population.

Parameter & Statistics

  • Parameter: A value calculated from a population.

    • Examples: Mean [µ], Standard Deviation [σ]

    • Use notation with Greek letters is called PARAMETER

  • Statistics: The value calculated from a sample.

    • Examples: Mean [x], Standard Deviation [s]

    • Use notation with Roman letters is called a STATISTICS

Inference

  • We perform a statistical analysis upon a sample, from which we infer characteristics of the larger population, which is also called the "reference population."

  • The statistics and epidemiologic approaches we use are affected by the assumptions of the sampling strategies used.

  • Statistics are performed on this representative sample in order to infer properties about the reference population.

Sampling Bias

  • If the sample is not representative of the population, then we have sampling bias, and our inferences will be wrong.

Sampling Example: Telephone Survey in BHUTAN

  • Let’s say you want to run a telephone survey in the BHUTAN to measure the prevalence rate of perceived back pain.

    • Reference population: All adults in the BHUTAN

    • Study/Accessible population: Those with telephones

    • Sampling frame: Purchase a block of listed numbers from a phone company

    • Sample: Those who answer the phone and agree to participate

  • We end up with people with phones with listed numbers who agree to be interviewed representing all adults in BHUTAN. This can introduce BIAS.

More Examples

  • Example #1: You have been hired by the National Election Commission to examine how the Bhutanese people feel about the fairness of the voting procedures in Bhutan. Whom will you ask?

  • A substitute teacher wants to know how students in the class did on their last test. The teacher asks the 10 students sitting in the front row to state their latest test score. He concludes from their report that the class did extremely well.

    • What is the sample? What is the population? Can you identify any problems with choosing the sample in the way that the teacher did?

  • A coach is interested in how many cartwheels the average college freshmen at his university can do. Eight volunteers from the freshman class step forward. After observing their performance, the coach concludes that college freshmen can do an average of 16 cartwheels in a row without stopping.