01-Introduction(1) Biometry

Introduction to Statistics

Learning Outcomes

  • Differentiate between Populations and Samples

  • Identify types of data:

    • Qualitative/Categorical: Non-numeric categories

    • Quantitative/Numeric: Numeric measurements

  • Determine experimental bias and confounding

What is Statistics?

  • Definition: Statistics is the Science of Data that focuses on collecting, analyzing, interpreting, presenting, and organizing data.

  • Focus Areas:

    • Extracting meaningful information from data

    • Managing and dealing with uncertainty in data analysis

    • Answering questions using limited and potentially unreliable information

Study Design

  • Statistics begins before data collection, emphasizing the importance of proper study design.

  • Methods of Data Collection:

    • Experiments: Deliberately generating data to answer specific research questions through controlled conditions.

    • Observations: Monitoring and recording data from the natural environment without manipulation.

    • Surveys: Collecting responses from individuals through questionnaires or interviews.

Importance of Statistics in Research

  • Crucial during:

    • Formulating the research question

    • Designing the study or experiment effectively

Formulating a Research Question

  • Criteria:

    • Must be relevant to the field of study, clear and specific, and answerable through empirical methods.

  • Examples of Bad Research Questions:

    • "How long is a piece of string?" (Vague and not measurable)

    • "What is the quality of this wine?" (Subjective and lacks clarity)

  • Examples of Good Research Questions:

    • "What is the speed of light in a vacuum?" (Clear and measurable)

    • "What is the sugar content per unit mass of juice?" (Precise and quantifiable)

    • "Does lichen coverage of trees vary by aspect?" (Specific and measurable)

Populations and Samples

  • Research questions typically refer to properties of a population.

  • Population: The entire set of individuals or items of interest to a researcher.

    • Example: Sugar content pertains to a specific harvest or variety of plants.

    • Example: Tree lichen refers to a specific type of tree or all trees within a designated area.

  • Clear definition of the population is essential for researchers to ensure results are applicable.

Understanding Samples

  • Population Size: Populations may be large or theoretical, making sampling necessary.

  • Working with Samples: Generally focuses on a fraction of the total population to draw conclusions about it.

  • Estimation and Inference: Using sample data to make generalizations or predictions about the overall population.

Types of Data

  • Quantitative/Numeric Data:

    • Involves continuous or numerical measurements.

    • Examples: Heights of trees, percent sugar content, or population counts.

    • Can include discrete values (e.g., counting the number of cars in a parking lot).

  • Qualitative/Categorical Data:

    • Consists of fixed-level data types that classify individuals into categories.

    • Examples: Gender, left-handedness, or hair color.

Characteristics of Data Types

  • Quantitative Data:

    • Suitable for a variety of mathematical operations, including addition and subtraction.

  • Qualitative Data:

    • Summarized by categorizing observations and counting the frequency of each category.

Bias, Variation, and Confounding

  • A sample should ideally be representative of the population from which it is drawn.

  • Common Issues:

    • Bias: Affect collection methods and results quality.

      • Examples: Studying the speed of rabbits by chasing them can lead to only catching slower rabbits.

      • Sampling cholesterol levels solely from a fast-food parking lot can yield skewed results.

    • Variation: Most data exhibits some form of variation, which is inherent in natural processes.

      • Includes natural variation (like differences in heights of trees) and measurement variation (errors occurring during data collection).

    • Reducing variation is important, but complete elimination is often impossible.

Sources of Variation

  • Measurement constitutes multiple components that must be accounted for:

    • Population average (e.g., average heights of different tree species).

    • Natural variation arising from genetics or environmental factors.

    • Temporal variations depending on the life cycle stage of the organism.

    • Measurement error, leading to different results with repeated measures of the same quantity.

Understanding Confounding

  • Confounding occurs when the effects of one variable are mixed with the effects of another, complicating analysis.

    • Example: Studying lichen growth on various sides of trees may be affected by variables such as geographic location if different campuses are involved.

    • Careful study design helps to mitigate confounding issues and improve the accuracy of results.

Experimental Design Principles

  • Bias, variation, and confounding can be minimized through:

    • Random Sampling: Ensures that every individual has an equal chance of being selected, improving sample accuracy.

    • Blocking: Involves organizing the data collection process to reduce variability within treatment groups.

Statistical Methods

  • Statistics involves:

    • The collection, description, display, and analysis of data.

  • Main focuses:

    • Estimating population parameters based on sample data.

    • Making inferential conclusions about the population from sample analysis.

  • Estimation: Involves point estimates (single value) and interval estimates (range of values).

  • Inference: Relates to hypothesis testing, which assesses claims based on data.

Statistics and Mathematics

  • Though grounded in mathematics, statistics primarily centers on data analysis, requiring techniques similar to laboratory methodologies.

Statistical Software

  • Most statistical tasks now utilize computational tools.

  • Common Software Packages:

    • Excel: Useful for basic statistical analysis through the Analysis Toolpak.

    • SPSS, Minitab, SAS: Comprehensive commercial software for extensive statistical functionalities.

    • R: An open-source statistical programming language widely adopted in both academic and professional settings.

Course Content Overview

  • Topics will encompass:

    • Descriptive statistics and graphics

    • Probability theory

    • Single-sample estimation and inference techniques

    • Two-sample estimation and inference

    • Multiple sample inference approaches

    • Inference concerning counts

    • Exploring relationships between two measurements

robot