Stats

What is Statistics?

  • Definition: Statistics is the science of

    • Collecting, organizing, analyzing, and interpreting data to make decisions or conclusions

    • It's not merely numbers but about what those numbers mean

Understanding Data

  • Data is a plural term; singular is datum

  • Definition: Data refers to collections of facts or information, such as

    • Profit margins of a company

    • Election voting statistics

Descriptive vs. Inferential Statistics

  • Descriptive Statistics:

    • Summarizes and visualizes observations using graphs, tables, and numerical summaries

    • Aim: Clarity

  • Inferential Statistics:

    • Generalizes from sample data to a broader population and quantifies uncertainty

    • Aim: Justification of conclusions

Key Statistical Terms

  • Population: The entire set of individuals or items being studied

    • Example: All Baylor first-year students this fall

  • Sample: A subset of the population that is observed

  • Parameter: An unknown number describing the population, e.g., true average battery life

  • Statistic: A number computed from a sample used to learn about the parameter

Application of Statistics: Case Study

  • Example of a clinic testing a flu prevention program:

    • 18% from the usual care group got the flu compared to 12% from the new program group

    • Descriptively, the new program appears better; inferentially, investigate if the difference could be due to chance

Steps in Using Statistics

  • Start with a clear question, identify observational units

  • Decide how to collect data: survey, experiment, or database

  • Use descriptive statistics to visualize data

  • Make a reasoned argument based on inference

  • Communicate findings in context

Understanding Populations and Samples

  • Population vs. Target Population vs. Accessible Population:

    • Target population: Group cared about in research, e.g., all incoming first-year students

    • Accessible population: Portion of the target population realistically observable

    • Mismatches can lead to biased results

The Role of Samples

  • A sample must represent the population well for the statistic computed to reflect the parameter accurately

  • Type of sampling:

    • Census: Attempts to measure every unit, rare and complex

    • Sampling is more practical and enables learning from large populations

Parameters and Statistics

  • Parameter: Numerical summary describing the population, examples include mean income, support proportions, standard deviations

    • Parameters are fixed and usually unknown

  • Statistic: Numerical summary computed from a sample, can vary from sample to sample

Sampling Error

  • Sampling error reflects natural variability when observing a subset of the population

  • Major goals of statistics include measuring and communicating uncertainty

Simulation of Sample Means

  • Example where the true mean of population is 10, standard deviation is 2:

    • Distribution of sample means shows variation decreases with larger sample sizes (n=20, n=50, n=200)

Case and Variable Definitions

  • Case: Rows in data representing observational units

  • Variable: Columns describing characteristics recorded about each case

    • Variables are essential for clarity in data understanding

Types of Variables

  • Categorical Variables:

    • Assign cases to groups or categories, e.g., blood types

    • Subtypes:

    • Nominal (no ordering, e.g., eye color)

    • Ordinal (meaningful order, e.g., survey responses)

    • Binary (two categories)

  • Quantitative Variables:

    • Numerical values where arithmetic operations are meaningful, e.g., height, weight

    • Subtypes:

    • Discrete (counting values)

    • Continuous (measuring values)

Importance of Measurement Scales

  • Nominal, Ordinal, Interval, Ratio Scales:

    • Describe comparisons and operations meaningful for a variable

    • E.g., temperature on scales shows differences are meaningful but ratios may not be on the interval scale

Importance of Data Context

  • Identifiers and date/time variables must be clearly defined to avoid analytical mistakes

  • Identifier variables (e.g., ID numbers) should not be treated as quantities

The Research Process

  • Not just about running tests; it’s a disciplined iterative process

  • Clear articulation of the research question is essential

Workflow of Research Design

  • Step 1: Specify the problem clearly

  • Step 2: Design the study (observational or experimental)

  • Step 3: Collect quality data

  • Step 4: Perform exploratory data analysis

  • Step 5: Model and infer

  • Step 6: Quantify uncertainty

  • Step 7: Communicate findings

  • Step 8: Ensure reproducibility

Conclusion

  • The process is rarely linear; iterative study design and quality judgment are key to sound statistical research and conclusions.