Stat 211: Elementary Inferential Statistics - Study Notes

Stat 211: Elementary Inferential Statistics

Unit 1: Data and Collection

Identifying Data
  • Date: Introduction to Statistics

Definitions: Statistics and Statistic
  • Statistics (plural): The science of learning from data. It involves collecting, organizing, analyzing, and interpreting information to understand patterns, make decisions, and draw conclusions in the presence of uncertainty.

  • Statistic (singular): A numerical summary that describes a characteristic of a SAMPLE.

Population vs Sample
  • Population: The entire group of individuals or cases that we are interested in studying or drawing conclusions about.

  • Sample: A subset of the population that is actually observed or collected for analysis. It is used to estimate information about the population.

  • Parameters & Statistics:

    • Populations have parameters: numerical measurements describing characteristics of a population (usually unknown, denoted by Greek letters).

    • Samples have statistics: numerical measurements describing characteristics of a sample (calculated to estimate parameters).

Types of Statistics
  • A statistic is a number that summarizes data from a sample and serves various roles:

    • Descriptive: Describes what was observed.

    • Summary: Condenses raw data into a meaningful value.

    • Explanatory: Helps explore relationships between variables.

    • Predictive: Used to forecast outcomes.

Statistical Analysis
  • Descriptive Statistics: Summarizing and organizing data.

  • Inferential Statistics: Making predictions or decisions about a population based on a sample.

The Process of a Statistical Study
  1. Identify Goals

  2. Draw from Population

  3. Sample Population

  4. Draw Conclusions

  5. Summarize Sample Statistics

Applications of Statistics

Data Collection Examples

  • Application: Facebook

    • Data Collection: Gathers user behavior (likes, shares, clicks, time on posts).

    • Descriptive Statistics: Summarizes activity (average time spent, most liked posts, user demographics).

    • Inferential Statistics: Makes predictions (What posts will you like next? Who might you know?).

  • Application: Texting while Driving

    • Study Details: 40 drivers divided into three groups: sober, drunk, and texting.

    • Measured: Reaction times in simulated emergencies.

    • Findings: Reaction times of texting drivers were significantly slower than both sober and drunk drivers.

  • Application: Driving While Black

    • Study Details: 2,533 traffic stops in Cincinnati, OH.

    • Focus: Investigated disproportionate stops and searches by race.

    • Discussion Point: Raises the question: Can these outcomes be explained by random variation alone?

Data Structures

  • Organization: Tables, matrices, and data frames keep data tidy and accessible.

  • Efficiency: Enables faster searching, sorting, grouping, and computation.

  • Interpretation: Makes it easier to summarize, graph, and derive insights.

  • Analysis-Ready: Most tools require data in specific formats (e.g., rows = observations, columns = variables).

Datasets, Variables, and Cases

  • Cases: The units or subjects being studied.

  • Variables: The characteristics or attributes measured on each case.

Variable Types
  • Categorical (Qualitative): Places values into groups or categories.

  • Quantitative: Numerical values where magnitude matters.

  • Identifier: Unique labels for each case (not used in analysis).

Categorical Variables (Qualitative)
  • Nominal: No natural order (e.g., types of fruit).

  • Ordinal: Categories with a meaningful order (e.g., education level).

Quantitative Variables
  • Discrete: Countable values (e.g., number of students).

  • Continuous: Any value in an interval (e.g., weight).

Quantitative Measurement Types
  • Interval: Equal spacing, but no true zero (e.g., temperature in Celsius).

  • Ratio: Equal spacing with a true zero (e.g., height).

Practice: Classify These Variables

  1. Age: Quantitative (Continuous)

  2. Musical Genres: Categorical (Nominal)

  3. Price of Computer: Quantitative (Continuous)

  4. Marital Status: Categorical (Nominal)

  5. Number of Pixels Displayed: Quantitative (Discrete)

  6. Time Needed to Complete Exam: Quantitative (Continuous)

Sampling Techniques

Sampling Overview

  • Goal: Learn about the entire group of individuals called the population.

  • Problem: It is usually impossible to collect data on the entire population (called a census).

  • Compromise: Collect data on a smaller group of individuals (called a sample) selected from the population.

  • Challenge: Obtain a sample that is perfectly representative of the population while avoiding Bias.

  • Bias: The over- or under-emphasizing of some characteristic that is pertinent to the study.

Census
  • A census is when data is collected for the entire population.

  • Reasons for Not Conducting a Census:

    • Difficult to complete.

    • Hard/expensive to locate everyone.

    • Impractical in manufacturing.

    • Populations change over time.

    • Opinions may change.

    • The U.S. Census: Conducted every 10 years as required by the U.S. Constitution. Used for congressional representation, federal funding, research, and planning.

Randomization
  • Definition: Randomizing protects us from the influences we know and don't know are in the data.

  • Key Idea: On average, a randomized sample will look like the population.

Sample Size
  • Sample Size Definition: The number of individuals in the sample.

  • Guideline: A few hundred may be enough for a proportion; in general, the bigger the better.

  • Incomplete Sampling Frame: Not all individuals in the population are included in the list from which the sample is taken.

Sampling Methods

  1. Simple Random Sampling (SRS): - Choosing a subset of a population where each member has an equal chance of being selected.

    • Example: Dining Hall - Use Excel and the function = rand() to assign random values, then sort.

  2. Stratified Random Sampling: - Sampling by dividing the population into homogenous groups called strata and selecting proportionate amounts from each group.

    • Benefits: More precise estimates, less variability, detect differences among groups.

    • Simpson's Paradox: A trend appears in separate groups but disappears or reverses when groups are combined.

  3. Cluster Sampling: - Divide the population into clusters that are mutually homogenous yet internally heterogeneous, then sample whole clusters at random.

    • Benefits: Less time and cost, natural groupings (e.g., dorms, classes, majors), easier administration (e.g., professors can mandate responses).

    • Analogy: Stratified Sampling = takes a bite from each layer of a pie; Cluster Sampling = takes a vertical slice through the whole cake.

  4. Systematic Sampling: - Choose every nth person after selecting a random starting point (the order must not be related to the outcome).

    • Example: Select students in a dorm systematically.

  5. Multistage Sampling: - Sampling schemes that combine several methods are called multistage samples.

Visualization of Sampling Methods

  • SRS: Randomly selected sample.

  • Stratified: Grouped by strata attributes.

  • Systematic: Selected at intervals.

  • Cluster: Whole groups sampled.

  • Multistage: Combination of methods.

Practice Quiz: Name that Sampling Method
  1. Pick every 10th passenger on a flight → Systematic

  2. Randomly choose 5 from first class, 25 from coach → Stratified

  3. Randomly generate 30 seat numbers → Simple Random Sampling

  4. Survey everyone sitting in window seats → Cluster Sampling

Bad Sampling Techniques

  • Voluntary Response Sampling: Individuals choose to participate, prone to bias.

    • Examples: "Tell Us What You Think" website.

  • Convenience Sampling: Drawn from those easiest to reach, prone to bias.

    • Example: Stopping people outside a dining hall.

Bias in Statistical Studies

What is Bias?
  • Definition: Bias is the degree to which a procedure systematically over- or under-estimates a population value.

  • A procedure is unbiased if it produces the true population parameter on average.

Major Categories of Bias
  1. Selection Bias: Affects the sample selection process.

  2. Response Bias: Affects the responses provided by respondents.

Types of Selection Bias
  • Voluntary Response Bias:

    • Occurs when people choose to participate in a survey, often reflecting only the opinions of those with strong opinions.

    • Example: A survey asking people to call in their support for a new issue.

  • Non-Response Bias:

    • Occurs when people are unwilling to participate and those who do not respond may differ in opinions from those who do.

  • Sampling Bias:

    • Some individuals are more likely to be selected than others.

    • Example: Favoring a specific group unintentionally.

Types of Response Bias
  • Social Acceptability Bias:

    • People may give answers they think are more socially acceptable.

    • Example: Over-reporting favorable behaviors like recycling.

  • Leading Question Bias:

    • The wording of the question suggests a preferred response.

    • Example: "Don’t you agree that our policy is beneficial?"

  • Acquiescence Bias:

    • The tendency to agree with statements, regardless of true beliefs.

    • Example: Likely in surveys that use scales (e.g., Strongly Agree to Strongly Disagree).

  • Self-Interest Bias:

    • Arises when individuals or organizations have a self-interest in the outcome, which can influence both the study and how the results are analyzed.

Key Takeaway for Types of Bias

Type of Bias

Description

Voluntary Response

Only people with strong opinions tend to participate

Non-Response

Those who don't respond may differ meaningfully from responders

Selection Bias

Certain groups are underrepresented in the way the sample is drawn

Social Acceptability

People give socially acceptable answers

Leading Question

Wording nudges toward a preferred answer

Acquiescence

Tendency to agree regardless of true belief

Self-Interest

Result influenced by parties with something to gain

Practice: Identify the Type of Bias
  1. Local business owner asks residents to call a hotline to show support for a new stadium: Type of Bias: Voluntary Response Bias

  2. Police chief sends uniformed officers to ask if residents think the police are doing a bad job: Type of Bias: Response Bias (Social Acceptability Bias)

  3. Candidate's campaign website claims only 11% support a rival's policy: Type of Bias: Self-Interest Bias

  4. Bank mails 8,000 surveys; only 500 are returned: Type of Bias: Non-Response Bias

  5. Fitness center asks, “Why do YOU love our new 24-hour access policy?”: Type of Bias: Leading Question Bias