Statistics and Data Analysis

Audience Engagement

  • Queries posed to students:

    • Who wants to learn R?

    • Who wants to become a data scientist?

    • Who wants to learn statistics?

  • Affiliation: UCONN Department of Statistics

  • Social Media Handle: @statsystem

The Current Context

  • Increased importance of statistics and data analytics:

    • Surrounding issues with "fake news" and "alternative facts"

    • Example: Claims and statistics surrounding hunger among American children

    • Expressing the necessity for statistical literacy in modern society

Examining Statistical Credibility

  • Example of Misleading Data:

    • One in four American children under age 12 at risk of hunger

    • Source: Food Research and Action Center

    • Based on the following questions:

      • “Do you ever eat less than you feel you should?”

      • “Did you ever rely on limited numbers of foods to feed your children because you were running out of money to buy food for a meal?”

  • Example of Misleading Advertising:

    • Kellogg’s Frosted Mini Wheats advertisement:

    • Claims: “clinically shown to improve kid’s attentiveness by nearly 20%”

    • Reality: Only half of the kids displayed any improvement in attentiveness; compared to kids who had water for breakfast

  • Health Insurance Example:

    • Prior to the Health Care Reform Act: 30% of employers predicted they'd “definitely” or “probably” stop health coverage

      • Based on leading survey questions to private-sector employers (sample size: 1,329)

Statistical Thinking

  • Definition (American Society of Quality - ASQ):

    • A philosophy of learning and action combining process thinking with statistics

  • Fundamental Principles of Statistical Thinking:

    • All work occurs in a system of interconnected processes

    • Variation exists in all processes

    • Understanding and reducing variation is the key to success

  • Involves applying rational thought and statistics to critically assess data and inferences

  • Connection to Lean Six-Sigma discussed

Overview of Statistics

  • Definition:

    • The science of data

  • Key Activities in Statistics:

    • Collecting: Surveys, Randomized Experiments, etc.

    • Classifying: Distinguishing between Quantitative and Qualitative variables

    • Organizing: Using Tables and Databases

    • Summarizing: Using measures like Means, Medians, Standard Deviations

    • Analyzing: Methods like Hypothesis Tests, Confidence Intervals, Regression

    • Interpreting: Drawing Conclusions and making Decisions

Types of Statistical Methods

  • Descriptive Statistics:

    • Uses numerical and graphical methods to explore, summarize, and present data

  • Inferential Statistics:

    • Involves drawing conclusions about populations from samples using descriptive statistics data

Distinction Between Descriptive and Inferential Statistics

  • Descriptive Statistics:

    • Converts raw data into meaningful summaries and visuals

  • Inferential Statistics:

    • Involves making conclusions beyond the data available in a sample

    • Example: Inferring support for a political candidate based on a sample

Key Definitions

  • Population: The entire set of units or objects of interest

  • Sample: A subset of the population that researchers have access to

  • Variable: An attribute of a unit or object, e.g., a voter's choice for president

  • Parameter: A summary measure calculated from a population, e.g., the proportion of voters for a candidate

  • Statistic: A summary measure calculated from a sample, e.g., the proportion of voters supporting a candidate in a sample

  • Symbol Usage:

    • Parameters are generally represented using Greek letters

    • Statistics are typically represented using Latin letters

Practical Examples and Inferences

  • Example of Hypothesis Testing:

    • Study of FOX viewers' average age

    • Hypothesis tested: Is the average age of FOX viewers greater than 60?

    • Components:

      • Population: All FOX viewers

      • Variable of Interest: Age (years) of each viewer

      • Sample: 200 selected FOX viewers

      • Inference: Estimate average age and determine if it exceeds 60 years

Types of Data

  • Data Sources:

    • Published data (books, journals, websites)

    • Survey data

    • Designed experimental data

    • Observational study data

  • Quantitative Data:

    • Recorded on a numerical scale, e.g., unemployment rates

    • Example: Number of male and female executives in a sample

  • Qualitative Data:

    • Cannot be quantified; classified into categories

    • Example: Political affiliation of a sample of CEOs

  • Ordinal Data:

    • A subcategory of qualitative data with a hierarchy, e.g., low to high

Case Study Variables Classification

  • Study Example:

    • U.S. Army Corps of Engineers study on fish in the Tennessee River

  • Variables Measured:

    1. River/Creeek (Qualitative)

    2. Species (Qualitative)

    3. Length (Quantitative)

    4. Weight (Quantitative)

    5. DDT Concentration (Quantitative)

  • Measure Classifications:

    • Measurement variables are typically quantitative

    • Classification variables are typically qualitative

Summary

  • Conclusion of Part I of Chapter 1, covering a foundational understanding of statistics, types of statistical methods, and the importance of statistical literacy in the context of modern data challenges.