Statistics and Data Analysis
Audience Engagement
Queries posed to students:
Who wants to learn R?
Who wants to become a data scientist?
Who wants to learn statistics?
Affiliation: UCONN Department of Statistics
Social Media Handle: @statsystem
The Current Context
Increased importance of statistics and data analytics:
Surrounding issues with "fake news" and "alternative facts"
Example: Claims and statistics surrounding hunger among American children
Expressing the necessity for statistical literacy in modern society
Examining Statistical Credibility
Example of Misleading Data:
One in four American children under age 12 at risk of hunger
Source: Food Research and Action Center
Based on the following questions:
“Do you ever eat less than you feel you should?”
“Did you ever rely on limited numbers of foods to feed your children because you were running out of money to buy food for a meal?”
Example of Misleading Advertising:
Kellogg’s Frosted Mini Wheats advertisement:
Claims: “clinically shown to improve kid’s attentiveness by nearly 20%”
Reality: Only half of the kids displayed any improvement in attentiveness; compared to kids who had water for breakfast
Health Insurance Example:
Prior to the Health Care Reform Act: 30% of employers predicted they'd “definitely” or “probably” stop health coverage
Based on leading survey questions to private-sector employers (sample size: 1,329)
Statistical Thinking
Definition (American Society of Quality - ASQ):
A philosophy of learning and action combining process thinking with statistics
Fundamental Principles of Statistical Thinking:
All work occurs in a system of interconnected processes
Variation exists in all processes
Understanding and reducing variation is the key to success
Involves applying rational thought and statistics to critically assess data and inferences
Connection to Lean Six-Sigma discussed
Overview of Statistics
Definition:
The science of data
Key Activities in Statistics:
Collecting: Surveys, Randomized Experiments, etc.
Classifying: Distinguishing between Quantitative and Qualitative variables
Organizing: Using Tables and Databases
Summarizing: Using measures like Means, Medians, Standard Deviations
Analyzing: Methods like Hypothesis Tests, Confidence Intervals, Regression
Interpreting: Drawing Conclusions and making Decisions
Types of Statistical Methods
Descriptive Statistics:
Uses numerical and graphical methods to explore, summarize, and present data
Inferential Statistics:
Involves drawing conclusions about populations from samples using descriptive statistics data
Distinction Between Descriptive and Inferential Statistics
Descriptive Statistics:
Converts raw data into meaningful summaries and visuals
Inferential Statistics:
Involves making conclusions beyond the data available in a sample
Example: Inferring support for a political candidate based on a sample
Key Definitions
Population: The entire set of units or objects of interest
Sample: A subset of the population that researchers have access to
Variable: An attribute of a unit or object, e.g., a voter's choice for president
Parameter: A summary measure calculated from a population, e.g., the proportion of voters for a candidate
Statistic: A summary measure calculated from a sample, e.g., the proportion of voters supporting a candidate in a sample
Symbol Usage:
Parameters are generally represented using Greek letters
Statistics are typically represented using Latin letters
Practical Examples and Inferences
Example of Hypothesis Testing:
Study of FOX viewers' average age
Hypothesis tested: Is the average age of FOX viewers greater than 60?
Components:
Population: All FOX viewers
Variable of Interest: Age (years) of each viewer
Sample: 200 selected FOX viewers
Inference: Estimate average age and determine if it exceeds 60 years
Types of Data
Data Sources:
Published data (books, journals, websites)
Survey data
Designed experimental data
Observational study data
Quantitative Data:
Recorded on a numerical scale, e.g., unemployment rates
Example: Number of male and female executives in a sample
Qualitative Data:
Cannot be quantified; classified into categories
Example: Political affiliation of a sample of CEOs
Ordinal Data:
A subcategory of qualitative data with a hierarchy, e.g., low to high
Case Study Variables Classification
Study Example:
U.S. Army Corps of Engineers study on fish in the Tennessee River
Variables Measured:
River/Creeek (Qualitative)
Species (Qualitative)
Length (Quantitative)
Weight (Quantitative)
DDT Concentration (Quantitative)
Measure Classifications:
Measurement variables are typically quantitative
Classification variables are typically qualitative
Summary
Conclusion of Part I of Chapter 1, covering a foundational understanding of statistics, types of statistical methods, and the importance of statistical literacy in the context of modern data challenges.