1.1+The+Structure+of+Data
Chapter 1: Collecting Data
Section 1.1: The Structure of Data
Key Concepts Covered:
What is Data? How do we collect it?
Understanding Cases and Variables
Differentiating between Categorical and Quantitative Variables
Recognizing Explanatory and Response Variables
Utilizing data to answer questions
Understanding Data
Data are pervasive in every field and discipline.
The role of statistics is pivotal:
Collecting data
Describing data (organizing, summarizing, visualizing)
Analyzing data
Drawing conclusions from data
Defining Data
Data: A collection of measurements taken on individual units.
Examples of Data Sets:
Ages of adults in an apartment building
Gender identities of adults
Types of flowers in a garden
Daily temperatures in June
Cases and Variables
Cases (Units): The subjects or objects about which we gather information.
Example: In a survey asking Los Angeles residents about composting, the cases are the residents asked.
Variable: Any characteristic recorded for each case.
Example: In the same survey, the variable is whether each resident composts or not.
Dataset Structure:
Each case corresponds to a row and each variable to a column.
Creating Your Dataset
Consider a potential dataset of interest:
Identify cases
Identify variables
Formulate interesting questions for analysis
Examples of Variables
Example 1: Shelf Life of Apples
Cases: Barrels of apples
Variable: Number of days until apples spoil
Categorical vs Quantitative Variables
Categorical Variable: Divides cases into groups with names or labels.
Example: Political affiliation (Democratic, Republican, etc.)
Quantitative Variable: Measures a numerical quantity for each case.
Example: Student GPAs
Student Survey Data
Example 2: Data Overview
Survey Variables:
Year, Gender, Higher SAT, SAT score, GPA, Siblings, Height, Weight, Exercise, TV, Pulse, Award
Classifying Variables:
Year in School: Categorical
Gender: Categorical
Higher SAT: Categorical
SAT Score: Quantitative
GPA: Quantitative
Siblings: Quantitative
Height: Quantitative
Weight: Quantitative
Exercise: Quantitative
TV: Quantitative
Pulse Rate: Quantitative
Award Preference: Categorical
Analyzing Movies and Ratings
Example 3: Movie Ratings Comparison
Objective: Assess if comedies have higher audience ratings than dramas.
Cases: Movies
Variables:
Type of movie (Comedy/Drama)
Audience rating
Variable Types:
Movie Type: Categorical
Audience Rating: Quantitative
Understanding Relationships in Variables
Explanatory vs Response Variables:
Explanatory Variable: Predicts or explains
Response Variable: Outcome affected by the explanatory variable
Example: Studying with music (explanatory) vs exam scores (response)
Investigating Organic Foods
Example 4: Organic vs Conventional Foods
Explanatory Variable: Type of food (organic or conventional)
Response Variable: Pesticide status
Blood Alcohol Content Study
Example 5: Variables Identification
Explanatory Variable: Number of drinks consumed
Response Variable: Blood alcohol content
Survey of Female Gamers
Example 6: Gamer Survey Findings
Cases: 1141 female gamers in Great Britain
Variables to Consider:
Received obscene messages: Categorical
Hours played per week: Quantitative
Sufficient strong female characters: Categorical
Dataset Organization: 1141 rows (cases) and 3 columns (variables)