1.1+The+Structure+of+Data

Chapter 1: Collecting Data

Section 1.1: The Structure of Data

  • Key Concepts Covered:

    • What is Data? How do we collect it?

    • Understanding Cases and Variables

    • Differentiating between Categorical and Quantitative Variables

    • Recognizing Explanatory and Response Variables

    • Utilizing data to answer questions

Understanding Data

  • Data are pervasive in every field and discipline.

  • The role of statistics is pivotal:

    • Collecting data

    • Describing data (organizing, summarizing, visualizing)

    • Analyzing data

    • Drawing conclusions from data

Defining Data

  • Data: A collection of measurements taken on individual units.

    • Examples of Data Sets:

      • Ages of adults in an apartment building

      • Gender identities of adults

      • Types of flowers in a garden

      • Daily temperatures in June

Cases and Variables

  • Cases (Units): The subjects or objects about which we gather information.

    • Example: In a survey asking Los Angeles residents about composting, the cases are the residents asked.

  • Variable: Any characteristic recorded for each case.

    • Example: In the same survey, the variable is whether each resident composts or not.

  • Dataset Structure:

    • Each case corresponds to a row and each variable to a column.

Creating Your Dataset

  • Consider a potential dataset of interest:

    • Identify cases

    • Identify variables

    • Formulate interesting questions for analysis

Examples of Variables

Example 1: Shelf Life of Apples

  • Cases: Barrels of apples

  • Variable: Number of days until apples spoil

Categorical vs Quantitative Variables

  • Categorical Variable: Divides cases into groups with names or labels.

    • Example: Political affiliation (Democratic, Republican, etc.)

  • Quantitative Variable: Measures a numerical quantity for each case.

    • Example: Student GPAs

Student Survey Data

Example 2: Data Overview

  • Survey Variables:

    • Year, Gender, Higher SAT, SAT score, GPA, Siblings, Height, Weight, Exercise, TV, Pulse, Award

  • Classifying Variables:

    • Year in School: Categorical

    • Gender: Categorical

    • Higher SAT: Categorical

    • SAT Score: Quantitative

    • GPA: Quantitative

    • Siblings: Quantitative

    • Height: Quantitative

    • Weight: Quantitative

    • Exercise: Quantitative

    • TV: Quantitative

    • Pulse Rate: Quantitative

    • Award Preference: Categorical

Analyzing Movies and Ratings

Example 3: Movie Ratings Comparison

  • Objective: Assess if comedies have higher audience ratings than dramas.

    • Cases: Movies

    • Variables:

      • Type of movie (Comedy/Drama)

      • Audience rating

    • Variable Types:

      • Movie Type: Categorical

      • Audience Rating: Quantitative

Understanding Relationships in Variables

  • Explanatory vs Response Variables:

    • Explanatory Variable: Predicts or explains

    • Response Variable: Outcome affected by the explanatory variable

    • Example: Studying with music (explanatory) vs exam scores (response)

Investigating Organic Foods

Example 4: Organic vs Conventional Foods

  • Explanatory Variable: Type of food (organic or conventional)

  • Response Variable: Pesticide status

Blood Alcohol Content Study

Example 5: Variables Identification

  • Explanatory Variable: Number of drinks consumed

  • Response Variable: Blood alcohol content

Survey of Female Gamers

Example 6: Gamer Survey Findings

  • Cases: 1141 female gamers in Great Britain

  • Variables to Consider:

    • Received obscene messages: Categorical

    • Hours played per week: Quantitative

    • Sufficient strong female characters: Categorical

  • Dataset Organization: 1141 rows (cases) and 3 columns (variables)