Ch. 1
Types of Data Variables
Overview of Data Variables
Data Variable: A characteristic or attribute that can take on different values.
Types of Data Variables
Quantitative Variables:
Represent numerical values that can be measured. These variables yield numerical data that allows for mathematical operations like addition or averaging, answering questions like "how much?" or "how many?".
Can be further divided into:
Discrete Variables:
These are countable variables that take specific, distinct values and are often the result of counting. They cannot assume any value between two consecutive values.
For example: The number of words in a Tweet, the number of students in a class, the number of cars passing a certain point in an hour, the roll of a die ().
Continuous Variables:
These can take on any value within a given range, often involving measurements. The precision of the value depends on the measurement instrument, meaning there are an infinite number of possible values between any two given values.
For example: Temperature (e.g., ), height of a person (), weight of an object (), time taken to complete a task, amount of rainfall, pH level of a solution.
Qualitative Variables:
Represent categorical values that describe characteristics or qualities. These variables represent categories or attributes that cannot be measured numerically but can be classified. They describe "what type" or "which kind" rather than "how much."
Examples include:
Categorical Variables:
Non-numerical categories, where the order of categories may or may not matter.
For example: Country of residence, hair color (blonde, brown, black), marital status (single, married, divorced), type of car (sedan, SUV, truck), level of education (high school, bachelor's, master's, PhD).
Data Examples
Identifier Variables:
These are unique labels or codes used to identify individual observations or records.
Examples: Social Security Number, Student ID, Email Address, Employee ID.
Quantitative Variable Examples:
Temperature (a continuous variable).
Number of words in a Tweet (a discrete variable).
Lab Assignments
Assignments submission:
Instructions: Submit answers on Canvas under Assignments for extra credit.
Datasets Overview
Types of Datasets
Cross-Sectional Datasets:
Data collected from multiple subjects (individuals, firms, countries, etc.) at a single, specific point in time or over a defined period that is treated as a single snapshot. It captures a snapshot of a population at a particular moment.
Examples:
Number of residents in each U.S. state in the year 2020.
Average income of employees in various companies in 2023.
Number of tornadoes that touched down in each U.S. county in 2021.
Number of emails received yesterday for each individual student in this class.
Panel Datasets (Longitudinal Data):
Data collected over time across the same subjects to observe changes over time. This involves repeated observations of the same entities over multiple periods.
Examples:
Total parking citations issued within each zip code in D.C. for each month from October to December 2021.
The unemployment rate for each of the G7 countries measured annually for the past 10 years.
A student's GPA recorded every semester throughout their college career.
The annual sales figures for a specific group of companies over a five-year period.
Specific Dataset Examples
Cross-Sectional Example:
Number of tornadoes that touched down in each U.S. county in 2021.
Number of emails received yesterday for each individual student in this class.
Lab Assignment Classification
Instructions:
Classify the following datasets as Cross-Sectional or Panel and submit answers on Canvas under Assignments.