Different Types of Data and Variables
Different Types of Data and Variables
Data
Economic Data
Understanding the type of data typically seen in economics is crucial for analysis and interpretation.
Why Data?
Share of Respondents:
Presented with various statistical percentages indicating data management trends from 2018 to 2023 in organizations worldwide regarding data culture, innovation, and competition.
Data utilization is essential as questions posed in class may not always require data, but can be answered more effectively when data is incorporated.
Practice: Homework assignments will involve using data to articulate answers.
What is Data?
Data can encompass an array of elements, including:
Examples:
GPA
Year in school
Major
Music preferences
Specific show-related time spent (e.g., Prison Break in ECON 203)
Core Idea: Anything can be categorized as data, depending on the context.
Usable Data: The utility of data relies on the tools available corresponding to its type. The structure and nature of individual observations inform data interpretation.
Economic Data
Raising Prices Impact:
Exploring how higher prices affect consumer behavior requires ethical considerations, as manipulating prices directly could be problematic.
Observational studies sometimes allow for insights without direct intervention.
Economic Data Types
Observational Studies
Definition:
A set of data collected without researcher intervention.
Researchers have no control over actions taken by study participants.
Observations are made concerning subjects acting independently.
Ethical complications may arise but randomization can enhance study validity.
Randomized Experiments
Definition:
A data set derived from a scenario where a researcher assigns different values to variables to observe subsequent effects.
Randomization serves to reduce bias in the findings.
Facilitates the establishment of causal relationships, although ethical concerns may limit applications.
Data Characteristics in Economics
Typically, economists use already available data rather than conducting randomized controlled trials (RCTs).
Economic data frequently consists of observational studies rather than experimental setups.
Different Types of Data
Data Organization Methods:
Cross-Sectional Data
Time Series Data
Pooled Cross-Sectional Data
Panel Data
Acknowledged that there are additional data types as well.
Data Classification Questions
Evaluating the data set type can depend on:
Variation in surveyed units (e.g., individuals or objects)
Timing of data collection
Cross-Sectional Data
Definition:
Contains heterogeneous data derived exclusively from surveyed individual characteristics at one single point in time.
Example:
Analyze characteristics collected today from a specific group.
Cross-Sectional Data Example
House Size vs. Lot Size:
Data presented on housing metrics.
Weekly Wages and Hours Worked (2010)
Structured data representing gender, year, wage, and hours worked, demonstrating characteristics across different individuals.
Time Series Data
Definition:
Characterized by data collected at various points in time, focused on the same unit or individual.
If collected on multiple subjects consistently, it transitions out of a time series classification.
Time Series Data Example
S&P 500 Closing Prices:
Data illustrating the stock market’s performance over specified dates with closing values indicated.
Pooled Cross-Sectional Data
Definition:
Integrates elements from both time series and cross-sectional data, showcasing variability from differing units and fluctuations in timing.
Example:
Survey characteristics from varying student cohorts across multiple semesters.
Pooled Cross-Sectional Data Example
Log wage against age depicted in two separate years demonstrating shifts in trends over time.
Panel Data
Definition:
Analyses the same subjects repeatedly across different time periods. Retaining the same subjects is critical; any deviation or dropout alters the dataset's classification.
Panel Data Example
Eliminations and changes in a study participants’ characteristics across years illustrated by tracking wages and hours worked.
iClicker Question Example
A scenario examining dataset types based on class retention over a semester leading to:
(A) Panel Data
(B) Cross-Sectional
(C) Pooled Cross-Sectional
(D) Time Series
Importance of Data Type Understanding
Distinguishing between data types is essential as various data sets lend themselves to different modeling techniques. The focus in the course will primarily involve cross-sectional data and pooled cross-sectional data.
Types of Variables
Identification of variables is key for effective data representation in datasets.
Example variables could include GPA, major, and year in school, which need proper categorization.
Certain numbers utilize calculations, whereas others may not yield the same computational relevance.
Variable Classification
Numeric Variable:
Represents measurable quantities for comparative description.
Example: Course grade indicating percentage completions.
Not all variables that include numbers are numeric in nature.
Numeric Variables
Types:
Continuous Variables:
Can theoretically encompass infinite values (e.g., representing age in real-time).
Discrete Variables:
Limited in scope, referencing specifically quantifiable amounts (e.g., age represented in whole years only).
Categorical Variables
Definition:
Offer insight into the grouping of observations.
Examples encompass geographical information.
Categorical variables may utilize numerical representation or be presented as text-based.
Types of Categorical Variables
Ordinal Variables:
Rank observations but without emphasizing the disparity between ranks (e.g., economic status categories).
Nominal Variables:
Offer descriptive categorizations where order is irrelevant (e.g., state of birth).
Dummy Variables
Definition:
Indicate if a certain criterion for a variable is met, taking values either 0 or 1.
Example: A dummy variable for gender could be designated as 1 for male, 0 otherwise.
Useful for segmenting categorical variables in regression analyses.
iClicker Questions on Variable Types
An inquiry about a variable related to minimum wage in New Jersey demonstrating understanding.
A question about group assignments employing a variable valuation indicating categorized groups.
Conclusion: Roadmap
Data generation, the importance of variable differentiation, and the capability of utilizing data to address inquiries will be covered throughout the course. Understanding probability will lay the groundwork for deeper explorations into data utilization.