Descriptive Statistics: Graphical
Probability and Statistics for Economists
Descriptive Statistics: Graphical Analysis
Presented by: Cristina Blanco-Perez, University of Ottawa
Outline
Descriptive Statistics
Graphical Analysis
Data Classification
Analyzing Data Graphically
Understanding Measurement Scales
Measurement Scales
Measurement scales describe how precisely variables are recorded and analyzed. The choice of measurement scale determines the amount of information contained in the data and influences the appropriate summarization and statistical analysis techniques used to extract insights.
Data Classification
Types of Measurement Scales:
Categorical Data (Qualitative): Data represented by labels or names that identify the attribute of each element. This can be either non-numeric (e.g., gender) or numeric (e.g., a coded representation of gender).
Quantitative Data (Numerical): Data that consists of numeric values indicating quantity or amount, crucial for conducting arithmetic operations.
Types of Variable Data
Categorical Variables:
Nominal: Categories that do not follow any natural order (e.g., gender identity, geographical locations). For instance, categories like 'Asian', 'European', and 'African' are nominal as they are mere labels without a ranking.
Ordinal: Categories with a natural order (e.g., survey ratings such as very poor to very good), where the rank carries meaning, but the differences between ranks are not uniform. This for instance includes customer satisfaction surveys.
Quantitative Variables:
Discrete: Numeric values that can only be counted in whole numbers (e.g., the number of students in a class). These values often represent countable data points.
Continuous: Numeric values that can take on any value within a specified range (e.g., height, weight, temperature), allowing for measurements that are not restricted to whole numbers.
Examples of Data Types
Health Rating: 1 = Very Good, 2 = Good, 3 = Fair, 4 = Poor, 5 = Very Poor. (Ordinal)
Age Range: Typically from 0 to 120 years inclusive. (Discrete/Continuous)
Height Measurement: Recorded in centimeters, allowing for precision in analysis. (Continuous)
Gender Identifier: Coded as 1 = Female, 2 = Non-Female, which can include various gender identities based on the study's focus. (Nominal)
Year of Birth: Ranges from 1900 to 2016, often analyzed to study generational trends. (Discrete)
Income: Categorized in ranges from $0 to a specified maximum, often used in economic studies to understand wealth distribution. (Numerical Continuous/Categorical)
Types of Data Classification According to Time-Span
Time Series Data: Data tracked over successive time points (e.g., yearly income), valuable for identifying trends over time.
Tracking the stock and bond markets, as another example
Cross-Sectional Data: Data collected at a single point in time across multiple subjects or entities (e.g., the income of various households in a particular year), useful for comparative analysis.
Longitudinal Data (Panel Data): Data that follows the same subjects over time, allowing researchers to study changes over several periods, such as multiple households’ income tracked annually.
(Census is an imperfect example of this type of data)
Analyzing Data Graphically
Graphical analysis involves visual representations to extract key insights from data collections more easily than traditional numerical data presentations. After data collection, data must be classified to select the appropriate graph or table for comprehensive analysis.
Analyzing Categorical Data
Methods include:
For tabulating:
Frequency Distribution Tables: Effective for organizing data. (Frequency is # of observations in each category).
For graphing:
Bar Charts: Visual representation highlighting the frequency of categories. Each bar’s height reflects the count of occurrences.
Pie Charts: Serve to show the proportionate makeup of categories within the total dataset, allowing comparisons of their relative sizes.
Frequency Distribution Table
This table visualizes potential responses and the number of observations for each class (frequency). It may include a Relative Frequency Distribution Table, which displays proportions and gives a better understanding of how each category relates to the whole.
Example: Health Data Frequency Distribution
Very Poor: 1000 (6.5%)
Poor: 3450 (22.5%)
Fair: 6000 (39.1%)
Good: 3000 (19.6%)
Very Good: 1890 (12.3%)
Total Observations: 15340
Presenting Main Indicators
Using Tables:
Numbering and captions for reference to improve navigability.
Presentation of example indicators like Gross Domestic Product (GDP) revisions per province to illustrate economic changes over time.
Bar Chart and Pie Chart
Bar Chart: Provides a clear visual for the frequency of each category, helping to comprehend the most common occurrences.
Pie Chart: Highlights the proportions of frequencies relative to the total dataset.
Cross Table
Analyzes the interaction between two categorical variables for deeper insights, making it easier to see relationships between them.
Analyzing Time-Series Data
Utilizes Line Charts (Time-Series Plots), enabling visual comprehension of data points tracked over different time intervals. These plots reveal trends, cycles, and variations that can indicate changes over time.
Analyzing Quantitative Variables
Quantitative data analysis can be accomplished through:
Histograms: Visual representations that display distributions, showing counts along the y-axis and intervals on the x-axis.
Frequency Distributions - Key Considerations
When determining frequencies for numerical analysis, consider:
The necessary number of intervals (k) to categorize the data.
Setting a suitable interval width (w) that fits the data’s natural distribution.
Ensuring intervals are inclusive and non-overlapping to maintain analytical integrity.
Example of Salary Frequency Distribution: Displays categories based on predetermined intervals and their corresponding frequency counts, allowing for insight into the distribution of income.
Histogram
A graphical representation portraying frequency distributions where intervals are on the x-axis and corresponding frequency counts are on the y-axis. The shape of the histogram provides insight into data distribution characteristics.
For histograms:
w = (Largest Observation - Smallest Obsrvation)/k, where k is the number of intervals. w is the width of each interval, which helps to determine how the data is grouped and visualized in the histogram.
Make sure the intervals are inclusive and non-overlapping to accurately represent the frequency of observations within each range, ensuring that each data point is accounted for without duplication.
Ensure that you understand cumulative frequency and relative cumulative frequency.
Distribution Shapes
Analyzing the shape of a distribution in a histogram can reveal:
Symmetric Distribution: Reflects a balanced dataset.
Skewed Right Distribution: Shows a longer tail on the right side, indicating more lower values.
Skewed Left Distribution: Indicates a longer tail on the left side, showing that there are a few lower values affecting the average. (i.e there are more higher values)
Scatter Plot
This is used to analyze the relationship between two numerical variables. For instance, examining Income vs. Years of Education may uncover correlations between educational attainment and income levels.
Software for Analysis
Utilize tools like Excel or CALC for creating charts and conducting comprehensive data analysis. However, employ caution, as visual representations can mislead; always reference guidelines on interpreting graphs accurately.
Suggested Readings and Exercises
Read Chapter 2, Sections 2.1, 2.2, 2.3 in the textbook for detailed understanding.
Suggested exercises include Section 2.2, Questions 12 and 20, providing practical applications of the concepts discussed.