Lecture-3 Visualisation and Presentation of Data 061124

Page 1: Lecture Information

Lecture Title: Visualization and Presentation of Data (Continued)
Lecturer: Dr. Lei Xu
Office Location: BE.203 in Sir Richard Morris
Feedback and Consultation Hours:
- Tuesday 16:00 - 17:00
- Wednesday 09:30 - 10:30
Email: L.Xu2@lboro.ac.uk

Page 2: Learning Objectives and Readings

Learning Objectives

Measure of Dispersion (Variability)
Presentation and Descriptive Analysis
Correlation
Causal Analysis

Recommended Readings

Koop, G. (2013). Analysis of Economic Data. John Wiley & Sons. Chapter 3.
Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press.
- Sections: 'Introduction', 'Directed Acyclic Graphs', and 'Potential Outcomes Causal Model'.
Anderson, D. R., Williams, T. A., & Cochran, J. J. (2020). Statistics for Business & Economics. Cengage Learning. Chapters 2-3.

Page 3: Central Tendency and Measures of Dispersion

Measure of Location (Central Tendency)

Mean
Median
Mode

Measure of Dispersion (Variability)

Range and Percentile
Quartiles and Interquartile Range (IQR)
Mean Deviation
Variance
Standard Deviation

Page 4: Range and Percentile

Range

Definition: The simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a data set.
Formula: Range = Maximum value - Minimum value

Example 1

Data Set: 1000, 1050, 3000, 2500, 1780, 2210, 2540, 1980, 3650, 4970, 5000, 8500, 7010
Solution: Range = 8500 - 1000 = 7500

Percentile

Definition: A value such that at least 𝑝 percent of observations are less than or equal to this value.

Calculation Steps for Percentile

Arrange data in ascending order.
Compute an index: 𝑖 = 𝑝/100 * 𝑛, where 𝑛 is the number of observations.
Depending on whether 𝑖 is an integer, find corresponding value.

Page 5: Exercise on Percentile Calculation

Goal: Find the 75th percentile of the data set: 1000, 1050, 3000, 2500, 1780, 2210, 2540, 1980, 3650, 4970, 5000, 8500, 7010.

Solution Steps

Arrange Data: 1000, 1050, 1780, 1980, 2210, 2500, 2540, 3000, 3650, 4970, 5000, 7010, 8500
Calculate Index: 𝑖 = 75/100 * 13 = 9.75.
Determine Position: Round up to position 10;
- 75th Percentile: 4970 (10th position).

Note

The median represents the 50th percentile.
Excel Formula: =PERCENTILE.INC(array, k)

Page 6: Quartiles

Definition: Data is divided into four parts, each containing approximately 25% of observations.
Quartiles are Defined as:
- Q1: First quartile (25th percentile)
- Q2: Second quartile (50th percentile, median)
- Q3: Third quartile (75th percentile)
Calculation: Same method as percentiles.
Excel Formula: =QUARTILE.INC(array, quart)

Page 7: Interquartile Range (IQR)

Definition: Difference between third quartile (Q3) and first quartile (Q1);
Significance: Measures variability, focusing on the middle 50% of the data.
Illustration: See Figure 3.2.
Excel Formula: =Q3 - Q1 or =QUARTILE(array,3) - QUARTILE(array,1)

Page 8: Box Plot

Overview: A box plot (or box-and-whisker plot) is used in descriptive statistics for data visualization.
Purpose: Shows the distribution and skewness of numerical data through quartiles and means.
Five-Number Summary Components:
- Minimum value
- First quartile (Q1)
- Median (Q2)
- Third quartile (Q3)
- Maximum value

Page 9: Percentile Point Visualization

Source: ONS
Graphical Representation: Percentile points related to total income before tax with intervals.

Page 10: Central Tendency and Distributional Measures

Central Tendency

Definition: A single value representing the "center" or typical value of a dataset.
Common Measures:
- Mean: Average of all data points.
- Median: Middle value when data is ordered.
- Mode: Most frequently occurring value.

Distributional Measures

Definition: Describe the spread or dispersion of the data across its values.
Common Measures:
- Range: Difference between max and min values.
- Variance: Average of squared differences from the mean.
- Standard Deviation: Square root of variance.
- Skewness: Measure of data distribution asymmetry.
- Kurtosis: Measure of the distribution's sharpness.

Purpose of Measures

Central Tendency: Identifies a typical value.
Distributional Measures: Understands variability and shape of the dataset.

Page 11: Excel Formulas for Measures

Measures of Location

Mean: =AVERAGE(number1, number2,…)
Percentile: =PERCENTILE(array, k)
Quartile: =QUARTILE(array, quart)
Median: =MEDIAN(number1, number2,…)
Skewness: =SKEW(number1, number2,…)
Kurtosis: =KURT(number1, number2,…)

Measures of Dispersion

Standard Deviation: =STDEV.S(number1, number2,…)
Variance: =VAR.S(number1, number2,…)
Range: =MAX(number1, number2,…)-MIN(number1, number2,…)
Interquartile Range: =QUARTILE(array,3)-QUARTILE(array,1)

Page 12: Summary Statistics

Definition of Summary Statistics: Provide a summary of data on a numerical variable.
Sample Insight: Out of nearly 181,000 male employees, 95.3% are whites, with the lowest years of schooling yet highest potential work experience.
Wage Statistics: Average hourly wages in January 2018 prices are:
- Whites: £19.5
- Minority Natives: £18.7
- Minority Immigrants: £18.3

Page 13: Tabular and Graphical Methods for Summarizing Data

Data Types

Qualitative Data
Quantitative Data

Tabular Methods

Frequency Distribution
Relative Frequency Distribution
Percentage Frequency Distribution

Graphical Methods

Bar Chart
Pie Chart
Histogram
Scatter Diagram
Ogive

Page 14: Scatter Plot

Definition: A two-dimensional visualization that uses dots to represent values of two different variables.
Purpose: Shows the relationship between two variables.

Page 15: Line Graph

Definition: A graph for visualizing values over time, using a horizontal (x-axis) and vertical (y-axis).
Axes: x-axis for time intervals and y-axis for corresponding values (e.g., revenue).
Data Representation: Data points connected in a "dot-to-dot" fashion.

Page 16: Presentation Example

Context: Create a figure depicting life expectancy trends of white and Black males over time.

Page 17: Life Expectancy Presentation Data

Observations by Year

Data for White Males:
Data for Black Males:
Life expectancy trends displayed graphically.
Notable dip in 1918 attributed to the influenza pandemic.

Page 18: Wage Presentation

Sector Analysis: Overview of average hourly wage across various sectors, illustrated graphically.

Page 19: Average Hourly Wage Presentation

Detailed distribution of hourly wages for women aged 34 to 46 across various sectors, correlating with job types.

Page 20: Research Question

Inquiry: Does attending university lead to higher earnings?

Page 21: Earnings by Graduation Cohort

Data Analysis: Real earnings tracked over time after graduation for female and male cohorts.
Cohorts Analyzed: 2008, 2009, 2010, 2011, 2012.

Page 22: Earnings by Education Level

Graphical Analysis: Real earnings distinguished by GCSE results and higher education attendance.

Page 23: Mean Earnings Post-Graduation

Summary of Findings: Mean earnings corresponding to education levels 5 years post-graduation.

Page 24: Earnings by Subject Studied

Context: Average earnings differentiated by subject study, considering drops outs.

Page 25: Ice Cream Consumption Case Study

Scenario: Matt's ice-cream sales over 30 days collected to assess effects of temperature on sales.

Page 26: Univariate Description

Visual graphs showing frequency of ice cream consumption against temperature.

Page 27: Bivariate Descriptive Statistics

Summary statistics for ice cream consumption and associated temperatures, detailing means, standard errors, medians, modes, etc.

Page 28: Bivariate Description: Scatter Plot

Plot showing relationship between temperature and ice cream consumption, indicating a strong positive correlation.

Page 29: Covariance Analysis

Discussion on the interpretation of data points' quadrants in a covariance context, evaluating positive and negative relationships.

Page 30: Covariance Formula

Statistical formulas for calculating the covariance between two variables with explanations regarding population and sample covariance.

Page 31: Covariance Calculation for Ice Cream Sales

Detailed calculations presented from the collected data on ice-cream sales and temperature.

Page 32: Covariance Properties

Key properties of covariance explaining relationships: positive, negative, or no relationship.

Page 33: Correlation Definition

Coefficient of Correlation (r): Normalized index indicating the strength and direction of a linear relationship between two variables.

Page 34: Correlation Calculation for Ice Cream Sales

Data-driven analysis of correlation using previously mentioned dataset, illustrating calculation steps and results.

Page 35: Correlation Properties

An examination of correlation values denoting their interpretations about linear relationships between variables.

Page 36: Correlation Examples

Graphical examples demonstrating varying correlation strengths and their interpretations from scattered data plots.

Page 37: Causal Relationships

Causal inference explained as a measure of relationship strength between variables, emphasizing difficulty in establishing causation.

Page 38: Rubin Causal Model

Explanation of the model where treatment variables and potential outcomes are linked to a causal effect analysis based on individual observations.

Page 39: Average Treatment Effect (ATE)

Importance of understanding average effects across populations through conditional expectations, focusing on wage premiums related to educational attainment.

Page 40: Challenges in Causal Inference

Discussion on observational data's limitations and the missing data problem correlating to causal effect estimations.

Page 41: Wage Differential Analysis

Key Comparisons: College vs. Non-College wage differentials with focus on selection bias and how this affects causal interpretations.

Page 42: Random Assignment in Causal Studies

Definition and importance of random assignment in minimizing selection bias within experimental causal inference.

Page 43: Random Controlled Trials (RCT)

Overview of RCT as an essential method for determining causal relationships, including its design and execution fundamentals.

Page 44: Boxplot Analysis of Returns by Subject

Visualization and discussion of earnings based on academic institutions.

Page 45: Distribution of UCAS Points by University

Comparative analysis of UCAS tariff across various recognized UK universities.

Page 46: Subject Coefficients by A-Level Impact

Assessment of adjusted coefficients corresponding to subject intake based on mathematical A-levels.

Page 47: Adjusted Coefficients Summary

Detailed examination of subject-wise coefficients among various UK universities, with statistical significance metrics.

Page 48: Statistical Summary of Gender-Stratified Earnings

Data presenting overall impacts of higher education on earnings, segmenting by gender and number of individuals observed.