Lecture-3 Visualisation and Presentation of Data 061124
Page 1: Lecture Information
Lecture Title: Visualization and Presentation of Data (Continued)
Lecturer: Dr. Lei Xu
Office Location: BE.203 in Sir Richard Morris
Feedback and Consultation Hours:
Tuesday 16:00 - 17:00
Wednesday 09:30 - 10:30
Email: L.Xu2@lboro.ac.uk
Page 2: Learning Objectives and Readings
Learning Objectives
Measure of Dispersion (Variability)
Presentation and Descriptive Analysis
Correlation
Causal Analysis
Recommended Readings
Koop, G. (2013). Analysis of Economic Data. John Wiley & Sons. Chapter 3.
Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press.
Sections: 'Introduction', 'Directed Acyclic Graphs', and 'Potential Outcomes Causal Model'.
Anderson, D. R., Williams, T. A., & Cochran, J. J. (2020). Statistics for Business & Economics. Cengage Learning. Chapters 2-3.
Page 3: Central Tendency and Measures of Dispersion
Measure of Location (Central Tendency)
Mean
Median
Mode
Measure of Dispersion (Variability)
Range and Percentile
Quartiles and Interquartile Range (IQR)
Mean Deviation
Variance
Standard Deviation
Page 4: Range and Percentile
Range
Definition: The simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a data set.
Formula: Range = Maximum value - Minimum value
Example 1
Data Set: 1000, 1050, 3000, 2500, 1780, 2210, 2540, 1980, 3650, 4970, 5000, 8500, 7010
Solution: Range = 8500 - 1000 = 7500
Percentile
Definition: A value such that at least 𝑝 percent of observations are less than or equal to this value.
Calculation Steps for Percentile
Arrange data in ascending order.
Compute an index: 𝑖 = 𝑝/100 * 𝑛, where 𝑛 is the number of observations.
Depending on whether 𝑖 is an integer, find corresponding value.
Page 5: Exercise on Percentile Calculation
Goal: Find the 75th percentile of the data set: 1000, 1050, 3000, 2500, 1780, 2210, 2540, 1980, 3650, 4970, 5000, 8500, 7010.
Solution Steps
Arrange Data: 1000, 1050, 1780, 1980, 2210, 2500, 2540, 3000, 3650, 4970, 5000, 7010, 8500
Calculate Index: 𝑖 = 75/100 * 13 = 9.75.
Determine Position: Round up to position 10;
75th Percentile: 4970 (10th position).
Note
The median represents the 50th percentile.
Excel Formula: =PERCENTILE.INC(array, k)
Page 6: Quartiles
Definition: Data is divided into four parts, each containing approximately 25% of observations.
Quartiles are Defined as:
Q1: First quartile (25th percentile)
Q2: Second quartile (50th percentile, median)
Q3: Third quartile (75th percentile)
Calculation: Same method as percentiles.
Excel Formula: =QUARTILE.INC(array, quart)
Page 7: Interquartile Range (IQR)
Definition: Difference between third quartile (Q3) and first quartile (Q1);
Significance: Measures variability, focusing on the middle 50% of the data.
Illustration: See Figure 3.2.
Excel Formula: =Q3 - Q1 or =QUARTILE(array,3) - QUARTILE(array,1)
Page 8: Box Plot
Overview: A box plot (or box-and-whisker plot) is used in descriptive statistics for data visualization.
Purpose: Shows the distribution and skewness of numerical data through quartiles and means.
Five-Number Summary Components:
Minimum value
First quartile (Q1)
Median (Q2)
Third quartile (Q3)
Maximum value
Page 9: Percentile Point Visualization
Source: ONS
Graphical Representation: Percentile points related to total income before tax with intervals.
Page 10: Central Tendency and Distributional Measures
Central Tendency
Definition: A single value representing the "center" or typical value of a dataset.
Common Measures:
Mean: Average of all data points.
Median: Middle value when data is ordered.
Mode: Most frequently occurring value.
Distributional Measures
Definition: Describe the spread or dispersion of the data across its values.
Common Measures:
Range: Difference between max and min values.
Variance: Average of squared differences from the mean.
Standard Deviation: Square root of variance.
Skewness: Measure of data distribution asymmetry.
Kurtosis: Measure of the distribution's sharpness.
Purpose of Measures
Central Tendency: Identifies a typical value.
Distributional Measures: Understands variability and shape of the dataset.
Page 11: Excel Formulas for Measures
Measures of Location
Mean: =AVERAGE(number1, number2,…)
Percentile: =PERCENTILE(array, k)
Quartile: =QUARTILE(array, quart)
Median: =MEDIAN(number1, number2,…)
Skewness: =SKEW(number1, number2,…)
Kurtosis: =KURT(number1, number2,…)
Measures of Dispersion
Standard Deviation: =STDEV.S(number1, number2,…)
Variance: =VAR.S(number1, number2,…)
Range: =MAX(number1, number2,…)-MIN(number1, number2,…)
Interquartile Range: =QUARTILE(array,3)-QUARTILE(array,1)
Page 12: Summary Statistics
Definition of Summary Statistics: Provide a summary of data on a numerical variable.
Sample Insight: Out of nearly 181,000 male employees, 95.3% are whites, with the lowest years of schooling yet highest potential work experience.
Wage Statistics: Average hourly wages in January 2018 prices are:
Whites: £19.5
Minority Natives: £18.7
Minority Immigrants: £18.3
Page 13: Tabular and Graphical Methods for Summarizing Data
Data Types
Qualitative Data
Quantitative Data
Tabular Methods
Frequency Distribution
Relative Frequency Distribution
Percentage Frequency Distribution
Graphical Methods
Bar Chart
Pie Chart
Histogram
Scatter Diagram
Ogive
Page 14: Scatter Plot
Definition: A two-dimensional visualization that uses dots to represent values of two different variables.
Purpose: Shows the relationship between two variables.
Page 15: Line Graph
Definition: A graph for visualizing values over time, using a horizontal (x-axis) and vertical (y-axis).
Axes: x-axis for time intervals and y-axis for corresponding values (e.g., revenue).
Data Representation: Data points connected in a "dot-to-dot" fashion.
Page 16: Presentation Example
Context: Create a figure depicting life expectancy trends of white and Black males over time.
Page 17: Life Expectancy Presentation Data
Observations by Year
Data for White Males:
Data for Black Males:
Life expectancy trends displayed graphically.
Notable dip in 1918 attributed to the influenza pandemic.
Page 18: Wage Presentation
Sector Analysis: Overview of average hourly wage across various sectors, illustrated graphically.
Page 19: Average Hourly Wage Presentation
Detailed distribution of hourly wages for women aged 34 to 46 across various sectors, correlating with job types.
Page 20: Research Question
Inquiry: Does attending university lead to higher earnings?
Page 21: Earnings by Graduation Cohort
Data Analysis: Real earnings tracked over time after graduation for female and male cohorts.
Cohorts Analyzed: 2008, 2009, 2010, 2011, 2012.
Page 22: Earnings by Education Level
Graphical Analysis: Real earnings distinguished by GCSE results and higher education attendance.
Page 23: Mean Earnings Post-Graduation
Summary of Findings: Mean earnings corresponding to education levels 5 years post-graduation.
Page 24: Earnings by Subject Studied
Context: Average earnings differentiated by subject study, considering drops outs.
Page 25: Ice Cream Consumption Case Study
Scenario: Matt's ice-cream sales over 30 days collected to assess effects of temperature on sales.
Page 26: Univariate Description
Visual graphs showing frequency of ice cream consumption against temperature.
Page 27: Bivariate Descriptive Statistics
Summary statistics for ice cream consumption and associated temperatures, detailing means, standard errors, medians, modes, etc.
Page 28: Bivariate Description: Scatter Plot
Plot showing relationship between temperature and ice cream consumption, indicating a strong positive correlation.
Page 29: Covariance Analysis
Discussion on the interpretation of data points' quadrants in a covariance context, evaluating positive and negative relationships.
Page 30: Covariance Formula
Statistical formulas for calculating the covariance between two variables with explanations regarding population and sample covariance.
Page 31: Covariance Calculation for Ice Cream Sales
Detailed calculations presented from the collected data on ice-cream sales and temperature.
Page 32: Covariance Properties
Key properties of covariance explaining relationships: positive, negative, or no relationship.
Page 33: Correlation Definition
Coefficient of Correlation (r): Normalized index indicating the strength and direction of a linear relationship between two variables.
Page 34: Correlation Calculation for Ice Cream Sales
Data-driven analysis of correlation using previously mentioned dataset, illustrating calculation steps and results.
Page 35: Correlation Properties
An examination of correlation values denoting their interpretations about linear relationships between variables.
Page 36: Correlation Examples
Graphical examples demonstrating varying correlation strengths and their interpretations from scattered data plots.
Page 37: Causal Relationships
Causal inference explained as a measure of relationship strength between variables, emphasizing difficulty in establishing causation.
Page 38: Rubin Causal Model
Explanation of the model where treatment variables and potential outcomes are linked to a causal effect analysis based on individual observations.
Page 39: Average Treatment Effect (ATE)
Importance of understanding average effects across populations through conditional expectations, focusing on wage premiums related to educational attainment.
Page 40: Challenges in Causal Inference
Discussion on observational data's limitations and the missing data problem correlating to causal effect estimations.
Page 41: Wage Differential Analysis
Key Comparisons: College vs. Non-College wage differentials with focus on selection bias and how this affects causal interpretations.
Page 42: Random Assignment in Causal Studies
Definition and importance of random assignment in minimizing selection bias within experimental causal inference.
Page 43: Random Controlled Trials (RCT)
Overview of RCT as an essential method for determining causal relationships, including its design and execution fundamentals.
Page 44: Boxplot Analysis of Returns by Subject
Visualization and discussion of earnings based on academic institutions.
Page 45: Distribution of UCAS Points by University
Comparative analysis of UCAS tariff across various recognized UK universities.
Page 46: Subject Coefficients by A-Level Impact
Assessment of adjusted coefficients corresponding to subject intake based on mathematical A-levels.
Page 47: Adjusted Coefficients Summary
Detailed examination of subject-wise coefficients among various UK universities, with statistical significance metrics.
Page 48: Statistical Summary of Gender-Stratified Earnings
Data presenting overall impacts of higher education on earnings, segmenting by gender and number of individuals observed.