DS421 MOD2 EX 1 - Anscombe's Data
Module 3: Introduction to Tableau
Overview of Module Content
Objective: Connect to data and set variable types in Tableau.
Key Features: Utilize the Tableau user interface for visual analytics, create charts, dashboards, and stories.
Reading Assignment:
Chapter 1 from "Learning Tableau 2022" by Joshua Milligan.
The book includes all datasets, completed workbooks, and detailed instructions.
Midterm Preparation:
Create a one-page cheat sheet for midterm exam (June 30).
Cheat sheet can be typed or handwritten and must be submitted with the exam.
Recitation Quiz Focus: Basic Tableau and related energy constants.
Example Analysis: Anscombe's Quartet
Significance of Visualization
Anscombe's quartet is a famous dataset from 1973 illustrating the importance of visualizing data:
Contains four pairs of XY values with identical summary statistics.
Visualization shows that simple linear regression is appropriate for only one pair.
Highlights the inadequacy of relying solely on statistical calculations.
Details of the Dataset
Each pairing stats:
11 observations per pair.
Summary statistics identical to one decimal point.
Fitted regression line and coefficient of determination ($R^2$) identical across the pairs.
Coefficient of determination ($R^2$) tells the percentage of explained variation in Y using the regression line on X.
Connecting to the Data
Data format: Excel, titled "Anscombe Data Analysis Ready."
Steps to Connect in Tableau:
Open Tableau, navigate to data source folder.
Connect to the Excel file.
Data will show a summary on the left and spreadsheet format on the right.
Note on Live Connection:
Reflection of updates in the original data source as changes occur.
If working with large data, disconnect (live) to save RAM.
Adjusting Variable Types
Confirm data fields are correctly identified as numeric or categorical.
"Pair" variable should be treated as categorical, moved from measures to dimensions.
In Tableau: Dimensions represent categorical data and measures are numeric fields.
Creating Visualizations
Objective: Create scatter plots to examine relationships.
Steps to Create Scatter Plots:
Use the "Pages" shelf to create separate plots for each pair.
Place dependent variable Y on the rows shelf, independent variable X on the columns shelf.
Disaggregate data by pair to visualize differences effectively.
Observations from Visualization:
Pair 1: Linear relationship (appropriate for simple linear regression).
Pair 2: Non-linear (quadratic relationship).
Pair 3: Strong linear association, but affected by an outlier.
Pair 4: Constant X values; regression not applicable.
Statistical Analysis and Model Fitting
Understanding Statistical Outputs
Aggregates available in Tableau for X and Y values:
Sum, average, standard deviation, etc.
Differences in Model Fit:
Impact of outlier shown in Pair 3; affects regression line greatly.
Outlier Influence
Definitions:
Outlier: A point drastically different from others affecting regression estimates.
Influential Point: An extreme point in the direction of the explanatory variable.
Managing Outliers:
Headers often lack identifiers; best practice is to include a unique ID when available for disaggregation.
If data is stationary (with limited entries), researchers recommend disaggregating data at the individual observation.
Saving Work in Tableau
Importance of frequent saving.
Save formats in Tableau:
Default: Tableau workbook (.twb) - saves visualizations separately from data.
Packaged workbook (.twbx) - saves data with visualizations, simpler for sharing.
Visualization Techniques
Tips for Better Visuals:
Remove grid lines for clarity.
Annotate visuals for context and conclusions.
Use captions to enhance accessibility (i.e. for screen readers).
Creating Dashboards
Organize elements on dashboards:
Use horizontal/vertical containers to drag and arrange sheets for visual coherence.
Adding titles and descriptions enhances user understanding.
Creating Stories
Importance of Storytelling with Data
Similar to a presentation format, allowing exploration of visual data.
Captions summarize findings or describe insights observed.
Stories can navigate through multiple sheets, allowing comparisons and conclusions to be made intuitively.