DS421 MOD2 EX 1 - Anscombe's Data

Module 3: Introduction to Tableau

Overview of Module Content

  • Objective: Connect to data and set variable types in Tableau.

  • Key Features: Utilize the Tableau user interface for visual analytics, create charts, dashboards, and stories.

  • Reading Assignment:

    • Chapter 1 from "Learning Tableau 2022" by Joshua Milligan.

    • The book includes all datasets, completed workbooks, and detailed instructions.

  • Midterm Preparation:

    • Create a one-page cheat sheet for midterm exam (June 30).

    • Cheat sheet can be typed or handwritten and must be submitted with the exam.

  • Recitation Quiz Focus: Basic Tableau and related energy constants.

Example Analysis: Anscombe's Quartet

Significance of Visualization
  • Anscombe's quartet is a famous dataset from 1973 illustrating the importance of visualizing data:

    • Contains four pairs of XY values with identical summary statistics.

    • Visualization shows that simple linear regression is appropriate for only one pair.

    • Highlights the inadequacy of relying solely on statistical calculations.

Details of the Dataset
  • Each pairing stats:

    • 11 observations per pair.

    • Summary statistics identical to one decimal point.

    • Fitted regression line and coefficient of determination ($R^2$) identical across the pairs.

    • Coefficient of determination ($R^2$) tells the percentage of explained variation in Y using the regression line on X.

Connecting to the Data
  • Data format: Excel, titled "Anscombe Data Analysis Ready."

  • Steps to Connect in Tableau:

    • Open Tableau, navigate to data source folder.

    • Connect to the Excel file.

    • Data will show a summary on the left and spreadsheet format on the right.

  • Note on Live Connection:

    • Reflection of updates in the original data source as changes occur.

    • If working with large data, disconnect (live) to save RAM.

Adjusting Variable Types
  • Confirm data fields are correctly identified as numeric or categorical.

    • "Pair" variable should be treated as categorical, moved from measures to dimensions.

  • In Tableau: Dimensions represent categorical data and measures are numeric fields.

Creating Visualizations
  • Objective: Create scatter plots to examine relationships.

  • Steps to Create Scatter Plots:

    • Use the "Pages" shelf to create separate plots for each pair.

    • Place dependent variable Y on the rows shelf, independent variable X on the columns shelf.

    • Disaggregate data by pair to visualize differences effectively.

  • Observations from Visualization:

    • Pair 1: Linear relationship (appropriate for simple linear regression).

    • Pair 2: Non-linear (quadratic relationship).

    • Pair 3: Strong linear association, but affected by an outlier.

    • Pair 4: Constant X values; regression not applicable.

Statistical Analysis and Model Fitting

Understanding Statistical Outputs
  • Aggregates available in Tableau for X and Y values:

    • Sum, average, standard deviation, etc.

  • Differences in Model Fit:

    • Impact of outlier shown in Pair 3; affects regression line greatly.

Outlier Influence
  • Definitions:

    • Outlier: A point drastically different from others affecting regression estimates.

    • Influential Point: An extreme point in the direction of the explanatory variable.

  • Managing Outliers:

    • Headers often lack identifiers; best practice is to include a unique ID when available for disaggregation.

    • If data is stationary (with limited entries), researchers recommend disaggregating data at the individual observation.

Saving Work in Tableau

  • Importance of frequent saving.

  • Save formats in Tableau:

    • Default: Tableau workbook (.twb) - saves visualizations separately from data.

    • Packaged workbook (.twbx) - saves data with visualizations, simpler for sharing.

Visualization Techniques

  • Tips for Better Visuals:

    • Remove grid lines for clarity.

    • Annotate visuals for context and conclusions.

    • Use captions to enhance accessibility (i.e. for screen readers).

Creating Dashboards
  • Organize elements on dashboards:

    • Use horizontal/vertical containers to drag and arrange sheets for visual coherence.

  • Adding titles and descriptions enhances user understanding.

Creating Stories

Importance of Storytelling with Data
  • Similar to a presentation format, allowing exploration of visual data.

    • Captions summarize findings or describe insights observed.

    • Stories can navigate through multiple sheets, allowing comparisons and conclusions to be made intuitively.