Data Visualization

  • Pie Charts

    • Show what percentage of the whole fall into each category for a single variable

  • Bar Graphs

    • Show what percentage or frequency of the whole fall into each category

    • can be used for two or three variables simultaneously

  • Pictograms

    • Bar Graph that uses pictures related to topic

Exploratory Data Analysis

Preliminary analysis of data sets to summarize (or discover) their main characteristics

may involve data visualization, help determine what stories the data might tell, and suggest future analysis.

Explore the data without assumptions, models, or preconceived expectations about what the data will contain.

Discover what questions are worth asking

Visualize the processes that generated the data

Goal: reveal patterns

Correlation

A measure of the strength and direction of the linear relationship between two measurement variables.

r = coefficient or correlation

-1<=r<=1

if r=-1 or r=1, all the points will fall on a line

Time series plot: x=time, y=variables

  • trends?

  • seasonal effects?

  • change points?

Regular scatter plot

  • linear vs. nonlinear relationships

  • constant variance?

  • outliers?

  • Anything unexpected or surprising?

Box Plots

Based on five number summary: min, lower quartile, median, upper quartile, max.

Interquartile range = upper quartile – lower quartile

Whiskers are draw to length 1.5(IQR) or to the min and max

Quantile-Quantile (QQ) Plots

  • used to determine if data is consistent with a sample from a particular distribution

  • commonly used to asses normality

  • Plot the quantiles of the empirical distribution(observed in the sample) with the quantiles of a theoretical distribution

  • If they align you should get a slope of 1