Data Visualization
Pie Charts
Show what percentage of the whole fall into each category for a single variable
Bar Graphs
Show what percentage or frequency of the whole fall into each category
can be used for two or three variables simultaneously
Pictograms
Bar Graph that uses pictures related to topic
Exploratory Data Analysis
Preliminary analysis of data sets to summarize (or discover) their main characteristics
may involve data visualization, help determine what stories the data might tell, and suggest future analysis.
Explore the data without assumptions, models, or preconceived expectations about what the data will contain.
Discover what questions are worth asking
Visualize the processes that generated the data
Goal: reveal patterns
Correlation
A measure of the strength and direction of the linear relationship between two measurement variables.
r = coefficient or correlation
-1<=r<=1
if r=-1 or r=1, all the points will fall on a line
Time series plot: x=time, y=variables
trends?
seasonal effects?
change points?
Regular scatter plot
linear vs. nonlinear relationships
constant variance?
outliers?
Anything unexpected or surprising?
Box Plots
Based on five number summary: min, lower quartile, median, upper quartile, max.
Interquartile range = upper quartile – lower quartile
Whiskers are draw to length 1.5(IQR) or to the min and max
Quantile-Quantile (QQ) Plots
used to determine if data is consistent with a sample from a particular distribution
commonly used to asses normality
Plot the quantiles of the empirical distribution(observed in the sample) with the quantiles of a theoretical distribution
If they align you should get a slope of 1