Topic Five Spring 2020

DESCRIBING DATA: VISUAL AND NUMERICAL DESCRIPTIONS

Dr. Erin K. Freeman

Topic Five


TOPIC FIVE OBJECTIVES

  • Purpose of visual and numerical descriptions

  • Distinguish between good and poor visual descriptions

  • Create and interpret visual charts and graphs

  • Define and distinguish measures of center in data distributions

  • Define and distinguish measures of variability in data distributions

  • Understand the concepts of shape and outliers in data distributions


WHY VISUAL DESCRIPTIONS?

  • Simplify interpretation of large information sets

  • Visual aids often more effective than words in conveying messages

  • Statistical graphics preferred for data summarization

  • Essential skill in data visualization creation and interpretation


DATA VISUALIZATION

  • Characteristics: simple, thorough, accurate, impactful

  • Innovations informed by neuroscience

  • Includes descriptive and exploratory data analysis


BANDWIDTH OF SENSES (David McCandless)

  • Sight: 1250 MB/s

  • Touch: 125 MB/s

  • Hearing: 12.5 MB/s

  • Same as computer networks


COMMON GRAPHICAL ERRORS

  • Omitting baselines (zero points)

  • Manipulating axes

  • Cherry-picking data

  • Using inappropriate chart types

  • Excessive grid lines or irrelevant labels

  • Failing to adjust dollar amounts for inflation


DESCRIPTIVE STATISTICS

Measures of Center

  • Mode: Most frequently occurring score

  • Median: Midpoint that divides data; robust against outliers

  • Mean: Arithmetic average, sensitive to extremes

Measures of Variability

  • Range: Difference between max and min values

  • Interquartile Range (IQR): Spread of middle 50% of data

  • Variance and Standard Deviation: Average distance of scores from the mean


SHAPE OF DISTRIBUTION

  • Normal Distribution: Symmetric, unimodal

  • Skewness: Direction of tail affects central tendency measures' choice

  • Commonly skewed distributions: income (positive), grades (negative)


CENTRAL TENDENCY

  • Summary statistic reflecting typical value

  • Different measures minimize error for different distributions

  • Central tendency decisions influenced by data shape


FIVE NUMBER SUMMARY

  • Max, Upper Hinge (Q3), Median, Lower Hinge (Q1), Min

  • Visualized via box plots to assess shape and outliers


BOX PLOT

  • Represents five number summary

  • Helps visualize data shape and identify outliers


IMPLICATIONS

  • Assessing variability provides context beyond central tendency

  • Knowing central tendency alone can be misleading

  • Importance of visual representations in data interpretation


NEXT TOPIC

  • Topic 6: Normal Distributions