Lecture 2 - Data Summary and Visualization

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/17

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

18 Terms

1
New cards

How do measures of central tendency fall?

In normal distribution, mean = median = mode

In skewed/multimodal distribution, mean, median, and mode differ

  • when distributions are highly skewedmedian is a better measure

  • when distributions are multimodal, this indicates the presence of multiple groups

  • we can use modes to describe the ‘center’ or non-numeric data

2
New cards

Standard deviation features

  • when sd is low, points are clustered to the mean

  • when there’s high variability, the points are more spread out

  • observations that are a lot higher oor lower than the mean are going to be common in a distribution with high variability

<ul><li><p>when sd is low, points are clustered to the mean</p></li><li><p>when there’s high variability, the points are more spread out</p></li><li><p>observations that are a lot higher oor lower than the mean are going to be common in a distribution with high variability</p></li></ul><p></p>
3
New cards

Good vs bad graph features

Good

  1. clean

  2. informative AND

  3. easy to understand

Bad

  1. confusing

  2. misleading BAD

  3. OR meaningless

4
New cards

Pie Charts

to show proportions of different categorical and mutually-exclusive options

  • Proportions need to add up to 1 (percentages need to add up to 100%)

5
New cards
<p>Good or bad graph?</p>

Good or bad graph?

Good graph! It’s possible to describe the graph

6
New cards
<p>Good or bad graph?</p>

Good or bad graph?

Bad graph! the percentages need to add up to 100%

7
New cards

Bar plots

to compare values associated with separate (usually categorical) variables

  • These can be any sort of numeric variable that we are looking for differences across groups, such as means, counts/frequencies, medians, or percentages

    • In some cases, either a bar plot or pie chart can be used (i.e. if we can calculate percentages that sum to 100)

  • Can be vertical or horizontal

  • Can quickly become overwhelming if too many categories

<p><span>to compare values associated with separate (usually categorical) variables</span></p><ul><li><p><span>These can be any sort of numeric variable that we are looking for differences across groups, such as <strong>means, counts/frequencies, medians, or percentages</strong></span></p><ul><li><p><span>In some cases, either a bar plot or pie chart can be used (i.e. if we can calculate percentages that sum to 100)</span></p></li></ul></li><li><p><span>Can be vertical or horizontal</span></p></li><li><p><span>Can quickly become overwhelming if too many categories</span></p></li></ul><p></p>
8
New cards

Error bars

show uncertainty (or confidence) in the average

  • Wider bar = more variation/uncertainty

  • Narrower bar = less uncertainty (or more confidence)

  • *Imagine a distribution on the side of the bar

<p><span>show uncertainty (or confidence) in the average</span></p><ul><li><p><span>Wider bar = more variation/uncertainty</span></p></li><li><p><span>Narrower bar = less uncertainty (or more confidence)</span></p></li><li><p><span>*Imagine a distribution on the side of the bar</span></p></li></ul><p></p>
9
New cards
<p>Good or bad graph?</p>

Good or bad graph?

Bad graph! Too much and redundant info, could use a line plot.

10
New cards
<p>Good or bad graph?</p>

Good or bad graph?

Bad graph! Too much and redundant info

11
New cards

Line graphs

only appropriate when x values are linear

  • Frequently used for time-series data → 

  • Can involve 2 y-axes (with caution)

  • Can have error bars/bands

<p><span>only appropriate when x values are linear</span></p><ul><li><p><span>Frequently used for time-series data →&nbsp;</span></p></li><li><p><span>Can involve 2 y-axes (with caution)</span></p></li><li><p><span>Can have error bars/bands</span></p></li></ul><p></p>
12
New cards

Scatterplots

shows every data point; can help show relationships between variables while looking at all the data

  • They can easily be overwhelming when there’s lots of data

<p><span>shows every data point; can help show relationships between variables while looking at all the data</span></p><ul><li><p><span>They can easily be overwhelming when there’s lots of data</span></p></li></ul><p></p>
13
New cards

Histograms

bar plots specifically that plot frequency (i.e. the number of times a certain value occurs in the data)

  • The x-axis has a variable, and y-axis is always frequency

  • Plot shapes are readability can depend extensively on bin widths

<p><span>bar plots specifically that plot <strong>frequency</strong> (i.e. the number of times a certain value occurs in the data)</span></p><ul><li><p><span>The x-axis has a variable, and y-axis is always frequency</span></p></li><li><p><span>Plot shapes are readability can depend extensively on bin widths</span></p></li></ul><p></p>
14
New cards

Guide for Choosing a graph

  1. Are we looking at percentages of different categories on a single variable that sum to 100? (one variable)

    1. Pie chart

  2. Are we looking at a numeric variable by groups of different categories? (two variables)

    1. Bar graph

  3. Are we looking at a numeric variable over a series of time? (two variables)

    1. Line graph

  4. Are we looking at the relationship between two different numeric variables? (two variables)

    1. Scatterplot

  5. Are we looking at the distribution of a numeric variable? (one variable)

    1. Histogram

15
New cards

When is it okay to truncate axes?

  • it’s okay to truncate axes to the reasonable range for a measure

    • Ex: 95 is a good temp, 99.5 and above indicates fever → when values are so close, it makes sense to truncate to get a sense what the real meaningful difference is

    • it’s okay when you want to show change over time

<ul><li><p>it’s okay to truncate axes to the reasonable range for a measure</p><ul><li><p><span>Ex: 95 is a good temp, 99.5 and above indicates fever → when values are so close, it makes sense to truncate to get a sense what the real meaningful difference is</span></p></li><li><p><span>it’s okay when you want to show change over time</span></p></li></ul></li></ul><p></p>
16
New cards

Basic elements of color for representation

  • Magnitude (low to high; 0-50) = light to dark shade of same color (ex: light red to dark red)

  • Divergent/bidirectional (negative to positive; -15 to 15) = dark shade of color 1 to dark shade of color 2 (ex: dark blue to white to dark red)

  • Categorical = different colors

  • Consider audience and accessibility (ex: colorblind population)

<ul><li><p><span><strong>Magnitude</strong> (low to high; 0-50) = light to dark shade of same color (ex: light red to dark red)</span></p></li><li><p><span><strong>Divergent/bidirectional</strong> (negative to positive; -15 to 15) = dark shade of color 1 to dark shade of color 2 (ex: dark blue to white to dark red)</span></p></li><li><p><span><strong>Categorical</strong> = different colors</span></p></li><li><p><span>Consider audience and accessibility (ex: colorblind population)</span></p></li></ul><p></p>
17
New cards

Aspect ratio

ratio of width to height

  • Widening the x-axis (see x-axis numbers on left panel) makes the slope look steeper, implying a greater increase, and affects our interpretation

18
New cards

How to make a good plot

  • Logical representation

  • Labeled axes

  • No redundancy

  • All plot elements are clearly visible

  • Clean presentation

    • Sufficiently large font

    • Avoid clutter

    • No unnecessary color 

  • Include a title

  • Include a legend (if necessary)