Data Visualization

5.0(1)
studied byStudied by 9 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/52

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

53 Terms

1
New cards

General Rules About Pie Charts

1. appropriate for non technical audiences
2. widely used
3. few categories
4. when data is 100%
5. add labels for percentage
6. Keep it simple

2
New cards

Time

goal is to see the evolution of one or several quantities over time.
- often referred to as time-series data

3
New cards

how to visualize discrete points in time?

- bar graph
- stacked bar graph
- points

4
New cards

Continuous time point

- time series chart
- step chart

5
New cards

why visualize data?

1. Find patterns and see data in context
2. Expand memory
3. Make data accessible to everyone
4. Answer questions (or discover them)5. Make decisions / Persuade others to make decisions

6
New cards

How to se distributions?

- histogram
- continuous density plot
- box plot

7
New cards

Microsoft excel Pros.

Supports processing of data
Compatible with Word and Power Point
Relatively easy to learn
Widely used

8
New cards

Excel Cons

Good for basic visualization - not interactive
Require customization to adhere to design
standards
May not process large dataset (~1GB)

9
New cards

GEOSPATIAL VISUALIZATION TOOLS

ARCGIS
- Built for desktop mapping
- User interface, no coding required
- Used by professional cartographers, graphics departments

- Google, Yahoo, and Microsoft Maps
- Easiest online solution, requires some programming
- Modest maps
- Flash and ActionScript library for tile based maps, coding required
- Other tools such as Tableau and R

10
New cards

PROGRAMMING TOOLS - R

Pros
Free open-source statistical programming language
Built and maintained for statisticians by statisticians.
Capable of both data analysis and data graphics
Libraries used for visualization in R: Graphics, ggplot2, car, lattice, ndtv, plotly
Can write your own functions and packages to make graphics the way you want
Cons
Default chart outputs require design refinements: Lack of titles, Undefined scales for axes
Use R to create graphs and edit and refine using design software: Adobe Illustrator, Inkmap
R is great for exploratory data visualization (analysis) but may not be the best tool for explanatory data visualization (presenting results and storytelling)

11
New cards

PROGRAMMING TOOLS - PYTHON

-pros
Can handle large amounts of data without crashing.
Useful for analyses and heavy computation
Clean and easy to read syntax
Some of pythons data visualization libraries: Matplotlib, seaborn, geoplotlib, ggplot

-cons
Great starting point for data exploration, not very good aesthetically

12
New cards

PROGRAMMING TOOLS - JAVASCRIPT

PROS
- web-based scripting language
- some javascript libraries - d3, rcharts, highcharts, charts.js, dimple.js
- freely available and allow users to create sophisticated web-based visualizations

CONS
- steep learning curve
- require skills in working with HTML and JSON

13
New cards

CHECKLIST FOR DATA VISUALIZATION TOOLS

- preparing data before visualization
- integration
- ease of use
- ease of collaboration
- visualization types
- communication
- performance
- privacy
- price

14
New cards

categorical data

Data that consists of names, labels, or other nonnumerical values

15
New cards

ordinal data

data exists in categories that are ordered but differences cannot be determined or they are meaningless. (Example: 1st, 2nd, 3rd)

16
New cards

Quantitative data collection

Measurable, using only factual content

17
New cards

Exploratory

testing a hypothesis (visual confirmation) and mining for patterns,trends, and anomalies (visual exploration)

18
New cards

Explanatory

usually simple everyday visualizations —line charts, bar charts, pies, andscatter plots conveying a single message

19
New cards

NINE VISUAL CUES

position, length, angle, direction, shapes, area, volume, color saturation, color hue

20
New cards

VISUAL CUES - POSITION

-commonly used on scatter plots
-you compare values bases on where other are placed in the coordinate system
-easy to notice outliers and clustering

21
New cards

VISUAL CUES - LENGTH

- Commonly used on bar charts
- Length of bars in bar graph provides visual cues
- The longer the bar, the longer the absolute value

- Start the axis at zero as people visually compare the distance from 0 to the end of the bar

22
New cards

VISUAL CUES - ANGLES

- Commonly used for pie charts
- Commonly used to represents parts of a whole
- Donut charts do not use angles since the center of the circle is cut out -arc lengths are used as visual cue

23
New cards

VISUAL CUES - DIRECTION

-Commonly noticed in line graphs
-Direction provides one basic visual cue
-Direction helps with noticing trends
-Slope be used to signal sharp/drastic changes in direction

24
New cards

VISUAL CUES - AREA AND VOLUME

bigger object represent greater value

25
New cards

VISUAL CUES - SHAPES

-shapes can be used to denote categories and objects
-visually shapes are readily recognized

26
New cards

VISUAL CUES - COLOR

Hue - refers to the different
colors
Saturation - refers to the
density of a given color e.g
gradients
Color can be used to show
categories
Color can be used to
highlight certain aspects of
your data visualization

27
New cards

steps to preparing data

Structuring data
Cleaning data
Aggregating data from different
sources
Validating data

28
New cards

What is special about a bar chart?

- intuitive
- appropriate for non-technical audience
- useful to visualize discrete data
- start axis at zero

29
New cards

wedge

each portion of the pie represents a category of value.

30
New cards

what are the kinds of coordinates?

cartesian, Polar, Geographic,

31
New cards

Types

Linear, Logarithmic, categorical, ordinal, percent, time

32
New cards

what is a histogram

- encodes data using height as the visual cue
- density is on the vertical axis
- horizontal axis has values

33
New cards

Histogram bin size

- bin size changes by dataset
- you want the bin sizes big enough so that you see variability in the data
- not so small that the histogram is too noisy to interpret

34
New cards

Continuous density plot

Like a histogram butis continuous insteadof bins

35
New cards

what are the three types of distributions?

symmetric distribution, left skewed distribution, right skewed distribution

36
New cards

Box Plot

Shows range, median and quartiles of the
data
Uses position and height/length visual cues
You can use multiple box plots to compare
distributions
Less specific than histograms or density
plots

37
New cards

Union

Merging data that is spread acorss several files, sheets, or tables

38
New cards

what are the kinds of unions?

- inner join
- Left Join
- Right Join
- Full outer Join

39
New cards

What does the pill color indicate?

continuous or discrete

40
New cards

what color are discrete pills?

blue

41
New cards

what color are continuous pills?

green

42
New cards

How do dimensions come out?

Dimensions come out onto the view as themselves

43
New cards

How do measures come out?

Measures come out onto the view as aggregates

44
New cards

What is a scatter plot used for?

Often used to visualize the
relationship between two
variables
Scatter plots use position as
the visual cue
Each dot has X- ad Y-
coordinates that match the
axes

45
New cards

correlation

means one thing tends to change a certain way as another thing changes.

46
New cards

Direction

positive or negative correlation

47
New cards

magnitude

strong or weak correlation

48
New cards

coefficient of correlation

quantifies how tightly coupledthe values of two variables are with respect to each other

49
New cards

Scatterplot Matrix

Scatter plot matrix is useful
to see relationships among
multiple variables
Allows comparison across
multiple dimensions
Plot every variable pair and
look for correlation

50
New cards

bubble chart

Allows you to compare 3
variables at once: x variable, y
variable, and area variable
Bubble should be sized based
on area not radius, diameter
or circumference
Example - Hans Rosling's Tool

51
New cards

symbol maps

Specific geographic locations are marked with circles, squares, or custom shapes
Form, size or color of these marks can vary according to a measure or dimension

52
New cards

choropleth maps

Geographic areas are shaded according to a measure or dimension

53
New cards

density maps

Areas of relative concentration are colored intensely, while those with sparse occurrences of the dimensionare colored lightly