Chapter 2 Visualization of Data

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/25

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

26 Terms

1
New cards

Line Plot

used to plot the relationship or dependence of one variable on another. To plot the relationship between the two variables, we can simply call the plot function.

2
New cards

Bar Chart

used for comparing the quantities of different categories or groups. Values of a category are represented with the help of bars and they can be configured with vertical or horizontal bars, with the length or height of each bar representing the value.

3
New cards

Pie and Donut Charts

used to compare the parts of a whole and are most effective when there are limited components and when text and percentages are included to describe the content. However, they can be difficult to interpret because the human eye has a hard time estimating areas and comparing visual angles.

4
New cards

Histogram Plot

representing the distribution of a continuous variable over a given interval or period of time, is one of the most frequently used data visualization techniques in machine learning. It plots the data by chunking it into intervals called ‘bins’. It is used to inspect the underlying frequency distribution, outliers, skewness, and so on.

5
New cards

Scatter Plot

a two-dimensional plot representing the joint variation of two data items. Each marker (symbols such as dots, squares and plus signs) represents an observation. The marker position indicates the value for each observation. When you assign more than two measures, a scatter plot matrix is produced that is a series of scatter plots displaying every possible pairing of the measures that are assigned to the visualization. Used for examining the relationship, or correlations, between X and Y variables.

6
New cards

Box and Whisker Plot

A binned box plot with whiskers shows the distribution of large data and easily sees outliers.

7
New cards

Word Clouds and Network Diagrams

The variety of big data brings challenges because semi-structured and unstructured data require new visualization techniques. Represents the frequency of a word within a body of text with its relative size in the cloud. This technique is used on unstructured data as a way to display high- or low-frequency words.

8
New cards

Network Diagram

Another visualization technique that can be used for semi-structured or unstructured data. It represent relationships as nodes (individual actors within the network) and ties (relationships between the individuals). They are used in many applications, for example for analysis of social networks or mapping product sales across geographic areas.

9
New cards

Correlation Matrices

allows quick identification of relationships between variables by combining big data and fast response times. A table showing correlation coefficients between variables: Each cell in the table represents the relationship between two variables.

10
New cards

Descriptive Statistics

It describes data. It’s a way to summarize and organize all that data you’ve collected into something more manageable and easy to understand.

11
New cards

Central Tendency

first type of descriptive statistics; mostly represented by the mean, median, or mode.

12
New cards

Mean

the average of data set.

13
New cards

Median

the value of the data point in the middle of the set.

14
New cards

Mode

the value which occurs most frequently.

15
New cards

Frequency

second type of descriptive statistics; it’s a measure of how frequently something happens.

16
New cards

Measure of Position

third type of descriptive statistics; s includes quartile and percentile ranks. Essentially, this type of descriptive statistical analysis helps to describe how different points of data relate to each other. The measure of position is best used to compare the data points to each other.

17
New cards

Variation or Dispersion

fourth and final type of descriptive statistics; most commonly used for determining the range of values that the data encompasses, identifying the maximum and minimum values in a descriptive statistics example.

18
New cards

Data visualization

taking the data you have and converting it into a more visual form.

19
New cards

Stephanie Evergreen and Ann Emery (2014)

they provided a strategy checklist to enhance the user’s experience for data visualization.

20
New cards

The five key ideas when designing visualizations, according to Evergreen and Emery:

(1) supporting text description

(2) arrangement

(3) colors

(4) lines

(5) overall meaning

21
New cards

Supporting Text Description

Adding a text description to support the visualization may help the user. The idea of adding a text description to the visualization is to clarify the graphics.

22
New cards

Arrangement

Improper arrangement of graph elements can confuse readers at best and mislead viewers at worst. The goal of the arrangement is getting the viewer to focus on the substance of the visualization rather than on how the visualization was developed

23
New cards

Colors

Colors are an important part of any visualization. We must think of colors when we apply visualization to statistical analysis. Colors are the visual perceptual properties corresponding to the categories called red, blue, yellow, and others. Based on Evergreen and Emery (2014), colors are used to highlight key patterns. Action colors should guide the viewer to key parts of the display. Less important or supporting information should be in muted colors—mix your color arrangement with white or grey, making it less bright.

24
New cards

Lines

Also an important part of the visualization. Excessive line use—gridlines, border tick marks, and axes can add clutter or noise to a graph, so eliminate them whenever they are not useful for interpreting data.

25
New cards

Overall Meaning

While the meaning of visualization is still a difficult subject to determine, Evergreen and Emery recommend we provide more details in order to help the user to better understand the visualization.

26
New cards