IDSC 4210 Midterm Review

5.0(1)
studied byStudied by 3 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/87

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

88 Terms

1
New cards

One goal of data visualization

reduce the “time to insight”

2
New cards

For Data Analysts, Visualization helps you in..

  • Exploring data structure

  • Detecting Outliers and unusual Groups

  • Identifying trends and clusters

  • Presenting your results

3
New cards

Data Visualization allows you to

  • Simplify complex information (even from a large datasets)

  • Enhance Decision Making Process

4
New cards

Steps in Creating Data Visualization

  1. What is your point: identify business objective (hypothesis)

  2. How can you emphasize your point in your graph: identify data types, then choose graph type (proof)

  3. What does the final graph show exactly: aesthetic mapping (color, shape, size) (explaining)

5
New cards

What is the 15 second rule?

Data visualizations should convey meaning in 15 seconds or less.

6
New cards

How quickly is a first impression made

7 seconds or less

7
New cards

select()

select variable

8
New cards

filter()

select rows

9
New cards

arrange()

order rows by variable

10
New cards

mutate()

create new variable

11
New cards

summarize()

aggregate the data

12
New cards

group_by()

group rows by variable

13
New cards

starts_with()

select based on prefix

14
New cards

ends_with()

select based on suffix

15
New cards

num_range()

select based on prefix & numeric range

16
New cards

contains()

matches a string within the variable

17
New cards

matches()

more general matching using regular expressions

18
New cards

one_of()

selects columns from a group of names

19
New cards

Filter(data, logical test)

==, >=, <=, !=(not equal too)

20
New cards

%>%

pipes: used to chain commands

21
New cards

desc()

sort variable in descending order

22
New cards

Pre-attentive attributes

generally the best way to present data because we can see these patterns without thinking or processing

23
New cards

What do pre-attentive attributes do?

  • Draw Users Attention to the intended place

  • create visual hierarchy

<ul><li><p>Draw Users Attention to the intended place </p></li><li><p>create visual hierarchy </p></li></ul><p></p>
24
New cards

Pre-attentive attributes for bar charts

Form (length, width), color

25
New cards

Pre attentive attributes for line Graphs

Orientation, length, curvature, color, shape

26
New cards

Scatter Plots

2D position, color, form (shape, size)

27
New cards

What is the most important color consideration in data visualization?

Contrast

28
New cards

Color Best Practice

Limit number of colors

29
New cards

What is the 30, 60, 90 rule

 use 60% of a neutral color, 30% of a supplementary color, 10% of a color that pops 

30
New cards

Color in Mobile apps?

you can just use one single primary color with different brightness and saturations

31
New cards

Sequential Color in data visualization

data is ordered from high to low

32
New cards

Diverging Color

two sequential colors with a neutral midpoint

33
New cards

Categorical Color

contrasting colors for individual comparison

34
New cards

Highlight Color

color used to highlight something

35
New cards

Alert Color

Color used to get readers attention

36
New cards

Tufte’s 3 design principles

  1. Graphical integrity: graph must be honest portrayal of data

  2. Maximize Data to Ink Ratio

  3. Chart Junk: all elements must be necessary

37
New cards

4 data types

  • Nominal/Categorical Data

  • Ordinal Data

  • Interval Ratio

  • Ratio Data

38
New cards

Nominal (categorical) Data

  • Data placed in categories according to a specified characteristic 

  • Categories bear no quantitative relationship to one another 

39
New cards

Ordinal (data)

  • Data that is ranked or ordered according to some relationship with one another 

  • No fixed units of measurement (the athlete that wins gold could be way better than the silver but in terms of rank it won’t matter) 

40
New cards

Interval Data

  • Ordinal data but with constant differences between observations 

  • No true zero point - 0 degrees does not mean there is an absence in temperature 

  • Ratios are not meaningful

41
New cards

Ratio Data

  • Continuous vales and have a natural, meaningful zero point

  • Ratios are meaningful

42
New cards

In tableau what data types are dimensions

Nominal (categorical) and Ordinal

43
New cards

In tableau which data would be considered a measure?

Interval and Ratio Data

44
New cards

What are line graphs used to convey

TREND DATA

y is a measure, x is a time series (dimmension)

45
New cards

what are scatterplots used to convey

CORRELATIONS

x and y are measures

46
New cards

What are bar graphs used to convey?

MEASURES ACROSS DIMENSIONS

y is measure

x is a dimension

Histogram- distribution of measure

47
New cards

aes()

Aesthetic attributes

48
New cards

geoms()

Geometric Objects

49
New cards

stats()

statistical transofrmation

50
New cards

coord()

coordinates

51
New cards

facet()

facets

52
New cards

label()

with labels

53
New cards

Design Thinking

a philosophy and toolkit that helps you solve
problems through a creative, and human-centered lens.

54
New cards

What is Step 1 in design thinking?

Empathize: Understand people/organizations, within the given context.
It is your effort to understand the way they do things

55
New cards

What is step 2 in design thinking?

Define: Define the problem you are taking on, based on what you have
learned about your user and about the context in Step 1

56
New cards

What is step 3 in design thinking?

Ideate: Suggest ideas that can help organizations to find solution/insight
(Hypotheses Generation).

57
New cards

What is step 4 in design thinking?

Prototype: Generate artifacts that can provide answer/insight to solve the
problem

58
New cards

What is step 5 in design thinking?

Test: Refine the prototype.

59
New cards

How do you apply design thinking to data visualization?

knowt flashcard image
60
New cards

alpha()

a solution to overplotting, adds transparency to the data points, helping visualize overlapping data

<p>a solution to overplotting, adds transparency to the data points, helping visualize overlapping data</p>
61
New cards

What does this code return?
prvfc + geom_point(alpha=.6, aes(color=QTY))

knowt flashcard image
62
New cards

What does the following code return? prvfc + geom_point(alpha=.6,
aes(color=QTY))+scale_color_distiller(palette = "Spectral", name =
"#Tickets")


knowt flashcard image
63
New cards

What does this code return? prvfc + geom_point(alpha=.3,
aes(size=QTY))+scale_size(breaks=c(1,5,15))

knowt flashcard image
64
New cards

what does this code return? tpg+geom_bar(stat="summary",fun="mean",aes(fill=Game),position="dodge"
)+scale_fill_manual(values=c(Premium="Green",Regular="Orange",Value="R
ed"))+theme(axis.text.x = element_text(angle=45,hjust=1,size=6))


knowt flashcard image
65
New cards
66
New cards

What does this Code return? prvfc + geom_point(alpha=.6,
aes(color=QTY))+scale_color_gradient(low="Pink",high="Purple")+
facet_grid(~Game)


knowt flashcard image
67
New cards

What does this code return:pg+geom_bar(stat="summary",fun="mean",aes(fill=Game),position="dodge")+t
heme_bw()+coord_flip()


knowt flashcard image
68
New cards

How many categories can you compare in a bar graph?

No more than 7 categories

69
New cards

How many categories can a horizontal bar chart have?

7-15 categories sorting ascendingly.

70
New cards

What is a correlation?

a statistical measure that indicates the
strength and direction of the relationship between two
quantitative variables, ranging from -1 to 1

71
New cards

What does a correlation of zero mean

no correlation, random assortment of points

72
New cards

what does a correlation of 1 or -1 mean?

perfect correlation all points on a single line

73
New cards

what is the magnitude of slope of a line?

slope coefficient y= mx+b

74
New cards

R-squared (coefficient of determination)

  • % of the variability in dependent variables Y explained by the independent variable X 

  • In essence, it shows how well X explains the variation in Y 

  • If R^2 = 0.75, 75% of the variation in Y can be explained by X 

  • The remaining 25% is due to other factors 

75
New cards

Tricks for over plotting

• First and foremost, check the stats:
correlation and regression
– This tells us about the relationship
• Take a sub sample
• Gradients (use alpha = )
• No fill
• Jittering
• Faceting


76
New cards

What will this code return?
aialpha<- exp %>% ggplot(aes(x=age, y=Income))+geom_point(alpha=.3)
aialpha

gradient

<p>gradient </p>
77
New cards

What does this code return? expsample <- sample_n(exp,1000)
aisam <- expsample %>% ggplot(aes(x=age, y=Income))+geom_point()+geom_smo
()
One trick to overcoming over-plotting is
sampling: quick and easy!!!


Sampling

<p>Sampling </p>
78
New cards

What does this code return:aismall<- exp %>% ggplot(aes(x=age, y=Income)) +
geom_point(shape=21)


no fill (often the best choice)

<p>no fill (often the best choice) </p>
79
New cards

What does this code return: aijitter<- exp %>% ggplot(aes(x=age,
y=Income))+geom_jitter(height=8,width=8,alpha=.3)


Jittering

<p>Jittering </p>
80
New cards

What does this code return: gexpf<-mutate(exp,sex=ifelse(gender==0,"F","M"))
aig <- gexpf %>% ggplot(aes(x=age, y=Income))+
geom_point(alpha=.3)+facet_grid(.~sex)


Matrix

<p>Matrix</p>
81
New cards

What part to whole pie chart is generally not ever the best choice?

Pie Chart

  1. Hard perceive area

  2. Hard to show multiple data points

  3. Hard to perceive angles

82
New cards

When there are 2 or mare categories and we want to show parts-to-whole what graph should we consider?

Stacked bar chart

83
New cards

If you want to use a stacked column chart with time intervals what do you have to do.

Make sure dates have the same time interval

<p>Make sure dates have the same time interval</p>
84
New cards

What does this code return: ggplot(tk, aes(x = reorder(Opponent, Opponent, function(x) -length(x)),
fill = Reseller)) + geom_bar(position = "stack") + labs(title = "Ticket Sales by
Opponent and Reseller", x = "Opponent", y = "Number of Transactions", fill =
"Reseller")

knowt flashcard image
85
New cards

What does this return:orprice<-tk %>% ggplot(aes(x=reorder(Opponent,Price),y=Price)) + geom_bar(stat="summar
()

knowt flashcard image
86
New cards

When looking at a time series what visualizations should we consider?

Line graph, bar graph, dot graph

87
New cards

When do you use dot plans?

when you have irregular intervals of time,
and/or when connection of dots is inappropriate

88
New cards