CP321 Knowledge Flashcards

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/156

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 12:53 AM on 4/17/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

157 Terms

1
New cards

3 Types of Information Visualization Models

Theoretical Models - explain underlying principles of how people understand visual data (pattern, color) “WHY”
Descriptive Models - create taxonomies (lists/classes) of chart types, data types “WHAT”
Prescriptive Models - how to design and evaluate a visualization “HOW”

2
New cards

CLT (Cognitive Load Theory) in Data Visualization (Theoretical Models)

CLT says humans are bad at remembering things, short term memory so:
- Present information in comprehensible manner
- Respect limitations and make sure of the affordances suggested by CLT about peoples ability to accept information

3
New cards

Multimedia Principle (Theoretical Models)

People learn better from words and pictures than from words alone

4
New cards

Split attention principle (Theoretical Models)

it is important to avoid formats that require learners to split their attention between, and mentally integrate, multiple sources of information

5
New cards

Modality Principle (Theoretical Models)

People learn better from graphics and narration(audio) than from graphics and text

6
New cards

Redundancy Principle (Theoretical Models)

Do not repeat the same information

7
New cards

Spatial Contiguity Principle (Theoretical Models)

People learn more deeply from a multimedia message when corresponding words and pictures are presented near rather than far from each other on the page or screen

8
New cards

Temporal Contiguity Principle (Theoretical Models)

People learn better when corresponding words and pictures are presented simultaneously rather than successively

9
New cards

Coherence Principle (Theoretical Models)

People learn more deeply from a multimedia message when extraneous material is excluded than included

10
New cards

Signaling Principle (Theoretical Models)

People learn more deeply from a multimedia message when cues are added that highlight the critical aspects of the presented information (headings, highlighting)

11
New cards

Personalization Principle (Theoretical Models)

People learn more deeply when the words in a multimedia presentation are in conversational style rather than formal style.

12
New cards

Pre-training Principle (Theoretical Models)

People learn more deeply from a multimedia message when they know the names and characteristics of the main

13
New cards

Abstraction

Removes unnecessary details based on the task

14
New cards

ASSERT Model (Prescriptive Models)

Ask → Search → Structure → Envision → Represent → Tell

15
New cards
<p>Which model would you trust more? Why?</p>

Which model would you trust more? Why?

We pick simple model
Based on CLT, our working memory is limited

Too many elements at once:
Too many colours → Higher cognitive load
Too many charts → Split attention

16
New cards
<p>Which map would you use to drive?</p>

Which map would you use to drive?

Map 1

Abstraction removes details based on the task

17
New cards

Good visualizations are driven by ______

good questions

18
New cards

Turning Topics into Questions (Ask ASSERT)

Questions that offer more complex answers than a simple yes or no tend to be more interesting pursuits

ex: “Was slavery the cause of the Civil War?” → “What degree of economic advantage did slavery offer the South to maintain it?”

19
New cards

Strategies for methodically generating and assessing the quality of questions (Ask ASSERT)

The three-part query
KWL

20
New cards

Three-part query (Ask ASSERT)

identifying topic, questions, and reason to care

We are looking at [topic] because we want to find [who/where/when/why/what/how] in order for my audience to understand [signifigance/reason]

21
New cards

KWL (Ask ASSERT)

What we already KNOW
What we WANT to learn
What we have LEARNED

22
New cards

CPS (Creative Problem-Solving) (Ask ASSERT)

Generate ideas (options) then systematically evaluate the options

23
New cards

CPS tools

Brainstorming, Group discussion, Concept map

24
New cards

Types of Raw Data

Numbers (populations, rainfall amounts)
Documents (court records, birth and death records)
Multimedia (photos, drawings, audio)

25
New cards

Types of Data Sources

Primary Sources: Taken directly from event, such as first-hand witness
Secondary sources: books and resources that aggregate info from primary sources
Tertiary resources: encyclopedias pull primary and secondary sources to provide broad but shallow overview

<p>Primary Sources: Taken directly from event, such as first-hand witness<br>Secondary sources: books and resources that aggregate info from primary sources<br>Tertiary resources: encyclopedias pull primary and secondary sources to provide broad but shallow overview</p>
26
New cards

Effective Data

knowt flashcard image
27
New cards

Types of Data

Quantitative Continuous → Continous Scale (1.3, 5.7, 83)
Quantitative Discrete → Discrete Scale (1,2,3,4) (Fixed Units)
Qualitative Unordered → Discrete Scale (dog, cat, fish)
Qualitative Unordered → Discrete Scale (Good, Fair, Poor)
Date or time → Continous or Discrete
Text → None or Discrete

28
New cards

Quantitative vs Discrete

Continuous = can take infinitely many values within a range

Discrete = countable, whole numbers only

(0.5, 1.0, 1.5 could be discrete if no intermediate values can exist)

29
New cards

Which of the following variables is quantitative and continuous?

A. Number of students in a class

B. Temperature measured in Celsius

C. Types of animals (dog, cat, fish)

D. Satisfaction level (good, fair, poor)

B. Temperature measured in Celsius

A is Quantitative but discrete (you can’t have 25.5 students)

30
New cards

Which variable is quantitative but discrete?

A. Height of a person

B. Time of day

C. Number of emails received per day

D. Color of a car

C. Number of emails received per day

31
New cards

Which example represents qualitative categorical unordered data?

A. Rankings (1st, 2nd, 3rd)

B. Exam scores

C. Weather type (sunny, rainy, snowy)

D. Dates on a calendar

C. Weather type (sunny, rainy, snowy)

32
New cards

Which option correctly classifies the variables below?

• Number of website visits per day

• Satisfaction level (poor, fair, good)

• Exact time of a transaction

A. Discrete – Unordered – Discrete

B. Discrete – Ordered – Continuous

C. Continuous – Ordered – Discrete

D. Discrete – Continuous – Ordered

B. Discrete – Ordered – Continuous

33
New cards

A Scale must be one-to-one

Each data value → one unique visual

No overlaps or confusion

1 → circle

2 → square

3 → diamond

<p>Each data value → one unique visual</p><p>No overlaps or confusion<br><br>1 → circle</p><p>2 → square</p><p>3 → diamond</p><p></p>
34
New cards

What if we want to visualize highly skewed data?

Nonlinear axes: even spacing in data units corresponds to uneven spacing in visualization

We use different scales/graphs

35
New cards

log-scale

Issue: Doesn’t allow 0

<p>Issue: Doesn’t allow 0</p>
36
New cards

square-root scale

Also compresses larger numbers into smaller range
Allows presence of 0

<p>Also compresses larger numbers into smaller range<br>Allows presence of 0</p>
37
New cards

Polar Coordinate System

Pole (Center Point, usually 0,0)
Radius (Distance from pole)
Polar Angle (angle, in deg or rad)

<p>Pole (Center Point, usually 0,0)<br>Radius (Distance from pole)<br>Polar Angle (angle, in deg or rad)</p>
38
New cards

Geospatial Data

Maybe a good question: show a map, which map type is this?

<p>Maybe a good question: show a map, which map type is this?</p>
39
New cards

Most accurate way to represent numerical data?

Position (Coordinate Systems)

40
New cards

Shape and line type are limited to ___ data, while color and size can represent both ______

Shape and line type are limited to discrete data, while color and size can represent both discrete and continuous data

41
New cards

Three Fundamental Use Cases for Colour in Data Visualizations

  1. Distinguish groups of data from eachother

  2. Represent data values

  3. Highlight

42
New cards

3 Types of Color Scales

Qualitative (countries on map)
Sequential (data, heat on map)
Diverging (data, percent identifying as white) (brown → blue)
Accent (Highlight specific data)

43
New cards

Colour as a tool to distinguish

Use color to distinguish discrete items or groups that don’t have order (countries on map)
- Qualitative Color Scale, finite set, should look clearly distinct
- No color should stand out relative to others

<p>Use color to distinguish discrete items or groups that don’t have order (countries on map)<br>- <strong>Qualitative Color Scale</strong>, finite set, should look clearly distinct<br>- No color should stand out relative to others</p>
44
New cards

Color to represent data values

Use color to represent data values, such as income, temp, or speed
Sequential Color Scale should indicate:
- Which values are larger or smaller than others
- How distant two specific values are from each other

<p>Use color to represent data values, such as income, temp, or speed<br><strong>Sequential Color Scale</strong> should indicate:<br>- Which values are larger or smaller than others<br>- How distant two specific values are from each other</p>
45
New cards

Diverging Colour Scale

Same category as Sequential

<p>Same category as Sequential</p>
46
New cards

Color as a tool to highlight

Accent Color Scales
Highlight specific values or categories in dataset

Contain set of subdued colors and a matching set of stronger darker, more saturated colors

<p><strong>Accent Color Scales</strong><br>Highlight specific values or categories in dataset<br><br>Contain set of subdued colors and a matching set of stronger darker, more saturated colors</p>
47
New cards
<p>Which chart better highlights one or two teams while still allowing easy comparison across all teams?<br><br>A. The left chart, because assigning a unique color to each bar helps viewers remember all teams equally.</p><p>B. The left chart, because more colors always improve clarity and engagement.</p><p>C. The right chart, because using mostly neutral colors with selective highlighting creates a clear visual hierarchy.</p><p>D. The right chart, because color should never be used in bar charts.</p>

Which chart better highlights one or two teams while still allowing easy comparison across all teams?

A. The left chart, because assigning a unique color to each bar helps viewers remember all teams equally.

B. The left chart, because more colors always improve clarity and engagement.

C. The right chart, because using mostly neutral colors with selective highlighting creates a clear visual hierarchy.

D. The right chart, because color should never be used in bar charts.

C. The right chart, because using mostly neutral colors with selective highlighting creates a clear visual hierarchy.

48
New cards
term image

????

49
New cards

Color Scale Mapping

Categories → Qualitative
Values → Sequential or Diverging
Emphasis → Accent Colours

50
New cards

Unstructured Text → Structured Data

knowt flashcard image
51
New cards

Domain Space Abstraction

Data needs to be abstracted from the domain space to quantitative and qualitative info

52
New cards

Simple Random Sampling

every member of the population has an equal chance of being selected.

<p>every member of the population has an equal chance of being selected.</p>
53
New cards

Systemic/Interval Sampling

Every member has an index, pick every x persons, has to be same interval

<p>Every member has an index, pick every x persons, has to be same interval</p>
54
New cards

Stratified Sample

divide the population into strata (subgroups), calculate how many people should be sampled from each subgroup

Use random or systematic to pick sample from each subgroup

<p><span>divide the population into strata (subgroups), calculate how many people should be sampled from each subgroup</span><br><br><span>Use random or systematic to pick sample from each subgroup</span></p>
55
New cards

Cluster Sampling

Cluster into subgroups, randomly select entire clusters

56
New cards

Data Profiling

summary statistics on data before cleaning, to get a good idea of quality of data (part of data cleaning process)

57
New cards

Data Cleaning actions (TSTMO)

Type conversion
Standarize
Transformation
Missing values: (1) drop (2) Flag
Outliers: innocent until proven guilty

58
New cards

Join Types

Inner Join
Left Join
Right Join
Full Join

<p>Inner Join<br>Left Join<br>Right Join<br>Full Join</p>
59
New cards

Dataframe

A python object in Pandas, essentially a table

<p>A python object in Pandas, essentially a table</p>
60
New cards

Dataframe basic functions

knowt flashcard image
61
New cards

Bar Chart/Plot

knowt flashcard image
62
New cards

Problem comonly encountered with vertical bar charts

Labels for each bar take up a lot of horizontal space

We want the bars to run horizontally

<p>Labels for each bar take up a lot of horizontal space<br><br>We want the bars to run horizontally</p>
63
New cards

Natural Ordering

knowt flashcard image
64
New cards

Grouped Bar Plot

For two categorical variables at the same time

<p>For two categorical variables at the same time</p>
65
New cards

Stacked Bar Plot

For when we’re interested in sum of stacked bars in meaningful

<p>For when we’re interested in sum of stacked bars in meaningful</p>
66
New cards

Dot Plot

knowt flashcard image
67
New cards

Heatmap

knowt flashcard image
68
New cards

Title, Captions, Annotations

Title → Only one title per graph
Caption → Brief desc that appears next to image and credits the source
Annotations → Axis lables, legend titles

69
New cards

Histogram

Displays shape and spread of continous sample data, depend on bin width

<p>Displays shape and spread of continous sample data, depend on bin width</p>
70
New cards

Histogram vs BarChart

BarChart: bars are positioned over a label that represents a categorical variable, has gaps, categories
Histogram: Height of bars represent observed frequencies, no gaps, number range

<p>BarChart: bars are positioned over a label that represents a categorical variable, has gaps, categories<br>Histogram: Height of bars represent observed frequencies, no gaps, number range</p>
71
New cards

Kernel Density Estimation

Estimates a smooth distribuition, bandwidth matters more than kernel type

72
New cards

Density Plot

depends on bandwidth

<p>depends on bandwidth</p>
73
New cards

Gaussian Kernel Bandwith Meaning

Higher bandwidth = smoother curve

<p>Higher bandwidth = smoother curve</p>
74
New cards

ECDF Empirical cumulative distribution function

??

75
New cards

Q-Q Plots (Quantile-Quantile)

knowt flashcard image
76
New cards

Boxplot

knowt flashcard image
77
New cards

Boxplot Key

knowt flashcard image
78
New cards

Violin Plots

knowt flashcard image
79
New cards

Violin Plot Key

knowt flashcard image
80
New cards

Strip Chart

Plot all individual data points of the variable directly

<p>Plot all individual data points of the variable directly</p>
81
New cards

Jittering

Something you can do with Stripcharts to ensure you don’t overplot (plot points on top of eachother)

<p>Something you can do with Stripcharts to ensure you don’t overplot (plot points on top of eachother)</p>
82
New cards

Sina plot

Hybrid between violin plot and jittered points

<p>Hybrid between violin plot and jittered points</p>
83
New cards

Ridgeline Plot

Visualizing multiple distributions along the horizontal axis

<p>Visualizing multiple distributions along the horizontal axis</p>
84
New cards

Reproducible vs Repeatable

Reproducible: if the overarching finding of the work will remain the same if a different group does it
Repeatable: if very similar or identical measurements can be obtained by the same person repeating the same procedure on same equipment

85
New cards

A visualization is reproducible if…

if the plotted data are available and any data transformations that may have been applied are exactly specified.

86
New cards

A visualization is repeatable…

if it is possible to recreate the exact same visual appearance, down to the last pixel, from the raw data.

87
New cards

Two distinct phases of data visualization

Data exploration → how we want to visualize, transformations, type of plot
Data presentation → prepare actual figure, use software, reproducing and repeating

88
New cards

Seperation of content and design

Content → specific dataset, data transformations, mappings, scales, ranges
Design → foreground background colours, fonts, shapes, placements

89
New cards

What is the main advantage of visualizing full distributions instead of only means with error bars?
A. It reduces visual clutter

B. It eliminates the need for grouping variables

C. It reveals shape, spread, and potential multimodality

D. It guarantees statistical significance

C. It reveals shape, spread, and potential multimodality

90
New cards

Why can jittering improve a strip chart but also introduce risk if overused?
A. It modifies the underlying data values

B. It can distort perceived density patterns

C. It hides the grouping variable

D. It removes outliers

B. It can distort perceived density patterns

if you overdo it, points get pushed too far apart and can make areas look more or less dense than they really are

91
New cards
<p>The two plots below show exam score distributions for three classes (A, B, and C). Which plot is more helpful for identifying which class has the highest median exam score, and why?<br><br>A) Plot 1, because it shows every individual data point very clearly.</p><p>B) Plot 2, because it clearly summarizes the center and spread of the distributions.</p>

The two plots below show exam score distributions for three classes (A, B, and C). Which plot is more helpful for identifying which class has the highest median exam score, and why?

A) Plot 1, because it shows every individual data point very clearly.

B) Plot 2, because it clearly summarizes the center and spread of the distributions.

B) Plot 2, because it clearly summarizes the center and spread of the distributions.

92
New cards

Nested Pi Chart

From broader categories on the inside, to more specific on outside

<p>From broader categories on the inside, to more specific on outside</p>
93
New cards

Mosaic Plot

knowt flashcard image
94
New cards

Treemap

Type of mosaic chart

<p>Type of mosaic chart</p>
95
New cards

Parallel Sets

knowt flashcard image
96
New cards

Proportional ink

Visual size MUST match the data

97
New cards

Nested Data can be shown using (4)

Mosiac Plots
Treemaps
Nested Pies
Parallel sets

98
New cards

Side-by-side bar charts are best for

accurate comparison of proportions

99
New cards

Pie charts and stacked bars show composition but are weak for _____

precise comparisons

100
New cards

Visualizing Associations

How quantitative Variables relate to each other
2 Variables (ex: hieght and weight) → scatter plot
>2 Variables → bubble chart, scatter plot matrix, correlogram