Information Visualization Final Exam Review

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/43

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

44 Terms

New cards

Machine Learning

The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and draw inferences from patterns in data.

New cards

Types of Machine Learning

Supervised learning
Unsupervised learning
Reinforcement learning
Semi-supervised learning

New cards

Supervised learning

Labeled data for training to predict.
Training a model using a labeled dataset to predict until the algorithm is able to accurately predict new, unseen data.
The algorithm is given a set of labeled data with correct answers until the algorithm is able to predict.
Imagine a teacher is supervising the learning process of a student
- The labeled training dataset: The teacher
- The machine: The student
- Learning is done repeatedly/iteratively
Areas of specialization
- Image and speech recognition
- Fraud detection
- Recommendation system
- Medical diagnostics
- Weather forecasting
Algorithms
- K-nearest neighbors (KNN)
- Logistic Regression
- Decision TreesSupport Vector Machine (SVM)
- Random Forest

New cards

Unsupervised learning

A type of machine learning where unlabeled data are handled.
Tries to learn the pattern and structure of data on its own.
Areas of specialization
- Clustering similar data items together
  - Types of clustering
    - Partitioning
    - Hierarchical
    - Density-based methods
- Finding meaningful groups with a given dataset
- Large dataset, where it would be time-consuming and expensive if done manually
- Identifying hidden relationships that may not be immediately obvious
- Segment customers
- Creation of new patterns

Algorithms
- K-mean clustering
  - Finds the optimum number of clusters in the data set.
  - Involves assigning each data point its cluster based on its mean
  - Cluster based on the mean of its nearest neighbor
  - Repeat until there are no more clusters to create
  - Advantages
    - Computationally efficient

New cards

Reinforcement learning

The agent learns by interacting with environmental data.
The agent performs certain actions and then observes the rewards or consequences.
- Learns from mistakes.
  - Trial and error
The difference here is there is no correct answer to mimic.
Areas of specialization
- Simulations
- Statistical analysis

New cards

Semi-supervised learning

Combines the benefits of supervised and unsupervised learning.
Used when obtaining a fully labeled dataset is time-consuming and/or expensive.

New cards

Machine learning using Python

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

# Load iris dataset

iris = load_iris()

# Split dataset into train and test

X_train, X_test, y_train, y_test = train_test_split( iris['data'], iris['target'], random_state = 0)

# Initialize your classifier

knn = KNeighborsClassifier(n_neighbors = 1)

# Fit the model

knn.fit(X_train, y_train)

# Make a prediction

prediction = knn.predict([[5, 2.9, 1, 0.2]])

print("Prediction: ", prediction)

New cards

Perception

The ability to capture, process, and actively make sense of information that’s being received by our senses.
Cognitive process that makes you interpret your surroundings.

New cards

Data Visualization

Abstract- Describes information that is not physical
Graphical display of abstract information.
- Making sense/sense-making = data analysis
- Communication
Powerful tool to discover, analyze, understand, and present your stories.
The goal is to translate abstract information into visual representations that can be easily, efficiently, accurately, and meaningfully decoded.
Statistical information is abstract
Can display relationships between non quantitative values with nodes.

New cards

Cognition

In an era where technology is rapidly reshaping the way we interact with the world, understanding the intricacies of AI is not just a skill, but a necessity for designers.

New cards

Pictures for the eyes and mind

Data visualization is only successful to the degree that it encodes information in a manner that our eyes can discern and our brains can understand.
Consider a case when you need to help people understand the primary causes of death in America.
- To achieve this goal, the display should achieve the following:
  - Clearly indicates how the values relate to one another, which in this case is a part-to-whole relationship - the number of deaths per cause, when summed, equals all deaths during the year.
  - Represents the quantities accurately.
  - Makes it easy to compare the quantities.
  - Makes it easy to see the ranked order of values, such as from the leading cause of death to the least.
  - Makes obvious how people should use the information - what they should use it to accomplish - and encourages them to do this.
- The traditional way to display this information graphically involves a pie chart

New cards

Gestalt principles of perception

Proximity
Similarity
Enclosure
Closure
Continuity
Connection

New cards

Information Visualization

Information: Processed data
Visualization: Images or graphics to communicate information
- AKA Graphics Visualization, is any technique for creating images, diagrams, or animations to communicate a message. Visualization through visual imagery has been an effective way to communicate both abstract and concrete ideas since the dawn of humanity.
Input: Data
- Data: Unprocessed information
Output: Information
Information visualization
- The practice of representing data in a meaningful, visual way that users can interpret and easily comprehend. This includes data visualizations and dashboards. Information visualization is an effective way to share insights in a digestible format for non-experts.

New cards

Advantages of Information/Data Visualization

Eyes are drawn to colors and patterns.
Our culture is visual, including everything from art and advertisements to TV and movies.
A chart allows us to quickly see trends/patterns and outliers.
If we can see something, we internalize it quickly.
Storytelling with a purpose.
Helps keep interest in the subject.
More effective than a simple spreadsheet.
The easy sharing of information.
Interactively explore opportunities.
Helps with decision-making. (Data-driven decisions)
Visualize patterns and relationships.

New cards

Disadvantages of Information/Data Visualization

Sometimes data can be misrepresented or misinterpreted when placed in the wrong style of data visualization.
When viewing a visualization with many different data points, it’s easy to make an inaccurate assumption.
Visualizations can be designed wrong, making them biased and confusing.
- Biased or inaccurate information
Correlation doesn’t always mean causation.
Core messages can get lost in translation.

New cards

Elements of visualizations

Images
Spreadsheets
Animations
- Videos
Maps
- Geospatial
  - Proportional symbol maps
  - Choropleth, Isopleth, Area Maps
- Heatmaps
- Treemaps
Dashboards
Tables
Diagrams
- Graphs
  - Bar Graphs
  - Bullet Graphs
  - Box-and-whisker Plot
- Infographic
- Charts
  - Pie Charts
  - Bar Charts
    - Gantt Charts
    - Histograms

New cards

Geospatial

A visualization that shows data in map form using different shapes and colors to show the relationship between pieces of data and specific locations.
Focus on the relationship between data and its physical location to create insight.
Geovisualization overlays variables on a map using latitude and longitude to foster insight.
Maps are the primary focus. They act as a container for extra data. This allows for the creation of context using shapes and color to change the visual focus. They identify problems, track change, understand trends, and perform forecasting related to specific places and times.

New cards

Heat maps

A type of geospatial visualization in map form that displays specific data values as different colors (this doesn’t need to be temperatures, but that is a common use).

New cards

Treemaps

A type of chart that shows different, related values in the form of rectangles nested together.

New cards

Bullet graphs

A bar marked against a background to show progress or performance against a goal, denoted by a line on the graph.

New cards

Box-and-whisker Plot

These show a selection of ranges (the box) across a set measure (the bar).

New cards

Gantt charts

Typically used in project management, Gantt charts are a bar chart depiction of timelines and tasks.

New cards

Ten important factors of information visualization

Information becomes easily shareable
Decision making
Identify trends and patterns
Optimize resources
Resource allocation
Easier for stakeholders to understand/internalize information
Customer satisfaction
Enhanced efficiency
Cost reduction
Innovation and competitiveness
Social implications

New cards

Role of Python in data analysis and information visualization

Built with a focus on business information analysis
User-friendly syntax
Ecosystem libraries
Community support
Scalability
Integrability and interpretability
Pandas library
- Helps with data analysis and manipulation
- Data can be transformed

New cards

Time Series Analysis

Definition: A specific way of analyzing a sequence of data points collected over an interval of time.
Analysts record data points at consistent intervals over a set period of time rather than just recording the data points intermittently or randomly.
Time is a crucial variable because it shows how the data adjusts over the course of the data points as well as the final results. It provides an additional source of information and a set order of dependencies between the data.
Typically requires a large amount of data.
Ensures that trends or patterns discovered are not outliers.

New cards

Why organizations use time series data analysis

Time series analysis helps organizations understand the underlying causes of trends or systemic patterns over time and predict future events.

New cards

Examples of use cases for time series analysis

Weather data
Rainfall measurements
Temperature readings
Heart rate monitoring (EKG)
Brain monitoring (EEG)
Quarterly sales
Stock prices
Automated stock trading
Industry forecasts
Interest rates

New cards

Time Series Analysis considerations

Variability
Rate of Change
- Measured in percentage between the two points.
Covariance
Cycles
- Linear fashion of viewing data within a period of time.
Exceptions

New cards

Time Series Analysis models

Classification
Curve fitting
Descriptive analysis
Explanative analysis
Exploratory analysis
Forecasting
Intervention analysis
Segmentation

New cards

Time Series model displays

Line graph (Works best for time series analysis)
- Analyzing patterns and exceptions
Bar plots
- Compares individual values
Dot plots / Box plots
- Analyze distribution changes
Radar graph
- Comparing cycles
Heatmap
- Analyze high-volume cyclical patterns and exceptions
- Uses color to encode quantitative values

New cards

Time Series techniques and best practices

Aggregations to various time intervals
- Examples:
  - Quarterly
  - Monthly
  - Weekly
  - Daily
Viewing time periods in context
Grouping related time intervals
Using running averages to enhance the perception of high-level patterns
Omitting missing values from the display
Optimize a graph’s aspect ratio
Using the logarithmic scale to compare the rate of change
Overlapping time scale to compare cyclical patterns
Using cycle plots to examine trends and cycles together
Combining individual and cumulative values to compare actuals to targets
Stacking line graphs to compare multiple variables
Expressing time as 0 - 100% to compare a synchronous proceeding

New cards

Time Series Analysis Python example

import pandas as pd

import matplotlib.pyplot as plt

# Simple time-series plot

time_series_data = pd.DataFrame({

'Date': pd.date_range(start='1/1/2022', periods=10, freq='D'),

'Stock_Price': [1, 2, 3, 4, 3, 4, 5, 6, 7, 8]

})

time_series_data.plot(x='Date', y='Stock_Price', kind='line')

plt.title('Time-Series Data')

plt.show()

$import pandas as pdimport matplotlib.pyplot as plt # Simple time-series plottime_series_data = pd.DataFrame({'Date': pd.date_range(start='1/1/2022', periods=10, freq='D'),'Stock_Price': [1, 2, 3, 4, 3, 4, 5, 6, 7, 8]})time_series_data.plot(x='Date', y='Stock_Price', kind='line')plt.title('Time-Series Data')plt.show() $

New cards

When designing interaction with any type of navigation menu, we have to consider the following six aspects:

Symbols
Target areas
Interaction event
Layout
Levels
Functional context

New cards

Symbols

Users often rely on small visual clues, such as icons and symbols, to guide them through a website’s interface. Creating a system of symbolic communication throughout the website that is unambiguous and consistent is important.
The first principle in designing a drop-down navigation menu is to make users aware that it exists in the first place.
- Triangle symbol
- Plus symbol
- Three-line symbol
Consistent use of symbols

New cards

Target areas

A simple yet important rule is that links in a navigation menu should be easy to read, large and consistently located. The area in the interface that is assigned to and activates a link is typically referred to as the target area.
Legibility
Size
Consistency of location
Interaction event
- Four most common events:
  - Hovering
  - Clicking
  - Scrolling
  - Typing

New cards

Levels

Designing a single-level navigation menu is hard enough as it is. Incorporating multiple levels complicates the matter, especially on small screens.
- Removing navigation levels
- Levels and mobiles
- Levels and mega-menus
- Dynamic filters
- Breadcrumbs
- Mega-sites

New cards

Arrays

Collection of data/values of the same data type.
- Example:
  - Array of student scores
    - Int array_score[10] = {75, 88, 69, 90, 66, 81, 98, 77, 85, 70}

New cards

Matrices

A two-dimensional data structure where numbers are arranged into rows and columns.
Example:
- 1 2 3 4

5 6 7 8

9 10 11 12

New cards

NumPy (Python library)

Function used to process numbers
Complex analysis of arrays
Multi-dimensional arrays
Data manipulation
Capable of handling large datasets with ease.
Mathematical functions to operate on data structures
- Basic statistical operations/functions
  - Mean
  - Standard deviation
  - Skewness
  - Kurtosis
- Broadcasting
  - Automatically expands smaller arrays to match the shape of larger ones.

New cards

Pandas (Python library)

Data manipulation capability
Data format and data types
Cleaning datasets
Transforms data into visualizations

New cards

DataFrame

A highly versatile data structure that is essentially a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).
df.head()
- Returns first 5 rows of the DataFrame
df.tail()
- Returns last 5 rows of the DataFrame
df.info()
- Concise summary of the DataFrame
df.describe()
- Statistical insight into the numerical columns of the DataFrame

New cards

Series

A type of data structure in the Pandas library. It is a one dimensional labeled array that contains data of any type. It can be thought of as a single column in a DataFrame.
- This means that the Series can be used to store a single column of data, such as a list of numbers, names, or any other data type. In pandas, you can create a Series from a list, array, or dictionary.
s.size()
- Returns the number of elements in the series
s.mean()
- Returns the mean (average) value of the series
s.std
- Returns the standard deviation of the series
s.unique
- Returns an array of unique values in the series

New cards

Matplotlib (Python library)

One of the most widely used and versatile Python libraries available for creating static, interactive, and animated visualizations. With Matplotlib, users can easily create a wide range of visualizations, including line plots, scatter plots, bar plots, histograms, and more. Additionally, Matplotlib provides a high degree of customization, allowing users to tailor their visualizations to their precise needs.
- Additional features
  - Subplots
  - Legends
  - Annotations
  - Error bars

New cards

Seaborn (Python library)

A powerful library built on top of Matplotlib that offers a high-level, user-friendly interface. It integrates closely with Pandas data structures and incorporates best practices for effective data visualization. With Seaborn, you'll have access to a wider range of color palettes, more visually appealing plots, and simpler syntax.
- Create more complex visualizations
- More customization options
  - Tweaking color schemes
  - Adjusting axis limits
  - Adding annotations
- Offers a variety of statistical plots
  - Bar plots
  - Pair plots
  - Heat maps
  - Violin plots
  - Facet grids
  - Joint plots