Information Visualization Final Exam Review

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/43

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

44 Terms

1
New cards

Machine Learning

The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and draw inferences from patterns in data.

2
New cards

Types of Machine Learning

  • Supervised learning

  • Unsupervised learning

  • Reinforcement learning

  • Semi-supervised learning

3
New cards

Supervised learning

  • Labeled data for training to predict.

  • Training a model using a labeled dataset to predict until the algorithm is able to accurately predict new, unseen data.

  • The algorithm is given a set of labeled data with correct answers until the algorithm is able to predict.

  • Imagine a teacher is supervising the learning process of a student

    • The labeled training dataset: The teacher

    • The machine: The student

    • Learning is done repeatedly/iteratively

  • Areas of specialization

    • Image and speech recognition

    • Fraud detection

    • Recommendation system

    • Medical diagnostics

    • Weather forecasting

  • Algorithms

    • K-nearest neighbors (KNN)

    • Logistic Regression

    • Decision TreesSupport Vector Machine (SVM)

    • Random Forest

4
New cards

Unsupervised learning

  • A type of machine learning where unlabeled data are handled.

  • Tries to learn the pattern and structure of data on its own.

  • Areas of specialization

    • Clustering similar data items together

      • Types of clustering

        • Partitioning

        • Hierarchical

        • Density-based methods

    • Finding meaningful groups with a given dataset

    • Large dataset, where it would be time-consuming and expensive if done manually

    • Identifying hidden relationships that may not be immediately obvious

    • Segment customers

    • Creation of new patterns

  • Algorithms

    • K-mean clustering

      • Finds the optimum number of clusters in the data set.

      • Involves assigning each data point its cluster based on its mean

      • Cluster based on the mean of its nearest neighbor

      • Repeat until there are no more clusters to create

      • Advantages

        • Computationally efficient

5
New cards

Reinforcement learning

  • The agent learns by interacting with environmental data.

  • The agent performs certain actions and then observes the rewards or consequences.

    • Learns from mistakes.

      • Trial and error

  • The difference here is there is no correct answer to mimic.

  • Areas of specialization

    • Simulations

    • Statistical analysis

6
New cards

Semi-supervised learning

  • Combines the benefits of supervised and unsupervised learning.

  • Used when obtaining a fully labeled dataset is time-consuming and/or expensive.

7
New cards

Machine learning using Python

from sklearn.datasets import load_iris 

from sklearn.model_selection import train_test_split 

from sklearn.neighbors import KNeighborsClassifier 


# Load iris dataset 

iris = load_iris() 

# Split dataset into train and test 

X_train, X_test, y_train, y_test = train_test_split( iris['data'], iris['target'], random_state = 0)

# Initialize your classifier

knn = KNeighborsClassifier(n_neighbors = 1)

# Fit the model

knn.fit(X_train, y_train)

# Make a prediction

prediction = knn.predict([[5, 2.9, 1, 0.2]])

print("Prediction: ", prediction)


8
New cards

Perception

  • The ability to capture, process, and actively make sense of information that’s being received by our senses.

  • Cognitive process that makes you interpret your surroundings.

9
New cards

Data Visualization

  • Abstract- Describes information that is not physical

  • Graphical display of abstract information.

    • Making sense/sense-making = data analysis

    • Communication

  • Powerful tool to discover, analyze, understand, and present your stories.

  • The goal is to translate abstract information into visual representations that can be easily, efficiently, accurately, and meaningfully decoded.

  • Statistical information is abstract

  • Can display relationships between non quantitative values with nodes.

10
New cards

Cognition

In an era where technology is rapidly reshaping the way we interact with the world, understanding the intricacies of AI is not just a skill, but a necessity for designers.

11
New cards

Pictures for the eyes and mind

  • Data visualization is only successful to the degree that it encodes information in a manner that our eyes can discern and our brains can understand.

  • Consider a case when you need to help people understand the primary causes of death in America.

    • To achieve this goal, the display should achieve the following:

      • Clearly indicates how the values relate to one another, which in this case is a part-to-whole relationship - the number of deaths per cause, when summed, equals all deaths during the year.

      • Represents the quantities accurately.

      • Makes it easy to compare the quantities.

      • Makes it easy to see the ranked order of values, such as from the leading cause of death to the least.

      • Makes obvious how people should use the information - what they should use it to accomplish - and encourages them to do this.

    • The traditional way to display this information graphically involves a pie chart

12
New cards

Gestalt principles of perception

  • Proximity

  • Similarity

  • Enclosure

  • Closure

  • Continuity

  • Connection

13
New cards

Information Visualization

  • Information: Processed data

  • Visualization: Images or graphics to communicate information

    • AKA Graphics Visualization, is any technique for creating images, diagrams, or animations to communicate a message.  Visualization through visual imagery has been an effective way to communicate both abstract and concrete ideas since the dawn of humanity.

  • Input: Data

    • Data: Unprocessed information

  • Output: Information

  • Information visualization

    • The practice of representing data in a meaningful, visual way that users can interpret and easily comprehend.  This includes data visualizations and dashboards.  Information visualization is an effective way to share insights in a digestible format for non-experts.

14
New cards

Advantages of Information/Data Visualization

  • Eyes are drawn to colors and patterns.

  • Our culture is visual, including everything from art and advertisements to TV and movies.

  • A chart allows us to quickly see trends/patterns and outliers.

  • If we can see something, we internalize it quickly.

  • Storytelling with a purpose.

  • Helps keep interest in the subject.

  • More effective than a simple spreadsheet.

  • The easy sharing of information.

  • Interactively explore opportunities.

  • Helps with decision-making. (Data-driven decisions)

  • Visualize patterns and relationships.

15
New cards

Disadvantages of Information/Data Visualization

  • Sometimes data can be misrepresented or misinterpreted when placed in the wrong style of data visualization.

  • When viewing a visualization with many different data points, it’s easy to make an inaccurate assumption.

  • Visualizations can be designed wrong, making them biased and confusing.

    • Biased or inaccurate information

  • Correlation doesn’t always mean causation.

  • Core messages can get lost in translation.

16
New cards

Elements of visualizations

  • Images

  • Spreadsheets

  • Animations

    • Videos

  • Maps

    • Geospatial

      • Proportional symbol maps

      • Choropleth, Isopleth, Area Maps

    • Heatmaps

    • Treemaps

  • Dashboards

  • Tables

  • Diagrams

    • Graphs

      • Bar Graphs

      • Bullet Graphs

      • Box-and-whisker Plot

    • Infographic

    • Charts

      • Pie Charts

      • Bar Charts

        • Gantt Charts

        • Histograms

17
New cards

Geospatial

  • A visualization that shows data in map form using different shapes and colors to show the relationship between pieces of data and specific locations. 

  • Focus on the relationship between data and its physical location to create insight.

  • Geovisualization overlays variables on a map using latitude and longitude to foster insight.

  • Maps are the primary focus.  They act as a container for extra data.  This allows for the creation of context using shapes and color to change the visual focus.  They identify problems, track change, understand trends, and perform forecasting related to specific places and times.

18
New cards

Heat maps

A type of geospatial visualization in map form that displays specific data values as different colors (this doesn’t need to be temperatures, but that is a common use).

19
New cards

Treemaps

A type of chart that shows different, related values in the form of rectangles nested together.

20
New cards

Bullet graphs

A bar marked against a background to show progress or performance against a goal, denoted by a line on the graph.

21
New cards

Box-and-whisker Plot

These show a selection of ranges (the box) across a set measure (the bar).

22
New cards

Gantt charts

Typically used in project management, Gantt charts are a bar chart depiction of timelines and tasks.

23
New cards

Ten important factors of information visualization

  • Information becomes easily shareable

  • Decision making

  • Identify trends and patterns

  • Optimize resources

  • Resource allocation

  • Easier for stakeholders to understand/internalize information

  • Customer satisfaction

  • Enhanced efficiency

  • Cost reduction

  • Innovation and competitiveness

  • Social implications

24
New cards

Role of Python in data analysis and information visualization

  • Built with a focus on business information analysis

  • User-friendly syntax

  • Ecosystem libraries

  • Community support

  • Scalability

  • Integrability and interpretability

  • Pandas library

    • Helps with data analysis and manipulation

    • Data can be transformed

25
New cards

Time Series Analysis

  • Definition: A specific way of analyzing a sequence of data points collected over an interval of time.

  • Analysts record data points at consistent intervals over a set period of time rather than just recording the data points intermittently or randomly.

  • Time is a crucial variable because it shows how the data adjusts over the course of the data points as well as the final results. It provides an additional source of information and a set order of dependencies between the data. 

  • Typically requires a large amount of data.

  • Ensures that trends or patterns discovered are not outliers.

26
New cards

Why organizations use time series data analysis

Time series analysis helps organizations understand the underlying causes of trends or systemic patterns over time and predict future events.

27
New cards

Examples of use cases for time series analysis

  • Weather data

  • Rainfall measurements

  • Temperature readings

  • Heart rate monitoring (EKG)

  • Brain monitoring (EEG)

  • Quarterly sales

  • Stock prices

  • Automated stock trading

  • Industry forecasts

  • Interest rates

28
New cards

Time Series Analysis considerations

  • Variability

  • Rate of Change

    • Measured in percentage between the two points.

  • Covariance

  • Cycles

    • Linear fashion of viewing data within a period of time.

  • Exceptions

29
New cards

Time Series Analysis models

  • Classification

  • Curve fitting

  • Descriptive analysis

  • Explanative analysis

  • Exploratory analysis

  • Forecasting

  • Intervention analysis

  • Segmentation

30
New cards

Time Series model displays

  • Line graph (Works best for time series analysis)

    • Analyzing patterns and exceptions

  • Bar plots

    • Compares individual values

  • Dot plots / Box plots

    • Analyze distribution changes

  • Radar graph

    • Comparing cycles

  • Heatmap

    • Analyze high-volume cyclical patterns and exceptions

    • Uses color to encode quantitative values

31
New cards

Time Series techniques and best practices

  • Aggregations to various time intervals

    • Examples:

      • Quarterly

      • Monthly

      • Weekly

      • Daily

  • Viewing time periods in context

  • Grouping related time intervals

  • Using running averages to enhance the perception of high-level patterns

  • Omitting missing values from the display

  • Optimize a graph’s aspect ratio 

  • Using the logarithmic scale to compare the rate of change

  • Overlapping time scale to compare cyclical patterns

  • Using cycle plots to examine trends and cycles together

  • Combining individual and cumulative values to compare actuals to targets

  • Stacking line graphs to compare multiple variables

  • Expressing time as 0 - 100% to compare a synchronous proceeding

32
New cards

Time Series Analysis Python example

import pandas as pd

import matplotlib.pyplot as plt


# Simple time-series plot

time_series_data = pd.DataFrame({

'Date': pd.date_range(start='1/1/2022', periods=10, freq='D'),

'Stock_Price': [1, 2, 3, 4, 3, 4, 5, 6, 7, 8]

})

time_series_data.plot(x='Date', y='Stock_Price', kind='line')

plt.title('Time-Series Data')

plt.show()


<p><span>import pandas as pd</span></p><p><span>import matplotlib.pyplot as plt</span></p><p><br></p><p><span><strong># Simple time-series plot</strong></span></p><p><span>time_series_data = pd.DataFrame({</span></p><p><span>'Date': pd.date_range(start='1/1/2022', periods=10, freq='D'),</span></p><p><span>'Stock_Price': [1, 2, 3, 4, 3, 4, 5, 6, 7, 8]</span></p><p><span>})</span></p><p><span>time_series_data.plot(x='Date', y='Stock_Price', kind='line')</span></p><p><span>plt.title('Time-Series Data')</span></p><p><span>plt.show()</span></p><p><br></p>
33
New cards

When designing interaction with any type of navigation menu, we have to consider the following six aspects:

  • Symbols

  • Target areas

  • Interaction event

  • Layout

  • Levels

  • Functional context

34
New cards

Symbols

  • Users often rely on small visual clues, such as icons and symbols, to guide them through a website’s interface.  Creating a system of symbolic communication throughout the website that is unambiguous and consistent is important.

  • The first principle in designing a drop-down navigation menu is to make users aware that it exists in the first place.

    • Triangle symbol

    • Plus symbol

    • Three-line symbol

  • Consistent use of symbols

35
New cards

Target areas

  • A simple yet important rule is that links in a navigation menu should be easy to read, large and consistently located. The area in the interface that is assigned to and activates a link is typically referred to as the target area.

  • Legibility

  • Size

  • Consistency of location

  • Interaction event

    • Four most common events:

      • Hovering

      • Clicking

      • Scrolling

      • Typing

36
New cards

Levels

  • Designing a single-level navigation menu is hard enough as it is.  Incorporating multiple levels complicates the matter, especially on small screens.

    • Removing navigation levels

    • Levels and mobiles

    • Levels and mega-menus

    • Dynamic filters

    • Breadcrumbs

    • Mega-sites

37
New cards

Arrays

  • Collection of data/values of the same data type.

    • Example:

      • Array of student scores

        • Int array_score[10] = {75, 88, 69, 90, 66, 81, 98, 77, 85, 70}

38
New cards

Matrices

  • A two-dimensional data structure where numbers are arranged into rows and columns.

  • Example:

    • 1 2 3 4

5 6 7 8

9 10 11 12


39
New cards

NumPy (Python library)

  • Function used to process numbers

  • Complex analysis of arrays

  • Multi-dimensional arrays

  • Data manipulation

  • Capable of handling large datasets with ease.

  • Mathematical functions to operate on data structures

    • Basic statistical operations/functions

      • Mean

      • Standard deviation

      • Skewness

      • Kurtosis

    • Broadcasting

      • Automatically expands smaller arrays to match the shape of larger ones.

40
New cards

Pandas (Python library)

  • Data manipulation capability

  • Data format and data types

  • Cleaning datasets

  • Transforms data into visualizations

41
New cards

DataFrame

  • A highly versatile data structure that is essentially a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

  • df.head() 

    • Returns first 5 rows of the DataFrame

  • df.tail()

    • Returns last 5 rows of the DataFrame

  • df.info()

    • Concise summary of the DataFrame

  • df.describe()

    • Statistical insight into the numerical columns of the DataFrame

42
New cards

Series

  • A type of data structure in the Pandas library. It is a one dimensional labeled array that contains data of any type. It can be thought of as a single column in a DataFrame. 

    • This means that the Series can be used to store a single column of data, such as a list of numbers, names, or any other data type. In pandas, you can create a Series from a list, array, or dictionary.

  • s.size()

    • Returns the number of elements in the series

  • s.mean()

    • Returns the mean (average) value of the series

  • s.std

    • Returns the standard deviation of the series

  • s.unique

    • Returns an array of unique values in the series

43
New cards

Matplotlib (Python library)

  • One of the most widely used and versatile Python libraries available for creating static, interactive, and animated visualizations. With Matplotlib, users can easily create a wide range of visualizations, including line plots, scatter plots, bar plots, histograms, and more. Additionally, Matplotlib provides a high degree of customization, allowing users to tailor their visualizations to their precise needs.

    • Additional features

      • Subplots

      • Legends

      • Annotations

      • Error bars

44
New cards

Seaborn (Python library)

  • A powerful library built on top of Matplotlib that offers a high-level, user-friendly interface. It integrates closely with Pandas data structures and incorporates best practices for effective data visualization. With Seaborn, you'll have access to a wider range of color palettes, more visually appealing plots, and simpler syntax.

    • Create more complex visualizations

    • More customization options

      • Tweaking color schemes

      • Adjusting axis limits

      • Adding annotations

    • Offers a variety of statistical plots

      • Bar plots

      • Pair plots

      • Heat maps

      • Violin plots

      • Facet grids

      • Joint plots