Data Visualization and Analysis Concepts

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/86

Earn XP

Description and Tags

These flashcards cover essential vocabulary and concepts related to data visualization and analysis for effective exam preparation.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

87 Terms

New cards

Paired data

Sample size n with the observations formatted in pairs, allowing for comparison between two related datasets.

New cards

Correlation Coefficient (r)

A numerical measure that indicates the strength and direction of a linear relationship between two variables.

New cards

Regression Analysis

A statistical process for estimating the relationships among variables.

New cards

Simple Linear Regression

A method to model the relationship between a single predictor variable and a response variable.

New cards

Deterministic Regression Model

A regression model that does not account for error terms.

New cards

Probabilistic Regression Model

A regression model that incorporates error terms.

New cards

Extrapolation

The process of estimating unknown values by extending known values.

New cards

Residual

The difference between the observed value and the predicted value in a regression model.

New cards

Sum of Square Fit

A measure of how well a statistical model explains the variation in the data.

New cards

Cognitive Load

The mental effort required to process and understand information from a data visualization.

New cards

Preattentive Attributes

Visual properties that are processed effortlessly and automatically.

New cards

Color Psychology

The study of how colors influence human behavior and emotions.

New cards

Complementary Colors

Colors that are opposite each other on the color wheel, creating contrast.

New cards

Analogous Colors

Colors that are next to each other on the color wheel, creating harmony.

New cards

Gestalt Principles

Principles explaining how people perceive visual elements as unified wholes.

New cards

Similarity (Gestalt Principle)

The principle where objects with similar characteristics are perceived as belonging to the same group.

New cards

Proximity (Gestalt Principle)

The principle whereby objects physically close to each other are perceived as a group.

New cards

Enclosure (Gestalt Principle)

The principle that suggests objects enclosed together are perceived as a single group.

New cards

Connection (Gestalt Principle)

The principle that connected objects are seen as related or part of the same group.

New cards

Frequency Distribution

A summary of how often each category occurs within a dataset.

New cards

Bubble Chart

A data visualization that uses circles of varying sizes to represent three quantitative variables.

New cards

Heat Map

A graphical representation of data where values are represented by colors.

New cards

Natural Language Processing (NLP)

A field of artificial intelligence that focuses on the interaction between computers and human language.

New cards

Tokenization

The process of breaking down text into individual words or phrases.

New cards

Term Frequency (TF)

A measure of how often a term appears in a document relative to the total number of terms.

New cards

Inverse Document Frequency (IDF)

A measure that reflects how important a term is within the entire document set.

New cards

TF-IDF

A statistical measure that evaluates the importance of a word in a document relative to a collection of documents.

New cards

Sentiment Analysis

The process of determining the emotional tone behind a series of words.

New cards

Trend Line

The positive slope indicates there is a positive association between percentage the more tightly the points cluster around, strong relationship

New cards

Sum of Squares Due to Error

SSE

New cards

Total Sum of Squares

SST

New cards

Sum of Squares Due to Regression

SSR

New cards

Coefficient of Determination

A statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.

New cards

Multiple Regression

regression analysis with two or more independent variables or with at least one nonlinear predictor

New cards

Multiple Regression Equation

An equation that models the relationship between multiple independent variables and a dependent variable, typically expressed in the form Y = b0 + b1X1 + b2X2 + … + bnXn + e.

New cards

Adjusted R²

to avoid adding extra variables that do not really belong, this value is typically listed in regression outputs

New cards

Data Visualization

the graphical representation of information and data, using visual elements like charts, graphs, and maps to communicate complex ideas.

New cards

Color

Features that can be processed by iconic memory and is the property of an object that results from the way the object reflects or emits light

New cards

Hues

Base of color

New cards

RYB

Traditional artist, adding model, Red, Yellow, Blue

New cards

CMY

Computers, Subtracting Model, cyan, yellow, magenta

New cards

Unnecessary use of Color

that can distract from the main message of a visualization.

New cards

Excessive use of color

that overwhelms the viewer and leads to confusion instead of clarity in a visualization.

New cards

Insufficient contract

between colors that makes elements difficult to distinguish or interpret in visualizations.

New cards

Inconsistency across related charts

can confuse viewers by failing to maintain a coherent design or color scheme, hindering effective comparison.

New cards

Orientation

positioning of an object within a data visualization

New cards

Size

amount of space an object occupies in a visualization, struggle to estimate relative size differences

New cards

Shape

form of objects used in data visualization to distinguish different groups

New cards

Length

the distance of a line or bar/column

New cards

Width

the thickness of a line or bar/column

New cards

Spatial Positioning

Pre attentive attribute of this focuses on the location of an object within some defined spaces

New cards

Frequency distribution

Bar chart

New cards

One continuous (numerical) variable

Histogram

New cards

Two categorial variables

contingency table/stacked column chart

New cards

Two continuous (numerical) variables

Scatter plot

New cards

Three continuous (numerical) variables

Bubble Plot

New cards

Timeseries

Line chart

New cards

Matrix Array

Heat Map

New cards

Geographic Map

a chart that shows characteristics and the arrangement of the geography of our physical reality

New cards

Data Dashboards

visual interfaces that display key metrics and trends, allowing users to analyze data from multiple sources.

New cards

Corpus

The entire body of text material to be analyzed (collection of documents)

New cards

Documents

the container of tokens chosen by the analyst

New cards

Text Analytics

Broader concept that includes information retrieval, where text mining primary focuses on discovering new and useful knowledge from the textual data sources

New cards

Text Mining

Knowledge in discovery in textual data

New cards

Information Extraction

identify key phrases and relationships with text by looking for predefined objects & sequences in text by way of pattern matching

New cards

Topic Tracking

Based on user profiles & documents that a user views, text mining can predict other documents of interest to the user

New cards

Summarization

Summary of documents to save time on the part of the reader

New cards

Clustering

letting themes emerge organically

New cards

Question Answering

finding the best answer to a given question through knowledge driven pattern machining

New cards

StopWord Removal

pare down the data removing words that don’t add any numerical value

New cards

Stemming

Process of removing prefixes, or suffixes - chop the word with letters in common

New cards

Lemmatization

Reducing the word to its lemma (dictionary entry) form

New cards

Term Document Matric TDM

bag of words technique counts the occurrence of words in a document while ignoring the order or the grammar of words

New cards

Binary Approach

the cells of the matrix are either populated with one (if token presented in document) or a zero (token not present)

New cards

Term Frequency Approach

Cells of matrix reflect the word count (frequency) in the document instead of just a zero or a one

New cards

Sparse Entry

A situation in a matrix where most of the entries are zero, indicating that only a small number of token occurrences are present compared to the total number of possible tokens.

New cards

TFIDF

value is specific to a single document whereas IDF entire corpus

New cards

Text Exploration

consists of techniques used to look for patterns or find relationships

New cards

Frequency Bar Chart

Consist of the x-axis representing terms and the y-axis representing the frequency of a particular term occurring

New cards

Word Cloud

is a visual representation of text data where the size of each word indicates its frequency or importance in the given text.

New cards

Text Modeling

Preprocessed data is used to build models

New cards

Classification

Most common knowledge discovery topic in analyzing complex data sources

New cards

Clustering

Unsupervised process where objects are classified into “natural groups” - problem is grouped into unlabeled collection of objects into meaningful clusters

New cards

Topic Modeling

Enables the analyst to discover hidden thematic structures in the text

New cards

Latent Dirichlet Allocation (LDA)

Goal is to maximize the separation between the estimated topics and minimize the variance within each projected topic

New cards

Sentiment Polarity

Classification of text as positive, negative, or neutral based on the emotional tone.