1/86
These flashcards cover essential vocabulary and concepts related to data visualization and analysis for effective exam preparation.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Paired data
Sample size n with the observations formatted in pairs, allowing for comparison between two related datasets.
Correlation Coefficient (r)
A numerical measure that indicates the strength and direction of a linear relationship between two variables.
Regression Analysis
A statistical process for estimating the relationships among variables.
Simple Linear Regression
A method to model the relationship between a single predictor variable and a response variable.
Deterministic Regression Model
A regression model that does not account for error terms.
Probabilistic Regression Model
A regression model that incorporates error terms.
Extrapolation
The process of estimating unknown values by extending known values.
Residual
The difference between the observed value and the predicted value in a regression model.
Sum of Square Fit
A measure of how well a statistical model explains the variation in the data.
Cognitive Load
The mental effort required to process and understand information from a data visualization.
Preattentive Attributes
Visual properties that are processed effortlessly and automatically.
Color Psychology
The study of how colors influence human behavior and emotions.
Complementary Colors
Colors that are opposite each other on the color wheel, creating contrast.
Analogous Colors
Colors that are next to each other on the color wheel, creating harmony.
Gestalt Principles
Principles explaining how people perceive visual elements as unified wholes.
Similarity (Gestalt Principle)
The principle where objects with similar characteristics are perceived as belonging to the same group.
Proximity (Gestalt Principle)
The principle whereby objects physically close to each other are perceived as a group.
Enclosure (Gestalt Principle)
The principle that suggests objects enclosed together are perceived as a single group.
Connection (Gestalt Principle)
The principle that connected objects are seen as related or part of the same group.
Frequency Distribution
A summary of how often each category occurs within a dataset.
Bubble Chart
A data visualization that uses circles of varying sizes to represent three quantitative variables.
Heat Map
A graphical representation of data where values are represented by colors.
Natural Language Processing (NLP)
A field of artificial intelligence that focuses on the interaction between computers and human language.
Tokenization
The process of breaking down text into individual words or phrases.
Term Frequency (TF)
A measure of how often a term appears in a document relative to the total number of terms.
Inverse Document Frequency (IDF)
A measure that reflects how important a term is within the entire document set.
TF-IDF
A statistical measure that evaluates the importance of a word in a document relative to a collection of documents.
Sentiment Analysis
The process of determining the emotional tone behind a series of words.
Trend Line
The positive slope indicates there is a positive association between percentage the more tightly the points cluster around, strong relationship
Sum of Squares Due to Error
SSE
Total Sum of Squares
SST
Sum of Squares Due to Regression
SSR
Coefficient of Determination
A statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.
Multiple Regression
regression analysis with two or more independent variables or with at least one nonlinear predictor
Multiple Regression Equation
An equation that models the relationship between multiple independent variables and a dependent variable, typically expressed in the form Y = b0 + b1X1 + b2X2 + … + bnXn + e.
Adjusted R²
to avoid adding extra variables that do not really belong, this value is typically listed in regression outputs
Data Visualization
the graphical representation of information and data, using visual elements like charts, graphs, and maps to communicate complex ideas.
Color
Features that can be processed by iconic memory and is the property of an object that results from the way the object reflects or emits light
Hues
Base of color
RYB
Traditional artist, adding model, Red, Yellow, Blue
CMY
Computers, Subtracting Model, cyan, yellow, magenta
Unnecessary use of Color
that can distract from the main message of a visualization.
Excessive use of color
that overwhelms the viewer and leads to confusion instead of clarity in a visualization.
Insufficient contract
between colors that makes elements difficult to distinguish or interpret in visualizations.
Inconsistency across related charts
can confuse viewers by failing to maintain a coherent design or color scheme, hindering effective comparison.
Orientation
positioning of an object within a data visualization
Size
amount of space an object occupies in a visualization, struggle to estimate relative size differences
Shape
form of objects used in data visualization to distinguish different groups
Length
the distance of a line or bar/column
Width
the thickness of a line or bar/column
Spatial Positioning
Pre attentive attribute of this focuses on the location of an object within some defined spaces
Frequency distribution
Bar chart
One continuous (numerical) variable
Histogram
Two categorial variables
contingency table/stacked column chart
Two continuous (numerical) variables
Scatter plot
Three continuous (numerical) variables
Bubble Plot
Timeseries
Line chart
Matrix Array
Heat Map
Geographic Map
a chart that shows characteristics and the arrangement of the geography of our physical reality
Data Dashboards
visual interfaces that display key metrics and trends, allowing users to analyze data from multiple sources.
Corpus
The entire body of text material to be analyzed (collection of documents)
Documents
the container of tokens chosen by the analyst
Text Analytics
Broader concept that includes information retrieval, where text mining primary focuses on discovering new and useful knowledge from the textual data sources
Text Mining
Knowledge in discovery in textual data
Information Extraction
identify key phrases and relationships with text by looking for predefined objects & sequences in text by way of pattern matching
Topic Tracking
Based on user profiles & documents that a user views, text mining can predict other documents of interest to the user
Summarization
Summary of documents to save time on the part of the reader
Clustering
letting themes emerge organically
Question Answering
finding the best answer to a given question through knowledge driven pattern machining
StopWord Removal
pare down the data removing words that don’t add any numerical value
Stemming
Process of removing prefixes, or suffixes - chop the word with letters in common
Lemmatization
Reducing the word to its lemma (dictionary entry) form
Term Document Matric TDM
bag of words technique counts the occurrence of words in a document while ignoring the order or the grammar of words
Binary Approach
the cells of the matrix are either populated with one (if token presented in document) or a zero (token not present)
Term Frequency Approach
Cells of matrix reflect the word count (frequency) in the document instead of just a zero or a one
Sparse Entry
A situation in a matrix where most of the entries are zero, indicating that only a small number of token occurrences are present compared to the total number of possible tokens.
TFIDF
value is specific to a single document whereas IDF entire corpus
Text Exploration
consists of techniques used to look for patterns or find relationships
Frequency Bar Chart
Consist of the x-axis representing terms and the y-axis representing the frequency of a particular term occurring
Word Cloud
is a visual representation of text data where the size of each word indicates its frequency or importance in the given text.
Text Modeling
Preprocessed data is used to build models
Classification
Most common knowledge discovery topic in analyzing complex data sources
Clustering
Unsupervised process where objects are classified into “natural groups” - problem is grouped into unlabeled collection of objects into meaningful clusters
Topic Modeling
Enables the analyst to discover hidden thematic structures in the text
Latent Dirichlet Allocation (LDA)
Goal is to maximize the separation between the estimated topics and minimize the variance within each projected topic
Sentiment Polarity
Classification of text as positive, negative, or neutral based on the emotional tone.