Looks like no one added any tags here yet for you.
Crosstab/Contingency table
slices data by 2 categorical variables, bivariate
Bar chart
utilizes frequency table and categorical data, univariate(one variable), uses height/length
Histogram
usually used for continuous (numerical) data, univariate(one variable), uses height/length
Positively/Right skewed
heavy to left side, light on right side (TAIL ON RIGHT)
Negatively/Left skewed
heavy to right side, light on left side (TAIL ON LEFT)
Stacked bar/column chart
utilizes contingency chart, can have both data types, usually categorical though, multivariate
Line chart
measures two 2 things over time
Scatterplot
relationship between 2 numeric variables/ the third variable can be categorical with a legend, uses position, can be bivariate or multivariate
Business analytics
data analyses for business applications
Data science
develop applications for end users
Sequence/Types of Analytics
Descriptive → Diagnostic → Predictive → Prescriptive
Statistical inference
is the process of using data from a sample to gain
information about the population
Sampling bias
occurs when the method of selecting a sample causes the sample to differ from the population in some relevant way
Time series data
data values observed over time
Cross sectional data
values observed at the same point in time
Structured Data
Reside in a pre-defined, row-column format; Spreadsheet or database applications; Enter, store, query, and analyze
Unstructured Data
Do not conform to a pre-defined, row-column format; Textual; Multimedia content
Discrete data
numerical, can have decimals, more strigid, would be a more jagged graph
Continuous data
numerical, yes to decimals, what is the number above 1
How to determine if it’s numerical or categorical
if you can perform a relevant calculation then it’s numerical ex: avg/mean (you don’t need the avg of zip codes, so it’s nominal)
Nominal data
categorical, no order, can be numeric but usually words ex: 1=yes 0=no, uniform numbers, zip code
Ordinal data
ranked, not necessarily a preference, ORDER
Lollipop chart
variation of a bar chart, uses height/length
Bullet graph
Encodes data using length/height, position and color to show actual compared to target and performance bands
Dot plot
is a Univariate plot for Continuous data, uses position
Box and whisker plot
univariate, for continuous data, uses position and height/length
Pie chart
uses angle, area and arc to show a part-to-whole comparison, univariate, can be categorical or continuous
Line chart
uses position and often shows trend over time, usually bivariate, time usually on x-axis and y-axis is usually numerical
Sparkline / Sparkbar
using position (line) or height/length (bar) in a small, word-sized graphic
Bubble plot
Allows to add more variables to scatter plot, can use color and size to visualize other (likely numerical) data, multivariate
Heat Map
uses color, uses numerical data but does not use numbers in the visualization, bivariate
Visual perception
the brain's ability to receive, interpret, and act upon visual stimuli
Preattentive attributes
visual properties that we notice without using conscious effort to do so
Important preattentive attributes used in graphs
Length, width, orientation (is it a different way than the others), size, shape, color hue, color intensity, position, texture
Marks to encode quantitative values
Points, lines, bars, boxes, shapes with 2-d areas, shapes with color intensity
Encoding categorical items
Hue, point shape, 2d position
Pie charts are
bad! we don’t like to use them
Business intelligence
Data + tools + brains
As data analytics changes from descriptive to diagnostic to predictive to prescriptive, more human input is required for making decisions and enacting them.
FALSE
The use of historical information to predict what could happen in the future describes prescriptive analytics.
FALSE - predictive analytics
Social media data, such as Facebook, Instagram, and TicTok are examples of structured data.
FALSE
Supervised learning
Input & output data, classification, regression, predictive and prescriptive models
Unsupervised learning
Input data, clustering, association, PATTERN/structure discovery
Four Vs of Big Data
velocity, variety, volume, veracity(accuracy of data)
Descriptive Data Analytics
What is happening in my business?
Diagnostic Data Analytics
Why is it happening?
Prescriptive Data Analytics
What should be done?
Predictive Data Analytics
What will happen in the future?
Data analytics
the science of examining raw data to conclude that information; the process of inspecting, cleansing, transforming, and modeling data to discover useful information for decision-making.
Big Data
massive complex structured and unstructured data sets that are rapidly generated and transmitted from a wide variety of sources
Data Mining
a set of statistical and machine learning methods that inform decision-making. (Dipping through vast stores of data in search of something interesting)
Information
a set of data that are organized and processed in a meaningful and purposeful way.