Crosstab/Contingency table

slices data by 2 categorical variables, bivariate

Bar chart

utilizes frequency table and categorical data, univariate(one variable), uses height/length

Histogram

usually used for continuous (numerical) data, univariate(one variable), uses height/length

Positively/Right skewed

heavy to left side, light on right side (TAIL ON RIGHT)

Negatively/Left skewed

heavy to right side, light on left side (TAIL ON LEFT)

Stacked bar/column chart

utilizes contingency chart, can have both data types, usually categorical though, multivariate

Line chart

measures two 2 things over time

Scatterplot

relationship between 2 numeric variables/ the third variable can be categorical with a legend, uses position, can be bivariate or multivariate

Business analytics

data analyses for business applications

Data science

develop applications for end users

Sequence/Types of Analytics

Descriptive → Diagnostic → Predictive → Prescriptive

Statistical inference

is the process of using data from a sample to gain

information about the population

Sampling bias

occurs when the method of selecting a sample causes the sample to differ from the population in some relevant way

Time series data

data values observed over time

Cross sectional data

values observed at the same point in time

Structured Data

Reside in a pre-defined, row-column format; Spreadsheet or database applications; Enter, store, query, and analyze

Unstructured Data

Do not conform to a pre-defined, row-column format; Textual; Multimedia content

Discrete data

numerical, can have decimals, more strigid, would be a more jagged graph

Continuous data

numerical, yes to decimals, what is the number above 1

How to determine if it’s numerical or categorical

if you can perform a relevant calculation then it’s numerical ex: avg/mean (you don’t need the avg of zip codes, so it’s nominal)

Nominal data

categorical, no order, can be numeric but usually words ex: 1=yes 0=no, uniform numbers, zip code

Ordinal data

ranked, not necessarily a preference, ORDER

Lollipop chart

variation of a bar chart, uses height/length

Bullet graph

Encodes data using length/height, position and color to show actual compared to target and performance bands

Dot plot

is a Univariate plot for Continuous data, uses position

Box and whisker plot

univariate, for continuous data, uses position and height/length

Pie chart

uses angle, area and arc to show a part-to-whole comparison, univariate, can be categorical or continuous

Line chart

uses position and often shows trend over time, usually bivariate, time usually on x-axis and y-axis is usually numerical

Sparkline / Sparkbar

using position (line) or height/length (bar) in a small, word-sized graphic

Bubble plot

Allows to add more variables to scatter plot, can use color and size to visualize other (likely numerical) data, multivariate

Heat Map

uses color, uses numerical data but does not use numbers in the visualization, bivariate

Visual perception

the brain's ability to receive, interpret, and act upon visual stimuli

Preattentive attributes

visual properties that we notice without using conscious effort to do so

Important preattentive attributes used in graphs

Length, width, orientation (is it a different way than the others), size, shape, color hue, color intensity, position, texture

Marks to encode quantitative values

Points, lines, bars, boxes, shapes with 2-d areas, shapes with color intensity

Encoding categorical items

Hue, point shape, 2d position

Pie charts are

bad! we don’t like to use them

Business intelligence

Data + tools + brains

As data analytics changes from descriptive to diagnostic to predictive to prescriptive, more human input is required for making decisions and enacting them.

FALSE

The use of historical information to predict what could happen in the future describes prescriptive analytics.

FALSE - predictive analytics

Social media data, such as Facebook, Instagram, and TicTok are examples of structured data.

FALSE

Supervised learning

Input & output data, classification, regression, predictive and prescriptive models

Unsupervised learning

Input data, clustering, association, PATTERN/structure discovery

Four Vs of Big Data

velocity, variety, volume, veracity(accuracy of data)

Descriptive Data Analytics

What is happening in my business?

Diagnostic Data Analytics

Why is it happening?

Prescriptive Data Analytics

What should be done?

Predictive Data Analytics

What will happen in the future?

Data analytics

the science of examining raw data to conclude that information; the process of inspecting, cleansing, transforming, and modeling data to discover useful information for decision-making.

Big Data

massive complex structured and unstructured data sets that are rapidly generated and transmitted from a wide variety of sources

Data Mining

a set of statistical and machine learning methods that inform decision-making. (Dipping through vast stores of data in search of something interesting)

Information

a set of data that are organized and processed in a meaningful and purposeful way.

