1/40
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are the four main categories of data analytics?
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
Descriptive Analytics
Are procedures that summarize existing data to determine what has happened in the past
Diagnostic Analytics
Are procedures that explore the current data to determine why something has happened the way it has, typically comparing the data to a benchmark
Predictive Analytics
Are procedures used to generate a model that can be used to determine that is likely to happen in the future
Prescriptive Analytics
Are procedures that model data to enable recommendations for what should be done in the future
From lowest of highest of value and difficulty?
Descriptive Analytics
Diagnostic analytics
Predictive Analytics
Prescriptive Analytics
Descriptive analytics examples:
Summary statistics
Data reduction or filtering
Summary Statistics
Describe a set of data in terms of their location (mean, median), range (standard deviation, min, max), shape (quartile), and size (count)
Data Reduction or Filterting
Is used to reduce the amount of observations to focus on relevant items (that is, highest cost, highest risk, largest impact, etc.)
How is data reduction/filtering done?
It does this by taking a large set of data (perhaps the population) and reducing it to a smaller set that has the vast majority of the critical information of the larger set
Diagnostic Analytics examples:
Profiling
Clustering
Similarity Matching
Co-occurence grouping
Profiling
Identifies the “typical” behavior of an individual, group, or population by compiling summary statistics about the data (including mean, standard deviations, etc.) and comparing individuals to the population
used to discover patterns of behavior
Clustering
Helps identify groups (or clusters) of individuals (such as customers) that share common underlying characteristics - identifying groups of similar data elements and the underlying drivers of those groups
Similarity Matching
Is a grouping technique used to identify similar individuals based on data known about them
Co-occurence grouping
Discovers associations between individuals based on common events, such as transactions they are involved in
Predictive analytics example:
Regression
Classification
Link Prediction
Regression
Estimates or predicts the numerical value of a dependent variable based on the slope and interest of a line and the value of an independent variable
Classification
Predicts a class or category for a new observation based on the manual identification of classes from previous observations
Link Prediction
Predicts a relationship between two data items, such as members of a social media platform
Prescriptive analytics examples:
Decision support systems
Machine Learning and Artificial Intelligence
Decision Support Systems
Are rule-based systems that gather data and recommend actions based on the input
Machine Learning and Artificial Intelligence
Are learning models or intelligent agents that adapt to new external data to recommend a course of action
Summary Statistics
Describe the location, spread, shape, and dependence of a set of observations
Data reduction involves the following steps:
Identify the attribute you would like to reduce on or focus on
Filter the results
Interpret the results
Follow up on results
Fuzzy Matching
Locates approximate matches
Useful for identifying relationships in imperfect data
What can diagnostic analytics provide?
Insight into why things happened or how individual data values relate to the general population
What type of data is profiling primarily using?
Structured data - data that are stored in a database or spreadsheet and are readily searchable
Profiling relies on gathering summary statistics and identifying outliers:
Identify the objects or activity you want to profile
Determine the types of profiling you want to perform
Set boundaries or threshold for the activity
Interpret the results and monitor the activity and/or generate a list of exceptions
Follow up on exceptions
What can help show spread and outliers?
Z-scores and box plots
What is an example of data profiling?
Variance Analysis
Benford’s law
Is a diagnostic analytics that compares actual to expected values
What does regression help with?
helps predict expected outcomes
Classification: Training data
Are existing data that have been manually evaluated and assigned a class
Classification: Test Data
Are existing data used to evaluate the modelClassification:
Classification: Decision Trees
Are used to divide data into similar groups
Classification: Decision Boundaries
Mark the split between one class and another
Classification: Pruning
Removes branches from a decision tree to avoid overfitting the model
Classification: Linear Classifiers
Are useful for ranking items rather than simply predicting class probability
Useful for determining the important values
Classification: Support Vector Machine
Is a discriminating classifier that is define by a separating hyperplane that work first to find the widest margin and then works to find the middle line
How do we evaluate classifiers?
Try to avoid overfitting, or model that are too accurate. They are bad at predicting a future observation
Look for the sweet spot where we maximize the accuracy of the testing data
What do we do once other diagnostics and predictive analyses have been performed?
The decision process can be aided by rules-based decision support systems, machine learning models, or added to an existing artificial intelligence model to improve future predictions