Looks like no one added any tags here yet for you.
Descriptive Analysis
Investigates what is happening currently or has occurred in the past
Diagnostic
Helps understand why something happened
Predictive
Forecasts what might happen in the future
Prescriptive
Helps understand what should happen to meet goals and objectives
Example of Descriptive Analysis
What were gross sales by region for the past two years
Example of Diagnostic Analysis
Why did sales decrease in Region 1 in the prior year
Example of Predictive Analysis
What will sales be next year if we increase market share by 10%
Example of Prescriptive Analysis
What is the most cost-effective way to ship our products
Stages of the data analysis process
Plan, Analyze, Report
Stage 1 : Plan
Identifying the motivation for the analysis, determining the objective and questions to answer, and devising a strategy
Stage 2: Analyze
Data preparation, building information models, and exploring the data
Stage 3: Report
interpreting the analyses, and communicate effectively
What is motivation in data analysis
The reason the analysis is being performed. Why are we doing the analysis
What are the four risks with data analysis?
Data, Analysis, Assumptions, Biases
Data
Choosing inappropriate data, or data that are incomplete or incorrect
Analysis
Choosing an inappropriate method, or applying a data analysis method incorrectly
Assumptions
Not understanding or evaluating assumptions about the data or the results
Biases
Mental shortcuts that can affect decisions
Relational Database
A collection of logically related data that can be retrieved, manipulated, and updated to meet users’ needs.
Inner Join
Will select all of the rows from both the tables basked on the matched values
Left Join
Returns all the rows from the left table and will show any matching data from the right table
Right Join
Returns all the rows form the right table and will show any matching data from the left table
Full Join
Will return all the rows from both tables
Primary Key
A uniquely identified key
Foreign Key
If a primary key appears in another table, it is referred to as the foreign key
Attribute
The columns if the source of the data was a database
Record
Each row in a data set from a database
Fields
the individual columns in a data set
Descriptive Statistics
Help understand characteristics of data
Application of descriptive stats
Average observations in the data, the data’s shape, the distribution of the data
Correlation Analysis
Reveals relationships in data by measuring the linear relationship between two variables
How does correlation analysis work?
The measure is numerical between -1 and +1, the higher the absolute number the stronger the relationship
Skewness graph
Positive skew shows a tail off to the right, whereas the negative skew shows a tail off to the left
Positive skew
From left to right goes mode, median, mean
Negative skew
From left to right goes mean, median, mode
Coefficient of skewness
absolute value of CS > 1, high degree of skewness/ .5 < CS < 1, moderate skewness/ CS < .5, relative symmetry
Coefficient of kurtosis
If the CK > 3, data is somewhat peaked with less dispersion. If the CK < 3, data is somewhat flat with a high dispersion
What includes motivation
Opportunity, professional issues, problem solving, process and performance assessment
four types of objectives
Descriptive, diagnostic, predictive, and prescriptive
Descriptive Objectives
Designed to better understand data to answer business questions
Diagnostic Objectives
To identify a problem or issue to understand why an outcome occurred
Predictive Objectives
Focus on what may happen in the future
Prescriptive objectives
Goal is to investigate how to take advantage of future opportunities or mitigate a future risk outcome
When given the objective what questions should be asked?
Questions that address the objective, focus on a single issue, is measurable, and if the data necessary to answer it are available
Given the question, what relevant analysis should be used?
N/A
Regression analysis
Linear regression builds mathematical and statistical models to explain the relationship between a dependent variable and one or more independent variables
Linear optimization
The process of selecting values of variables that minimize or maximize some quality of interest
Data Plan Analysis
Focus on the objective, select a data strategy, select an analysis strategy, consider risks, embed controls
Measured raw data
Data created or captured by a controlled process capturing the valueu of the data. Their format can be discrete or continuous data
Nonmeasured raw data
Data often created automatically by the computer or company policy for control. These field are typically formatted as discrete data
Calculated Data
Data created when one or more fields in a particular row have any number of mathematical operators applied. These field are formatted as discrete or continuous data.
Measurement scale descriptions
Categorical, Ordinal, interval, and ratio
Categorical Data
Labeled or named data that can be sorted into groups according to specific characteristics
Ordinal Data
Ordered or ranked categorical data
Interval Data
Ordinal data that have equal and constant differences between observations and an arbitrary zero point. Examples of temperature, time, or credit score
Rational data
Interval data with a natural zero point. Economic data, such as dollars or euros
Analysis Strategies for descriptive and diagnostic analystics
Data dispersion, visualizations, correlation, calculations
Data Dispersion
Min, Max, Range, Variance, STD dev, skewness, and kurtosis
Visualizations
Bar, pie charts, histograms, and box plots
Correlation
Scatterplot
Calculations
Totals, subtotals, percent change, percent of total
Risk and embedded controls
Implement controls to reduce risks within the data and analysis strategies
Data profiling
The process of investigating data quality and structure. It has 3 components: Data quality, data structure, and deciding/informing
Table Structures
Aggregate and slice. Aggregate - calculate the total sales amount. Slice - Break the total down by region to examine the regional sales in more detail.
Quality Issues
Issues within a spreadsheet such as gender has Male and M, Date is empty, Email doesn’t have @gmail.com
Imputation
using estimated values are substituted for missing data
Data matching
A process that compares data and determines whether they describe the same entity
Star Schema
The recommended data structure for analytical databases. They consist of fact tables and dimension tables
Dimension Table
Provide context to analysis and give meaning to facts.
Fact tables
Correspond to business transactions such as orders, sales, pruchases, and payments
Cardinalities
The relationship between two tables, specifying how many rows in one table can be associated with how many rows in another table