1/56
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Phase 1- Discovery
the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn
Phase 1- Discovery
The team assesses the resources available to support the project in terms of people, technology, time, and data.
Phase 1- Discovery
Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.
Phase 1- Discovery
- Learning the business domain
-Resources
-Framing the problem
-Identifying Key Stakeholders
-Interviewing analytics sponsor
-Developing Initial Hypotheses
-Identifying Potential Data Source
Phase 2- Data preparation
requires the presence of an analytic sandbox, in which the
team can work with data and perform analytics for the duration of the project
Phase 2- Data preparation
The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data.
Phase 2- Data preparation
-Preparing analytics sandbox
-Performing ETL
-Learning about the data
-Uses Hadoop, Alpine Miner,
-OpenRefine, Data Wrangler
Phase 3-Model planning
the team determines the methods, techniques, and workflow it intends to follow for the subsequent model-building phase
Phase 3-Model planning
The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models.
Phase 3-Model planning
-Data exploration & variable selection
-Model Selection
-Uses R, SQL Analysis Service
-SAS/ACCESS
Phase 4-Model building:
the team develops data sets for testing, training, and production purposes
Phase 4-Model building:
In addition, in this phase the team builds and executes models based on the work done in the model planning phase.
Phase 4-Model building:
The team also considers whether its existing tools will suffice for running the models, or if it will need a more robust environment for executing models and workflows (for example, fast hardware and parallel processing, if applicable).
Phase 4-Model building:
-Develop data sets for training, testing, and production purposes
-Ensure that the training and test datasets are sufficiently robust for the model and analytical techniques
Phase 5-Communicate results
the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1
Phase 5-Communicate results
The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders.
Phase 5-Communicate results
-Articulate!
-Determine if the results are statistically significant and valid
-Validate the IH
Phase 6-0perationalize
the team delivers final reports, briefings, code, and technical documents
Phase 6-0perationalize
In addition, the team may run a pilot project to implement the models in a production environment.
Phase 6-0perationalize
-Pilot testing
-Monitoring; retraining the model
Null hypothesis
There is no correlation between hours of sleep and productivity
Null hypothesis
There is no significant difference in the academic performance among the three specialization tracks of BSIT students in BulSU.
Alternative hypothesis
There is a significant difference in the academic performance among the three specialization tracks of BSIT students in BulSU.
Alternative hypothesis
There is a correlation between hours of sleep and productivity.
Data conditioning
process of preparing data for analysis or uses in a system
cleaning and transforming
Data conditioning may involve tasks such as ______________ and ______________ the data, handling missing or incorrect values, standardizing or normalizing the data, and converting data into a suitable format for a specific application.
ensure that the data is ready to be used
The goal of data conditioning is to ___________________________________effectively and accurately, without any issues that could affect the results of the analysis or the performance of the system.
Considerations in data conditioning
-What are the data sources? Target fields?
-How clean is the data?
-How consistent are the contents and files? Missing or inconsistent values?
-Assess the consistency of the data types - numeric, nominal, ordinal, scale?
-Review the contents to ensure the data makes sense
-Look for evidence of systematic error
calculations remained consistent
Data Preparation (Survey and Visualize)
In data visualization, the following guidelines and considerations are recommended.
Review data to ensure that _______________________ within columns or across tables for a given data field.
distribution stay consistent
Data Preparation (Survey and Visualize)
In data visualization, the following guidelines and considerations are recommended.
Does the data ________________________ over all the data?
granularity & aggregation
Data Preparation (Survey and Visualize)
In data visualization, the following guidelines and considerations are recommended.
Assess the _________________ of the data, the range of values, and the level of __________________ of the data.
time-related
Data Preparation (Survey and Visualize)
In data visualization, the following guidelines and considerations are recommended.
For _______________ variables , are the measurements daily, weekly, or monthly? Is that good enough?
standardized/normalized & scales consistent
Data Preparation (Survey and Visualize)
In data visualization, the following guidelines and considerations are recommended.
Is the data __________________? Are the _____________________? If not, how consistent or irregular is the data?
geospatial & abbreviations consistent
Data Preparation (Survey and Visualize)
In data visualization, the following guidelines and considerations are recommended.
For __________________ datasets, are state or country ___________________ across the data? Are personal names normalized? English units? Metric units?
population of interest
Data Preparation (Survey and Visualize)
In data visualization, the following guidelines and considerations are recommended.
Does the data represent the _____________________ ?
valid and accurate
Model Building
Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives
Does the model appear _____________ and _____________ on the test data?
domain experts
Model Building
Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives
Does the model output/behavior make sense to the ______________? That is, does it appear as if the model is giving answers that make sense in this context?
parameter values
Model Building
Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives
Do the ___________________ of the fitted model make sense in the context of the domain?
accurate
Model Building
Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives
Is the model sufficiently _______________ to meet the goal?
intolerable
Model Building
Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives
Does the model avoid _______________ mistakes? Depending on the context, false positives may be more serious or less serious than false negatives, for instance.
data or more inputs
Model Building
Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives
Are more __________ or more __________needed? Do any of the inputs need to be transformed or eliminated?
model
Model Building
Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives
Will the kind of ____________ chosen support the runtime requirements?
Null Hypothesis
Discovery (Developing IH)
Accuracy Forecast
Model X does not predict better than the existing model.
Alternative Hypothesis
Discovery (Developing IH)
Accuracy Forecast
Model X predicts better than the existing model.
Null Hypothesis
Discovery (Developing IH)
Recommendation Engine
Algorithm Y does not produce better recommendations than the current algorithm being used.
Alternative Hypothesis
Discovery (Developing IH)
Recommendation Engine
Algorithm Y produces better recommendations than the current algorithm being used.
Null Hypothesis
Discovery (Developing IH)
Regression Modelling
This variable does not affect the outcome because its coefficient is zero.
Alternative Hypothesis
Discovery (Developing IH)
Regression Modelling
This variable affects the outcome because its coefficient is zero.
Data available and accessible
Data Preparation (Learning the Data)
Products shipped
Data available, but not accessible
Data Preparation (Learning the Data)
Product Financials
Data available, but not accessible
Data Preparation (Learning the Data)
Product call center data
Data to collect
Data Preparation (Learning the Data)
Live product feedback surveys
Data to obtain from third party sources
Data Preparation (Learning the Data)
Product sentiment from social media
Consumer packaged goods
Model Planning in Industry Verticals
multiple linear regression, automatic relevance determination (ARD), and decision tree
Retail banking
Model Planning in Industry Verticals
multiple regression
Retail business
Model Planning in Industry Verticals
logistics regression, ARD, decision tree
Wireless Telecom
Model Planning in Industry Verticals
neural network, decision tree, hierarchical neurofuzzy system, rile evolver, logistic regression