BABA - ELECTIVE 3: ANALYTICS LIFE CYCLE

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/56

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

57 Terms

1
New cards

Phase 1- Discovery

the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn

2
New cards

Phase 1- Discovery

The team assesses the resources available to support the project in terms of people, technology, time, and data.

3
New cards

Phase 1- Discovery

Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.

4
New cards

Phase 1- Discovery

- Learning the business domain

-Resources

-Framing the problem

-Identifying Key Stakeholders

-Interviewing analytics sponsor

-Developing Initial Hypotheses

-Identifying Potential Data Source

5
New cards

Phase 2- Data preparation

requires the presence of an analytic sandbox, in which the

team can work with data and perform analytics for the duration of the project

6
New cards

Phase 2- Data preparation

The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data.

7
New cards

Phase 2- Data preparation

-Preparing analytics sandbox

-Performing ETL

-Learning about the data

-Uses Hadoop, Alpine Miner,

-OpenRefine, Data Wrangler

8
New cards

Phase 3-Model planning

the team determines the methods, techniques, and workflow it intends to follow for the subsequent model-building phase

9
New cards

Phase 3-Model planning

The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models.

10
New cards

Phase 3-Model planning

-Data exploration & variable selection

-Model Selection

-Uses R, SQL Analysis Service

-SAS/ACCESS

11
New cards

Phase 4-Model building:

the team develops data sets for testing, training, and production purposes

12
New cards

Phase 4-Model building:

In addition, in this phase the team builds and executes models based on the work done in the model planning phase.

13
New cards

Phase 4-Model building:

The team also considers whether its existing tools will suffice for running the models, or if it will need a more robust environment for executing models and workflows (for example, fast hardware and parallel processing, if applicable).

14
New cards

Phase 4-Model building:

-Develop data sets for training, testing, and production purposes

-Ensure that the training and test datasets are sufficiently robust for the model and analytical techniques

15
New cards

Phase 5-Communicate results

the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1

16
New cards

Phase 5-Communicate results

The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders.

17
New cards

Phase 5-Communicate results

-Articulate!

-Determine if the results are statistically significant and valid

-Validate the IH

18
New cards

Phase 6-0perationalize

the team delivers final reports, briefings, code, and technical documents

19
New cards

Phase 6-0perationalize

In addition, the team may run a pilot project to implement the models in a production environment.

20
New cards

Phase 6-0perationalize

-Pilot testing

-Monitoring; retraining the model

21
New cards

Null hypothesis

There is no correlation between hours of sleep and productivity

22
New cards

Null hypothesis

There is no significant difference in the academic performance among the three specialization tracks of BSIT students in BulSU.

23
New cards

Alternative hypothesis

There is a significant difference in the academic performance among the three specialization tracks of BSIT students in BulSU.

24
New cards

Alternative hypothesis

There is a correlation between hours of sleep and productivity.

25
New cards

Data conditioning

process of preparing data for analysis or uses in a system

26
New cards

cleaning and transforming

Data conditioning may involve tasks such as ______________ and ______________ the data, handling missing or incorrect values, standardizing or normalizing the data, and converting data into a suitable format for a specific application.

27
New cards

ensure that the data is ready to be used

The goal of data conditioning is to ___________________________________effectively and accurately, without any issues that could affect the results of the analysis or the performance of the system.

28
New cards

Considerations in data conditioning

-What are the data sources? Target fields?

-How clean is the data?

-How consistent are the contents and files? Missing or inconsistent values?

-Assess the consistency of the data types - numeric, nominal, ordinal, scale?

-Review the contents to ensure the data makes sense

-Look for evidence of systematic error

29
New cards

calculations remained consistent

Data Preparation (Survey and Visualize)

In data visualization, the following guidelines and considerations are recommended.

Review data to ensure that _______________________ within columns or across tables for a given data field.

30
New cards

distribution stay consistent

Data Preparation (Survey and Visualize)

In data visualization, the following guidelines and considerations are recommended.

Does the data ________________________ over all the data?

31
New cards

granularity & aggregation

Data Preparation (Survey and Visualize)

In data visualization, the following guidelines and considerations are recommended.

Assess the _________________ of the data, the range of values, and the level of __________________ of the data.

32
New cards

time-related

Data Preparation (Survey and Visualize)

In data visualization, the following guidelines and considerations are recommended.

For _______________ variables , are the measurements daily, weekly, or monthly? Is that good enough?

33
New cards

standardized/normalized & scales consistent

Data Preparation (Survey and Visualize)

In data visualization, the following guidelines and considerations are recommended.

Is the data __________________? Are the _____________________? If not, how consistent or irregular is the data?

34
New cards

geospatial & abbreviations consistent

Data Preparation (Survey and Visualize)

In data visualization, the following guidelines and considerations are recommended.

For __________________ datasets, are state or country ___________________ across the data? Are personal names normalized? English units? Metric units?

35
New cards

population of interest

Data Preparation (Survey and Visualize)

In data visualization, the following guidelines and considerations are recommended.

Does the data represent the _____________________ ?

36
New cards

valid and accurate

Model Building

Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives

Does the model appear _____________ and _____________ on the test data?

37
New cards

domain experts

Model Building

Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives

Does the model output/behavior make sense to the ______________? That is, does it appear as if the model is giving answers that make sense in this context?

38
New cards

parameter values

Model Building

Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives

Do the ___________________ of the fitted model make sense in the context of the domain?

39
New cards

accurate

Model Building

Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives

Is the model sufficiently _______________ to meet the goal?

40
New cards

intolerable

Model Building

Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives

Does the model avoid _______________ mistakes? Depending on the context, false positives may be more serious or less serious than false negatives, for instance.

41
New cards

data or more inputs

Model Building

Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives

Are more __________ or more __________needed? Do any of the inputs need to be transformed or eliminated?

42
New cards

model

Model Building

Creating robust models that are suitable to a specific situation requires thoughtful consideration to ensure the models being developed ultimately meet the objectives

Will the kind of ____________ chosen support the runtime requirements?

43
New cards

Null Hypothesis

Discovery (Developing IH)

Accuracy Forecast

Model X does not predict better than the existing model.

44
New cards

Alternative Hypothesis

Discovery (Developing IH)

Accuracy Forecast

Model X predicts better than the existing model.

45
New cards

Null Hypothesis

Discovery (Developing IH)

Recommendation Engine

Algorithm Y does not produce better recommendations than the current algorithm being used.

46
New cards

Alternative Hypothesis

Discovery (Developing IH)

Recommendation Engine

Algorithm Y produces better recommendations than the current algorithm being used.

47
New cards

Null Hypothesis

Discovery (Developing IH)

Regression Modelling

This variable does not affect the outcome because its coefficient is zero.

48
New cards

Alternative Hypothesis

Discovery (Developing IH)

Regression Modelling

This variable affects the outcome because its coefficient is zero.

49
New cards

Data available and accessible

Data Preparation (Learning the Data)

Products shipped

50
New cards

Data available, but not accessible

Data Preparation (Learning the Data)

Product Financials

51
New cards

Data available, but not accessible

Data Preparation (Learning the Data)

Product call center data

52
New cards

Data to collect

Data Preparation (Learning the Data)

Live product feedback surveys

53
New cards

Data to obtain from third party sources

Data Preparation (Learning the Data)

Product sentiment from social media

54
New cards

Consumer packaged goods

Model Planning in Industry Verticals

multiple linear regression, automatic relevance determination (ARD), and decision tree

55
New cards

Retail banking

Model Planning in Industry Verticals

multiple regression

56
New cards

Retail business

Model Planning in Industry Verticals

logistics regression, ARD, decision tree

57
New cards

Wireless Telecom

Model Planning in Industry Verticals

neural network, decision tree, hierarchical neurofuzzy system, rile evolver, logistic regression