1/55
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
M1: Define data-driven decision making (analytics)
Using facts, metrics, and data to guide strategic business decisions that align with your goals, objectives, and initiatives. Asking the right questions.
It is the science of applying a structured method to solve a business problem using data and analysis to drive impact.
M1: Define data
A collection of facts used to identify patterns, draw conclusions, make predictions, and make decisions.
M1: Distinguish between data science and decision science: DATA SCIENCE
Toward Insight:
This is the technical track, designed to derive insights from data.
M1: Distinguish between data science and decision science: DECISION SCIENCE
Toward Impact:
This is the business track, designed to align stakeholders so that the valuable insights produced using the data science track can be inserted into the decision-making process and converted into action.
M1: Describe the tasks an analyst may need to perform and the software they might use
Data analysts focus on business analytics and perform tasks such as:
-Accessing, Transforming, and Manipulating (MySQL, Microsoft Excel)
-Statistical Analyses (R, Python)
-Visualizing (Tableau, Power BI Desktop)
M1: Describe and give examples of metric
Quantitative—continuous values
– Mean, Median, Variance, Standard Deviation):
Interval Scale (Common Arithmetic Operations -- Numerical ranking for how service was today, % supervisors assign to good performers %0 bad, 100% good, Low temperature = Bad attitude and high temperature = Good attitude)
-Ratio Scale (All Arithmetic Operations -- Amount purchased, Salesperson Sales volume, Likelihood of performing some act: 0% = No Likelihood to 100% = Certainty, Number of stores visited, Time spent viewing a particular web page, Number of web pages viewed)
M1: Describe and give examples of non-metric data
Qualitative—discrete values:
Ordinal – ranking scale with counting and ordering (Frequency, Mode, Median, Range)
EX: Dissatisfied to Delighted or HS Diploma up to Graduate Degree)
Nominal Scale (absolute value) with only counting (Frequency, Mode
EX: Yes-No, Female-Male, Buy-Did Not Buy, Postal Code ______)
M1: Identify the three characteristics of big data
Volume, Variety, Velocity
M1: Identify the characteristics of valuable data
Relevance, Completeness, Accuracy, Timeliness
M1: Describe the components of a balanced scorecard
Both financial and nonfinancial metrics matter. Looking forward, backward, internally, and externally
M1: Identify Financial Metrics
Profit
Net Present Value (NPV)
Internal Rate of Return (IRR)
Payback
M1: Identify Non-Financial Metrics
Brand Awareness
Product Trials
Churn
Customer Satisfaction (CSAT)
Customer Lifetime Value (CLTV)
Conversions
M1: Identify Customer Metrics
Customer Behavior -- (Frequency of Firm Desired Behavior, Strength of Firm Desired Behavior, Behavioral Intentions) and
Customer Evaluations -- (of Service Provider, of Service Experience, of Goods, of Firm, of Self)
M2: Name the steps in the BADIR process
1) Business question
2) Analysis Plan
3) Data Collection
4) Insights
5) Recommendation
M2: Identify the advantages of taking time to establish the Business Question
1) Reduction of iterations
2) Contributions with actionable recommendations
3) Recognition as a valued partner
4) Solutions originate from discussion not data
5) Quality of decision is proportional to the time invested in fully exploring what the problem is
M2: Name Information Seeking Questions
Who? What? When? Where? Why? How?
M2: Differentiate between questions that establish business intent and business considerations: INTENT
Context: What happened? Why are you interested? What is the problem or opportunity?
Impacted Segment: When did it take place? Where did it happen? Who is impacted?
Potential Reasons: What might have caused this? What do you think drives this?
M2: Differentiate between questions that establish business intent and business considerations: BUSINESS CONSIDERATIONS
Timelines: What decisions need to be taken and by When?
Stakeholder: Who is asking? Who is the decision maker? Who will take action?
Actions: What action are you going to take based on this analysis? Is this required one time (adhoc) vs. recurring (dashboard)?
M2: Describe when to use descriptive analysis and statistical analysis
A) Descriptive Analysis (What is this like?): Categorization, Identifying Patterns and Themes. Examples: Aggregate analysis, trend analysis, sizing/estimation, segmentation, customer life cycle
B) Statistical Analysis (Investigating the Why or What if?) Identifying relationships, Determining Causality, Correlation analysis, trend analysis, predictive analytics, segmentation
M2: Identify questions that would require descriptive analysis vs. statistical analysis --
1. Why has conversion dropped postlaunch of a product?
Statistical
M2: Identify questions that would require descriptive analysis vs. statistical analysis --
2. How many elementary schools exist in New York State?
Descriptive
M2: Identify questions that would require descriptive analysis vs. statistical analysis --
3. Determine if and why revenue growth for "Toys and All" has slowed down over the last few weeks?
Both Descriptive and Statistical
M2: Identify questions that would require descriptive analysis vs. statistical analysis --
4. Can you tell me which offer worked best in the last marketing campaign?
Both Descriptive and Statistical
M2: Identify questions that would require descriptive analysis vs. statistical analysis --
5. Are our London office employees younger than our Singa- pore office employees?
Descriptive
M2: Identify questions that would require descriptive analysis vs. statistical analysis --
6. What are the time cycles for our customers to go from hearing about us to downloading the free game and then paying for the premium features?
Descriptive
M2: Identify questions that would require descriptive analysis vs. statistical analysis --
7. Of our one million customers, to which 200K should I send the next marketing campaign to get the best ROI?
Statistical
M2: Identify questions that would require descriptive analysis vs. statistical analysis --
8. What are the different use cases for which our customer is using our printers? What does it mean for us?
Descriptive
M2: Identify three open-ended questions:
These types of questions prompt people to answer with sentences, lists, and stories. They give deeper and new insights.
1. What is your current understanding?
2. What have you considered?
3. What surprised you?
Closed-ended questions limit answers, thus tighter stats.
M2: Identify divergent questions: Go/No-go
What decision are you thinking about now?
M2: Identify divergent questions: Clarification
What do you mean?
M2: Identify divergent questions: Assumptions
What are your assumptions?
M2: Identify divergent questions: Foundational
How do we know this to be true?
M2: Identify divergent questions: Action
What could or should be done?
M2: Identify divergent questions: Cause
What is the context? Why did this happen?
M2: Identify divergent questions: Effect
What will be the impact or outcome of deciding?
M2: Describe the benefits and steps involved in IWIK questioning
Benenfits:
Clarifies priorities, Uncovers essential information needed, Identifies Knowledge gaps, Defines assumptions, Reveals Biases.
Steps:
1) Preparing questions before data
2) Asking the right people
3) Assessing Needs
4) Working Backwards
5) Examples
M3: Identify the steps involved in developing an analysis plan
This plan has five building blocks: 1) Analysis Goals (research objective)
2) Hypotheses
3) Methodology (how we are collecting the data, where we are collecting it from and techniques to analyze it)
4) Data Required (Specification, what variables do we need to measure)
5) Project Plan.
M3: Describe the ideal characteristics of an analysis goal
Create SMART analysis goals (research objectives) to answer the business question. These help us define the project and what we are trying to get out of it and what we want to accomplish):
-Specific
-Measurable
-Attainable
-Relevant
-Time bound
M3: Distinguish between a hypothesis and an analysis goal
Analysis Goal – More Specific and more measurable. Determine and Define Research Objectives -It would lay out what you can answer directly with the data you have.
A hypothesis is an informed guess as to what is causing the issue you are trying to address with your data analysis. It proposes a relationship between two variables. (If “x” goes up then “y” goes down) -Generated through brainstorming sessions -Hypothesis captures a potential answer, driver, or reason to address business question -Set criteria that will prove or disprove each of your hypotheses -
Hypothesis testing is based on probable theory:
-We take a sample and make an inference about the population of that sample
-We cannot make any statement about a sample with complete certainty
-Usually stated as supported or not supported
M3: Distinguish between independent and dependent variables: INDEPENDENT
Unknowns that may have a relationship with the dependent variable and no relationship with each other.
These are determined by the hypotheses developed to solve the business question. (The blue button and the presence of the banner in the checkout process described in the earlier example are independent variables.)
M3: Distinguish between independent and dependent variables: DEPENDENT
A variable that is the object of the particular predictive analysis. It is determined by the business question that the model is designed to solve.
Example: Conversion
M3: Describe the difference between type 1 and type 2 error: TYPE 1 ERROR
Occurs when sample data suggests that a relationship does exist when in fact a relationship does not exist
M3: Describe the difference between type 1 and type 2 error: TYPE 2 ERROR
Occurs when the sample data suggests that a relationship does not exist when in fact a relationship does exist
M3: Understand when a hypothesis is supported or not supported based on p-value
P-Value: Probability value or the observed or computed significance level (0.1, 0.05, 0.01) p-values are compared to significance levels to test a hypothesis.
Traditionally, researchers have specified an acceptable significance level for a test prior to the analysis.
Most typically, researchers set the acceptable amount of error, and therefore the acceptable significance level, at 0.1, 0.05, or 0.01.
If the p-value resulting from a statistical test is less than the prespecified significance level, the results support a hypothesis implying differences.
To illustrate, if an analyst is comparing sales in two districts and sets the acceptable Type I error at 0.1 and the p-value resulting from the test is 0.03, then the results support a hypothesis suggesting differences in sales in the two districts.
M3: Describe the steps involved in specifying the methodology
Specify how we are going to collect data, where is it coming from, and what analysis are we going to perform on it
1) Determine level of granularity
2) Assign unique ID
3) Aggregate it.
Only begins once the complete analysis plan is agreed upon by the key stakeholders
1) Only relevant data that is useful to prove or disprove a hypothesis should be collected. - determine granularity needed to answer the question.
2) Data specifications should be written before you go into data collection.
M3: Identify questions that could be answered with a Correlation
Look at variables that correlate with something that the business is trying to impact.
This analysis methodology is used most frequently to solve business problems related to understanding drivers of the business or an event (Best with Continuous variables).
The statistical measure of the linear relationship between two or more metric variables, as represented by the correlation coefficient (r) with a value at or between +1 and −1.
A t-test can be used to identify whether a correlation is statistically significant by providing a p-value, allowing the analyst to determine if the hypothesis is supported or not.
M3: Identify questions that could be answered with a Cross-tabulation
Also known as contingency table analysis, is most often used to analyze categorical (nominal measurement scale) data. The Chi-square statistic is the primary statistic used for testing the statistical significance of the cross-tabulation table. Chi-square tests determine whether or not the two variables are independent.
M3: Identify questions that could be answered with a Linear Regression
Approach to model linear relationship between scalar dependent variable and one or more independent variables. Usually applied towards Customer lifetime value, cost of acquisition. Can be used to predict change in an outcome.
M3: Identify questions that could be answered with a Logistic Regression
A special case of regression in which the dependent variable is not continuous. Instead, it is discrete, or categorical, and mostly binary (0/1). It is commonly used when there are a number of independent decisions, or discrete actions, like churn and fraud prediction
M4: Describe the two steps of data collection in the BADIR process: DATA PULL
Collect the data as per the data specification. Pull a small sample and eyeball to make sure you are getting what you want. Make sure it matches metrics.
M4: Describe the two steps of data collection in the BADIR process: DATA CLEANSING and VALIDATION
Clean the data to make it useable and validate the data to make sure it is accurate. Validate data as you go, is it what you were expecting, triangulate data by looking at some key metrics from the data you have and matching it to some other distinctly different sources.
M4: Describe the impact of missing data: (PRACTICAL IMPACT)
This impact of missing data is the reduction of the sample size available for analysis. It may reduce sample size to an inadequate sample size. In such situations, the researcher must either gather additional observations or find a remedy for the missing data in the original sample
M4: Describe the impact of missing data: (SUBSTANTIVE PERSPECTIVE)
Any statistical results based on data with a nonrandom missing data process could be inaccurate. The effects of missing data are sometimes termed "hidden" due to the fact that we still get results from the analyses even without the missing data.
M4: Define the concept of an outlier
Also known as anomalies, in the parlance of data mining, are observations with a unique combination of characteristics identifiable as distinctly different from what is "normal." One must define the "context" of the data to establish what is "normal" in order to detect outliers.
The designation of an outlier occurs in two distinct stages of the analysis: pre-analysis and post-analysis. Defining "normal" will vary in each stage as the objectives of outlier designation change.
M4: Describe the impact of outliers
It cannot be characterized as either beneficial or problematic, but instead must be viewed within the context of the analysis and should be evaluated by the types of information they may provide.
-Beneficial: —although different from the majority of the sample—may be indicative of characteristics of the population that would not be discovered in the normal course of analysis.
-Problematic outliers are not representative of the population, are counter to the objectives of the analysis, and can seriously distort statistical tests.
May provide feedback for necessary adjustments to the analysis
3 Types:
Error outliers. These are observations/cases that differ from the “normal” because of inaccuracies in data collection, etc. The remedy for this type of outlier is to correct the error or if not possible, remove the observation from the analysis.
Interesting outliers. These observations are different and/or unique such that they may bring new insight into the analysis. The suggestion to study these observations underscores the need for domain knowledge of the context of the analysis to understand whether these observations add to existing knowledge of the context.
Influential outliers. These observations are defined in terms of their impact on the analysis and are identified in the post-analysis stage. At this point, they had already been considered representative of the population in the pre-analysis stage, thus the researcher must either accommodate them in the analysis (perhaps through some robust methodology) or delete them from the analysis.
M4: Describe the concept of a dummy variable and when it is used
Acts as replacement variables for the nonmetric variable. Example: Gender is a non-metric variable. Create and assign dummy variable X1 to Females, X2 to Males.
Dummy variables are used most often in regression and discriminant analysis, where the coefficients have direct interpretation.