1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
classification tree
classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules
data preparation definition
the manipulation of data with the goal of putting it into a form suitable for formal modeling
what is the step in data mining that includes addressing missing or erroneous data, reducing the number of variables, defining new variables, and data exploration
data preparation
overfitting
occurs when the analysist builds a model that does a great job of explaining the sample on data on which it is based based but fails to accurately predict outside the sample data
confusion matrix
displays a model’s correct and incorrect classification
estimation of continuous outcome example
determining a freshman’s first year GPA from their SAT score, high school GPA, and number of extracurriculars
observation
or record, is the set of recorded values of variables associated with a single entity and is displayed as a row of values in a spreadsheet or database in which the columns correspond to the values
what does the y-axis of a decile chart show?
ratio of decile mean to overall mean
goal seek (excel)
allows users to determine the value of an input cell that will cause the value of the related output sell to each some specified value
what does separating the parameters from the spreadsheet model do?
enables the user to update the model parameters without the risk of mistakenly creating an error in a formula
what is the mathematical expression for Total Revenue?
Total Revenue = Production Volume × Revenue per Unit
Navigation in a spreadsheet model can be facilitated by what?
using clear labels and proper formatting and alignment
what is a good way to proceed with the influence diagram building for a problem?
The influence diagram for a portion of the problem is built first and then expanded until the total problem is conceptually modeled
influence diagram
a visual representation that shows which entities affect others in a model
an arrow
the visual depiction of the influence in an influence diagram
make-versus-buy
a decision in which companies have to decide whether they should manufacture a product or outsource production to another firm
controllable input
An input to a simulation model that is selected by the decision maker
NORM.INV
an excel function used with a given mean and standard deviation, that generate a value for the random variable characterized by a normal distribution.
continuous probability
in this distribution, a random variable can take any value in a specified range (not just a discrete set of values)
Monte Carlo simulation
uses repeated random sampling to represent uncertainty in a model representing a real system and that computes the values of model outputs
LN(RAND())*(–m)
excel expression that generates an exponential random variable with mean m
Verification
is largely a debugging task to make sure that no errors are in the computer procedure that implements the simulation
Uniform distribution
a type of continuous probability distribution in which the variable of interest takes any value in a specified range with equal probability
=RANDBETWEEN(0, 100)
excel function that generates random integers between the values of 0 to 100, inclusive
binding constraint
a constraint that holds as an equality at the optimal solution
extreme points
The points where constraints intersect on the boundary of the feasible region
objective function contour
a set of points that yield a fixed value of the objective function
objective function
the expression that defines the quantity to be maximized or minimized in a linear programming model
optimal point
Geometrically, binding constraints intersection
Problem formulation
or modeling, is the process of translating a verbal statement of a problem into a mathematical statement
Simplex algorithm
developed by George Dantzig, is quite effective at investigating extreme points in an intelligent way to find the optimal solution to even very large linear programs
conservative
an approach that evaluates each decision alternative in terms of the worst payoff that can occur. It leads to choosing the decision alternative that provides the best of the worst possible payoffs
Conditional probability
the probability of one event, given the known outcome of a (possibly) related event
payoff
a measure of the outcome of a decision such as profit, cost, or time
Bayes’ theorem
enables the use of sample information to revise prior probabilities
node
An intersection or junction point of a decision tree
Brett wants to sell throw blankets for the holiday season at a local flea market. Brett purchases the throws for $15 and sells them to his customers for $35. The rental space is fixed fee of $1,500 for the season. Assume there is no leftover value for unsold units. If he orders 200 and demand is 150, what is the payoff?
Payoff = 150($35) – [$1,500 + 200($15)] = $5,250 – $4,500 = $750
collectively exhaustive
The states of nature are defined so that they are mutually exclusive (no more than one can occur) and collectively exhaustive (at least one must occur)
perfect
A special case of sample information where the information tells the decision maker exactly which state of nature is going to occurre
regression
a statistical technique used to model the relationship between a dependent variable (target) and one or more independent variables (predictors) to forecast outcomes and determine the impact of specific factor
what are the three types of data analytics?
descriptive (what happened), predictive (what will happen), and prescriptive (what should be done)
what are patterns of time series?
trend, Seasonality, Cyclical patterns, and Irregular variations (or noise)
Seasonality
Predictable patterns that repeat at fixed, regular intervals (such as daily, weekly, monthly, or yearly)
Example: A spike in retail sales every December or increased website traffic on Monday mornings
Cyclical Patterns
rises and falls in data that are not of a fixed frequency, often linked to broader economic conditions or business cycles and they typically last longer than a year.
Example: Periods of economic expansion or recession affecting industry-wide sales
Irregular Variations (Noise/Residuals)
Unpredictable, random fluctuations that cannot be explained by trends or cycles
Example: A sudden, one-time spike in calls due to an unexpected news event
what is hierarchical clustering?
an unsupervised machine learning method that organizes data into a tree-like structure of clusters called a dendrogram
Agglomerative (Bottom-Up) dendrogram
Starts with each data point as its own cluster and merges the closest pairs until only one cluster remains
Divisive (Top-Down) dendrogram
Starts with all data points in one cluster and recursively splits them into smaller, distinct clusters
euclidean distance
distance = square root of: (x2 - x21)² + (y2-y1)²
linear regression calculation
y^ = B0 + B1x
y^ = The predicted value of the dependent variable
B0 = (Intercept): The value of y when x is zero.
B1 = (Slope/Coefficient): How much y changes for every one-unit increase in x
x = The independent variable