QMMS - FINALS
Module 2: Data Collection
Overview
Raw data must be processed to provide useful information.
Data reduction simplifies data by highlighting important features and patterns, providing a concise yet accurate view.
Lesson 1: Collecting Data
A framework is a formal structure that lists the key elements or factors used in enterprise architecture.
Data and Information
Managers require relevant information for decision-making, which starts with data collection.
Data is then processed into information and presented in suitable formats.
Data collection is essential for informed decision-making within an organization.
The Three Main Steps in Preparing Information:
Data collection
Processing to give information
Presentation
Data collection requires careful planning, not chance.
Detailed Steps in Data Collection Planning:
Define the purpose of the data.
Describe the data needed to achieve this purpose.
Check available secondary data for usefulness.
Define the population and sampling frame for primary data.
Choose the best sampling method and sample size.
Identify an appropriate sample.
Design a questionnaire or other method of data collection.
Run a pilot study and check for problems.
Train interviewers, observers, or experimenters.
Conduct the main data collection.
Perform follow-up, such as contacting non-respondents.
Analyze and present the results.
Amount of Data
Three important questions for data collection are the amount of data, its source, and the means of collection.
Marginal cost is the extra cost of collecting one more bit of data and increases with the amount of data collected.
Marginal benefit is the benefit from the last bit of data collected and decreases with the amount collected.
The optimal amount of data is collected when marginal cost equals marginal benefit.
Collecting less data results in lost potential benefit, while collecting more data wastes resources.
Types of Data
Nominal data: Cannot be quantified with meaningful units.
Ordinal data: Categories can be ranked in a meaningful order.
Cardinal data: Can be measured directly.
Cardinal Data Types:
Discrete: Takes only integer values.
Continuous: Can take any value, not restricted to integers.
Examples:
(a) Nominal data:
Percentage of respondents who would vote for political party X: 35%
Percentage of respondents who would vote for political party Y: 40%
Percentage of respondents who would vote for political party Z: 20%
Percentage of respondents who do not know who they would vote for: 5%
(b) Ordinal data:
Percentage of people who feel 'very strongly' in favor of a proposal: 8%
Percentage of people who feel 'strongly' in favor of a proposal: 14%
Percentage of people who feel 'neutral' about a proposal: 49%
Percentage of people who feel 'strongly' against a proposal: 22%
Percentage of people who feel 'very strongly' against a proposal: 7%
(c) Cardinal data:
Percentage of people in a club who are less than 20 years old: 12%
Percentage of people in a club who are between 20 and 35 years old: 18%
Percentage of people in a club who are between 35 and 50 years old: 27%
Percentage of people in a club who are between 50 and 65 years old: 29%
Percentage of people in a club who are more than 65 years old: 14%
Primary and Secondary Data
Primary data: New data collected by an organization for a specific purpose ('field research').
Secondary data: Existing data collected by other organizations or for other purposes ('desk research').
Using Samples to Collect Data
A population consists of all the people or items that could supply data.
These are listed in a sampling frame.
A census collects data from the whole population.
A sample collects data from a representative sample of the population and uses it to estimate values for the whole population.
Sampling is usually more cost and time effective than the Census.
There are several types of samples including random, systematic, stratified, quota, multi-stage and cluster samples.
Organizing Data Collection
Two alternatives for collecting data from a sample: observation and questionnaires.
Questionnaires can be administered through personal interview, telephone interview, the Internet, postal survey, panel survey, or longitudinal survey.
Questionnaires Design Guidelines
A questionnaire should ask a series of related questions in a logical sequence.
Make the questionnaire as short as possible to reduce collection and analysis costs.
Questions should be short, simple, unambiguous, easy to understand, and phrased in everyday terms.
Even simple changes to phrasing can give very different results.
Avoid leading questions that encourage conformity over truthful answers.
Use phrases that are as neutral as possible.
Phrase all personal questions carefully.
Do not give warnings, as they discourage responses.
Avoid vague questions that lack clarity.
Ask positive questions rather than less definite ones.
Avoid hypothetical questions that lead to speculative and unrealistic answers.
Avoid asking two or more questions in one, which can confuse respondents.
Open questions collect general views but favor articulate and quick-thinking individuals and are difficult to analyze.
Ask questions with precoded answers from a set of alternatives.
Be prepared for unexpected effects, such as sensitivity to the color and format of the questionnaire, or different types of interviewer getting different responses.
Always run a pilot survey before starting the whole survey to identify problems and improve the questionnaire design.
Examine non-respondents to make sure they do not introduce bias.
Potential Errors in Data Collection
Failure to identify an appropriate population
Choosing a sample that does not represent this population
Mistakes in contacting members of the sample
Mistakes in collecting data from the sample
Introducing bias from non-respondents
Mistakes made during data analysis
Drawing invalid conclusions from the analysis
Lesson 2: Using Numbers to Describe Data
Numerical measures give more objective and accurate data descriptions.
Two key measures describe the location and spread of data.
Measures of Location
Show where the center of the data is, giving some kind of typical or average value.
Alternatives are: the median (which is the middle value, when they are ranked in order of size) and the mode (which is the most frequently occurring value).
Measures of Spread
Show how the data is scattered around this centre, giving an idea of the range of values.
Arithmetic Mean
Add all the values together to get the sum
Divide this sum by the number of values to get the mean.
Formula:
Median Calculation
Arranging the values in order of size.
Counting the number of values.
Identifying the middle value, giving the median.
Mode Calculation
Drawing a frequency distribution of the data.
Identifying the most frequent value, giving the mode.
Choice of Measure of Location
Each of these three measures for location gives a different view:
The mean is the simple average.
The median is the middle value.
The mode is the most frequent value.
Measures of Spread
Other measures are needed for the spread of data.
The obvious measure is range, but this can be affected by a few outlying results.
More reliable values come from the interquartile range, or quartile deviation.
Range
The simplest measure of spread is the range, which is the difference between the largest and smallest values in a set of data.
The broader the range, the more spread out the data.
Formula:
Interquartile Range
Formula:
Quartile Deviation
Formula:
Mean Absolute Deviation
The deviation is the difference between a value and the mean.
A basic measure gives the mean absolute deviation.
Alternatively, we can square the deviations and calculate the mean squared deviation - or the variance.
Variance
Formula:
Standard Deviation
The square root of the variance is the standard deviation, which is the most widely used measure of spread.
We can usually estimate the number of observations expected within a certain number of standard deviations of the mean.
The standard deviation is used for other analyses, such as the coefficient of variation (which gives a relative view of spread) and the coefficient of skewness (which describes the shape of a distribution).
Lesson 4: Describing Changes with Index Numbers
Managers often have to monitor the way in which some value changes over time.
Index numbers give a way of monitoring such changes.
This chapter continues the theme of data presentation, using index numbers to describe the way that values change over time.
Measuring Change
The values of most variables change over time.
You can use an index to monitor these changes.
Indices
An index is defined as the ratio of the current value of a variable over its base value, which is its value in the base period.
This is normally multiplied by 100 to give a more convenient figure.
The difference in an index between periods shows the percentage point change.
Calculations with Indices
The base period of an index can be any convenient point, but it should be revised periodically.
To calculate a new index, you multiply the old index by a constant.
Changing the Base Period
An index can use any convenient base period, but rather than keep the same one for a long time, it is often best to update it periodically.
There are two reasons for this:
Changing circumstances: significant changes that make comparisons with earlier periods meaningless.
An index becomes too large
Indices for More Than One Variable
Simple indices monitor changes in a single variable, but sometimes you are concerned with changes in a combination of different variables.
Indices that measure changes in a number of variables are called aggregate indices.
Aggregate Indices
Two basic aggregate indices are the simple aggregate index and the mean price relative index.
The mean of the separate indices for each item.
An index based on the total cost.
A reasonable aggregate index must take into account two factors:
The price paid for each unit of a product.
The number of units of each product used.
Better options use base-period weighting and current-period weighting.
Base-Weighted Index
Formula:
This is sometimes called the Laspeyres index after its inventor.
It assumes that amounts bought do not change over time, and it does not respond to general trends in buying habits or responses to specific changes in price.
Base-weighted indices do not notice that people substitute cheaper items for ones whose price is rising, so they tend to be too high.
Current-Weighted Index
Formula:
This is sometimes called the Paasche index, which has the advantage of giving an accurate measure of changes in the costs of current purchases.
The calculation changes each period, so it does not give a direct comparison over time.
A Paasche index introduces new products that are relatively cheaper than they were in the base period, so it tends to be too low.
Other Weighted Indices
Base-weighting and current-weighting indices both assign weights to prices according to the quantities bought.
We can assign other weights, w, to the prices to reflect some other measure of importance, and define a weighted index as:
The Retail Price Index is a widely accepted measure of price increase based on the expenditure of a typical family.
Module 3: Solving Management Problems
Overview
Quantitative methods are used to solve real-world management problems.
The module introduces uncertainty through probability and statistical analyses.
Topics covered include financial calculations, performance measurement, regression analysis, forecasting, and linear programming.
Lesson Outcomes
Appreciate the importance of measuring performance.
Calculate performance ratios.
Find break-even points.
Understand the purpose of regression.
Measure the strength of relationships.
Measure errors introduced by noise.
Understand multiple regression.
Use curve fitting for complex functions.
Appreciate the importance of forecasting.
List forecasting methods.
Discuss judgmental forecasting.
Understand time series.
Describe linear programming steps.
Interpret computer package printouts.
Lesson 1: Finance and Performance
A range of common performance measures and financial calculations.
Measures of Performance
Managers measure performance to assess organizational function, target achievement, and improvement rates.
Measures include operations (capacity, output) and finance (profit, share price).
Capacity depends on resource organization and management.
Capacity varies over time due to factors like employee fatigue.
Performance Ratios
Absolute measures have limitations; performance ratios add context.
Common ratios relate to operations (productivity, utilization) and finance (profit margins, return on assets).
Four Main Types of Partial Productivity:
Equipment productivity.
Labor productivity.
Capital productivity.
Energy productivity.
Financial Ratios
Profit is a crucial measure.
Common Profit Ratios:
Ratios Important for Investors:
Break-Even Point
The break-even point is the production quantity at which revenue covers all costs, and the organization starts making a profit.
Extensions consider economies of scale, average, and marginal costs.
Profit and Loss Calculation:
Profit when \text{N} > \text{break-even point}.
Revenue equals total cost at break-even point:
Loss when \text{N} < \text{break-even point}.
Value of Money Over Time
The value of money changes over time due to interest.
An amount available now can earn interest and grow over time, usually by compound interest.
Compound Interest
Formula:
Discounting to Present Value
You can compare amounts of money available at different times by discounting to their present values.
Subtracting the present value of all costs from the present value of all revenues gives a net present value.
Formula:
net present value = sum of discounted revenues - sum of discounted costs
Internal Rate of Return
The internal rate of return (IRR) is the discount rate that gives a net present value of zero.
Depreciation
Organizations write-down the value of their assets each year, meaning that they reduce the book value by an amount of depreciation.
Straight-Line Depreciation
This reduces the value of equipment by a fixed amount each year.
Formula:
Reducing-Balance Depreciation
This reduces the value of equipment by a fixed percentage of its residual value each year.
Formula :
Lesson 2: Regression and Curve Fitting
This lesson will show how to find and measure the relationships between variables.
Measuring Relationships
A relationship between two variables means that values of a dependent variable, y, are related to values of an independent variable, x.
In practice, few relationships are perfect and there is inevitably some random noise.
Errors
The noise means that there is a difference between expected values and observed values.
The amount of noise determines the strength of a relationship - and we can consider the noise as an error.
Stronger relationships have less noise.
You can measure the error using the mean error, mean absolute error and mean squared error.
The mean squared error is the most widely used.
Error Calculations
For each observation i,
Linear Relationships
Linear regression finds the line of best fit through a set of data.
This line is defined as the one that minimizes the sum of squared errors.
The main use of linear regression is to predict the value of a dependent variable for a known value of an independent variable.
Basic Approach of Linear Regression:
Draw a scatter diagram.
Identify a linear relationship.
Find the equation for the line of best fit through the data.
Use this line to predict a value for the dependent variable from a known value of the independent variable.
Equation:
Equation calculations:
Measuring the Strength of a Relationship
Linear regression finds the line of best fit through a set of data, but we really need to measure how good the fit is.
If observations are close to the line, the errors are small and the line is a good fit to the data.
If observations are some way away from the line, errors are large and even the best line is not very good.
Coefficient of Determination
The coefficient of determination measures the proportion of the total variation from the mean explained by the regression line.
A value close to 1 shows that the regression line gives a good fit, while a value close to zero shows a poor fit.
total
explained
unexplained
coefficient of determination =
Coefficient of Correlation
Pearson's correlation coefficient shows how strong the linear relationship is between two variables.
A value close to 1 or -1 shows a strong relationship, while a value close to zero shows a weak one.
Spearman's coefficient gives a correlation for ranked data.
Formula:
Multiple Regression
Sometimes a dependent variable is related to several independent variables.
Then multiple regression finds the best values for the constants a and bi.
Many packages do these calculations automatically, but the interpretation of results can be difficult.
Curve Fitting
Sometimes relationships are clearly not linear - you can use curve fitting to find more complex functions through data.
To fit a more complicated function through a set of data we use non-linear regression - or more generally curve fitting.
Many packages have functions for fitting more complicated curves to data, and spreadsheets typically fit:
linear models:
multiple linear curves:
polynomials:
exponential curves:
growth curves:
Lesson 3: Forecasting
This lesson will discuss methods of forecasting, which is an essential function in every organization.
Methods of Forecasting
There are many different ways of forecasting.
No method is always best, and managers have to choose the most appropriate for particular circumstances.
Forecasting methods can be classified in several ways, including the length of time they cover in the future.
The most useful classification refers to causal, judgmental and projective methods.
Time-Based Classifications:
Long-term forecasts look ahead several years.
Medium-term forecasts look ahead between three months and two years.
Short-term forecasts cover the next few weeks.
Causal forecasting looks for a relationship between variables and then forecasts values for a dependent variable from known values of the independent variable.
Judgmental Forecasts
The key feature of judgmental forecasts is that they use opinions and subjective views of informed people.
Most Widely Used Methods Are:
Personal insight.
Panel consensus.
Market surveys.
Historical analogy.
The Delphi method.
Projective Forecasts
Projective forecasts look only at historical observations and project the underlying patterns into the future.
A basic form of projective forecasting uses simple averages, but this is insensitive and has limited practical use.
Moving averages are more flexible, setting forecasts as the average of the latest n observations and ignoring all older values.
Time Series
Projective forecasts often work with time series, which are series of observations taken at regular intervals.
Common Patterns in Time Series:
Constant series.
Series with a trend.
Seasonal series.
Defining error
or
where: is the forecast for period t, is the value that actually occurs
Mean Error:
Simple Averages
Formula :
Moving Averages
Exponential Smoothing
Exponential smoothing is an efficient forecasting method, which adds portions of the latest observation to the previous forecast.
This automatically reduces the weight given to data as it gets older.
A smoothing constant that typically has a value between 0.1 and 0.2 determines the sensitivity of the forecast.
We can monitor the performance of a forecast using a tracking signal.
Forecasts with Seasonality and Trend
The methods we have described so far give good results for constant time series, but they need adjusting for other patterns.
Common Models:
Additive model:
Multiplicative model:
Finding the Underlying Trends
Linear regression with time as the independent variable.
Moving averages with a period equal to the length of a season.
Finding the Seasonal Indices
Making Forecasts
Project the trend into the future to find the deseasonalised values.
Multiply this by the appropriate seasonal index.
Lesson 4: Linear Programming
Constrained Optimization
Key Characteristics:
An aim of optimizing.
A set of constraints.
Stages of Solving a Linear Program
Formulation.
Solution.
Sensitivity analysis.
Formulation Components
Decision variables
An objective function
A set of constraints
A non-negativity constraint
Graphical Solutions
Each constraint becomes a line on a graph, showing those values that satisfy the constraint.
The feasible region, which is a convex space surrounded by extreme points.
The optimal solution if it exists is always at one of the extreme points.
Sensitivity of Solutions to Changes
Calculations of shadow prices
Assess the effects of changing constraints and the objective function.
Module 4: Introducing Statistics
Overview
This module introduces basic statistical concepts for dealing with uncertainty in management problems.
Lesson Outcomes
Appreciate deterministic vs. stochastic problems.
Define probability and its importance.
Understand probability distributions.
Calculate combinations and permutations.
Work with Normal distributions.
Understand sampling.
Appreciate statistical inference.
Calculate confidence intervals.
Use one-sided distributions.
Use t-distributions for small samples.
Understand hypothesis testing.
List hypothesis testing steps.
Understand errors and significance levels.
Use one- and two-tail tests.
Consider non-parametric tests.
Lesson 1: Uncertainty and Probabilities
Measuring Uncertainty
Deterministic problems have known features with certainty.
Stochastic or probabilistic problems have uncertainty measured by probabilities.
Defining Probability
Probabilities measure the likelihood of an event occurring.
A probability is defined on a scale of 0 to 1.
Probabilities can be calculated a priori or observed empirically. Less reliable estimates are subjective probabilities.
Calculations with Probabilities
An important concept for probabilities is the independence of events.
Independent Events
The independent events "AND" means that you multiply separate probabilities.
mutually exclusive events "OR" means that you add separate probabilities.
Conditional Probabilities
Occur when two events are dependent.
Events are considered dependent when
Bayes' Theorem: $$