QMMS - FINALS

Module 2: Data Collection

Overview

Raw data must be processed to provide useful information.
Data reduction simplifies data by highlighting important features and patterns, providing a concise yet accurate view.

Lesson 1: Collecting Data

A framework is a formal structure that lists the key elements or factors used in enterprise architecture.

Data and Information

Managers require relevant information for decision-making, which starts with data collection.
Data is then processed into information and presented in suitable formats.
Data collection is essential for informed decision-making within an organization.

The Three Main Steps in Preparing Information:

Data collection
Processing to give information
Presentation

Data collection requires careful planning, not chance.

Detailed Steps in Data Collection Planning:

Define the purpose of the data.
Describe the data needed to achieve this purpose.
Check available secondary data for usefulness.
Define the population and sampling frame for primary data.
Choose the best sampling method and sample size.
Identify an appropriate sample.
Design a questionnaire or other method of data collection.
Run a pilot study and check for problems.
Train interviewers, observers, or experimenters.
Conduct the main data collection.
Perform follow-up, such as contacting non-respondents.
Analyze and present the results.

Amount of Data

Three important questions for data collection are the amount of data, its source, and the means of collection.
Marginal cost is the extra cost of collecting one more bit of data and increases with the amount of data collected.
Marginal benefit is the benefit from the last bit of data collected and decreases with the amount collected.
The optimal amount of data is collected when marginal cost equals marginal benefit.
Collecting less data results in lost potential benefit, while collecting more data wastes resources.

Types of Data

Nominal data: Cannot be quantified with meaningful units.
Ordinal data: Categories can be ranked in a meaningful order.
Cardinal data: Can be measured directly.

Cardinal Data Types:

Discrete: Takes only integer values.
Continuous: Can take any value, not restricted to integers.

Examples:

(a) Nominal data:
- Percentage of respondents who would vote for political party X: 35%
- Percentage of respondents who would vote for political party Y: 40%
- Percentage of respondents who would vote for political party Z: 20%
- Percentage of respondents who do not know who they would vote for: 5%
(b) Ordinal data:
- Percentage of people who feel 'very strongly' in favor of a proposal: 8%
- Percentage of people who feel 'strongly' in favor of a proposal: 14%
- Percentage of people who feel 'neutral' about a proposal: 49%
- Percentage of people who feel 'strongly' against a proposal: 22%
- Percentage of people who feel 'very strongly' against a proposal: 7%
(c) Cardinal data:
- Percentage of people in a club who are less than 20 years old: 12%
- Percentage of people in a club who are between 20 and 35 years old: 18%
- Percentage of people in a club who are between 35 and 50 years old: 27%
- Percentage of people in a club who are between 50 and 65 years old: 29%
- Percentage of people in a club who are more than 65 years old: 14%

Primary and Secondary Data

Primary data: New data collected by an organization for a specific purpose ('field research').
Secondary data: Existing data collected by other organizations or for other purposes ('desk research').

Using Samples to Collect Data

A population consists of all the people or items that could supply data.
These are listed in a sampling frame.
A census collects data from the whole population.
A sample collects data from a representative sample of the population and uses it to estimate values for the whole population.
Sampling is usually more cost and time effective than the Census.
There are several types of samples including random, systematic, stratified, quota, multi-stage and cluster samples.

Organizing Data Collection

Two alternatives for collecting data from a sample: observation and questionnaires.
Questionnaires can be administered through personal interview, telephone interview, the Internet, postal survey, panel survey, or longitudinal survey.

Questionnaires Design Guidelines

A questionnaire should ask a series of related questions in a logical sequence.
Make the questionnaire as short as possible to reduce collection and analysis costs.
Questions should be short, simple, unambiguous, easy to understand, and phrased in everyday terms.
Even simple changes to phrasing can give very different results.
Avoid leading questions that encourage conformity over truthful answers.
Use phrases that are as neutral as possible.
Phrase all personal questions carefully.
Do not give warnings, as they discourage responses.
Avoid vague questions that lack clarity.
Ask positive questions rather than less definite ones.
Avoid hypothetical questions that lead to speculative and unrealistic answers.
Avoid asking two or more questions in one, which can confuse respondents.
Open questions collect general views but favor articulate and quick-thinking individuals and are difficult to analyze.
Ask questions with precoded answers from a set of alternatives.
Be prepared for unexpected effects, such as sensitivity to the color and format of the questionnaire, or different types of interviewer getting different responses.
Always run a pilot survey before starting the whole survey to identify problems and improve the questionnaire design.
Examine non-respondents to make sure they do not introduce bias.

Potential Errors in Data Collection

Failure to identify an appropriate population
Choosing a sample that does not represent this population
Mistakes in contacting members of the sample
Mistakes in collecting data from the sample
Introducing bias from non-respondents
Mistakes made during data analysis
Drawing invalid conclusions from the analysis

Lesson 2: Using Numbers to Describe Data

Numerical measures give more objective and accurate data descriptions.
Two key measures describe the location and spread of data.

Measures of Location

Show where the center of the data is, giving some kind of typical or average value.
Alternatives are: the median (which is the middle value, when they are ranked in order of size) and the mode (which is the most frequently occurring value).

Measures of Spread

Show how the data is scattered around this centre, giving an idea of the range of values.

Arithmetic Mean

Add all the values together to get the sum
Divide this sum by the number of values to get the mean.
Formula: $\text{mean} = \bar{x} = \frac{\Sigma x}{n}$

Median Calculation

Arranging the values in order of size.
Counting the number of values.
Identifying the middle value, giving the median.

Mode Calculation

Drawing a frequency distribution of the data.
Identifying the most frequent value, giving the mode.

Choice of Measure of Location

Each of these three measures for location gives a different view:
- The mean is the simple average.
- The median is the middle value.
- The mode is the most frequent value.

Measures of Spread

Other measures are needed for the spread of data.
The obvious measure is range, but this can be affected by a few outlying results.
More reliable values come from the interquartile range, or quartile deviation.

Range

The simplest measure of spread is the range, which is the difference between the largest and smallest values in a set of data.
The broader the range, the more spread out the data.
Formula: $\text{range} = \text{largest value} - \text{smallest value}$

Interquartile Range

Formula: $\text{interquartile range} = Q3 - Q1$

Quartile Deviation

Formula: $\text{Quartile deviation} = \frac{Q3 - Q1}{2}$

Mean Absolute Deviation

The deviation is the difference between a value and the mean.
A basic measure gives the mean absolute deviation.
Alternatively, we can square the deviations and calculate the mean squared deviation - or the variance.
$\text{mean deviation} = \frac{\Sigma(x - \bar{x})}{n}$
$\text{mean absolute deviation} = \frac{\Sigma |x - \bar{x}|}{n}$
$MAD = \frac{\Sigma |X - \bar{X}|}{n}$

Variance

Formula: $\text{variance} = \frac{\Sigma(x-\bar{x})^2}{n}$

Standard Deviation

The square root of the variance is the standard deviation, which is the most widely used measure of spread.
We can usually estimate the number of observations expected within a certain number of standard deviations of the mean.
$\text{standard deviation} = \sqrt{\frac{\Sigma(x-\bar{x})^2}{n}} = \sqrt{\text{variance}}$
The standard deviation is used for other analyses, such as the coefficient of variation (which gives a relative view of spread) and the coefficient of skewness (which describes the shape of a distribution).

Lesson 4: Describing Changes with Index Numbers

Managers often have to monitor the way in which some value changes over time.
Index numbers give a way of monitoring such changes.
This chapter continues the theme of data presentation, using index numbers to describe the way that values change over time.

Measuring Change

The values of most variables change over time.
You can use an index to monitor these changes.

Indices

An index is defined as the ratio of the current value of a variable over its base value, which is its value in the base period.
This is normally multiplied by 100 to give a more convenient figure.
$\text{index for the time} = \frac{\text{value at the time}}{\text{value in base period}} \times 100 = \frac{\text{value at the time}}{\text{base value}}$
The difference in an index between periods shows the percentage point change.

Calculations with Indices

The base period of an index can be any convenient point, but it should be revised periodically.
To calculate a new index, you multiply the old index by a constant.
$\text{Index In period N} = \frac{\text{value in period N}}{\text{base value}} \times 100$

Changing the Base Period

An index can use any convenient base period, but rather than keep the same one for a long time, it is often best to update it periodically.
There are two reasons for this:
- Changing circumstances: significant changes that make comparisons with earlier periods meaningless.
- An index becomes too large
$\text{old index} = \frac{\text{value in period M}}{\text{old base value}} \times 100$
$\text{new index} = \frac{\text{value in period M}}{\text{new base value}} \times 100$
$\text{new Index} = \text{old Index} \times \frac{\text{old base value}}{\text{new base value}}$

Indices for More Than One Variable

Simple indices monitor changes in a single variable, but sometimes you are concerned with changes in a combination of different variables.
Indices that measure changes in a number of variables are called aggregate indices.

Aggregate Indices

Two basic aggregate indices are the simple aggregate index and the mean price relative index.
The mean of the separate indices for each item.
$\text{mean price relative index for period} = \frac{\text{sum of separate indices for period}}{\text{number of indices}}$
An index based on the total cost.
$\text{simple aggregate index for period n} = \frac{\text{sum of price in period n}}{\text{sum of prices in base period}} \times 100$
A reasonable aggregate index must take into account two factors:
- The price paid for each unit of a product.
- The number of units of each product used.
Better options use base-period weighting and current-period weighting.

Base-Weighted Index

Formula: $\text{base-weighted Index} = \frac{\text{cost of base period quantities at current prices}}{\text{cost of base period quantities at base period prices}} \times 100 = \frac{\Sigma Q0 Pn}{\Sigma Q0 P0} \times 100$
This is sometimes called the Laspeyres index after its inventor.
It assumes that amounts bought do not change over time, and it does not respond to general trends in buying habits or responses to specific changes in price.
Base-weighted indices do not notice that people substitute cheaper items for ones whose price is rising, so they tend to be too high.

Current-Weighted Index

Formula: $\text{current-weighted Index} = \frac{\text{cost of current quantities at current prices}}{\text{cost of current quantities at base period prices}} \times 100 = \frac{\Sigma Qn Pn}{\Sigma Qn P0} \times 100$
This is sometimes called the Paasche index, which has the advantage of giving an accurate measure of changes in the costs of current purchases.
The calculation changes each period, so it does not give a direct comparison over time.
A Paasche index introduces new products that are relatively cheaper than they were in the base period, so it tends to be too low.

Other Weighted Indices

Base-weighting and current-weighting indices both assign weights to prices according to the quantities bought.
We can assign other weights, w, to the prices to reflect some other measure of importance, and define a weighted index as:
$weighted index = \frac{\Sigma w p_i/P}{\Sigma w} \times 100$
The Retail Price Index is a widely accepted measure of price increase based on the expenditure of a typical family.

Module 3: Solving Management Problems

Overview

Quantitative methods are used to solve real-world management problems.
The module introduces uncertainty through probability and statistical analyses.
Topics covered include financial calculations, performance measurement, regression analysis, forecasting, and linear programming.

Lesson Outcomes

Appreciate the importance of measuring performance.
Calculate performance ratios.
Find break-even points.
Understand the purpose of regression.
Measure the strength of relationships.
Measure errors introduced by noise.
Understand multiple regression.
Use curve fitting for complex functions.
Appreciate the importance of forecasting.
List forecasting methods.
Discuss judgmental forecasting.
Understand time series.
Describe linear programming steps.
Interpret computer package printouts.

Lesson 1: Finance and Performance

A range of common performance measures and financial calculations.

Measures of Performance

Managers measure performance to assess organizational function, target achievement, and improvement rates.
Measures include operations (capacity, output) and finance (profit, share price).
Capacity depends on resource organization and management.
Capacity varies over time due to factors like employee fatigue.

Performance Ratios

Absolute measures have limitations; performance ratios add context.
Common ratios relate to operations (productivity, utilization) and finance (profit margins, return on assets).
$\text{utilisation} = \frac{\text{amount of capacity used}}{\text{available capacity}}$
$partial productivity = \frac{\text{amount of products made}}{\text{units of a single resource used}}$

Four Main Types of Partial Productivity:

Equipment productivity.
Labor productivity.
Capital productivity.
Energy productivity.

Financial Ratios

Profit is a crucial measure.
$\text{profit} = \text{revenue} - \text{costs}$

Common Profit Ratios:

$\text{profit margin} = \frac{\text{profit before tax and interest}}{\text{sales}} \times 100$
$\text{return on assets} = \frac{\text{profit before interest and tax}}{\text{fix assets + current assets}} \times 100$
$\text{acid test} = \frac{\text{liquid assets}}{\text{current liabilities}}$

Ratios Important for Investors:

$\text{return on equity} = \frac{\text{profit after tax}}{\text{shareholders' money}} \times 100$
$\text{gearing} = \frac{\text{borrowed money}}{\text{shareholders' money}}$
$\text{earnings per share} = \frac{\text{profit after tax}}{\text{number of shares}}$
$\text{dividends per share} = \frac{\text{amount distributed as dividends}}{\text{number of shares}}$
$\text{price-earnings ratio} = \frac{\text{share price}}{\text{earnings per share}}$
$\text{dividend cover} = \frac{\text{profit after tax}}{\text{profit distributed to shareholders}}$
$yield = \frac{\text{dividend}}{\text{share price}} \times 100\%$

Break-Even Point

The break-even point is the production quantity at which revenue covers all costs, and the organization starts making a profit.
Extensions consider economies of scale, average, and marginal costs.
$PN = F + CN$
$break-even point = N = \frac{F}{P-C}$

Profit and Loss Calculation:

Profit when \text{N} > \text{break-even point}. $\text{profit} = N(P-C) - F$
Revenue equals total cost at break-even point: $N(P-C) = F$
Loss when \text{N} < \text{break-even point}. $loss = F - N(P-C)$

Value of Money Over Time

The value of money changes over time due to interest.
An amount available now can earn interest and grow over time, usually by compound interest.

Compound Interest

Formula: $At = A0 \times (1+i)^t$

Discounting to Present Value

You can compare amounts of money available at different times by discounting to their present values.
Subtracting the present value of all costs from the present value of all revenues gives a net present value.
Formula: $A0 = \frac{At}{(1+i)^t}= A_t \times (1+i)^{-t}$
net present value = sum of discounted revenues - sum of discounted costs

Internal Rate of Return

The internal rate of return (IRR) is the discount rate that gives a net present value of zero.

Depreciation

Organizations write-down the value of their assets each year, meaning that they reduce the book value by an amount of depreciation.

Straight-Line Depreciation

This reduces the value of equipment by a fixed amount each year.
Formula: $annual depreciation = \frac{\text{cost of equipment} - \text{scrap value}}{\text{life of equipment}}$

Reducing-Balance Depreciation

This reduces the value of equipment by a fixed percentage of its residual value each year.
Formula : $At= A0 \times (1-i)^t$

Lesson 2: Regression and Curve Fitting

This lesson will show how to find and measure the relationships between variables.

Measuring Relationships

A relationship between two variables means that values of a dependent variable, y, are related to values of an independent variable, x.
In practice, few relationships are perfect and there is inevitably some random noise.

Errors

The noise means that there is a difference between expected values and observed values.
The amount of noise determines the strength of a relationship - and we can consider the noise as an error.
Stronger relationships have less noise.
You can measure the error using the mean error, mean absolute error and mean squared error.
The mean squared error is the most widely used.

Error Calculations

For each observation i, $\text{error}, E_i = \text{actual value} - \text{expected value from the relationship}$
$\text{mean error} = \frac{\Sigma Ei}{n} = \frac{\Sigma (yi - \hat{y}_i)}{n}$
$\text{mean absolute error} = \frac{\Sigma |Ei|}{n} = \frac{\Sigma |yi - \hat{y}_i|}{n}$
$\text{mean squared error} = \frac{\Sigma (Ei)^2}{n} = \frac{\Sigma (yi - \hat{y}_i)^2}{n}$

Linear Relationships

Linear regression finds the line of best fit through a set of data.
This line is defined as the one that minimizes the sum of squared errors.
The main use of linear regression is to predict the value of a dependent variable for a known value of an independent variable.

Basic Approach of Linear Regression:

Draw a scatter diagram.
Identify a linear relationship.
Find the equation for the line of best fit through the data.
Use this line to predict a value for the dependent variable from a known value of the independent variable.

Equation: $y = a + bx$

Equation calculations:

$b = \frac{\Sigma xy - \frac{\Sigma x \Sigma y}{n}}{\Sigma x^2 - \frac{(\Sigma x)^2}{n}}$
$a = \bar{y} - b\bar{x}$

Measuring the Strength of a Relationship

Linear regression finds the line of best fit through a set of data, but we really need to measure how good the fit is.
If observations are close to the line, the errors are small and the line is a good fit to the data.
If observations are some way away from the line, errors are large and even the best line is not very good.

Coefficient of Determination

The coefficient of determination measures the proportion of the total variation from the mean explained by the regression line.
A value close to 1 shows that the regression line gives a good fit, while a value close to zero shows a poor fit.
total $SSE = \Sigma(y - \bar{y})^2$
explained $SSE = \Sigma(\hat{y}_i - \bar{y})^2$
unexplained $SSE = \Sigma(yi - \hat{y}i)^2$
coefficient of determination = $\frac{\text{explained SSE}}{\text{total SSE}}$
$r^2= \frac{\Sigma xy - \frac{\Sigma x \Sigma y}{n}}{[n\Sigma x^2 - (\Sigma x)^2] x [n\Sigma y^2 - (\Sigma y)^2]}$

Coefficient of Correlation

Pearson's correlation coefficient shows how strong the linear relationship is between two variables.
A value close to 1 or -1 shows a strong relationship, while a value close to zero shows a weak one.
Spearman's coefficient gives a correlation for ranked data.
Formula: $r_s = 1 - \frac{6 \Sigma D^2}{n(n^2 - 1)}$

Multiple Regression

Sometimes a dependent variable is related to several independent variables.
Then multiple regression finds the best values for the constants a and bi.
Many packages do these calculations automatically, but the interpretation of results can be difficult.
$y = a + b1x1 + b2x2 + b3x3 + b4x4 + b5x5 + …$
$sales = a + b1 x advertising + b2 x price + b3 x unemployment rate + b4 x income + b_5 x competition$

Curve Fitting

Sometimes relationships are clearly not linear - you can use curve fitting to find more complex functions through data.
To fit a more complicated function through a set of data we use non-linear regression - or more generally curve fitting.
Many packages have functions for fitting more complicated curves to data, and spreadsheets typically fit:
- linear models: $y = a + bx$
- multiple linear curves: $y = a + b1x1 + b2x2 + b3x3 + …$
- polynomials: $y = a + bx + cx^2 + dx^3 + …$
- exponential curves: $y = a x b^x$
- growth curves: $y = a b^x$

Lesson 3: Forecasting

This lesson will discuss methods of forecasting, which is an essential function in every organization.

Methods of Forecasting

There are many different ways of forecasting.
No method is always best, and managers have to choose the most appropriate for particular circumstances.
Forecasting methods can be classified in several ways, including the length of time they cover in the future.
The most useful classification refers to causal, judgmental and projective methods.

Time-Based Classifications:

Long-term forecasts look ahead several years.
Medium-term forecasts look ahead between three months and two years.
Short-term forecasts cover the next few weeks.
Causal forecasting looks for a relationship between variables and then forecasts values for a dependent variable from known values of the independent variable.

Judgmental Forecasts

The key feature of judgmental forecasts is that they use opinions and subjective views of informed people.

Most Widely Used Methods Are:

Personal insight.
Panel consensus.
Market surveys.
Historical analogy.
The Delphi method.

Projective Forecasts

Projective forecasts look only at historical observations and project the underlying patterns into the future.
A basic form of projective forecasting uses simple averages, but this is insensitive and has limited practical use.
Moving averages are more flexible, setting forecasts as the average of the latest n observations and ignoring all older values.

Time Series

Projective forecasts often work with time series, which are series of observations taken at regular intervals.

Common Patterns in Time Series:

Constant series.
Series with a trend.
Seasonal series.
Defining error $E_t = error in the forecast in period t = actual observation in period t - forecast value$
or $Et = yt - F_t$
where: $Ft$ is the forecast for period t, $yt$ is the value that actually occurs

Mean Error:

$\text{mean error} = \frac{\Sigma Et}{n} = \frac{\Sigma (yt - F_t)}{n}$

Simple Averages

Formula : $\text{forecast} = F{t+1} = \frac{\Sigma Yt}{n}$

Moving Averages

$\text{forecast} = \text{average of n most recent observations} = \frac{\text{latest demand} + \text{next latest} + … + \text{nth latest}}{n}$

Exponential Smoothing

Exponential smoothing is an efficient forecasting method, which adds portions of the latest observation to the previous forecast.
This automatically reduces the weight given to data as it gets older.
$\text{new forecast} = \alpha \times \text{latest observation} + (1 - \alpha) \times \text{last forecast}$
$F{t+1} = \alpha yt + (1 - \alpha)F$
A smoothing constant that typically has a value between 0.1 and 0.2 determines the sensitivity of the forecast.
We can monitor the performance of a forecast using a tracking signal.
$\text{tracking signal} = \frac{\text{sum of errors}}{\text{mean absolute error}}$

Forecasts with Seasonality and Trend

The methods we have described so far give good results for constant time series, but they need adjusting for other patterns.

Common Models:

Additive model: $F = T + S$
Multiplicative model: $F = T x S$

Finding the Underlying Trends

Linear regression with time as the independent variable.
Moving averages with a period equal to the length of a season.

Finding the Seasonal Indices

$\text{seasonal index}, S = \frac{\text{seasonal value}}{\text{deseasonalised value}}$

Making Forecasts

Project the trend into the future to find the deseasonalised values.
Multiply this by the appropriate seasonal index.

Lesson 4: Linear Programming

Constrained Optimization

Key Characteristics:

An aim of optimizing.
A set of constraints.

Stages of Solving a Linear Program

Formulation.
Solution.
Sensitivity analysis.

Formulation Components

Decision variables
An objective function
A set of constraints
A non-negativity constraint

Graphical Solutions

Each constraint becomes a line on a graph, showing those values that satisfy the constraint.
The feasible region, which is a convex space surrounded by extreme points.
The optimal solution if it exists is always at one of the extreme points.

Sensitivity of Solutions to Changes

Calculations of shadow prices
Assess the effects of changing constraints and the objective function.

Module 4: Introducing Statistics

Overview

This module introduces basic statistical concepts for dealing with uncertainty in management problems.

Lesson Outcomes

Appreciate deterministic vs. stochastic problems.
Define probability and its importance.
Understand probability distributions.
Calculate combinations and permutations.
Work with Normal distributions.
Understand sampling.
Appreciate statistical inference.
Calculate confidence intervals.
Use one-sided distributions.
Use t-distributions for small samples.
Understand hypothesis testing.
List hypothesis testing steps.
Understand errors and significance levels.
Use one- and two-tail tests.
Consider non-parametric tests.

Lesson 1: Uncertainty and Probabilities

Measuring Uncertainty

Deterministic problems have known features with certainty.
Stochastic or probabilistic problems have uncertainty measured by probabilities.

Defining Probability

Probabilities measure the likelihood of an event occurring.
$\text{probability of an event} = \frac{\text{number of ways that the event can occur}}{\text{number of possible outcomes}}$
A probability is defined on a scale of 0 to 1.
Probabilities can be calculated a priori or observed empirically. Less reliable estimates are subjective probabilities.

Calculations with Probabilities

An important concept for probabilities is the independence of events.
Independent Events $P(a) = P(a/b) = P(a/\bar{b})$
The independent events "AND" means that you multiply separate probabilities.
$P(a \text{ AND } b) = P(a) x P(b)$
mutually exclusive events "OR" means that you add separate probabilities. $P(a \text{ OR } b) = P(a) + P(b)$

Conditional Probabilities

Occur when two events are dependent.
Events are considered dependent when $P(a) \neq P(a/b) \neq P(a/\bar{b})$
Bayes' Theorem: $$