Output Analysis for Simulation Model

Output analysis involves analyzing data from simulation runs to predict or compare system designs.
In stochastic simulations (systems with random variables), multiple runs are essential.
Output analysis addresses random variability in simulation output due to random or pseudorandom generators.
It estimates means and variances of random variables and determines the number of observations needed for desired precision.
Simulation model output data evaluates performance measures, reflecting characteristics of the modeled stochastic process.
The trace represents basic output data from a simulation run.
Common measures for representing simulation results:
- Mean (Average): Fundamental characteristic of output data.
- Standard Deviation: Summarizes variability of output data.
- Frequency Plot (Histogram): Represents the distribution of output data.
The mean and standard deviation can be computed during the simulation run by accumulating values of the random variable.
Two main methods to statistically analyze simulation output:
- Point Estimation
- Interval Estimation
Both methods are often needed for meaningful data interpretation.

Point Estimation

Point estimators are functions approximating population parameter values from random samples; producing single values.
Main characteristics of point estimators:
- Bias: The difference between the expected value of the estimator and the true parameter value.
  - An unbiased estimator has an expected value equal to the parameter being estimated.
- Consistency: How close the point estimator is to the true parameter value as the sample size increases.
  - Larger sample sizes lead to more consistent and accurate estimators.
- Efficiency: Depends on the population distribution. The most efficient estimator has the smallest variance among all unbiased and consistent estimators.
Point estimation for discrete-time data $[Y<em>1, Y</em>2, …, Y_n]$ with ordinary mean $\theta$ :
- Biased if: $E(\theta) \neq \theta$
- Unbiased if the expected value of $\theta$ is: $E(\theta) = \theta$ , where:
  - $Y$ = Specific value from a simulation run
  - $n$ = Number of simulation runs
  - $E$ = An event that stops the simulation
Point estimation for continuous-time data ${Y(t), 0 \leq t \leq T_E}$ with time-weighted mean $\phi$ :
- Generally biased.
- An unbiased or low-bias estimator is still desired, where:
  - $T_E$ = Duration of the simulation run
  - $Y(t)$ = Function representing replication(s) relative to time $t$

Interval Estimation

Interval estimation estimates a parameter within two values, $l$ and $u$ , with a given probability. The interval $[l, u]$ is the confidence interval.
The confidence interval is mainly used in output analysis.
It displays the probability that a parameter will fall between a pair of values around the mean.
Confidence interval is an interval estimate for a parameter that specifies the level of confidence that provides a way of quantifying imprecision.
Its goal is to form an interval with endpoints determined by the samples, that will contain or “cover” the target parameter with pre-specified (high) probability called the confidence level.
The Central Limit Theorem (CLT) states that as the number of samples ( $n \geq 30$ ) increases, the distribution of the mean will be approximately normal.
Applying CLT:
- Each single run of a stochastic simulation model can be considered as a single sample.
- Each independent model replication, performed using random numbers, produces another sample point.
Mathematical representation of confidence interval:
- $\bar{Y} \pm t_{\alpha/2, R-1} \frac{S}{\sqrt{R}}$ , where:
  - $\bar{Y}$ : sample mean
  - $t_{\alpha/2, R-1}$ : the t-multiplier
  - $\frac{S}{\sqrt{R}}$ : standard error
Prediction interval is a measure of risk or uncertainty.
It is a range of values that is likely to contain the value of a single new observation given a specific setting or initial condition.
Prediction intervals predict the spread for individual observations rather than the mean.
The interval in this estimation method is wider than confidence interval due to the uncertainty involved in predicting a single response.
Mathematical representation of prediction interval:
- $\bar{Y} \pm t_{\alpha/2, R-1} S \sqrt{1 + \frac{1}{R}}$ , where:
  - $\bar{Y}$ : sample estimate or predicted value
  - $t_{\alpha/2, R-1}$ : the t-multiplier
  - $S \sqrt{1 + \frac{1}{R}}$ : standard error of prediction

Types of Simulation with Respect to Output Analysis

The type of simulation, either terminating or non-terminating, greatly depends on:

The objective(s) of the modeling and simulation study.
The nature of the system.

Terminating Simulation

A terminating simulation is carried out to study the behavior of a system over a particular time interval.
The simulation starts and ends at a defined state or time.
Majority of service systems are modeled as terminating systems.
The analysis of terminating simulations involves multiple runs using different seeds for the random or pseudorandom number generations.
The data is gathered for successive time intervals during the simulation period.
Examples:
1. A car manufacturer receives a contract to produce 120 cars, which must be delivered within 18 months. The company would like to simulate various manufacturing configurations to see which can meet the delivery request at the least cost.
2. A shop that sells a single product would like to decide how many items they should have in the inventory for the next 120 months. Given some initial inventory data, the objective is to determine the monthly order to minimize the expected averaging cost per month in the inventory system.
Statistical analysis formulas applicable for terminating simulations:
- Unbiased estimator for the population mean ( $\mu$ )
- Sample variance
- Confidence interval for the population mean ( $\mu$ )

Non-Terminating Simulation

A non-terminating simulation is carried out to study the steady-state and/or long-term average behavior of a system.
The long-term average behavior of the system is analyzed by calculating the adequate length of the simulation period.
The simulation runs are performed to gather data for the statistical analysis of the steady-state behavior of the system.
The simulation runs begin with a warm-up state, also known as transient state – the output process for the initial condition at a discrete time, and gradually moves to a steady state – shows approximately the same distribution of random variables from a specific point.
Examples:
1. Telecommunications systems
2. Assembly lines that often halt operations
3. Emergency rooms in hospitals
Statistical analysis methods applicable for non-terminating simulations:
- Welch Method
- Replication-Deletion Approach
- Batch Mean Method