Queuing Theory & Cost Trade-Off – Comprehensive Study Notes

Overview & Learning Objectives

Prescriptive Analytics topic: focuses on determining optimal decisions/actions. It aims to not only predict what will happen but also to suggest actionable strategies to achieve desired outcomes.
Queuing Learning Objectives:
- Describe trade-off curves between cost-of-waiting time & cost-of-service, illustrating how these opposing costs lead to an optimal service level.
- Identify & explain the 3 fundamental parts of a queuing system: arrivals (how customers enter the system), service facilities (where service is provided), and the waiting line (where customers wait if service is not immediately available).
- Recognize basic queuing configurations, such as single-channel, multi-channel, single-phase, and multi-phase systems.
- List assumptions underlying common analytical models (e.g., Poisson arrivals, exponential service times) to understand their applicability and limitations.
- Compute & interpret operating characteristics (e.g., average length of queue, average waiting time, probability of system being empty, server utilization) and associated costs, providing quantitative insights into system performance.

Cost Trade-Off & Service Level Concepts

Two key cost curves:
- Cost of Providing Service (C_s): This cost rises with higher service levels. For example, installing more servers, hiring faster staff, or upgrading equipment increases service capacity, leading to higher fixed and variable costs.
- Cost of Waiting Time (C_w): This cost falls as service levels increase. When service is faster or more readily available, customers wait less, reducing their dissatisfaction, potential lost business, or idle time (e.g., in a manufacturing context, idle machines cost money).
The Optimal service level occurs at the minimum of the Total Expected Cost curve ( $CT = Cs + C_w$ ). This point represents the best balance between the cost of providing service and the cost incurred due to customer waiting.
Service Level metrics often expressed as quantitative measures to assess system performance:
- Average waiting time ( $W_q$ ) or average time in system ( $W$ ) indicate the duration customers spend waiting or entirely within the system.
- Average number in queue ( $L_q$ ) or in system ( $L$ ) represent the average number of customers waiting or present within the system at any given moment.
- Server utilization ( $\rho$ ) indicates the proportion of time a server is busy, reflecting resource efficiency.

Components & Characteristics of Queuing Systems

Arrival (Input) Characteristics

Size of the calling population:
- Unlimited / Infinite (e.g., cars at a toll booth, shoppers in a supermarket, internet users accessing a website): This assumption simplifies modeling by implying that the arrival rate is not significantly affected by the number of customers already in the system.
- Limited / Finite (e.g., two machines in a small shop requiring repair, a fixed number of employees needing support from an IT desk): The size of the population is small enough that the arrival rate changes as customers move into and out of the queuing system.
Pattern of arrival:
- Scheduled (deterministic inter-arrival times): Arrivals occur at fixed intervals (e.g., 1 patient every 15 minutes for a scheduled appointment). These are predictable and easier to manage.
- Random (stochastic): Arrivals occur unpredictably, often modeled by the Poisson distribution for the number of arrivals per unit time, or the Exponential distribution for the time between arrivals (inter-arrival times). This pattern is common in real-world systems where demand is not controlled, and the exponential distribution implies a 'memoryless' property where the past duration does not affect future probabilities.
Behavior of arrivals:
- Balking: A customer decides not to enter the queue upon seeing how long it is. This lost business can be a significant cost, indicating insufficient capacity.
- Reneging: A customer joins the queue but leaves before receiving service due to impatience or excessive waiting. This also represents lost business and customer dissatisfaction.
- Jockeying: A customer switches between parallel lines (e.g., in a supermarket or bank) in an attempt to find a shorter or faster-moving queue. This behavior can complicate queue management and fairness.

Waiting-Line (Queue) Characteristics

Length capacity:
- Limited (physical/managerial capacity): The queue can only hold a certain number of customers. Once full, new arrivals are turned away (leading to balking), which can be due to space constraints or policy decisions (e.g., a small waiting room).
- Unlimited (theoretical infinity): The queue can theoretically grow indefinitely. This assumption simplifies mathematical models and is often used when the probability of the queue exceeding practical limits is very low.
Queue discipline (order of service):
- FIFO (First-In, First-Out): The first customer to arrive is the first to be served. This is the most common and generally fairest discipline, forming the basis for many standard queuing formulas.
- Priority scheduling: Customers are served based on a predefined priority rule (e.g., based on class of service, urgency of need, shortest processing time). This system prioritizes certain customers over others, which is common in emergency services or manufacturing.

Service-Facility Characteristics

Number of service channels (parallel servers): This refers to whether there is one server or multiple servers working in parallel to serve customers from a single queue or multiple queues.
Number of phases (sequential service stages): This indicates whether service involves a single step or multiple sequential steps, each requiring a separate service facility.
Common structures (see next section): Combinations of channel and phase configurations define the overall system layout.

Diagrams on pages 11–14 illustrate flow of arrivals → service facility(-ies) → departure.

Common Queuing Configurations

Single-Channel, Single-Phase: A single waiting line feeds into one server, and service is completed in one step (e.g., a one-person barbershop).
Single-Channel, Multiphase: A single waiting line feeds into one server, but service involves multiple sequential steps, each performed by a dedicated facility (e.g., car wash where different stages like washing, rinsing, drying are sequential).
Multichannel, Single-Phase: A single waiting line feeds into multiple parallel servers, and service is completed in one step by any available server (e.g., a bank with multiple tellers serving customers from a single line).
Multichannel, Multiphase: A single waiting line feeds into multiple parallel servers, and service involves multiple sequential stages, with parallel service at each stage (e.g., a large factory assembly line with multiple parallel workstations at each stage).

Single-Channel Poisson/Exponential Model (M/M/1)

Standard Assumptions

FIFO discipline: Customers are served in the order of their arrival; every arrival is eventually served (implies no balking or reneging).
Calling population infinite: The number of potential customers is so large that we can assume it doesn't decrease significantly even after many arrivals, keeping the arrival rate constant.
Arrivals independent & stationary: The arrival of one customer does not influence the arrival of another, and the arrival rate does not change over time.
Inter-arrival times ~ Exponential with mean $1/\lambda$ : The time between successive customer arrivals follows an exponential distribution, which means the process is memoryless. This also implies that the number of arrivals per unit time follows a Poisson distribution with mean $\lambda$ (arrival rate).
Service times independent, Exponential with mean $1/\mu$ : The time it takes to serve a customer also follows an exponential distribution, also implying a memoryless property. The service rate is $\mu$ customers per unit time.
Stability condition: \mu > \lambda (service rate exceeds arrival rate): This is crucial for a stable system; if service is slower than arrivals, the queue will grow infinitely, and the system will break down.

Operating Characteristic Equations

(For M/M/1 model, which stands for Markovian arrivals / Markovian service / 1 server, with an unlimited queue capacity)

Average number in system ( $L$ ): This is the average number of customers (both waiting and being served) that you would expect to find in the entire system at any given moment: $L = \frac{\lambda}{\mu-\lambda}$.
Average time in system ( $W$ ): This measures the average total time a customer spends from arrival until service completion and departure: $W = \frac{1}{\mu-\lambda}$.
Average number in queue ( $Lq$ ): This is the average number of customers waiting in line, not including the one currently being served: $Lq = \frac{\lambda^2}{\mu(\mu-\lambda)}$.
Average waiting time in queue ( $Wq$ ): This is the average time a customer spends waiting in line before service begins: $Wq = \frac{\lambda}{\mu(\mu-\lambda)}$.
Server utilization (traffic intensity) ( $\rho$ ): This indicates the proportion of time the server is busy providing service. A higher utilization often means longer queues and waits: $\rho = \frac{\lambda}{\mu}$.
Probability system empty ( $P0$ ): This is the probability that there are zero customers in the entire system (the server is idle): $P0 = 1-\rho$.
Probability more than k customers present (P{n>k}): This gives the probability that there are more than $k$ customers in the system at a random point in time: $P{n>k} = \rho^{k+1}$.

Significance: These closed-form expressions allow rapid assessment of performance and cost for a wide range of real-world service systems, such as call centers, repair bays, bank queues, and hospital emergency rooms. They provide fundamental insights into how changes in arrival or service rates impact system efficiency and customer experience.

Case Study – Arnold’s Muffler Shop

This case study illustrates the application of queuing formulas to a business decision, comparing different service configurations.

Base Scenario: One Mechanic (Reid Blank)

Data:
- Arrival rate $\lambda = 2 \text{ cars/hr}$ (meaning, on average, 1 car arrives every 30 minutes)
- Service rate $\mu = 3 \text{ cars/hr}$ (meaning, on average, 1 car is serviced every 20 minutes)
- Wage to server (Labor Cost, LC): \$7/hr
- Waiting cost per customer time (Waiting Cost, WC): \$10/hr (representing the cost of customer dissatisfaction or lost productivity while waiting)
Calculated operating characteristics (using M/M/1 formulas):
- $L = 2 \text{ cars}$ (average number of cars in the shop)
- $W = 1 \text{ hr}$ (average time a car spends in the shop)
- $L_q = 1.333 \text{ cars}$ (average number of cars waiting in line)
- $W_q = 0.667 \text{ hr}$ (approximately 40 minutes, average waiting time before service)
- Utilization $\rho = 0.667$ (the mechanic is busy about 66.7% of the time)
- Empty-system probability $P_0 = 0.333$ (the shop is empty about 33.3% of the time)
- Probability >3 cars: P_{n>3}=\rho^{4}=0.198 (19.8% chance of having more than 3 cars in the system at any given moment)
Cost evaluation (assuming a 10-hour workday for customers, and 8-hour for labor):
- Customer waiting cost per day: $Cw = WC \times Lq \times \text{hours per day} = 10 \times 1.333 \times 10 = \$133.33$
- Labor cost per day: $C_s = \text{mechanic wage} \times \text{hours worked} = 7 \times 8 = \$56$ (Assuming 8-hour shift for mechanic)
- (Slides used different hour assumptions: the slide's calculation of $Cw = (10 \times 10 \times \frac{2}{3} ext{ hours of waiting per day}) = \$66.67$ and total cost $CT = \$66.67 + \$56 = \$122.67$. The note provided previously had an error $(10 \tfrac{2}{3}) \text{ hr} = \tfrac{102}{3}$ leading to a different $Cw$. Let's re-calculate to match the example results consistently, assuming perhaps a different interpretation of the daily calculation for $Cw$ , where $Lq$ is an average over the entire day’s operation.) The provided slide calculation $CT = (10)(10 \times 0.667) + 56 = 10 imes (6.67) + 56 = 66.7 + 56 = \$122.7$.
- Using the provided slide result: $CT = \$162/\text{day}$ , where the discrepancy suggests specific daily operating hours and how $Lq$ translates to total daily waiting hours might be slightly different from a direct $Lq imes 10$ calculation. Assuming cost of waiting is $10 imes (Wq imes ext{Number of arrivals per day}) = 10 imes 0.667 imes (2 imes 8) = 10 imes 0.667 imes 16 = 106.72$. So total cost would be $106.72+56 = 162.72$ (This aligns with the slide's $Cw = 106.67$ and $CT=162$).

Alternative 1: Replace with Faster Mechanic (Jimmy Smith)

New service rate $\mu = 4 \text{ cars/hr}$ (1 car every 15 minutes); wage \$9/hr.
Performance:
- $L = 1$ car; $W = 0.5$ hr (30 min)
- $Lq = 0.5$ car; $Wq = 0.25$ hr (15 min)
- $\rho = 0.5$ ; $P_0 = 0.5$
Probability table (excerpt): $P_{n>k}=\rho^{k+1}=0.5^{k+1}$ gives probabilities of 0.5, 0.25, 0.125, etc., for more than 0, 1, 2 customers, respectively.
Cost (assuming 8-hour workday, for consistent comparison):
- Waiting cost/day: $Cw = WC \times Wq \times \lambda \times \text{hours per day if traffic is continuous} = 10 \times 0.25 \times 2 \times 8 = \$40$
- Labor cost/day: $C_s = 9 \times 8 = \$72$
- Total Cost: $C_T=\$40 + \$72 = \$112/\text{day}$ .

Alternative 2: Two Mechanics (Parallel Servers)

This system becomes an M/M/2 configuration (Markovian arrivals / Markovian service / 2 servers) with identical servers, each with a service rate of $\mu = 3 \text{ cars/hr}$ . The arrival rate remains $\lambda = 2 \text{ cars/hr}$ . This requires more complex Erlang C formulas.
Given results (from Erlang C formulas):
- $L = 0.75$ car; $W = 0.375$ hr (22.5 min)
- $Lq = 0.0833$ car; $Wq = 0.0417$ hr (2.5 min)
- Utilization per server $\rho = \frac{\lambda}{m\mu}= \frac{2}{2\times3}=0.333$ (each server is busy only 33.3% of the time, very low utilization)
- $P_0 = 0.5$ (50% chance system is empty)
Cost (assuming 8-hour workday):
- Waiting cost/day: $C_w = 10 \times 0.0417 \times 2 \times 8 = \$6.67$ (as slide indicates \$6.64)
- Labor cost/day: $C_s = 2 \times (7 \times 8) = 2 \times \$56 = \$112$ (two original mechanics)
- Total Cost: $C_T = \$6.67 + \$112 = \$118.67/\text{day}$ . (Slide says \$118.64, a minor rounding difference).

Comparative Insights (Effect of Service Level)

Metric	Reid (baseline)	Two Mechanics	Jimmy (fast single)
$W_q$	40 min	2.5 min	15 min
$C_T$	\$162/day	\$118.64/day	\$112/day
$L_q$	1.33 cars	0.083 car	0.50 car
$W$	60 min	22.5 min	30 min
$L$	2 cars	0.75 car	1 car
$P_0$	0.33	0.50	0.50

Managerial takeaway: Hiring Jimmy, the faster single mechanic, yields the lowest total cost while still maintaining a moderate wait time. Adding a second standard mechanic, though costing slightly more in total, virtually eliminates the queue and drastically reduces customer waiting time ( $W_q$ ). The choice depends on the organization's priorities: minimizing cost versus minimizing customer wait and maximizing satisfaction. If customer satisfaction and efficient flow are paramount, the two-mechanic option might be preferred despite its slightly higher cost.
This demonstrates the classic cost trade-off: investing in faster service ( $\mu$ ) or more servers ( $m$ ) reduces waiting costs but raises payroll and operational expenses. The optimum solution is highly sensitive to the specific values of customer waiting cost ($WC$), labor cost ($LC$), and the demand pattern ( $\lambda$ ).

Practical / Ethical / Real-World Considerations

Over-utilized systems (high $\rho$ ): While appearing efficient, high server utilization creates customer frustration, longer waits, and a higher probability of balking or reneging, leading to potential loss of goodwill and revenue. Staff burnout can also increase.
Excess capacity (very low $\rho$ ): While reducing waiting times, very low server utilization increases wage/asset costs unnecessarily. However, this may be justified for critical services (e.g., hospitals, emergency services, fire departments) where the cost of waiting is exceptionally high (e.g., lives are at stake) or in services where immediate availability is a core value proposition.
Ethical duty: Businesses often have an ethical duty to balance cost efficiency with reasonable service levels, especially in healthcare (ensuring timely treatment), safety-related contexts, and ensuring fairness in priority scheduling systems to prevent discrimination.
Importance of validating model assumptions: Real-world systems rarely perfectly match theoretical assumptions (e.g., perfectly Poisson arrivals or exponential service times). It's crucial to collect data and validate model assumptions before relying on analytical results for major decisions. Simulation modeling might be necessary for more complex or non-standard systems.

Connections & Extensions

Builds on foundational probability: Queuing theory fundamentally relies on concepts from probability and statistics, particularly the Poisson and exponential distributions for arrival and service processes, and underlying Markov chain theory for state transitions.
Prescriptive analytics complements descriptive & predictive analytics: While descriptive analytics tells