Amortized Analysis and Array Resizing Strategies Study Notes

Source Context: These notes are based on the curriculum for CMSC 341 - Data Structures, revised from Prof. Donyaee's slides by Keke Chen.
Core Concept: Many data structures perform operations where the majority of executions are very fast, but occasional operations are significantly more expensive due to maintenance or resizing. Amortized analysis provides a method to calculate the average cost of these operations over time to give a more realistic performance bound.

Implementation Efficiency: Using an array to implement a Stack or Queue Abstract Data Type (ADT) typically allows for very efficient operations. For example, push, pop, enqueue, and dequeue usually run in $O(1)$ time.
The Fixed-Size Constraint: The primary limitation is that standard arrays have a predefined, fixed size.
The Practical Problem: In many real-world applications, it is impossible to predict the maximum growth of a stack or queue. Once the predefined array size is reached, further insertions are impossible without a strategy to expand the capacity.
General Solution: The standard practice is to increase the size of the array dynamically once it becomes full and additional space is required.

Algorithm Logic: The logic for a push(x) operation in a stack with a resizable array follows these steps: 1. Check if the stack is full: if the stack is full. 2. Invoke resizing: resize(Array). 3. Increment the pointer: top ++. 4. Assign the new element: Array[top] = x.
Data Migration Process: Whenever a resize is triggered, the system must create a new, larger array. It is then required to copy every piece of current data from the original (old) array into the new array. Only after this copy operation is complete can the new data element be inserted and subsequent operations continued on the new array.

Definition: In this strategy, the array size is increased by exactly one element every time it becomes full.
Trace of Operations: * Case 1 (Transition from size 1 to 2): Suppose the original array is size 1. To insert a second element, a new array of size 2 is created. The one existing element is copied, and the new data is inserted. Total operations: $2$ (1 copy + 1 insertion). * Case 2 (Transition from size 2 to 3): To insert a third element, a new array of size 3 is created. The two existing elements are copied, and the new data is inserted. Total operations: $3$ (2 copies + 1 insertion).
Mathematical Modeling of Total Cost: * If this algorithm continues for the pushing of $n$ data elements, the total number of operations is represented by the arithmetic series: $1 + 2 + 3 + 4 + 5 + \dots + n$ . * The sum of this series is: $\frac{n(n + 1)}{2}$ .
Amortized Cost Calculation: * The amortized cost (average per element) is: $\frac{\frac{n(n + 1)}{2}}{n} = \frac{n + 1}{2}$ . * Conclusion: This strategy results in a linear average cost, which is inefficient as it scales directly with the number of elements.

Definition of Amortized Cost: It is the average running time calculated over a sequence of $n$ such operations. It answers the fundamental question: "Even if some operations are expensive, do things average out over time?"
Realistic Bound Estimation: In some data structures, while worst-case analysis might suggest high costs, those high-cost rebalancing operations are infrequent. Amortized analysis provides a tighter and more realistic bound in these scenarios.
Optimization Goal: Designers should create infrequent operations smartly to minimize this amortized cost.
Numerical Example of Amortization: * Scenario: 100 operations occur at a cost of $1$ each, followed by one operation with a cost of $100$ . * Total count of operations: $101$ . * Total cost: $100 + 100 = 200$ . * Amortized cost per operation: $\frac{200}{101} < 2$ . * This shows that even a costly maintenance step can result in an average operation cost of less than 2.

Definition: This strategy involves doubling the array size every time the array reaches capacity.
Mathematical Context: Assume the initial size of the stack is $s$ , where $s \ll n$ .
Iteration Breakdown: * The operation that triggers the resize for $n$ elements implies that in the previous iteration, $\frac{n}{2}$ elements were copied into the new array. * Before that, $\frac{n}{4}$ elements were copied. * Before that, $\frac{n}{8}$ elements were copied, continuing down to the initial size.
Quantifying Copy Operations: * The total number of copy operations is the sum of a geometric series: $\frac{n}{2} + \frac{n}{4} + \frac{n}{8} + \dots + \text{initial array size} < n$ . * Example: If resizing occurred only three times ( $s = \frac{n}{8}$ ), the total copy operations would be $\frac{n}{2} + \frac{n}{4} + \frac{n}{8} = \frac{7n}{8}$ . Note that $\frac{7n}{8} < n$ .
Total Operations and Constant Time: * Total operations = (Total copy operations) + (Actual $n$ insertions). * Since Total Copy Operations $< n$ , the Total Operations $< 2n$ . * Average Cost Calculation: $\frac{2n}{n} = 2$ . * Conclusion: Doubling the array size results in an amortized constant running time, or $O(1)$ . This is a significant improvement over the +1 resizing strategy.

Amortized Cost Analysis: * Focuses on the total cost over a sequence divided by the number of operations. * Asks if things average out over multiple steps regardless of input randomness.
Average Case Time Complexity: * Different from amortized analysis; it considers the distribution of the input. * Assume there are $m$ cases for $n$ items, where each case has a probability $p_i$ (e.g., $p_1, p_2, \dots, p_m$ ). * Formula: $\text{Average cost} = p_1 \times \text{cost}_1 + \dots + p_m \times \text{cost}_m$ . * Search Example: To calculate the average cost of searching a binary tree, one must know the probability of each value in the domain being searched and determine the specific cost for each value (whether it exists in the tree or not). * Core Question: It answers: "What is the expected cost, if my input is random following some distribution?"