Amortized Analysis and Array Resizing Strategies Study Notes
Introduction to Amortized Analysis
- Source Context: These notes are based on the curriculum for CMSC 341 - Data Structures, revised from Prof. Donyaee's slides by Keke Chen.
- Core Concept: Many data structures perform operations where the majority of executions are very fast, but occasional operations are significantly more expensive due to maintenance or resizing. Amortized analysis provides a method to calculate the average cost of these operations over time to give a more realistic performance bound.
Case Study: Array Implementation of Stack and Queue ADTs
- Implementation Efficiency: Using an array to implement a Stack or Queue Abstract Data Type (ADT) typically allows for very efficient operations. For example,
push, pop, enqueue, and dequeue usually run in O(1) time. - The Fixed-Size Constraint: The primary limitation is that standard arrays have a predefined, fixed size.
- The Practical Problem: In many real-world applications, it is impossible to predict the maximum growth of a stack or queue. Once the predefined array size is reached, further insertions are impossible without a strategy to expand the capacity.
- General Solution: The standard practice is to increase the size of the array dynamically once it becomes full and additional space is required.
Resizable Array Mechanics for the push() Method
- Algorithm Logic: The logic for a
push(x) operation in a stack with a resizable array follows these steps:
1. Check if the stack is full: if the stack is full.
2. Invoke resizing: resize(Array).
3. Increment the pointer: top ++.
4. Assign the new element: Array[top] = x. - Data Migration Process: Whenever a resize is triggered, the system must create a new, larger array. It is then required to copy every piece of current data from the original (old) array into the new array. Only after this copy operation is complete can the new data element be inserted and subsequent operations continued on the new array.
Analysis of Resizing Strategy 1: Increment by One (+1 Strategy)
- Definition: In this strategy, the array size is increased by exactly one element every time it becomes full.
- Trace of Operations:
* Case 1 (Transition from size 1 to 2): Suppose the original array is size 1. To insert a second element, a new array of size 2 is created. The one existing element is copied, and the new data is inserted. Total operations: 2 (1 copy + 1 insertion).
* Case 2 (Transition from size 2 to 3): To insert a third element, a new array of size 3 is created. The two existing elements are copied, and the new data is inserted. Total operations: 3 (2 copies + 1 insertion).
- Mathematical Modeling of Total Cost:
* If this algorithm continues for the pushing of n data elements, the total number of operations is represented by the arithmetic series: 1+2+3+4+5+⋯+n.
* The sum of this series is: 2n(n+1).
- Amortized Cost Calculation:
* The amortized cost (average per element) is: n2n(n+1)=2n+1.
* Conclusion: This strategy results in a linear average cost, which is inefficient as it scales directly with the number of elements.
Justification and Strategic Use of Amortized Analysis
- Definition of Amortized Cost: It is the average running time calculated over a sequence of n such operations. It answers the fundamental question: "Even if some operations are expensive, do things average out over time?"
- Realistic Bound Estimation: In some data structures, while worst-case analysis might suggest high costs, those high-cost rebalancing operations are infrequent. Amortized analysis provides a tighter and more realistic bound in these scenarios.
- Optimization Goal: Designers should create infrequent operations smartly to minimize this amortized cost.
- Numerical Example of Amortization:
* Scenario: 100 operations occur at a cost of 1 each, followed by one operation with a cost of 100.
* Total count of operations: 101.
* Total cost: 100+100=200.
* Amortized cost per operation: 101200<2.
* This shows that even a costly maintenance step can result in an average operation cost of less than 2.
Analysis of Resizing Strategy 2: Array Doubling
- Definition: This strategy involves doubling the array size every time the array reaches capacity.
- Mathematical Context: Assume the initial size of the stack is s, where s≪n.
- Iteration Breakdown:
* The operation that triggers the resize for n elements implies that in the previous iteration, 2n elements were copied into the new array.
* Before that, 4n elements were copied.
* Before that, 8n elements were copied, continuing down to the initial size.
- Quantifying Copy Operations:
* The total number of copy operations is the sum of a geometric series: 2n+4n+8n+⋯+initial array size<n.
* Example: If resizing occurred only three times (s=8n), the total copy operations would be 2n+4n+8n=87n. Note that 87n<n.
- Total Operations and Constant Time:
* Total operations = (Total copy operations) + (Actual n insertions).
* Since Total Copy Operations <n, the Total Operations <2n.
* Average Cost Calculation: n2n=2.
* Conclusion: Doubling the array size results in an amortized constant running time, or O(1). This is a significant improvement over the +1 resizing strategy.
Comparison: Amortized Cost vs. Average Case Complexity
- Amortized Cost Analysis:
* Focuses on the total cost over a sequence divided by the number of operations.
* Asks if things average out over multiple steps regardless of input randomness.
- Average Case Time Complexity:
* Different from amortized analysis; it considers the distribution of the input.
* Assume there are m cases for n items, where each case has a probability pi (e.g., p1,p2,…,pm).
* Formula: Average cost=p1×cost1+⋯+pm×costm.
* Search Example: To calculate the average cost of searching a binary tree, one must know the probability of each value in the domain being searched and determine the specific cost for each value (whether it exists in the tree or not).
* Core Question: It answers: "What is the expected cost, if my input is random following some distribution?"