Chapter 9 Study Notes – Fundamental Sorting Algorithms
Overview of Sorting Algorithms
- Sorting = systematic rearrangement of a sequence according to an ordering relation (numerical ↑/↓, lexicographical, etc.)
- Corner-stone of algorithm design; exposes ideas such as divide-and-conquer, data-structures interaction and randomisation.
- Typical scenario: records contain many fields but we sort only on a key (e.g. Name, Class, Examscore → key = Examscore).
- Real-world uses:
- Electronic mail clients – sort by date, subject, sender…
- Telephone directories – alphabetical names for fast lookup.
- Dictionaries – lexicographic ordering of words.
Fundamental Operations
- Every comparison-based sort repeatedly performs two primitives:
- Compare – determines relative order of two items.
- Swap – exchanges misplaced items, implemented with a temporary variable:
- Step 1 temp←A, Step 2 A←B, Step 3 B←temp – prevents loss of the overwritten original value.
Bubble Sort
- Concept: adjacent-pair comparisons; larger values “bubble” to the end.
- Descriptive algorithm:
- For each pass i=0…N−2
- Compare A[j] with A[j+1] for j=0…N−2−i, swap if out of order.
- Worked example – [4,7,3,1,6] turns into [1,3,4,6,7] after 4 passes (detailed pair-by-pair trace included in transcript).
- Python (basic) implementation:
for i in range(n-1):
for j in range(n-1-i):
if A[j] > A[j+1]:
A[j],A[j+1] = A[j+1],A[j]
- Optimisations
- Keep flag swapFlag; if a complete pass produces no swap → already sorted → terminate early.
- Last i elements are fixed after pass i, therefore inner loop length shortens by i.
- Worst-case passes: after N−1 passes the remaining element must also be in place because all N−1 others are.
Insertion Sort
- Analogy: insert new playing card into already ordered hand.
- Maintains two partitions:
- Left part sorted (initially first element).
- Right part unsorted.
- For each element in unsorted part:
- Scan sorted part from right to left until correct position found; shift larger elements rightward; insert.
- Iterative pseudocode provided (variables current, i, j loops) and equivalent recursive version.
- Trace example on [40,35,22,16,37] shows progressive growth of sorted prefix.
Memory Management: In-place vs Not-in-place
- In-place sort: uses only O(1) auxiliary space (apart from small constants). E.g. bubble, insertion, quick.
- External / not-in-place: requires extra arrays or files; used when data don’t all fit in RAM. Classic example: merge sort.
- Comparative table:
- In-place advantages – memory efficiency, good for nearly sorted data.
- In-place drawbacks – side effects, possible instability, poor big-data performance for O(n2) variants.
- Not-in-place advantages – stability (merge sort), often simpler implementation.
- Not-in-place drawbacks – extra memory, overhead for small inputs.
Stability in Sorting
- Stable ⇒ equal keys keep original relative order.
- Bubble and insertion sorts are stable; quicksort (standard forms) often not, merge sort stable.
- Importance: multi-key sorting (e.g. keep earlier order while resorting on another key) and reporting word-frequency counts while maintaining alphabetic order among equal counts.
Quick Sort (Partition-Exchange)
- Invented by Tony Hoare (1959). Divide-and-conquer, very fast on average.
- High-level steps:
- Choose pivot pi.
- Partition array into < pivot and > pivot (elements equal to pivot can be placed either side or grouped).
- Recursively quicksort left and right parts.
- Base case = subarray size ≤1.
- Pivot selection strategies: first, last, middle, median, random – choice affects performance.
- Partitioning (first-element pivot) pseudocode given – indices i (pivot position) and j scan.
- Full trace on array [9,5,8,2,−6,23] provided (chronological recursive call list and array states).
- Stability test: introduce duplicate keys with unique IDs to see if order retained.
- Alternate quicksort: choose middle element pivot; scans inward from both ends using one recursive subroutine.
- Not-in-place version (Python) builds two new sublists (less-than, greater-than) and concatenates recursively – simpler but O(n) extra space.
Merge Sort
- Efficient stable sort using divide-and-conquer; repeatedly halves list until singletons, then merges back.
- Process:
- Divide: recursively split array until size 1.
- Conquer: merge two sorted halves into larger sorted list.
- Illustration on [38,16,27,39,12,27] and on [38,27,3,9,82,7,24] with tree diagrams.
- Merging rule: repeatedly compare first elements of two queues and append smaller to output list; when one queue empty, append remainder of other.
- Not-in-place merge sort pseudocode:
- mergesort() returns new array; merge() builds target array c with three while-loops for leftover items.
- Exercises require Python translation, tree tracing, discussion of not-in-place nature (creation of new arrays list1, list2, c).
- In-place merge sort (advanced): keep smallest elements in first subarray by swapping with B[0] and re-inserting B[0] into correct place of B – repeatedly produces fully sorted A + B without external buffer.
- Example walk-through starting A=[9,11,12], B=[1,2,3,4] results in combined [1,2,3,4,9,11,12].
Complexity & Pass Counts (Quick Reference)
- Bubble sort: worst/average O(n2) time, best O(n) if early-exit used; space O(1).
- Insertion sort: worst/average O(n2), best O(n) when nearly sorted; space O(1).
- Quicksort: average O(nlogn), worst O(n2) (bad pivots); space O(1) extra in-place, O(n) not-in-place.
- Merge sort: always O(nlogn) time; space O(n) (standard) or O(1) (in-place variant).
Coding & Exercise Highlights
- Improved bubbleSortBetter pseudocode uses REPEAT-UNTIL with swapFlag and decrements n each pass.
- Python exercise array A=[5,3,1,9,8,2,4,7] – number of comparisons required as output.
- Recursive bubbleSortRecur template provided – correct call should swap arr[j], arr[j+1].
- Bubble-sort pseudocode fill-in ( swap lines; UNTIL Posn = ArraySize-Pass).
- Merge sort worksheet tasks: create task11(list) returning sorted list; task12(infile,outfile) reading admissions, writing sorted list, then searching.
- Top-down mergeSort(ar,low,high) trace: list all function calls until size 1 segments; answer counts self-invocations.
Ethical / Practical Notes
- Swapping changes original memory; programmers must beware of inadvertent data loss when stability or previous order is semantically relevant.
- Choice between in-place and not-in-place influenced by available RAM, data volume, and requirement for stability.
- For huge datasets external ‑> use merge sort with disk files (k-way merges), map-reduce frameworks, etc.