Lecture 4: Interesting sorting algorithms
Shellsort
- Shellsort is a refined insertion sort that attempts to improve upon the slowness of regular insertion sort, which spends considerable time shifting elements by one place.
- It was invented by Donald Shell.
- The core idea involves choosing a stride length (a number) to create subsequences within the array.
- For example, with a stride length of 7, the subsequence a[0], a[7], a[14], and so on, would be insertion sorted.
- This is repeated for a[1], a[8], a[15], and so on, and then for a[2], a[9], a[16], and so forth.
- Sorting with a stride length of 7 is referred to as 7-sorting.
- After 7-sorting, the process is repeated with a shorter stride length, like 3, eventually ending with 1-sorting, which is equivalent to regular insertion sort.
Example
- Starting with a stride length of 3:
- Original array: 3 6 2 1 5 4
- 3-sorted array: 1 5 2 3 6 4
- Moving to stride 1 (regular insertion sort):
- 1 5 2 3 6 4
- 1 2 5 3 6 4
- 1 2 3 5 6 4
- 1 2 3 5 4 6
- 1 2 3 4 5 6
Why is Shellsort Faster?
- When sorting with a long stride length, the subarrays being sorted are small, making sorting faster.
- When sorting with a short stride, the array is already almost sorted, which makes insertion sort faster.
- If an array is -sorted and then -sorted, the array remains -sorted. This means each pass increases the sortedness of the array.
Complexity
- The complexity of Shellsort depends on the sequence of stride lengths used, known as the gap sequence.
- Shell's original gap sequence used the numbers , halving every step, resulting in a worst-case complexity of .
- Many gap sequences give a worst-case complexity of . For example, using numbers of the form (that is: 1, 3, 7, 31, …) can achieve this.
Correctness
- Shellsort's correctness is based on the fact that the only operations performed are swaps, ensuring the array is always a permutation of the original.
- The final pass, which is a regular insertion sort, guarantees that the array is fully sorted.
Divide and Conquer: Mergesort
- Fast algorithms can be obtained by a divide and conquer strategy: split the input data into pieces, work on those, then combine the results somehow.
Mergesort
- Mergesort is a divide-and-conquer sorting algorithm.
- It works by recursively splitting the input array into two halves, sorting the two halves using mergesort, and then merging the two sorted halves together while preserving the order.
Mergesort Example
- Starting situation: M E R G E S O R T T H I S
- Split: M E R G E | S O R T T H I S
- Sort by recursive call: E E G M R | H I O R S S T T
- Merge: E E G H I M O R R S S T T
Merging
- Merging two arrays of total length takes steps.
- The process involves comparing the first unmerged element in each input array, adding the smaller of the two to the output array, and repeating until all elements have been added to the output.
Mergesort Description
- Given an array of length (where is a power of 2):
- If , stop.
- Otherwise, mergesort elements to , and mergesort elements to .
- Merge the two sorted halves.
Complexity of Mergesort
If splitting the input array into two equal halves, the time complexity of mergesort is described by the recurrence:
Unrolling the second line once:
The recurrence stops when we reach 1, after unrollings, so . This is better than or .
Computational Example
- Consider a computer working at GHz rates (10^9 instructions per second), with 10^8 comparisons per second.
- To sort 10^9 elements:
- An algorithm needs operations; seconds = 316 years.
- An algorithm needs operations: 300 seconds = 5 minutes.
Summary
- Algorithms covered: selection sort , insertion sort , Shellsort , and mergesort .
- Some algorithms' run times depend on the input data; consider best-case, worst-case, and average-case complexity.
- Reasoning about loops using invariants is an important theoretical idea.