Sorting Algorithms: Selection, Bubble, and Tree Sort

Introduction to Sorting Algorithms

  • Conceptual Overview:

    • Sorting algorithms are fundamental to computer science and a core skill expected of software developers.
    • The primary goal is to take a set of unsorted data and organize it into a sorted order.
    • Studying different sorting algorithms provides excellent practice for general problem-solving and algorithmic thinking.
    • Rather than sticking with a naive or slow first solution, developers should dig deeper to find the most efficient approach for a given problem.
  • Factors Influencing Algorithm Choice:

    • Input State: Whether the data is already mostly sorted, completely unsorted, or reverse-sorted impacts performance.
    • Memory Restrictions: Some environments have limited memory (e.g., embedded systems) where sorting must be done "in-place."
    • Output Requirements: Sometimes only a general sort is required, or specific elements are more important than others.
  • Big O Notation in Sorting:

    • Big O measures how long an algorithm takes to run relative to the number of elements, denoted as nn.
    • Common complexities include O(n2)O(n^2), O(nlog(n))O(n \log(n)), and sometimes O(1)O(1).
    • An O(n2)O(n^2) complexity often occurs because the algorithm must compare every element against every other element.

Selection Sort

  • Definition: A naive, simple algorithm that sorts an array or list of elements by repeatedly finding the smallest remaining element.

  • Mechanism:

    1. The algorithm scans the entire list to find the smallest element.
    2. This element is placed in the first position (index 0).
    3. The algorithm then scans the remaining unsorted list to find the next smallest element.
    4. This element is placed in the second position (index 1).
    5. This process repeats until the list is fully sorted, with the range of the scan getting smaller each time.
  • Implementation Logic:

    • Utilizes nested loops.
    • The outer loop moves the boundary between the sorted and unsorted portions of the array.
    • The inner loop searches for the smallest value in the unsorted portion.
    • Smallest Index Tracker: Instead of storing the smallest value directly, it is often better to store the smallest_index and swap values after the inner loop finishes.
    • Swap Logic: A temporary variable is required to swap the current element with the found smallest element to prevent overwriting data.
      • temp=array[i]\text{temp} = \text{array}[i]
      • array[i]=array[smallest_index]\text{array}[i] = \text{array}[\text{smallest\_index}]
      • array[smallest_index]=temp\text{array}[\text{smallest\_index}] = \text{temp}
  • Performance Metrics:

    • Best Case: O(n2)O(n^2)
    • Average Case: O(n2)O(n^2)
    • Worst Case: O(n2)O(n^2)
  • Use Cases and Advantages:

    • Very easy to implement for quick solutions.
    • Appropriate for small arrays (e.g., 10 elements) where the difference between O(n2)O(n^2) and O(nlog(n))O(n \log(n)) is negligible.
    • In-place Sorting: It does not require extra memory or the allocation of a second array.

Bubble Sort

  • Definition: An algorithm that sorts a list by repeatedly stepping through the list, comparing adjacent elements and swapping them if they are in the wrong order.

  • Mechanism:

    1. Start at the first index and compare the current element with the one directly next to it.
    2. If the left element is greater than the right element (left>right\text{left} > \text{right}), swap them.
    3. Continue this until the end of the list; the largest element will "bubble up" to the final position.
    4. Repeat the process for the remaining unsorted portion of the list.
  • Implementation Variations:

    • While Loop with Boolean Flag: A sorted boolean can be used as a stopping condition. If a full pass is completed without any swaps, the list is sorted, and the algorithm can stop early.
    • Nested For Loops: Alternatively, two for loops can be used. The outer loop tracks the number of passes, and the inner loop handles comparisons.
    • Optimization: Each pass places the next largest element in its correct final position, so the inner loop range can decrease by one (nin - i) with each pass.
  • Performance Metrics:

    • Best Case: O(n)O(n) (This occurrs when the list is already sorted and the algorithm uses a stopping condition).
    • Average Case: O(n2)O(n^2)
    • Worst Case: O(n2)O(n^2) (Occurs when the list is in reverse order).
  • Advantages:

    • Effective for data sets that are "mostly sorted" or when only one or two elements are out of place.
    • In-place sorting (no extra memory required).
    • Highly readable and simple logic.

Tree Sort

  • Definition: A sorting algorithm that utilizes a Binary Search Tree (BST) structure to organize data, followed by an in-order traversal to retrieve elements in sorted order.

  • Mechanism:

    1. Create an empty Binary Search Tree.
    2. Iterate through the original array and insert every element into the BST.
    3. Perform an In-Order Traversal of the BST (Left Subtree → Current Node → Right Subtree).
    4. As nodes are visited during the traversal, place them back into the array or print them.
  • Performance Metrics:

    • Average and Best Case: O(nlog(n))O(n \log(n)) (Due to the logarithmic efficiency of BST insertions).
    • Worst Case: O(n2)O(n^2) (Occurs if data is entered in sorted order, creating a "linked list" style tree).
    • Space Complexity: Worst case is O(n)O(n) extra space because a complete copy of the data is stored in the tree structure.
  • Optimizations:

    • Using height-balanced trees like AVL Trees or Red-Black Trees can guarantee the O(nlog(n))O(n \log(n)) complexity even in the worst case by using rotations to prevent the tree from becoming a linked list.
  • Considerations:

    • Loading data from an array into a BST just to pull it back out into an array can be redundant if the data can simply remain in the BST for fast retrieval and manipulation.

Questions & Discussion

  • Question: How many people have seen Selection Sort before?
    • Answer: Most of the class raised their hands.
  • Question: Why use Selection Sort if it is slow?
    • Answer: It is useful for small data sets where speed is not critical, for making quick solutions without needing built-in libraries, or if memory is highly restricted since it is an in-place sort.
  • Interaction (Selection Sort Implementation):
    • The speaker encountered a Java error: Cannot find smallest index. This was because the variable was initially named smallest but referenced as smallest index later.
  • Interaction (Bubble Sort Logic):
    • The speaker initially placed an incrementer i++ inside the wrong loop. A student pointed out that incrementing i inside the for-loop would cause it to reach the length of the array too quickly, breaking the logic of the while-loop.
    • The speaker encountered a java.lang.ArrayIndexOutOfBoundsException: Index 14 out of bounds for length 14. This was an "off-by-one" error caused by comparing j to j+1 at the very end of the array.
  • Question: Would the Big O notation for Tree Sort be different if it was an AVL tree?
    • Answer: Yes, it would be faster in the worst case (guaranteed O(nlog(n))O(n \log(n))) because rotations keep the tree balanced, preventing the "linked list" scenario.
  • Algorithmic "Trick": To avoid the worst-case scenario for some algorithms, one can shuffle the data completely at random before sorting. This converts a potential worst-case scenario into an average-case scenario.

News & Administrative Updates

  • Schedule:

    • Wednesday/Friday: Lectures on Quick Sort, Merge Sort, and Heap Sort.
    • Next Monday: Public holiday (No class).
    • Next Wednesday: Test preparation session.
    • Next Friday: Test 2.
  • Test 2 Format Changes:

    • The True/False section from last year is being removed because it was awkward and felt like a "coin flip" for students.
    • The test will focus less on memorizing random small facts and more on applying algorithms.
    • New Content: Expect diagram-based questions, such as traversing a graph or inserting elements into a heap (handling the "up" or "down" movement of elements).