Sorting and Searching - Comprehensive Study Notes

Brute Force Approach

  • Brute force algorithm is a basic algorithm which works through every possible answer until the correct one is found.
  • It can solve a problem easily when the size of the problem is small, and can be preferable over crafting a more elegant solution in such cases.
  • It is highly effective in finding solutions that other algorithms might miss.
  • Essence: trial and error, letting a computer systematically explore a range of possibilities until the correct solution is reached.

Insertion Sort

  • Insertion Sort Algorithm (overview):
    1. If the element is the first element, assume that it is already sorted. Return 1.
    2. Pick the next element and store it separately in a key.
    3. Compare the key with all elements in the sorted sub-list.
    4. If the element in the sorted array is smaller than the current element, move to the next element; otherwise, shift greater elements in the array towards the right.
    5. Insert the value.
    6. Repeat until the array is sorted.
  • Example starting array (Arr) and subarrays:
    • Initial: Arr → [0] [1] [2] [3] [4] [5]
    • We take the first element Arr[0] as the sorted subarray; the rest is unsorted.
  • Key concepts:
    • key: the first element of the unsorted subarray (e.g., Arr[1] initially).
    • As we proceed, the sorted subarray grows; the unsorted subarray shrinks.
  • Step-by-step evolution (illustrative):
    • 17 35 15 21 4 8 (Initial)
    • Sorted vs Unsorted regions shift as we insert each key into its correct position.
  • Insertion Sort pseudocode (C):
  int insertion_sort(int A[], int n) {
    int curr, prev;
    for (int i = 1; i < n; i++) {
      curr = A[i];
      prev = i - 1;
      while (prev >= 0 && A[prev] > curr) {
        A[prev + 1] = A[prev];
        prev--;
      }
      A[prev + 1] = curr;
    }
    return 0;
  }
  • Original step-by-step demonstrations (as shown in slides):
    • Page 5: Initial array and labeling of Sorted / Unsorted regions.
    • Page 6: After first insertion, two elements are in the sorted subarray; n-2 elements unsorted.
    • Pages 7-8: Code listing for insertion_sort(
      A[], n)
  • Remarks:
    • Insertion sort builds the final sorted array one item at a time by inserting each new element into its proper place within the already-sorted portion.
    • Suitable for nearly-sorted data due to fewer shifts.

Bubble Sort

  • Bubble sort idea: repeatedly compare adjacent pairs and swap them if they are in the wrong order.
  • Pseudocode (as given):
  Bubble Sort(Array a[])
  Begin
    for i = 0 to n − 1
      for j = 0 to n − i - 1
        if (a[j] > a[j + 1]) then
          Swap (a[j], a[j + 1])
  End
  • Explanatory steps (illustrated in slides):
    • Each pass bubbles the largest remaining element to its correct position at the end.
    • Example traces show comparisons and swaps per iteration, with number of comparisons decreasing as the end portion becomes sorted.
  • Bubble sort implementation (C) shown:
  int bubble_sort(int A[], int n) {
    for (int i = 0; i < n-1; i++) {
      for (int j = 0; j < n - i -1; j++) {
        if (A[j] > A[j + 1]) {
          swap(A[j], A[j + 1]);
        }
      }
    }
  }
  • Key takeaway:
    • Simple, but has O(n^2) time in worst and average cases; performs well for small arrays or nearly-sorted data, but not scalable.

Merge Sort

  • Divide and Conquer approach (high-level):
    • Divide: split problem recursively into smaller subproblems.
    • Solve: solve each subproblem independently.
    • Combine: merge the solutions to form the final answer.
  • DC abstraction (control flow):
    • DC(P): if P is small enough then return Solution of P
    • else divide P into k subproblems P1,…,Pk; solve each Pi using DC; return combined solution DC(P1), DC(P2), …, DC(Pk)
  • Why DC works well here:
    • Subproblems are identical in structure to the main problem but with smaller parameters; recursive approach fits naturally.
    • Each subproblem is solved to the smallest possible size, then results are merged.
  • Time complexity recurrence (general DC):
    • T(n) = a \, T\left(\frac{n}{b}\right) + f(n)
    • where a is the number of subproblems, each of size n/b, and f(n) is the combine/divide cost.
  • Merge sort algorithm structure:
    • MergeSort(A, start, end) recursively splits until start < end, then merges:
    • Pseudocode snippet:
      text MergeSort(A, start, end) if start < end: mid = start + end ers 2 MergeSort(A, start, mid) MergeSort(A, mid+1, end) merge(A, start, mid, end)
  • Merge operation details:
    • Given arrays L and R derived from A[start..mid] and A[mid+1..end], merge in sorted order into A[start..end].
    • Typical merge skeleton creates sentinels ∞ at the end of L and R to simplify logic.
    • Merge pseudocode (from slides):
      text merge(A, start, mid, end): n1 = mid - start + 1 n2 = end - mid L[1..n1+1] and R[1..n2+1] be new arrays for i = 1 to n1: L[i] = A[start + i - 1] for j = 1 to n2: R[j] = A[mid + j] L[n1+1] = ∞; R[n2+1] = ∞ i = 1; j = 1 for k = start to end: if L[i] ≤ R[j]: A[k] = L[i]; i = i + 1 else: A[k] = R[j]; j = j + 1
  • Midpoint choice nuance (Merge Sort example):
    • Mid computed as ext{mid} = ext{start} + \frac{\text{end} - \text{start}}{2}
    • Some slides emphasize using the ceiling for mid: ext{Mid} = \left\lceil \frac{\text{start} + \text{end}}{2} \right\rceil
  • Example of a Merge Sort run (n=10) shows progressive halving and merging of sublists, culminating in a fully sorted array.
  • Merge sort implementation reference (C-like) is shown under a section labeled "Merge Sort - Implementation".
  • Key conclusions:
    • Merge sort has stable O(n log n) time and requires extra space for merging.
    • It uses a bottom-up view when n reduces to 1 and merges up.

Quick Sort

  • QuickSort(A, start, end) overview:
    • If start < end:
    • Partition A[start..end] into two parts around a pivot.
    • Recursively quicksort the left part and the right part.
  • Partition step (as given):
    • pivot = A[end]
    • i = start - 1
    • for j = start to end - 1:
    • if A[j] <= pivot: i = i + 1; swap A[i] with A[j]
    • swap A[i+1] with A[end]
    • return i+1 as pivot position
  • Pseudocode snippet (as shown):
  QuickSort(A, start, end)
    if start < end:
      pivot = PARTITION(A, start, end)
      QuickSort(A, start, pivot - 1)
      QuickSort(A, pivot + 1, end)
  • Partition function snippet (as shown):
  PARTITION(A, start, end)
    pivot = A[end]
    i = start - 1
    for j = start to end - 1
      if A[j] <= pivot
        i = i + 1
        swap A[i] with A[j]
    swap A[i+1] with A[end]
    return i+1
  • Quick sort notes:
    • Provides efficient average performance, but worst-case can degrade to O(n^2) without optimizations like randomized pivoting or median-of-three.

Searching

What is Searching

  • Searching is the process of finding whether a given Key exists in a collection.
  • Applications include locating elements in arrays, lists, or data structures.
  • Main methods discussed: Linear (Sequential) Search and Binary Search.

Linear Search

  • Also called Sequential Search.
  • Assumes A is an n-dimensional array and the target is the Key.
  • Process:
    • Compare Key to each element of A one by one until a match is found or the array is exhausted.
    • Stop when the key element is discovered or after checking all elements.
  • Linear search algorithm (pseudocode from slides):
  LINEAR_SEARCH(A, Key)
    // Description: Perform a linear search on array A to search element Key
    // Input: Array of length n and Key to be searched
    // Output: Success / failure message
    flag = 0
    for i = 1 to n:
      if Key == A[i]:
        print "Element Found on Location", i
        flag = 1
        break
    if flag == 0:
      print "Element Not Found"
  • Linear search code example (as shown in slides):
    • A problem statement with two examples:
      1) Key = 71
      2) Key = 42
    • Includes writing the code for these examples.
  • Complexity (as described):
    • Best Case: If the key is at the first location, only 1 comparison is needed, so T(n) = O(1).
    • Worst Case: If the key is at the last position or not present, all elements are checked; T(n) = O(n).
    • Recurrence: T(n) = T(n - 1) + 1 which solves to O(n) in the worst case.

Binary Search

  • Binary search is more efficient than linear search, but requires a sorted array.
  • Strategy: divide the search space in half at each step and compare the middle element to the key.
  • Key concepts:
    • Maintain low and high indices (initially low = 1, high = n).
    • Compute mid as ext{mid} = \left\lfloor \frac{\text{low} + \text{high}}{2} \right\rfloor or as described in some slides with the exact calculation details.
    • If A[mid] equals key, search is successful; if key < A[mid], search left (high = mid - 1); if key > A[mid], search right (low = mid + 1).
    • Repeat until low > high or key is found.
  • Binary search explanations and example shots show the movement of low, high, and mid and the decision process at each step.
  • Binary search algorithm (pseudocode):
  BINARY_SEARCH(A, Key)
    low <- 1
    high <- n
    while low <= high:
      mid <- (low + high) / 2
      if A[mid] == Key:
        return mid
      else if A[mid] < Key:
        low <- mid + 1
      else:
        high <- mid - 1
    return 0
  • Complexity (as described):
    • Best Case: 1 comparison if key is exactly the middle element on first check; T(n) = 1.
    • Worst Case: Each iteration halves the search space; number of comparisons grows as O(log n).

Divide and Conquer (DC) Overview

  • Core idea: divide a big problem into smaller subproblems, solve them, then combine their solutions.
  • Three stages:
    • Divide: recursively split the problem into smaller subproblems.
    • Solve: solve each subproblem independently.
    • Combine: merge results to form the final answer.
  • DC abstraction (control flow):
    • Algorithm DC(P):
    • if P is small enough then return Solution of P
    • else divide P into k smaller subproblems P1, P2, …, Pk
    • solve each Pi using DC strategy
    • return combined solution (DC(P1), DC(P2), …, DC(Pk))
  • Properties:
    • Subproblems are independent and can be solved in parallel or sequentially.
    • Subproblems are often smaller versions of the original problem.
  • Recurrence for DC problems (generic):
    • T(n) = a\, T\left(\frac{n}{b}\right) + f(n)
    • a = number of subproblems, each of size n/b; f(n) = cost to divide and combine.
  • DC illustration (hierarchical view):
    • A problem of size n splits into subproblems of size n/2, then subproblems are solved and combined step by step to yield the solution for size n.
  • Important note:
    • Because subproblems can be identical in structure to the main problem, DC often uses recursion; subproblems can be solved independently and combined at the end.
  • DC abstractions appear as general-purpose templates for many sorting and searching algorithms (e.g., Merge Sort, Quick Sort).

Merge Sort Details

  • Merge sort relies on the DC paradigm:
    • Split the list into halves repeatedly until size 1; then merge back in sorted order.
  • Example (n = 10) shows step-by-step splitting:
    • Initial: 410 385 279 752 451 523 961 354 550 620
    • Split into halves and continue recursively until single-element lists.
  • Merge step mechanics:
    • During merge, two sorted lists are merged into one sorted list by repeatedly selecting the smaller head element.
    • Implementation uses temporary arrays L and R with sentinels ∞ at the end to simplify comparison logic.
  • Merge Sort pseudocode (high-level):
    • MergeSort(A, start, end):
    • if start < end:
      • mid = start + (end - start) / 2
      • MergeSort(A, start, mid)
      • MergeSort(A, mid+1, end)
      • merge(A, start, mid, end)
  • Merge function details (pseudocode excerpt):
    • Compute n1 = mid - start + 1
    • Compute n2 = end - mid
    • Build L[1..n1+1], R[1..n2+1]
    • Copy values into L and R, set L[n1+1] = ∞, R[n2+1] = ∞
    • Merge back into A[start..end] by comparing L[i] and R[j]
  • Implementation notes:
    • Midpoint calculation nuance: use mid = start + (end - start)/2 or the alternative that uses the mathematical mid as shown in slides.
    • The slides also present a dedicated MergeSort implementation under a section titled "Merge Sort - Implementation".
  • Key properties:
    • Stability: Merge sort is typically stable when merge ensures stable ordering.
    • Complexity: O(n log n) time, O(n) auxiliary space (typical implementation).

Quick Sort (Details)

  • QuickSort structure (as shown):
    • QuickSort(A, start, end)
    • if start < end:
    • pivot = PARTITION(A, start, end)
    • QuickSort(A, start, pivot - 1)
    • QuickSort(A, pivot + 1, end)
  • Partition function (as shown):
    • PARTITION(A, start, end)
    • pivot = A[end]
    • i = start - 1
    • for j = start to end - 1:
    • if A[j] <= pivot:
      • i = i + 1
      • swap A[i] with A[j]
    • swap A[i+1] with A[end]
    • return i+1
  • Notes:
    • The partition step places the pivot in its final position and ensures elements
    • Quick sort performance is highly dependent on pivot choice; average-case is O(n log n).

Complexity and Practical Considerations

  • Linear Search: best-case O(1), worst-case O(n); recurrence T(n) = T(n-1) + 1.
  • Binary Search: best-case O(1), worst-case O(log n).
  • Divide and Conquer: general recurrence T(n) = a T(n/b) + f(n); used to analyze algorithms like Merge Sort and Quick Sort.
  • Merge Sort: stable runtime behavior with O(n log n) time; extra space for merging.
  • Quick Sort: typically O(n log n) average, but worst-case O(n^2) with poor pivot choices; in practice, optimizations (randomized pivots, median-of-three) help.
  • Copying and merging costs, as well as memory usage, are important in the practical selection of sorting algorithms.

Key Formulas and Notations

  • Divide and Conquer recurrence:
    • T(n) = a\, T\left(\frac{n}{b}\right) + f(n)
  • Merge step midpoint (common form):
    • \text{mid} = \text{start} + \frac{\text{end} - \text{start}}{2}
  • Midpoint (alternative expression from slides):
    • \text{Mid} = \left\lceil \frac{\text{low} + \text{high}}{2} \right\rceil
  • Binary search core concept recap:
    • Maintain search interval [low, high], repeatedly halve until key is found or interval is empty.

Quick Reference: Pseudocode Highlights (C-like)

  • Insertion Sort (as above)
  • Bubble Sort (as above)
  • Selection Sort (brief outline):
    • Find the minimum element in the unsorted portion and swap it with the element at the current position; repeat for each position.
  • Merge Sort (as above)
  • Quick Sort (as above)

Connections and Practical Implications

  • Brute force is straightforward and robust for small problems but scales poorly.
  • Insertion, Bubble, and Selection sorts are simple to implement and reason about but have higher time complexity for large datasets; their educational value lies in illustrating sorting mechanics.
  • Merge Sort and Quick Sort demonstrate the Divide and Conquer paradigm; Merge Sort is stable and predictable, while Quick Sort tends to be faster in practice with good pivot strategies.
  • Linear vs Binary search illustrates the advantage of pre-processing (sorting) against straightforward search; linear search is universal but slower on large datasets, while binary search halves the search space per step but requires sorted data.

References to Provided Slides (Content Overview)

  • Page 3: Brute force description and use-cases.
  • Pages 4-8: Insertion Sort steps, key concept, and C implementation.
  • Pages 9-12: Bubble Sort descriptions, step-by-step explanation, and C implementation.
  • Pages 13-16: Selection Sort process, iteration details, and sorted results.
  • Pages 17-18: Introduction to Searching and methods.
  • Pages 19-22: Linear Search definitions, algorithm, and complexity discussion.
  • Pages 23-27: Binary Search concepts, algorithm, and complexity.
  • Pages 28-33: Divide and Conquer overview and abstraction with recurrence discussion.
  • Pages 34-41: Merge Sort example, algorithm, and implementation notes.
  • Pages 42-44: Quick Sort algorithm and partition details.
  • Page 45: Final framing/closing.