Sorting and Searching - Comprehensive Study Notes

Brute Force Approach

Brute force algorithm is a basic algorithm which works through every possible answer until the correct one is found.
It can solve a problem easily when the size of the problem is small, and can be preferable over crafting a more elegant solution in such cases.
It is highly effective in finding solutions that other algorithms might miss.
Essence: trial and error, letting a computer systematically explore a range of possibilities until the correct solution is reached.

Insertion Sort

Insertion Sort Algorithm (overview):
1. If the element is the first element, assume that it is already sorted. Return 1.
2. Pick the next element and store it separately in a key.
3. Compare the key with all elements in the sorted sub-list.
4. If the element in the sorted array is smaller than the current element, move to the next element; otherwise, shift greater elements in the array towards the right.
5. Insert the value.
6. Repeat until the array is sorted.
Example starting array (Arr) and subarrays:
- Initial: Arr → [0] [1] [2] [3] [4] [5]
- We take the first element Arr[0] as the sorted subarray; the rest is unsorted.
Key concepts:
- key: the first element of the unsorted subarray (e.g., Arr[1] initially).
- As we proceed, the sorted subarray grows; the unsorted subarray shrinks.
Step-by-step evolution (illustrative):
- 17 35 15 21 4 8 (Initial)
- Sorted vs Unsorted regions shift as we insert each key into its correct position.
Insertion Sort pseudocode (C):

  int insertion_sort(int A[], int n) {
    int curr, prev;
    for (int i = 1; i < n; i++) {
      curr = A[i];
      prev = i - 1;
      while (prev >= 0 && A[prev] > curr) {
        A[prev + 1] = A[prev];
        prev--;
      }
      A[prev + 1] = curr;
    }
    return 0;
  }

Original step-by-step demonstrations (as shown in slides):
- Page 5: Initial array and labeling of Sorted / Unsorted regions.
- Page 6: After first insertion, two elements are in the sorted subarray; n-2 elements unsorted.
- Pages 7-8: Code listing for insertion_sort(
  A[], n)
Remarks:
- Insertion sort builds the final sorted array one item at a time by inserting each new element into its proper place within the already-sorted portion.
- Suitable for nearly-sorted data due to fewer shifts.

Bubble Sort

Bubble sort idea: repeatedly compare adjacent pairs and swap them if they are in the wrong order.
Pseudocode (as given):

  Bubble Sort(Array a[])
  Begin
    for i = 0 to n − 1
      for j = 0 to n − i - 1
        if (a[j] > a[j + 1]) then
          Swap (a[j], a[j + 1])
  End

Explanatory steps (illustrated in slides):
- Each pass bubbles the largest remaining element to its correct position at the end.
- Example traces show comparisons and swaps per iteration, with number of comparisons decreasing as the end portion becomes sorted.
Bubble sort implementation (C) shown:

  int bubble_sort(int A[], int n) {
    for (int i = 0; i < n-1; i++) {
      for (int j = 0; j < n - i -1; j++) {
        if (A[j] > A[j + 1]) {
          swap(A[j], A[j + 1]);
        }
      }
    }
  }

Key takeaway:
- Simple, but has O(n^2) time in worst and average cases; performs well for small arrays or nearly-sorted data, but not scalable.

Merge Sort

Divide and Conquer approach (high-level):
- Divide: split problem recursively into smaller subproblems.
- Solve: solve each subproblem independently.
- Combine: merge the solutions to form the final answer.
DC abstraction (control flow):
- DC(P): if P is small enough then return Solution of P
- else divide P into k subproblems P1,…,Pk; solve each Pi using DC; return combined solution DC(P1), DC(P2), …, DC(Pk)
Why DC works well here:
- Subproblems are identical in structure to the main problem but with smaller parameters; recursive approach fits naturally.
- Each subproblem is solved to the smallest possible size, then results are merged.
Time complexity recurrence (general DC):
- $T(n) = a \, T\left(\frac{n}{b}\right) + f(n)$
- where a is the number of subproblems, each of size n/b, and f(n) is the combine/divide cost.
Merge sort algorithm structure:
- MergeSort(A, start, end) recursively splits until start < end, then merges:
- Pseudocode snippet:
  text MergeSort(A, start, end) if start < end: mid = start + end ers 2 MergeSort(A, start, mid) MergeSort(A, mid+1, end) merge(A, start, mid, end)
Merge operation details:
- Given arrays L and R derived from A[start..mid] and A[mid+1..end], merge in sorted order into A[start..end].
- Typical merge skeleton creates sentinels ∞ at the end of L and R to simplify logic.
- Merge pseudocode (from slides):
  text merge(A, start, mid, end): n1 = mid - start + 1 n2 = end - mid L[1..n1+1] and R[1..n2+1] be new arrays for i = 1 to n1: L[i] = A[start + i - 1] for j = 1 to n2: R[j] = A[mid + j] L[n1+1] = ∞; R[n2+1] = ∞ i = 1; j = 1 for k = start to end: if L[i] ≤ R[j]: A[k] = L[i]; i = i + 1 else: A[k] = R[j]; j = j + 1
Midpoint choice nuance (Merge Sort example):
- Mid computed as $ext{mid} = ext{start} + \frac{\text{end} - \text{start}}{2}$
- Some slides emphasize using the ceiling for mid: $ext{Mid} = \left\lceil \frac{\text{start} + \text{end}}{2} \right\rceil$
Example of a Merge Sort run (n=10) shows progressive halving and merging of sublists, culminating in a fully sorted array.
Merge sort implementation reference (C-like) is shown under a section labeled "Merge Sort - Implementation".
Key conclusions:
- Merge sort has stable O(n log n) time and requires extra space for merging.
- It uses a bottom-up view when n reduces to 1 and merges up.

Quick Sort

QuickSort(A, start, end) overview:
- If start < end:
- Partition A[start..end] into two parts around a pivot.
- Recursively quicksort the left part and the right part.
Partition step (as given):
- pivot = A[end]
- i = start - 1
- for j = start to end - 1:
- if A[j] <= pivot: i = i + 1; swap A[i] with A[j]
- swap A[i+1] with A[end]
- return i+1 as pivot position
Pseudocode snippet (as shown):

  QuickSort(A, start, end)
    if start < end:
      pivot = PARTITION(A, start, end)
      QuickSort(A, start, pivot - 1)
      QuickSort(A, pivot + 1, end)

Partition function snippet (as shown):

  PARTITION(A, start, end)
    pivot = A[end]
    i = start - 1
    for j = start to end - 1
      if A[j] <= pivot
        i = i + 1
        swap A[i] with A[j]
    swap A[i+1] with A[end]
    return i+1

Quick sort notes:
- Provides efficient average performance, but worst-case can degrade to O(n^2) without optimizations like randomized pivoting or median-of-three.

Searching

What is Searching

Searching is the process of finding whether a given Key exists in a collection.
Applications include locating elements in arrays, lists, or data structures.
Main methods discussed: Linear (Sequential) Search and Binary Search.

Linear Search

Also called Sequential Search.
Assumes A is an n-dimensional array and the target is the Key.
Process:
- Compare Key to each element of A one by one until a match is found or the array is exhausted.
- Stop when the key element is discovered or after checking all elements.
Linear search algorithm (pseudocode from slides):

  LINEAR_SEARCH(A, Key)
    // Description: Perform a linear search on array A to search element Key
    // Input: Array of length n and Key to be searched
    // Output: Success / failure message
    flag = 0
    for i = 1 to n:
      if Key == A[i]:
        print "Element Found on Location", i
        flag = 1
        break
    if flag == 0:
      print "Element Not Found"

Linear search code example (as shown in slides):
- A problem statement with two examples:
  1) Key = 71
  2) Key = 42
- Includes writing the code for these examples.
Complexity (as described):
- Best Case: If the key is at the first location, only 1 comparison is needed, so T(n) = O(1).
- Worst Case: If the key is at the last position or not present, all elements are checked; T(n) = O(n).
- Recurrence: $T(n) = T(n - 1) + 1$ which solves to O(n) in the worst case.

Binary Search

Binary search is more efficient than linear search, but requires a sorted array.
Strategy: divide the search space in half at each step and compare the middle element to the key.
Key concepts:
- Maintain low and high indices (initially low = 1, high = n).
- Compute mid as $ext{mid} = \left\lfloor \frac{\text{low} + \text{high}}{2} \right\rfloor$ or as described in some slides with the exact calculation details.
- If A[mid] equals key, search is successful; if key < A[mid], search left (high = mid - 1); if key > A[mid], search right (low = mid + 1).
- Repeat until low > high or key is found.
Binary search explanations and example shots show the movement of low, high, and mid and the decision process at each step.
Binary search algorithm (pseudocode):

  BINARY_SEARCH(A, Key)
    low <- 1
    high <- n
    while low <= high:
      mid <- (low + high) / 2
      if A[mid] == Key:
        return mid
      else if A[mid] < Key:
        low <- mid + 1
      else:
        high <- mid - 1
    return 0

Complexity (as described):
- Best Case: 1 comparison if key is exactly the middle element on first check; T(n) = 1.
- Worst Case: Each iteration halves the search space; number of comparisons grows as O(log n).

Divide and Conquer (DC) Overview

Core idea: divide a big problem into smaller subproblems, solve them, then combine their solutions.
Three stages:
- Divide: recursively split the problem into smaller subproblems.
- Solve: solve each subproblem independently.
- Combine: merge results to form the final answer.
DC abstraction (control flow):
- Algorithm DC(P):
- if P is small enough then return Solution of P
- else divide P into k smaller subproblems P1, P2, …, Pk
- solve each Pi using DC strategy
- return combined solution (DC(P1), DC(P2), …, DC(Pk))
Properties:
- Subproblems are independent and can be solved in parallel or sequentially.
- Subproblems are often smaller versions of the original problem.
Recurrence for DC problems (generic):
- $T(n) = a\, T\left(\frac{n}{b}\right) + f(n)$
- a = number of subproblems, each of size n/b; f(n) = cost to divide and combine.
DC illustration (hierarchical view):
- A problem of size n splits into subproblems of size n/2, then subproblems are solved and combined step by step to yield the solution for size n.
Important note:
- Because subproblems can be identical in structure to the main problem, DC often uses recursion; subproblems can be solved independently and combined at the end.
DC abstractions appear as general-purpose templates for many sorting and searching algorithms (e.g., Merge Sort, Quick Sort).

Merge Sort Details

Merge sort relies on the DC paradigm:
- Split the list into halves repeatedly until size 1; then merge back in sorted order.
Example (n = 10) shows step-by-step splitting:
- Initial: 410 385 279 752 451 523 961 354 550 620
- Split into halves and continue recursively until single-element lists.
Merge step mechanics:
- During merge, two sorted lists are merged into one sorted list by repeatedly selecting the smaller head element.
- Implementation uses temporary arrays L and R with sentinels ∞ at the end to simplify comparison logic.
Merge Sort pseudocode (high-level):
- MergeSort(A, start, end):
- if start < end:
  - mid = start + (end - start) / 2
  - MergeSort(A, start, mid)
  - MergeSort(A, mid+1, end)
  - merge(A, start, mid, end)
Merge function details (pseudocode excerpt):
- Compute n1 = mid - start + 1
- Compute n2 = end - mid
- Build L[1..n1+1], R[1..n2+1]
- Copy values into L and R, set L[n1+1] = ∞, R[n2+1] = ∞
- Merge back into A[start..end] by comparing L[i] and R[j]
Implementation notes:
- Midpoint calculation nuance: use mid = start + (end - start)/2 or the alternative that uses the mathematical mid as shown in slides.
- The slides also present a dedicated MergeSort implementation under a section titled "Merge Sort - Implementation".
Key properties:
- Stability: Merge sort is typically stable when merge ensures stable ordering.
- Complexity: O(n log n) time, O(n) auxiliary space (typical implementation).

Quick Sort (Details)

QuickSort structure (as shown):
- QuickSort(A, start, end)
- if start < end:
- pivot = PARTITION(A, start, end)
- QuickSort(A, start, pivot - 1)
- QuickSort(A, pivot + 1, end)
Partition function (as shown):
- PARTITION(A, start, end)
- pivot = A[end]
- i = start - 1
- for j = start to end - 1:
- if A[j] <= pivot:
  - i = i + 1
  - swap A[i] with A[j]
- swap A[i+1] with A[end]
- return i+1
Notes:
- The partition step places the pivot in its final position and ensures elements
- Quick sort performance is highly dependent on pivot choice; average-case is O(n log n).

Complexity and Practical Considerations

Linear Search: best-case O(1), worst-case O(n); recurrence T(n) = T(n-1) + 1.
Binary Search: best-case O(1), worst-case O(log n).
Divide and Conquer: general recurrence T(n) = a T(n/b) + f(n); used to analyze algorithms like Merge Sort and Quick Sort.
Merge Sort: stable runtime behavior with O(n log n) time; extra space for merging.
Quick Sort: typically O(n log n) average, but worst-case O(n^2) with poor pivot choices; in practice, optimizations (randomized pivots, median-of-three) help.
Copying and merging costs, as well as memory usage, are important in the practical selection of sorting algorithms.

Key Formulas and Notations

Divide and Conquer recurrence:
- $T(n) = a\, T\left(\frac{n}{b}\right) + f(n)$
Merge step midpoint (common form):
- $\text{mid} = \text{start} + \frac{\text{end} - \text{start}}{2}$
Midpoint (alternative expression from slides):
- $\text{Mid} = \left\lceil \frac{\text{low} + \text{high}}{2} \right\rceil$
Binary search core concept recap:
- Maintain search interval [low, high], repeatedly halve until key is found or interval is empty.

Quick Reference: Pseudocode Highlights (C-like)

Insertion Sort (as above)
Bubble Sort (as above)
Selection Sort (brief outline):
- Find the minimum element in the unsorted portion and swap it with the element at the current position; repeat for each position.
Merge Sort (as above)
Quick Sort (as above)

Connections and Practical Implications

Brute force is straightforward and robust for small problems but scales poorly.
Insertion, Bubble, and Selection sorts are simple to implement and reason about but have higher time complexity for large datasets; their educational value lies in illustrating sorting mechanics.
Merge Sort and Quick Sort demonstrate the Divide and Conquer paradigm; Merge Sort is stable and predictable, while Quick Sort tends to be faster in practice with good pivot strategies.
Linear vs Binary search illustrates the advantage of pre-processing (sorting) against straightforward search; linear search is universal but slower on large datasets, while binary search halves the search space per step but requires sorted data.

References to Provided Slides (Content Overview)

Page 3: Brute force description and use-cases.
Pages 4-8: Insertion Sort steps, key concept, and C implementation.
Pages 9-12: Bubble Sort descriptions, step-by-step explanation, and C implementation.
Pages 13-16: Selection Sort process, iteration details, and sorted results.
Pages 17-18: Introduction to Searching and methods.
Pages 19-22: Linear Search definitions, algorithm, and complexity discussion.
Pages 23-27: Binary Search concepts, algorithm, and complexity.
Pages 28-33: Divide and Conquer overview and abstraction with recurrence discussion.
Pages 34-41: Merge Sort example, algorithm, and implementation notes.
Pages 42-44: Quick Sort algorithm and partition details.
Page 45: Final framing/closing.