Fundamental Algorithms – Search Algorithms: Comprehensive Study Notes

Overview of Search Algorithms

  • Definition of an Algorithm
    • Finite set of instructions that completes a task and is guaranteed to terminate.
  • Purpose of a Searching Algorithm
    • Locate the position/index of a specific item in a data structure, or verify its absence.
  • Search Strategies Covered in Chap 8
    • Linear (Sequential) Search
    • Binary Search
    • Hash-Table Search (hash tables to be taught in Semester 2)
  • Typical Scenarios Requiring Search
    • Validate that an element exists in an array before use / processing.
    • Find location in order to edit or delete an element.
    • Retrieve information from a single-field array.
    • Retrieve paired information across two parallel arrays.

Linear (Sequential) Search

  • Conceptual Description
    • Examine elements one-by-one from a chosen starting point, never skipping elements, till the target is found or the entire structure is exhausted.
  • Possible Data Conditions
    • Works on unsorted (unordered) and sorted (ordered) datasets.
    • Simple to code; minimal overhead.
  • Input/Process/Output Model
    • Input:
    • Search key (target)
    • Data set (sequence)
    • Process: Use linear walk to compare each element to key.
    • Output:
    • “Found” / “Not Found”
    • Index or memory address when found.
Linear Search on an Unordered Data Set
  • Descriptive Algorithm
    • Read first element, compare with key, advance pointer count while not at end and array element not NULL.
  • Representative Pseudocode (0-based, 20-slot global array ar)
  FUNCTION linearSearchUnordered(key : STRING) RETURNS BOOLEAN
      DECLARE found, size, count : BOOLEAN, INTEGER, INTEGER
      found ← FALSE
      count ← 0
      size ← LENGTH(ar)
      WHILE count < size AND ar[count] <> NULL
          IF key = ar[count] THEN
              found ← TRUE
              RETURN found
          ELSE
              count ← count + 1
          ENDIF
      ENDWHILE
      RETURN found
  ENDFUNCTION
  • Exercise 1 Highlights
    • (a) WHILE / REPEAT-UNTIL preferred over FOR because loop may need early exit once key is found; flexible stopping condition.
    • (b) Example array ["Dewi","Ali","Sean","Olaf","Cindi","Feng"]
    • Searching for "Olaf" ⇒ 4 iterations (indices 0–3 fail, index 3 success).
    • Searching for "Dill" ⇒ 6 iterations (checks all 6 elements, not found).
    • (c) Dry-run traces state/variable changes step-by-step.
Lexicographic Ordering Refresher (side note)
  • Alphabetical / dictionary order of strings; examples: "ABC" < "ACB" < "BAC" …
Linear Search on an Ordered Data Set
  • Optimisation Possibility
    • Because data are sorted, search can terminate early if current array value exceeds the key (for ascending order).
  • Descriptive Steps
    1. Start at first item.
    2. Compare current value with key.
    3. If equal ⇒ found.
    4. Else move to next item.
    5. Stop if last element reached or key < current value.
  • Skeleton Pseudocode to Complete (Exercise 2)
  DECLARE length, count : INTEGER
  length ← LENGTH(ar)
  count  ← 0
  WHILE count < length AND key >= ar[count]
      IF ar[count] = key THEN
          RETURN TRUE
      ENDIF
      count ← count + 1
  ENDWHILE
  RETURN FALSE
  • Sorted Example Array ["Ali","Cindi","Dewi","Feng","Olaf","Sean"]
    • Searching "Dill" terminates when pointer reaches "Feng" (> "Dill").
Advantages & Disadvantages of Linear Search
  • Advantages
    • Conceptually simple; trivial to implement.
    • Requires no prior sorting; works on any list.
    • For very small lists, overhead of more complex algorithms may outweigh gains; linear search can be faster.
  • Disadvantages
    • Time complexity: Θ(n)\Theta(n) comparisons in worst & average case.
    • Inefficient for large datasets; must inspect every element if key absent.
Implementation Exercises (Page 5)
  • Iterative pseudocode returning index or 1-1 (unsorted strings).
  • Python iterative returning Boolean on sorted ascending integers.
  • Python recursive returning index or 1-1 on sorted descending integers.

Binary Search

  • Pre-Conditions
    • Data must be sorted (ascending or descending consistent with comparison logic).
  • Core Idea
    • Repeatedly split search space in half by comparing middle element to key; discard half where key cannot lie.
  • Midpoint Formula (integer indices)mid=low+highlow2\text{mid} = \big\lfloor \text{low} + \frac{\text{high} - \text{low}}{2} \big\rfloor
    • Integer truncation (INT) removes fractional part.
  • Illustrative Example (Ascending array, key = 52)
    • Initial low = 1, high = 14.
    • mid = 7 → arr[7] = 70 (> 52) ⇒ search left sub-array.
    • New low = 1, high = 6.
    • mid = 3 → arr[3] = 18 (< 52) ⇒ search right sub-array.
    • New low = 4, high = 6.
    • mid = 5 → arr[5] = 34 (< 52).
    • New low = 6, high = 6.
    • mid = 6 → arr[6] = 52 ⇒ found.
Iterative Binary Search (Pseudocode)
FUNCTION BinarySearch_Iterative(search_key: INTEGER) RETURNS BOOLEAN
    DECLARE low, high, mid : INTEGER
    low  ← 1            // using 1-based indexing
    high ← LENGTH(arr) // arr sorted ascending
    WHILE low <= high DO
        mid ← INT((low + high)/2)
        IF search_key = arr[mid] THEN
            RETURN TRUE
        ELSE IF search_key < arr[mid] THEN
            high ← mid - 1
        ELSE
            low ← mid + 1
        ENDIF
    ENDWHILE
    RETURN FALSE
ENDFUNCTION
Recursive Binary Search (Pseudocode)
FUNCTION binSearch_re(low, high, key: INTEGER, arr: ARRAY) RETURNS BOOLEAN
    IF low > high THEN // base case: NOT FOUND
        RETURN FALSE
    ENDIF
    DECLARE mid : INTEGER
    mid ← INT(low + (high - low)/2)
    IF arr[mid] = key THEN // base case: FOUND
        RETURN TRUE
    ENDIF
    IF arr[mid] < key THEN
        RETURN binSearch_re(mid + 1, high, key, arr)
    ELSE // arr[mid] > key
        RETURN binSearch_re(low, mid - 1, key, arr)
    ENDIF
ENDFUNCTION
  • Exercise Question (Page 9)
    • Number of self-calls for key = 52 vs. 53 depends on array length; for the demo array above:
    • key = 52 ⇒ 3 recursive calls (levels) before success.
    • key = 53 ⇒ 4 recursive calls before hitting base case low > high.
Advantages & Disadvantages of Binary Search
  • Advantages
    • Time complexity: Θ(log2n)\Theta(\log_2 n) comparisons.
    • Drastically fewer checks for large nn; e.g. n=1024n = 1024 ⇒ at most log21024=10\lceil \log_2 1024 \rceil = 10 steps.
  • Disadvantages
    • Prerequisite: dataset must be sorted.
    • Sorting overhead if original data unsorted.
    • Duplicate keys complicate localisation (may revert to local linear scan to find first/last occurrence).

Comparative Summary

  • Linear Search
    • Pros: no assumptions; constant memory O(1)O(1); works on small lists.
    • Cons: O(n)O(n) time.
  • Binary Search
    • Pros: O(logn)O(\log n) time; scalable.
    • Cons: Require sorted input; may not suit dynamic datasets that change frequently.

Worksheet-Style Applications & Tracing

  • Trace Table Variables (Binary Search Example key = 95)
    • Columns: low, high, condition low <= high, mid, arr[mid], key = arr[mid], OUTPUT (TRUE/FALSE / continue).
    • Populate row-by-row while running algorithm.
  • Trace Tree Diagram (Recursive)
    • Nodes labelled with subset ar[low..high] + computed mid; leaves represent base cases.

Practical Programming Exercise – Inventory Management

  • File inventory.txt
    • Each line: ProductID,ProductName,Quantity (comma-separated).
    • Example entries: 101,Apple,50, 102,Raddish,100, 103,Orange,75.
Task 1 – Iterative Linear Search in Python
  • Function signature: linear_search(target_name: str, filename: str) -> int
  • Steps:
    1. Read file line-by-line; parse into (id, name, qty).
    2. Compare name to target_name sequentially.
    3. Return qty if found; else return 1-1.
Task 2 – Binary Search in Python
  • Function signature: binary_search(target_name: str, filename: str) -> int
  • Steps:
    1. Read file and load tuples (name, qty) into list.
    2. Sort list lexicographically by name.
    3. Apply iterative/recursive binary search on sorted list.
    4. Return qty when found, else 1-1.
Task 3 – Main Program Flow
  • Prompt user once for product name.
  • Call linear_search; display result.
  • Call binary_search; display result.
  • Assumptions: file well-formatted, unique product names.

Key Equations & Complexity Results

  • Midpoint (integer index): mid=low+highlow2\text{mid} = \big\lfloor \text{low} + \frac{\text{high} - \text{low}}{2} \big\rfloor
  • Linear search worst-case comparisons: nnΘ(n)\Theta(n).
  • Binary search worst-case comparisons: log2n\lceil \log_2 n \rceilΘ(logn)\Theta(\log n).

Ethical & Practical Implications

  • Data Integrity: Always validate file inputs (inventory) to avoid incorrect search results.
  • Performance vs. Simplicity Trade-Off: Choose algorithm appropriate to dataset size and mutability.
  • User Experience: Immediate feedback for small arrays may not justify sorting overhead.
  • Future Extensibility: Hash tables (covered later) may offer O(1)O(1) average search but introduce collision management complexities.