Fundamental Algorithms – Search Algorithms: Comprehensive Study Notes

Overview of Search Algorithms

Definition of an Algorithm
- Finite set of instructions that completes a task and is guaranteed to terminate.
Purpose of a Searching Algorithm
- Locate the position/index of a specific item in a data structure, or verify its absence.
Search Strategies Covered in Chap 8
- Linear (Sequential) Search
- Binary Search
- Hash-Table Search (hash tables to be taught in Semester 2)
Typical Scenarios Requiring Search
- Validate that an element exists in an array before use / processing.
- Find location in order to edit or delete an element.
- Retrieve information from a single-field array.
- Retrieve paired information across two parallel arrays.

Linear (Sequential) Search

Conceptual Description
- Examine elements one-by-one from a chosen starting point, never skipping elements, till the target is found or the entire structure is exhausted.
Possible Data Conditions
- Works on unsorted (unordered) and sorted (ordered) datasets.
- Simple to code; minimal overhead.
Input/Process/Output Model
- Input:
- Search key (target)
- Data set (sequence)
- Process: Use linear walk to compare each element to key.
- Output:
- “Found” / “Not Found”
- Index or memory address when found.

Linear Search on an Unordered Data Set

Descriptive Algorithm
- Read first element, compare with key, advance pointer count while not at end and array element not NULL.
Representative Pseudocode (0-based, 20-slot global array ar)

  FUNCTION linearSearchUnordered(key : STRING) RETURNS BOOLEAN
      DECLARE found, size, count : BOOLEAN, INTEGER, INTEGER
      found ← FALSE
      count ← 0
      size ← LENGTH(ar)
      WHILE count < size AND ar[count] <> NULL
          IF key = ar[count] THEN
              found ← TRUE
              RETURN found
          ELSE
              count ← count + 1
          ENDIF
      ENDWHILE
      RETURN found
  ENDFUNCTION

Exercise 1 Highlights
- (a) WHILE / REPEAT-UNTIL preferred over FOR because loop may need early exit once key is found; flexible stopping condition.
- (b) Example array ["Dewi","Ali","Sean","Olaf","Cindi","Feng"]
- Searching for "Olaf" ⇒ 4 iterations (indices 0–3 fail, index 3 success).
- Searching for "Dill" ⇒ 6 iterations (checks all 6 elements, not found).
- (c) Dry-run traces state/variable changes step-by-step.

Lexicographic Ordering Refresher (side note)

Alphabetical / dictionary order of strings; examples: "ABC" < "ACB" < "BAC" …

Linear Search on an Ordered Data Set

Optimisation Possibility
- Because data are sorted, search can terminate early if current array value exceeds the key (for ascending order).
Descriptive Steps
1. Start at first item.
2. Compare current value with key.
3. If equal ⇒ found.
4. Else move to next item.
5. Stop if last element reached or key < current value.
Skeleton Pseudocode to Complete (Exercise 2)

  DECLARE length, count : INTEGER
  length ← LENGTH(ar)
  count  ← 0
  WHILE count < length AND key >= ar[count]
      IF ar[count] = key THEN
          RETURN TRUE
      ENDIF
      count ← count + 1
  ENDWHILE
  RETURN FALSE

Sorted Example Array ["Ali","Cindi","Dewi","Feng","Olaf","Sean"]
- Searching "Dill" terminates when pointer reaches "Feng" (> "Dill").

Advantages & Disadvantages of Linear Search

Advantages
- Conceptually simple; trivial to implement.
- Requires no prior sorting; works on any list.
- For very small lists, overhead of more complex algorithms may outweigh gains; linear search can be faster.
Disadvantages
- Time complexity: $\Theta(n)$ comparisons in worst & average case.
- Inefficient for large datasets; must inspect every element if key absent.

Implementation Exercises (Page 5)

Iterative pseudocode returning index or $-1$ (unsorted strings).
Python iterative returning Boolean on sorted ascending integers.
Python recursive returning index or $-1$ on sorted descending integers.

Binary Search

Pre-Conditions
- Data must be sorted (ascending or descending consistent with comparison logic).
Core Idea
- Repeatedly split search space in half by comparing middle element to key; discard half where key cannot lie.
Midpoint Formula (integer indices) $\text{mid} = \big\lfloor \text{low} + \frac{\text{high} - \text{low}}{2} \big\rfloor$
- Integer truncation (INT) removes fractional part.
Illustrative Example (Ascending array, key = 52)
- Initial low = 1, high = 14.
- mid = 7 → arr[7] = 70 (> 52) ⇒ search left sub-array.
- New low = 1, high = 6.
- mid = 3 → arr[3] = 18 (< 52) ⇒ search right sub-array.
- New low = 4, high = 6.
- mid = 5 → arr[5] = 34 (< 52).
- New low = 6, high = 6.
- mid = 6 → arr[6] = 52 ⇒ found.

Iterative Binary Search (Pseudocode)

FUNCTION BinarySearch_Iterative(search_key: INTEGER) RETURNS BOOLEAN
    DECLARE low, high, mid : INTEGER
    low  ← 1            // using 1-based indexing
    high ← LENGTH(arr) // arr sorted ascending
    WHILE low <= high DO
        mid ← INT((low + high)/2)
        IF search_key = arr[mid] THEN
            RETURN TRUE
        ELSE IF search_key < arr[mid] THEN
            high ← mid - 1
        ELSE
            low ← mid + 1
        ENDIF
    ENDWHILE
    RETURN FALSE
ENDFUNCTION

Recursive Binary Search (Pseudocode)

FUNCTION binSearch_re(low, high, key: INTEGER, arr: ARRAY) RETURNS BOOLEAN
    IF low > high THEN // base case: NOT FOUND
        RETURN FALSE
    ENDIF
    DECLARE mid : INTEGER
    mid ← INT(low + (high - low)/2)
    IF arr[mid] = key THEN // base case: FOUND
        RETURN TRUE
    ENDIF
    IF arr[mid] < key THEN
        RETURN binSearch_re(mid + 1, high, key, arr)
    ELSE // arr[mid] > key
        RETURN binSearch_re(low, mid - 1, key, arr)
    ENDIF
ENDFUNCTION

Exercise Question (Page 9)
- Number of self-calls for key = 52 vs. 53 depends on array length; for the demo array above:
- key = 52 ⇒ 3 recursive calls (levels) before success.
- key = 53 ⇒ 4 recursive calls before hitting base case low > high.

Advantages & Disadvantages of Binary Search

Advantages
- Time complexity: $\Theta(\log_2 n)$ comparisons.
- Drastically fewer checks for large $n$ ; e.g. $n = 1024$ ⇒ at most $\lceil \log_2 1024 \rceil = 10$ steps.
Disadvantages
- Prerequisite: dataset must be sorted.
- Sorting overhead if original data unsorted.
- Duplicate keys complicate localisation (may revert to local linear scan to find first/last occurrence).

Comparative Summary

Linear Search
- Pros: no assumptions; constant memory $O(1)$ ; works on small lists.
- Cons: $O(n)$ time.
Binary Search
- Pros: $O(\log n)$ time; scalable.
- Cons: Require sorted input; may not suit dynamic datasets that change frequently.

Worksheet-Style Applications & Tracing

Trace Table Variables (Binary Search Example key = 95)
- Columns: low, high, condition low <= high, mid, arr[mid], key = arr[mid], OUTPUT (TRUE/FALSE / continue).
- Populate row-by-row while running algorithm.
Trace Tree Diagram (Recursive)
- Nodes labelled with subset ar[low..high] + computed mid; leaves represent base cases.

Practical Programming Exercise – Inventory Management

File inventory.txt
- Each line: ProductID,ProductName,Quantity (comma-separated).
- Example entries: 101,Apple,50, 102,Raddish,100, 103,Orange,75.

Task 1 – Iterative Linear Search in Python

Function signature: linear_search(target_name: str, filename: str) -> int
Steps:
1. Read file line-by-line; parse into (id, name, qty).
2. Compare name to target_name sequentially.
3. Return qty if found; else return $-1$ .

Task 2 – Binary Search in Python

Function signature: binary_search(target_name: str, filename: str) -> int
Steps:
1. Read file and load tuples (name, qty) into list.
2. Sort list lexicographically by name.
3. Apply iterative/recursive binary search on sorted list.
4. Return qty when found, else $-1$ .

Task 3 – Main Program Flow

Prompt user once for product name.
Call linear_search; display result.
Call binary_search; display result.
Assumptions: file well-formatted, unique product names.

Key Equations & Complexity Results

Midpoint (integer index): $\text{mid} = \big\lfloor \text{low} + \frac{\text{high} - \text{low}}{2} \big\rfloor$
Linear search worst-case comparisons: $n$ → $\Theta(n)$ .
Binary search worst-case comparisons: $\lceil \log_2 n \rceil$ → $\Theta(\log n)$ .

Ethical & Practical Implications

Data Integrity: Always validate file inputs (inventory) to avoid incorrect search results.
Performance vs. Simplicity Trade-Off: Choose algorithm appropriate to dataset size and mutability.
User Experience: Immediate feedback for small arrays may not justify sorting overhead.
Future Extensibility: Hash tables (covered later) may offer $O(1)$ average search but introduce collision management complexities.