Overview of Search Algorithms
- Definition of an Algorithm
- Finite set of instructions that completes a task and is guaranteed to terminate.
- Purpose of a Searching Algorithm
- Locate the position/index of a specific item in a data structure, or verify its absence.
- Search Strategies Covered in Chap 8
- Linear (Sequential) Search
- Binary Search
- Hash-Table Search (hash tables to be taught in Semester 2)
- Typical Scenarios Requiring Search
- Validate that an element exists in an array before use / processing.
- Find location in order to edit or delete an element.
- Retrieve information from a single-field array.
- Retrieve paired information across two parallel arrays.
Linear (Sequential) Search
- Conceptual Description
- Examine elements one-by-one from a chosen starting point, never skipping elements, till the target is found or the entire structure is exhausted.
- Possible Data Conditions
- Works on unsorted (unordered) and sorted (ordered) datasets.
- Simple to code; minimal overhead.
- Input/Process/Output Model
- Input:
- Search key (target)
- Data set (sequence)
- Process: Use linear walk to compare each element to key.
- Output:
- “Found” / “Not Found”
- Index or memory address when found.
Linear Search on an Unordered Data Set
- Descriptive Algorithm
- Read first element, compare with key, advance pointer
count while not at end and array element not NULL.
- Representative Pseudocode (0-based, 20-slot global array
ar)
FUNCTION linearSearchUnordered(key : STRING) RETURNS BOOLEAN
DECLARE found, size, count : BOOLEAN, INTEGER, INTEGER
found ← FALSE
count ← 0
size ← LENGTH(ar)
WHILE count < size AND ar[count] <> NULL
IF key = ar[count] THEN
found ← TRUE
RETURN found
ELSE
count ← count + 1
ENDIF
ENDWHILE
RETURN found
ENDFUNCTION
- Exercise 1 Highlights
- (a)
WHILE / REPEAT-UNTIL preferred over FOR because loop may need early exit once key is found; flexible stopping condition. - (b) Example array
["Dewi","Ali","Sean","Olaf","Cindi","Feng"] - Searching for "Olaf" ⇒ 4 iterations (indices 0–3 fail, index 3 success).
- Searching for "Dill" ⇒ 6 iterations (checks all 6 elements, not found).
- (c) Dry-run traces state/variable changes step-by-step.
Lexicographic Ordering Refresher (side note)
- Alphabetical / dictionary order of strings; examples: "ABC" < "ACB" < "BAC" …
Linear Search on an Ordered Data Set
- Optimisation Possibility
- Because data are sorted, search can terminate early if current array value exceeds the key (for ascending order).
- Descriptive Steps
- Start at first item.
- Compare current value with key.
- If equal ⇒ found.
- Else move to next item.
- Stop if last element reached or key < current value.
- Skeleton Pseudocode to Complete (Exercise 2)
DECLARE length, count : INTEGER
length ← LENGTH(ar)
count ← 0
WHILE count < length AND key >= ar[count]
IF ar[count] = key THEN
RETURN TRUE
ENDIF
count ← count + 1
ENDWHILE
RETURN FALSE
- Sorted Example Array
["Ali","Cindi","Dewi","Feng","Olaf","Sean"]- Searching "Dill" terminates when pointer reaches "Feng" (> "Dill").
Advantages & Disadvantages of Linear Search
- Advantages
- Conceptually simple; trivial to implement.
- Requires no prior sorting; works on any list.
- For very small lists, overhead of more complex algorithms may outweigh gains; linear search can be faster.
- Disadvantages
- Time complexity: Θ(n) comparisons in worst & average case.
- Inefficient for large datasets; must inspect every element if key absent.
Implementation Exercises (Page 5)
- Iterative pseudocode returning index or −1 (unsorted strings).
- Python iterative returning Boolean on sorted ascending integers.
- Python recursive returning index or −1 on sorted descending integers.
Binary Search
- Pre-Conditions
- Data must be sorted (ascending or descending consistent with comparison logic).
- Core Idea
- Repeatedly split search space in half by comparing middle element to key; discard half where key cannot lie.
- Midpoint Formula (integer indices)mid=⌊low+2high−low⌋
- Integer truncation (
INT) removes fractional part.
- Illustrative Example (Ascending array, key = 52)
- Initial low = 1, high = 14.
- mid = 7 → arr[7] = 70 (> 52) ⇒ search left sub-array.
- New low = 1, high = 6.
- mid = 3 → arr[3] = 18 (< 52) ⇒ search right sub-array.
- New low = 4, high = 6.
- mid = 5 → arr[5] = 34 (< 52).
- New low = 6, high = 6.
- mid = 6 → arr[6] = 52 ⇒ found.
Iterative Binary Search (Pseudocode)
FUNCTION BinarySearch_Iterative(search_key: INTEGER) RETURNS BOOLEAN
DECLARE low, high, mid : INTEGER
low ← 1 // using 1-based indexing
high ← LENGTH(arr) // arr sorted ascending
WHILE low <= high DO
mid ← INT((low + high)/2)
IF search_key = arr[mid] THEN
RETURN TRUE
ELSE IF search_key < arr[mid] THEN
high ← mid - 1
ELSE
low ← mid + 1
ENDIF
ENDWHILE
RETURN FALSE
ENDFUNCTION
Recursive Binary Search (Pseudocode)
FUNCTION binSearch_re(low, high, key: INTEGER, arr: ARRAY) RETURNS BOOLEAN
IF low > high THEN // base case: NOT FOUND
RETURN FALSE
ENDIF
DECLARE mid : INTEGER
mid ← INT(low + (high - low)/2)
IF arr[mid] = key THEN // base case: FOUND
RETURN TRUE
ENDIF
IF arr[mid] < key THEN
RETURN binSearch_re(mid + 1, high, key, arr)
ELSE // arr[mid] > key
RETURN binSearch_re(low, mid - 1, key, arr)
ENDIF
ENDFUNCTION
- Exercise Question (Page 9)
- Number of self-calls for key = 52 vs. 53 depends on array length; for the demo array above:
- key = 52 ⇒ 3 recursive calls (levels) before success.
- key = 53 ⇒ 4 recursive calls before hitting base case
low > high.
Advantages & Disadvantages of Binary Search
- Advantages
- Time complexity: Θ(log2n) comparisons.
- Drastically fewer checks for large n; e.g. n=1024 ⇒ at most ⌈log21024⌉=10 steps.
- Disadvantages
- Prerequisite: dataset must be sorted.
- Sorting overhead if original data unsorted.
- Duplicate keys complicate localisation (may revert to local linear scan to find first/last occurrence).
Comparative Summary
- Linear Search
- Pros: no assumptions; constant memory O(1); works on small lists.
- Cons: O(n) time.
- Binary Search
- Pros: O(logn) time; scalable.
- Cons: Require sorted input; may not suit dynamic datasets that change frequently.
Worksheet-Style Applications & Tracing
- Trace Table Variables (Binary Search Example key = 95)
- Columns:
low, high, condition low <= high, mid, arr[mid], key = arr[mid], OUTPUT (TRUE/FALSE / continue). - Populate row-by-row while running algorithm.
- Trace Tree Diagram (Recursive)
- Nodes labelled with subset
ar[low..high] + computed mid; leaves represent base cases.
Practical Programming Exercise – Inventory Management
- File inventory.txt
- Each line:
ProductID,ProductName,Quantity (comma-separated). - Example entries:
101,Apple,50, 102,Raddish,100, 103,Orange,75.
Task 1 – Iterative Linear Search in Python
- Function signature:
linear_search(target_name: str, filename: str) -> int - Steps:
- Read file line-by-line; parse into
(id, name, qty). - Compare
name to target_name sequentially. - Return
qty if found; else return −1.
Task 2 – Binary Search in Python
- Function signature:
binary_search(target_name: str, filename: str) -> int - Steps:
- Read file and load tuples
(name, qty) into list. - Sort list lexicographically by
name. - Apply iterative/recursive binary search on sorted list.
- Return
qty when found, else −1.
Task 3 – Main Program Flow
- Prompt user once for product name.
- Call
linear_search; display result. - Call
binary_search; display result. - Assumptions: file well-formatted, unique product names.
Key Equations & Complexity Results
- Midpoint (integer index): mid=⌊low+2high−low⌋
- Linear search worst-case comparisons: n → Θ(n).
- Binary search worst-case comparisons: ⌈log2n⌉ → Θ(logn).
Ethical & Practical Implications
- Data Integrity: Always validate file inputs (inventory) to avoid incorrect search results.
- Performance vs. Simplicity Trade-Off: Choose algorithm appropriate to dataset size and mutability.
- User Experience: Immediate feedback for small arrays may not justify sorting overhead.
- Future Extensibility: Hash tables (covered later) may offer O(1) average search but introduce collision management complexities.