SCC.121: Linear and Binary Searching

SCC.121: Fundamentals of Computer Science

Linear and Binary Searching

Instructor Information
  • Name: Amit Chopra

  • Email: amit.chopra@lancaster.ac.uk

Data Storage and Retrieval

  • Fundamental operations:

    • Put data item in store: This is the action of placing data into a storage medium.

    • Retrieve specific item: This process involves fetching a particular piece of data from storage.

  • Performance Characteristics:

    • If storage is quick: Retrieval will likely be slow.

    • If storage is careful: Retrieval will be fast.

  • Storage Methods:

    • Fast storage: For instance, appending the latest data at the end of a list is quick.

    • Slow retrieval:

    • Requires a linear scan to find a specific item.

    • Maintaining data in sorted order makes it slower because it necessitates identifying the precise position for insertion, which may require additional arrangements depending on the data structure, such as arrays versus linked lists.

    • Fast retrieval: Achieved through algorithms like binary search.

Theoretical Time Complexity

  • Definition: Theoretical time complexity defines the relationship between the size of the input and the algorithm's running time.

  • Input Size Representation: Let N represent the input size.

  • Characterizing Runtime: The runtime of the algorithm is characterized as a function of N, which helps evaluate performance as data size increases. It answers questions such as:

    • If the amount of data is doubled, does the algorithm take twice as long to run? If yes, that is a reasonable scenario.

    • If it takes much longer than twice as long, it is considered inefficient.

    • If it takes much less time than twice as long, the algorithm is efficient.

Big-O Classes

  • Definition: Big O notation captures the upper bound on time required for input processing, classifying algorithms into distinct complexity classes based on their performance.

  • Complexity Classifications:

    • O(1) – Constant time: The execution time doesn’t change regardless of N.

    • O(log N) – Logarithmic time: Achieves efficiency better than linear time.

    • O(N) – Linear time: Performance grows in direct proportion to N.

    • O(N log N) – Worse than linear time, commonly found in efficient sorting algorithms.

    • O(N²) – Quadratic time: Time complexity scales with the square of the input size.

    • O(N³) – Cubic time: Time complexity scales with the cube of the input size.

    • O(2^N) – Exponential time: A very slow growth rate making it impractical for large inputs.

Linear Searching: Process

  • Objective: To find the index of the first occurrence of integer x within an unsorted linear array A containing N integers.

  • Return value: Return -1 if x is not present in the array.

  • Algorithm:

    • Pseudocode:
      ls(a[], x) { for i=0 to a.length { if(a[i] == x) return i; } return -1; }

Linear Search Algorithm

  • Input Size: Let N denote the size of the input array.

  • Complexity: The linear search algorithm has a time complexity of O(N), indicating linear complexity where runtime increases directly with the input size.

  • Question: Can we improve on this method?

Binary Search

  • Usage: Binary search can only be used on sorted arrays and demonstrates better theoretical time complexity than linear searching.

  • Linear search complexity remains at O(N) even for sorted inputs.

  • Process to find X in sorted array A:

    1. Identify the mid-point M of A and check A[M]:

    • If $X = A[M]$, X is found.

    • If $size(A) = 1$ and $X
      eq A[M]$, X is not in A.

    • If $X < A[M]$, focus on the left half (LH) of A; repeat step 1.

    • If $X > A[M]$, focus on the right half (RH) of A; repeat step 1.

Binary Searching: Left and Right Halves

  • Searching Process:

    • If $X < A[M]$, search in the left half of A.

    • If $X > A[M]$, search in the right half of A.

  • Illustration:

    • For an array indexed from 0 to (N-1):

    • The left half consists of elements $A[0..M-1]$.

    • The right half consists of elements $A[M+1..N-1]$.

Practical Example: Searching for 6♦️

  • Analysis: In a given scenario, we only look at 4 cards while searching for a specific card (in this case, the 6 of diamonds).

  • Questions:

    • How many cards would we examine using linear search for:

    • 5♦️?

    • 3♦️?

    • 7♦️?

    • 4♦️?

Binary Searching: Algorithm

  • Implementing Binary Search:

// search for X in sorted array A 
int lo = 0;  
int hi = N-1;  
int mid;  
// next element to try 
while (lo <= hi) {  
    mid = (lo + hi)/2; // typically using floor function  
    if (A[mid] == X) return mid; 
    if (X < A[mid]) hi = mid - 1; // go left  
    else lo = mid + 1; // go right 
} 
return -1;  // not found 

Binary Searching: O(log₂N)

  • Efficiency: For input sizes of N=1,2,4,8,16,32,64,128… at most the following number of comparisons are made:

    • For N=1, 1 comparison.

    • For N=2, 2 comparisons.

    • For N=4, 3 comparisons.

    • For N=8, 4 comparisons.

    • This sequence follows: at most $(log_2 N) + 1$ comparisons are needed.

  • Example with N=15: The maximum comparisons needed would be 4.

Visual Representation of Binary Search

  • Structure: An indexed array with nodes representing elements being tested, and arrows indicating subsequent elements to test after a non-match.

Comparing Search Efficiencies

  • Conclusion:

    • Binary search complexity class: O(log N)

    • Linear search complexity class: O(N)

Summary
  • Efficient searching methods are critical in computer science for optimizing data retrieval processes.

  • Understanding and applying the correct algorithm based on input data structure and size can greatly influence performance outcomes.