Galil Rule

Galil Rule Introduction

  • Instructor: Dr. Mary Hudachek-Buswell

  • Developed by: Quill Healey

  • Contributions by: Ivan Leung, Rena Li, Mai Anh Lê Hoàng, Ruston Shome

Prerequisites for Galil Rule

  • Must have implemented the Boyer-Moore algorithm.

  • Must have implemented the Failure Table used in Knuth-Morris-Pratt (KMP) algorithm.

  • Reviewing these concepts before proceeding is highly recommended.

Historical Context

  • The Galil Rule was developed by Zvi Galil, who served as the former Dean of the College of Computing.

  • The foundational article is titled "On improving the worst case running time of the Boyer-Moore string matching algorithm" and was published by ACM in 1979.

  • The Galil Rule linearizes the worst-case runtime of the full Boyer-Moore algorithm.

Boyer-Moore Algorithm and Heuristics

  • The Boyer-Moore algorithm employs two heuristics:

    • Bad Character Heuristic

    • Good Suffix Heuristic

  • In this course, we focus on implementing the partial Boyer-Moore algorithm, which utilizes only the Bad Character heuristic.

  • The addition of the Galil Rule heuristic to the full Boyer-Moore algorithm linearizes its worst-case runtime, and similarly, it also linearizes the worst-case in certain scenarios when added to the partial version of Boyer-Moore.

Overview of the Galil Rule

  • The Galil Rule modifies the Boyer-Moore algorithm after a complete match has been found.

  • Traditional Boyer-Moore shifts the pattern over by only one character after a complete match; the Galil Rule enhances this by allowing for a more intelligent shift of the pattern.

  • This improvement allows the Boyer-Moore algorithm to approach linear time complexity in specific cases.

Understanding Repeated Prefixes

  • Definition: The period of a string $s1$ is defined as the length of the shortest prefix of $s1$ such that if a new string $s2$ is formed by repeating this prefix, then $s1$ would be a prefix of $s_2$.

  • Key Insight: When a full match is found, the Galil Rule leverages the period of the pattern to minimize unnecessary comparisons.

Example 1: Identifying the Period

  • Pattern: "abababa"

  • The prefix that fulfills the period condition is identified as "ab", which can be repeatedly concatenated to create the infinite string: "abababababababa…".

  • Therefore, the period of the pattern is the length of "ab", which is $2$.

Example 2: Identifying the Period

  • Pattern: "abcabcabc"

  • The shortest prefix fulfilling the period condition here is "abc", allowing the construction of the repeated string: "abcabcabcabcabcabc".

  • The period length is therefore $3$.

Example 3: Identifying the Period

  • Pattern: "abcdab"

  • No prefix shorter than "abcd" can be used to form a suitable string where the given pattern is a prefix.

  • Thus, the period is the length of "abcd", which is $4$.

Calculating the Period

  • To compute the period of a pattern efficiently, we must perform it in $O(m)$ time, where $m$ is the length of the pattern.

  • The failure table from the KMP algorithm can be utilized for this computation.

  • Period Calculation Formula: $k = m - ext{ft}[m - 1]$, where $ ext{ft}$ denotes the failure table.

Example 1: Computing k

  • Pattern: "abababa"

  • Failure Table:

    i

    0

    1

    2

    3

    4

    5

    6


    ft[i]

    0

    0

    1

    2

    3

    4

    5

    • Calculate:

    • $k = m - ext{ft}[m - 1]$

    • Where $m = 7$ and $ ext{ft}[6] = 5$, thus:

    • $k = 7 - 5 = 2$

  • This suggests we can repetitively use the prefix "ab".

Example 2: Computing k

  • Pattern: "abcabcabc"

  • Failure Table:

    i

    0

    1

    2

    3

    4

    5

    6

    7

    8


    ft[i]

    0

    0

    0

    1

    2

    3

    4

    5

    6

    • Calculate:

    • $k = m - ext{ft}[m - 1] = 9 - 6 = 3$

Example 3: Computing k

  • Pattern: "abcdab"

  • Failure Table:

    i

    0

    1

    2

    3

    4

    5


    ft[i]

    0

    0

    0

    0

    1

    2

    • Calculate:

    • $k = m - ext{ft}[m - 1] = 6 - 2 = 4$

Applying the Galil Rule

  • The Galil Rule can be applied following a complete match only.

  • Upon a mismatch, the Galil Rule does not apply, necessitating a return to the last occurrence table for shifting the pattern.

Example Demonstration of the Galil Rule


  • Text: "abcdeabcdabcdeabcdabcdeabcd", where $n=27$


  • Pattern: "abcdeabcd", where $m=9$.


  • Failure Table:

    i

    0

    0

    0

    0

    0

    1

    2

    3

    4


    ft[i]

    0

    0

    0

    0

    0

    1

    2

    3

    4

    • Compute k:

    • $k = m - ext{ft}[m - 1] = 9 - 4 = 5$.

    • The subsequent comparisons and realignments can be counted for the process of applying the Galil rule.

    No Matches Scenario

    • The Galil Rule enhances the Boyer-Moore algorithm efficiency when a pattern appears at least once.

    • In instances where there are no full matches, reliance on the Galil Rule is not possible, reverting to a worst-case time complexity of $O(mn)$.

    Algorithm for Galil Rule Application

    • Step 1: Set $n$ as the length of the text and $m$ as the pattern length.

    • Step 2: Initialize matches as an empty list.

    • Step 3: Calculate $k = m - ext{failureTable}[m - 1]$.

    • Step 4: Initialize $i = 0, w = 0$.

    • Step 5: While $i ≤ n - m$:

      • Step 6: Set $j = m - 1$.

      • Step 7: While $j ≥ w$ and pattern[j] = text[i + j]:

      • Step 8: Decrement $j$ by 1.

      • Step 9: End While.

      • Step 10: If $j < w$, then:

      • Step 11: Add $i$ to matches.

      • Step 12: Set $w = m - k$.

      • Step 13: Increment $i$ by $k$.

      • Step 14: Else:

      • Step 15: Set $w = 0$.

      • Step 16: Shift using the last occurrence table.

      • Step 17: End Else.

      • Step 18: End While.