Galil Rule

Galil Rule Introduction

Instructor: Dr. Mary Hudachek-Buswell
Developed by: Quill Healey
Contributions by: Ivan Leung, Rena Li, Mai Anh Lê Hoàng, Ruston Shome

Prerequisites for Galil Rule

Must have implemented the Boyer-Moore algorithm.
Must have implemented the Failure Table used in Knuth-Morris-Pratt (KMP) algorithm.
Reviewing these concepts before proceeding is highly recommended.

Historical Context

The Galil Rule was developed by Zvi Galil, who served as the former Dean of the College of Computing.
The foundational article is titled "On improving the worst case running time of the Boyer-Moore string matching algorithm" and was published by ACM in 1979.
The Galil Rule linearizes the worst-case runtime of the full Boyer-Moore algorithm.

Boyer-Moore Algorithm and Heuristics

The Boyer-Moore algorithm employs two heuristics:
- Bad Character Heuristic
- Good Suffix Heuristic
In this course, we focus on implementing the partial Boyer-Moore algorithm, which utilizes only the Bad Character heuristic.
The addition of the Galil Rule heuristic to the full Boyer-Moore algorithm linearizes its worst-case runtime, and similarly, it also linearizes the worst-case in certain scenarios when added to the partial version of Boyer-Moore.

Overview of the Galil Rule

The Galil Rule modifies the Boyer-Moore algorithm after a complete match has been found.
Traditional Boyer-Moore shifts the pattern over by only one character after a complete match; the Galil Rule enhances this by allowing for a more intelligent shift of the pattern.
This improvement allows the Boyer-Moore algorithm to approach linear time complexity in specific cases.

Understanding Repeated Prefixes

Definition: The period of a string $s1$ is defined as the length of the shortest prefix of $s1$ such that if a new string $s2$ is formed by repeating this prefix, then $s1$ would be a prefix of $s_2$.
Key Insight: When a full match is found, the Galil Rule leverages the period of the pattern to minimize unnecessary comparisons.

Example 1: Identifying the Period

Pattern: "abababa"
The prefix that fulfills the period condition is identified as "ab", which can be repeatedly concatenated to create the infinite string: "abababababababa…".
Therefore, the period of the pattern is the length of "ab", which is $2$.

Example 2: Identifying the Period

Pattern: "abcabcabc"
The shortest prefix fulfilling the period condition here is "abc", allowing the construction of the repeated string: "abcabcabcabcabcabc".
The period length is therefore $3$.

Example 3: Identifying the Period

Pattern: "abcdab"
No prefix shorter than "abcd" can be used to form a suitable string where the given pattern is a prefix.
Thus, the period is the length of "abcd", which is $4$.

Calculating the Period

To compute the period of a pattern efficiently, we must perform it in $O(m)$ time, where $m$ is the length of the pattern.
The failure table from the KMP algorithm can be utilized for this computation.
Period Calculation Formula: $k = m - ext{ft}[m - 1]$, where $ ext{ft}$ denotes the failure table.

Example 1: Computing k

Pattern: "abababa"
Failure Table:
i
0
1
2
3
4
5
6

ft[i]
0
0
1
2
3
4
5
Calculate:
- $k = m - ext{ft}[m - 1]$
- Where $m = 7$ and $ ext{ft}[6] = 5$, thus:
- $k = 7 - 5 = 2$
This suggests we can repetitively use the prefix "ab".

i	0	1	2	3	4	5	6
ft[i]	0	0	1	2	3	4	5
Calculate:

Example 2: Computing k

Pattern: "abcabcabc"
Failure Table:
i
0
1
2
3
4
5
6
7
8

ft[i]
0
0
0
1
2
3
4
5
6
Calculate:
- $k = m - ext{ft}[m - 1] = 9 - 6 = 3$

i	0	1	2	3	4	5	6	7	8
ft[i]	0	0	0	1	2	3	4	5	6
Calculate:

Example 3: Computing k

Pattern: "abcdab"
Failure Table:
i
0
1
2
3
4
5

ft[i]
0
0
0
0
1
2
Calculate:
- $k = m - ext{ft}[m - 1] = 6 - 2 = 4$

i	0	1	2	3	4	5
ft[i]	0	0	0	0	1	2
Calculate:

Applying the Galil Rule

The Galil Rule can be applied following a complete match only.
Upon a mismatch, the Galil Rule does not apply, necessitating a return to the last occurrence table for shifting the pattern.

Example Demonstration of the Galil Rule

Text: "abcdeabcdabcdeabcdabcdeabcd", where $n=27$
Pattern: "abcdeabcd", where $m=9$.
Failure Table:
i
0
0
0
0
0
1
2
3
4

ft[i]
0
0
0
0
0
1
2
3
4
Compute k:
$k = m - ext{ft}[m - 1] = 9 - 4 = 5$.
The subsequent comparisons and realignments can be counted for the process of applying the Galil rule.
No Matches Scenario
- The Galil Rule enhances the Boyer-Moore algorithm efficiency when a pattern appears at least once.
- In instances where there are no full matches, reliance on the Galil Rule is not possible, reverting to a worst-case time complexity of $O(mn)$.
Algorithm for Galil Rule Application
- Step 1: Set $n$ as the length of the text and $m$ as the pattern length.
- Step 2: Initialize matches as an empty list.
- Step 3: Calculate $k = m - ext{failureTable}[m - 1]$.
- Step 4: Initialize $i = 0, w = 0$.
- Step 5: While $i ≤ n - m$:
  - Step 6: Set $j = m - 1$.
  - Step 7: While $j ≥ w$ and pattern[j] = text[i + j]:
  - Step 8: Decrement $j$ by 1.
  - Step 9: End While.
  - Step 10: If $j < w$, then:
  - Step 11: Add $i$ to matches.
  - Step 12: Set $w = m - k$.
  - Step 13: Increment $i$ by $k$.
  - Step 14: Else:
  - Step 15: Set $w = 0$.
  - Step 16: Shift using the last occurrence table.
  - Step 17: End Else.
  - Step 18: End While.

i	0	0	0	0	0	1	2	3	4
ft[i]	0	0	0	0	0	1	2	3	4
Compute k:

$k = m - ext{ft}[m - 1] = 9 - 4 = 5$.
The subsequent comparisons and realignments can be counted for the process of applying the Galil rule.

No Matches Scenario

Galil Rule

Galil Rule Introduction

Prerequisites for Galil Rule

Historical Context

Boyer-Moore Algorithm and Heuristics

Overview of the Galil Rule

Understanding Repeated Prefixes

Example 1: Identifying the Period

Example 2: Identifying the Period

Example 3: Identifying the Period

Calculating the Period

Example 1: Computing k

Example 2: Computing k

Example 3: Computing k

Applying the Galil Rule

Example Demonstration of the Galil Rule

No Matches Scenario

Algorithm for Galil Rule Application