Exploring the Rabin-Karp Algorithm in Java

The Rabin-Karp algorithm is a powerful string searching algorithm that efficiently finds occurrences of a pattern within a larger text. It achieves this by leveraging hashing techniques to compare the pattern with substrings of the text. Lets understand step by step, and explanation in Java.

Understanding the Rabin-Karp Algorithm:

This Algorithm operates by creating a rolling hash function that compares the pattern to substrings of the text. It avoids comparing each character of the pattern individually with every substring of the text, making it more efficient than some other string searching algorithms.

  1. Hashing:
    The first step in the Rabin-Karp algorithm is to establish a hash function. We need a hash function that can efficiently compute the hash value of the pattern and subsequent substrings of the same length in the text. The choice of a good hash function plays a crucial role in the effectiveness of the algorithm.
  2. Rolling Hash:
    To efficiently calculate the hash value of each substring, the Rabin-Karp algorithm uses a rolling hash technique. It calculates the hash value of a substring based on the hash value of the previous substring. By doing so, it avoids redundant calculations and achieves better performance.
  3. Hash Comparisons:
    Once we have the hash values of the pattern and the substrings, we compare them to identify potential matches. However, due to the possibility of hash collisions (different substrings producing the same hash value), we need to verify potential matches character by character to ensure accurate results.

Java Implementation with Step-by-Step Explanation:
Let’s now dive into the Java implementation of the Rabin-Karp algorithm, explaining each step in detail:

public class RabinKarp {
    private int prime = 101; // A prime number used in the hash function
    private int d = 256; // Number of characters in the input alphabet

    public void search(String pattern, String text) {
        int patternLength = pattern.length();
        int textLength = text.length();
        int patternHash = 0; // Hash value for pattern
        int textHash = 0; // Hash value for text
        int h = 1;

        // Calculate the value of h
        for (int i = 0; i < patternLength - 1; i++)
            h = (h * d) % prime;

        // Calculate the hash value of the pattern and the first substring of the text
        for (int i = 0; i < patternLength; i++) {
            patternHash = (d * patternHash + pattern.charAt(i)) % prime;
            textHash = (d * textHash + text.charAt(i)) % prime;
        }

        // Slide the pattern over the text one by one
        for (int i = 0; i <= textLength - patternLength; i++) {
            // Check if the hash values match
            if (patternHash == textHash) {
                // Verify the potential match character by character
                int j;
                for (j = 0; j < patternLength; j++) {
                    if (text.charAt(i + j) != pattern.charAt(j))
                        break;
                }
                if (j == patternLength)
                    System.out.println("Pattern found at index " + i);
            }

            // Calculate the hash value for the next substring of text
            if (i < textLength - patternLength) {
                textHash = (d * (textHash - text.charAt(i) * h) + text.charAt(i + patternLength)) % prime;

                // In case of a

 negative hash value, convert it to positive
                if (textHash < 0)
                    textHash += prime;
            }
        }
    }

    public static void main(String[] args) {
        String text = "ABABDABACDABABCABAB";
        String pattern = "ABABCABAB";
        RabinKarp rabinKarp = new RabinKarp();
        rabinKarp.search(pattern, text);
    }
}
  1. We start by defining the RabinKarp class and initializing the prime number (prime) and the number of characters in the input alphabet (d).
  2. The search method takes the pattern and text as input and performs the Rabin-Karp algorithm. It initializes variables such as pattern length, text length, pattern hash, text hash, and a rolling hash factor (h).
  3. We calculate the rolling hash factor h by iteratively multiplying it by d modulo prime to the power of patternLength - 1.
  4. Next, we calculate the initial hash values for the pattern and the first substring of the text using the rolling hash technique.
  5. We slide the pattern over the text one by one and compare the hash values. If they match, we verify the potential match character by character to avoid false positives due to hash collisions.
  6. To calculate the hash value for the next substring of the text, we use the rolling hash technique again. We subtract the contribution of the first character of the previous substring, multiply the remaining hash value by d, and add the next character of the text. We adjust the hash value to be positive if it becomes negative.
  7. Finally, in the main method, we create an instance of the RabinKarp class, define a sample text and pattern, and invoke the search method to find occurrences of the pattern within the text.

The Rabin-Karp algorithm provides an efficient approach to searching for patterns within a text. By leveraging hashing and rolling hash techniques, it significantly reduces the number of comparisons required, resulting in improved performance. In this blog post, we explored the inner workings of the Rabin-Karp algorithm and provided a step-by-step explanation of its implementation in Java. You can now apply this algorithm to efficiently search for patterns in your Java applications.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top