Rabin-Karp Algorithm
Rabin-Karp algorithm is pretty useful tool in a pocket of an Engineer. Its use
cases are mostly finding repeated patterns in a string. For example: find all
repeated strings of length 3 in a string ABCDABCDDASDADABCDASDDDABCABC
.
The algorithm consists of following steps:
- Iterate the string from index 0 to
length - 3
(3 is a window size) - For each window compute a hash of the characters in the window
- Check if the hash was already used (for example store hashes in the hashset).
The trick here is to pick a hash function so that for next window we can, in
constant time, calculate the hash based on the previous hash and the letters which
are added and removed (during slide) to the window. This algorithm has O(N - L)
time complexity and O(N - L)
space complexity, which for constant L
(window
size), gives O(N)
for both time and space.
Example hash function which works pretty well is to pick some relatively large prime number, like 101. Then we convert every letter to ASCII character and for each letter we are multiplying it by our prime number. Then we are summing up all the multiplied characters.
Next, when we slide the window, all we need to do is:
- Get the character which fall out the window.
- Multiply it by our prime.
- Substitute the result from the hash of previous window.
- Get the new character in the window.
- Multiply it by out prime number.
- Add it to our hash which we got in step 3.
This way we are able to calculate Sliding Hash in constant time.