Trying to solve the following problem:
Given a string of arbitrary length, find the longest substring that occurs more than one time within the string, with no overlaps.
For example, if the input string was ABCABCAB, the correct output would be ABC.  You couldn't say ABCAB, because that only occurs twice where the two substrings overlap, which is not allowed.
Is there any way to solve this reasonably quickly for strings containing a few thousand characters?
(And before anyone asks, this is not homework. I'm looking at ways to optimize the rendering of Lindenmayer fractals, because they tend to take excessive amounts of time to draw at high iteration levels with a naive turtle graphics system.)
Here's an example for a string of length 11, which you can generalize
Set chunk length to floor(11/2) = 5
Scan the string in chunks of 5 characters left to looking for repeats. There will be 3 comparisons
    Left      Right
    Offset    Offset
     0         5
     0         6
     1         5
Here's some (obviously untested) pseudocode:
String s
int len = floor(s.length/2)
for int i=len; i>0; i--
    for j=0; j<=len-(2*i); j++
        for k=j+i; k<=len-i; k++
            if s.substr(j,j+i) == s.substr(k,k+i)
                return s.substr(j,j+i)
return null
There may be an off-by-one error in there, but the approach should be sound (and minimal).
it looks like a suffix tree problem. Create the suffix tree, then find the biggest compressed branch with more than one child (occurs more than once in the original string). The number of letters in that compressed branch should be the size of the biggest subsequence.
i found something similar here: http://www.coderanch.com/t/370396/java/java/Algorithm-wanted-longest-repeating-substring
Looks like it can be done in O(n).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With