Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Longest Repeating Subarray (With Overlapping)

Given an array of arbitrary numbers, I am looking to quickly [with time and space complexity lower than O(n²) at minimum, n = length of array] find the longest subarray (a contiguous segment of elements within array) that is repeated (appears more than once, or at least twice).
Please don't assume any bounds on the numbers in the array apart from all the values are valid JavaScript numbers and isFinite is true for every number in the array.

Overlapping is allowed, that is partial occurrences of a subarray within itself is allowed. for ex, in the array [1, 2, 1, 2, 1, 2] the longest repeating subarray is [1, 2, 1, 2]. Look here

[1, 2, 1, 2, 1, 2]
^         ^        occurrence #1
       ^        ^  occurrence #2

The two occurrences overlap, but that is a-ok as long as the two occurrence differ in either starting or ending index.

In case there are multiple distinct repeated subarrays that all have the longest length, the answer can be any of them. However, I suspect that any way that can find one answer can find all of them with the same ease.
A sample for this scenario is the array [1, 1, 2, 2]. The subarray [1] and [2] both appear twice. Either of those may be returned as the result.

Finally, if no subarray occurs more than once, the answer is [] (the empty array).

I recently came across the question Find Repeat Sequence in an Array but all the answers there are ridiculously slow. Everyone's using o(n³) solutions, with only one answer in o(n²).

I'm seeking a method that can handle arrays with length up to a couple hundred thousand (100,000) in a few seconds. Sub quadratic time complexity at least. It'd be great if you can provide a sample implementation in JS of the solution too because my understanding of some data structures is not that solid

Sorry for the long question, let me know if anything is unclear. If you have an idea but not a full solution just drop it in the comments, it can still be helpful to me.

I made some more examples if it helps:

Array Result
[1, 2, 5, 7, 1, 2, 4, 5] [1, 2]
[9, 7, 0, 1, 6, 3, 6, 5, 2, 1, 6, 3, 8, 3, 6, 1, 6, 3] [1, 6, 3]
[1, 2, 1, 2, 7, 0, -1, 7, 0, -1] [7, 0, -1]
[1, 1, 1, 1] [1, 1, 1]
[1, 1, 1] [1, 1]
[1, 2, 3, 4, 2, 5] [2]
[1, 2, 3, 4, 5] []
[1, 6, 2, 1, 6, 2, 1, 5] [1, 6, 2, 1]

Here's a function to generate large test case with array size 100,000 for reference, the answer is [1, 2, 3, ..., 100] (the 100 consecutive integers from 1 to 100)

function make_case() {
    let array = [];
    let number = 200;
    for (let i = 0; i < 500; i++) {
        array.push(number); array.push(number);
        number++;
    }
    for (let i = 1; i < 101; i++) {
        array.push(i);
    }
    for (let i = 0; i < 1700; i++) {
        array.push(number); array.push(number);
        number++;
    }
    for (let i = 1; i < 101; i++) {
        array.push(i);
    }
    for (let i = 0; i < (100_000 - 500 * 2 - 100 - 1700 * 2 - 100) / 2; i++) {
        array.push(number); array.push(number);
        number++;
    }
    return array;
}

Another important case is new Array(100_000).fill(0), the answer is the array with 99,999 zeros.

like image 283
user23274861 Avatar asked Jun 24 '26 00:06

user23274861


1 Answers

This can be solved by building a suffix array over a general alphabet. The longest repeating subarray will be the longest common prefix (LCP) between some two adjacent elements in the suffix array.

The following method has a time complexity of O(N log^2 (N)) and space complexity of O(N). The running time is dominated by the creation of the suffix array, so it can optimized further by using a faster suffix array construction algorithm, e.g. DC3.

function longestRepeatingSubarray(arr) {
    const n = arr.length, suff = [...arr.keys()], rank = [...arr], lcp = Array(n), tempRank = Array(n);
    tempRank[0] = 0;
    for (let jump = 1; jump < n; jump <<= 1) {
        const comp = (i, j) => rank[i] - rank[j] || (Math.max(i, j) + jump < n ? rank[i + jump] - rank[j + jump] : j - i);
        suff.sort(comp);
        for (let i = 1; i < n; i++) tempRank[i] = tempRank[i - 1] + !!comp(suff[i - 1], suff[i]);
        for (let i = 0; i < n; i++) rank[suff[i]] = tempRank[i];
    }
    for (let i = 0, k = 0; i < n; i++)
        if (rank[i] !== n - 1) {
            for (let j = suff[rank[i] + 1]; arr[i + k] === arr[j + k];) ++k;
            lcp[rank[i]] = k;
            if (k) --k;
        }
    let best = 0;
    for (let i = 1; i < n - 1; i++) if (lcp[i] > lcp[best]) best = i;
    return arr.slice(suff[best], suff[best] + lcp[best]);
}

console.log(...longestRepeatingSubarray([1, 2, 5, 7, 1, 2, 4, 5]));
console.log(...longestRepeatingSubarray([1, 2, 1, 2, 7, 0, -1, 7, 0, -1]));
function make_case(){let $=[],u=200;for(let e=0;e<500;e++)$.push(u),$.push(u),u++;for(let s=1;s<101;s++)$.push(s);for(let h=0;h<1700;h++)$.push(u),$.push(u),u++;for(let p=1;p<101;p++)$.push(p);for(let t=0;t<47700;t++)$.push(u),$.push(u),u++;return $}
const arr = make_case();
console.time(`${arr.length} elements`);
console.log(...longestRepeatingSubarray(arr));
console.timeEnd(`${arr.length} elements`);
like image 115
Unmitigated Avatar answered Jun 26 '26 13:06

Unmitigated