I'm trying to solve this MaxCollatzLength kata but I'm struggling to optimise it to run fast enough for really large numbers.
In this kata we will take a look at the length of collatz sequences. And how they evolve. Write a function that take a positive integer n and return the number between 1 and n that has the maximum Collatz sequence length and the maximum length. The output has to take the form of an array [number, maxLength] For exemple the Collatz sequence of 4 is [4,2,1], 3 is [3,10,5,16,8,4,2,1], 2 is [2,1], 1 is [ 1 ], so MaxCollatzLength(4) should return [3,8]. If n is not a positive integer, the function have to return [].
As you can see, numbers in Collatz sequences may exceed n. The last tests use random big numbers so you may consider some optimisation in your code:
You may get very unlucky and get only hard numbers: try submitting 2-3 times if it times out; if it still does, probably you need to optimize your code more;
Optimisation 1: when calculating the length of a sequence, if n is odd, what 3n+1 will be ?
Optimisation 2: when looping through 1 to n, take i such that i < n/2, what will be the length of the sequence for 2i ?
A recursive solution quickly blows the stack, so I'm using a while loop. I think I've understood and applied the first optimisation. I also spotted that for n that is a power of 2, the max length will be (log2 of n) + 1 (that only shaves off a very small amount of time for an arbirtarily large number). Finally I have memoised the collatz lengths computed so far to avoid recalculations.
I don't understand what is meant by the second optimisation, however. I've tried to notice a pattern with a few random samples and loops and I've plotted the max collatz lengths for n < 50000. I noticed it seems to roughly follow a curve but I don't know how to proceed - is this a red herring?
I'm ideally looking for a hints in the right direction so I can work towards the solution myself.
function collatz(n) {
let result = [];
while (n !== 1) {
result.push(n);
if (n % 2 === 0) n /= 2;
else {
n = n * 3 + 1;
result.push(n);
n = n / 2;
}
}
result.push(1);
return result;
}
function collatzLength(n) {
if (n <= 1) return 1;
if (!collatzLength.precomputed.hasOwnProperty(n)) {
// powers of 2 are logarithm2 + 1 long
if ((n & (n - 1)) === 0) {
collatzLength.precomputed[n] = Math.log2(n) + 1;
} else {
collatzLength.precomputed[n] = collatz(n).length;
}
}
return collatzLength.precomputed[n];
}
collatzLength.precomputed = {};
function MaxCollatzLength(n) {
if (typeof n !== 'number' || n === 0) return [];
let maxLen = 0;
let numeralWithMaxLen = Infinity;
while (n !== 0) {
let lengthOfN = collatzLength(n);
if (lengthOfN > maxLen) {
maxLen = lengthOfN;
numeralWithMaxLen = n;
}
n--;
}
return [numeralWithMaxLen, maxLen];
}
Memoization is the key to good performance here. You memoize the end results of the function that calculates the Collatz sequence. This will help you on repeated calls to maxCollatzLength
, but not when you determine the length of the sequence for the first time.
Also, as @j_random_hacker mentioned, there is no need to actually create the sequence as list; it is enough to store its length. An integer result is light-weight enough to be memoized easily.
You can make use of precalculated results already when you determine the length of a Collatz sequence. Instead of following the sequence all the way down, follow it until you hit a number for which the length is known.
The other optimizations you make are micro-optimizations. I'm not sure that calculating the log for powers of two really buys you anything. It rather burdens you with an extra test.
The memoized implementation below even forgoes the check for 1 by putting 1 in the dictionary of precalculated values initially.
var precomp = {1: 1};
function collatz(n) {
var orig = n;
var len = 0;
while (!(n in precomp)) {
n = (n % 2) ? 3*n + 1 : n / 2;
len++;
}
return (precomp[orig] = len + precomp[n]);
}
function maxCollatz(n) {
var res = [1, 1];
for (var k = 2; k <= n; k++) {
var c = collatz(k);
if (c > res[1]) {
res[0] = k;
res[1] = c;
}
}
return res;
}
I haven't used node.js, but the JavaScript in my Firefox. It gives reasonable performance. I first had collatz
as a recursive function, which made the implementation only slightly faster than yours.
The second optimization mentioned in the question means that if you know C(n)
, you also know that C(2*n) == C(n) + 1
. You could use that knowledge to precalculate the values for all even n
in a bottom-up approach.
It would be nice if the lengths of the Collatz sequences could be calculated from the bottom up, a bit like the sieve of Erathostenes. You have to know where you come from instead of where you go to, but it is hard to know ehen to stop, because for finding the longest sequence for n < N
, you will have to calculate many sequences out of bound with n > N
. As is, the memoization is a good way to avoid repetition in an otherwise straightforwad iterative approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With