Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is String.replace() with lambda slower than a while-loop repeatedly calling RegExp.exec()?

One problem:

I want to process a string (str) so that any parenthesised digits (matched by rgx) are replaced by values taken from the appropriate place in an array (sub):

var rgx = /\((\d+)\)/,
    str = "this (0) a (1) sentence",
    sub = [
            "is",
            "test"
        ],
    result;

The result, given the variables declared above, should be 'this is a test sentence'.

Two solutions:

This works:

var mch,
    parsed = '',
    remainder = str;
while (mch = rgx.exec(remainder)) { // Not JSLint approved.
    parsed += remainder.substring(0, mch.index) + sub[mch[1]];
    remainder = remainder.substring(mch.index + mch[0].length);
}
result = (parsed) ? parsed + remainder : str;

But I thought the following code would be faster. It has fewer variables, is much more concise, and uses an anonymous function expression (or lambda):

result = str.replace(rgx, function() {
    return sub[arguments[1]];
});

This works too, but I was wrong about the speed; in Chrome it's surprisingly (~50%, last time I checked) slower!

...

Three questions:

  1. Why does this process appear to be slower in Chrome and (for example) faster in Firefox?
  2. Is there a chance that the replace() method will be faster compared to the while() loop given a bigger string or array? If not, what are its benefits outside Code Golf?
  3. Is there a way optimise this process, making it both more efficient and as fuss-free as the functional second approach?

I'd welcome any insights into what's going on behind these processes.

...

[Fo(u)r the record: I'm happy to be called out on my uses of the words 'lambda' and/or 'functional'. I'm still learning about the concepts, so don't assume I know exactly what I'm talking about and feel free to correct me if I'm misapplying the terms here.]

like image 435
guypursey Avatar asked Nov 26 '25 16:11

guypursey


2 Answers

Why does this process appear to be slower in Chrome and (for example) faster in Firefox?

Because it has to call a (non-native) function, which is costly. Firefox's engine might be able to optimize that away by recognizing and inlining the lookup.

Is there a chance that the replace() method will be faster compared to the while() loop given a bigger string or array?

Yes, it has to do less string concatenation and assignments, and - as you said - less variables to initialize. Yet you can only test it to prove my assumptions (and also have a look at http://jsperf.com/match-and-substitute/4 for other snippets - you for example can see Opera optimizing the lambda-replace2 which does not use arguments).

If not, what are its benefits outside Code Golf?

I don't think code golf is the right term. Software quality is about readabilty and comprehensibility, in whose terms the conciseness and elegance (which is subjective though) of the functional code are the reasons to use this approach (actually I've never seen a replace with exec, substring and re-concatenation).

Is there a way optimise this process, making it both more efficient and as fuss-free as the functional second approach?

You don't need that remainder variable. The rgx has a lastIndex property which will automatically advance the match through str.

like image 131
Bergi Avatar answered Nov 28 '25 05:11

Bergi


Your while loop with exec() is slightly slower than it should be, since you are doing extra work (substring) as you use exec() on a non-global regex. If you need to loop through all matches, you should use a while loop on a global regex (g flag enabled); this way, you avoid doing extra work trimming the processed part of the string.

var rgR = /\((\d+)\)/g;
var mch,
    result = '',
    lastAppend = 0;

while ((mch = rgR.exec(str)) !== null) {
    result += str.substring(lastAppend, mch.index) + sub[mch[1]];
    lastAppend = rgR.lastIndex;
}
result += str.substring(lastAppend);

This factor doesn't disturb the performance disparity between different browser, though.

It seems the performance difference comes from the implementation of the browser. Due to the unfamiliarity with the implementation, I cannot answer where the difference comes from.

In terms of power, exec() and replace() have the same power. This includes the cases where you don't use the returned value from replace(). Example 1. Example 2.

replace() method is more readable (the intention is clearer) than a while loop with exec() if you are using the value returned by the function (i.e. you are doing real replacement in the anonymous function). You also don't have to reconstruct the replaced string yourself. This is where replace is preferred over exec(). (I hope this answers the second part of question 2).

I would imagine exec() to be used for the purposes other than replacement (except for very special cases such as this). Replacement, if possible, should be done with replace().

Optimization is only necessary, if performance degrades badly on actual input. I don't have any optimization to show, since the 2 only possible options are already analyzed, with contradicting performance between 2 different browser. This may change in the future, but for now, you can choose the one that has better worst-performance-across-browser to work with.

like image 33
nhahtdh Avatar answered Nov 28 '25 06:11

nhahtdh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!