For loop performance

Question

Bottom line: speed matters.

I have looked over my code and decided to see some more ways to improve the efficiency of it (even if by a millisecond, that would be great). All these data members, methods, useless data creation - we all have been taught to follow guidelines and how to and not to.

Except for loops.

We were always encouraged to use them because they help improve the readability of the code and help the user. THE USER. After I said the definition in my head, this idea came to mind:

for (int i = 0; i < 100; i++)
{
    //whatever code
}

Let's assume a case where we know the length. This executes the code 100 times, but it does 201 operations that can be omitted to help the machine. What if we copy paste the code 100 times, throwing away the initialization, condition and termination:

//Code[0]
//Code[1]
//Code[2]
//...

This is a small bit, but still...

Is this a common practice for efficiency lunatics?

DaoWen · Accepted Answer

This is a common optimization called loop unrolling and a good compiler should do it for you automatically. Depending on the code, many compilers will actually unroll loops that don't have an absolute upper bound as well.

For an extreme example, you might be interested in Duff's device. This allows you do loop unrolling on a loop with a variable upper-bound without having to worry about "left over" iterations at the end.

If you're looping from 0 until 200, unrolling your loop 200x won't necessarily improve your performance that much. There's a trade-off between how much code you're generating and how much performance increase you're getting by avoiding branching. I think in a lot of codes it's not common to unroll more than 10—but I don't have any citations to back up that number.

Most common desktops laptops today are running x86_64 processors, which are superscalar architectures that do crazy things like out of order execution and branch prediction. Between all the crazy things your compiler can do and all the crazy things your CPU is doing there isn't a lot of need to hand-tweak little things like this.

Actually, you need to be careful about over-optimizing. I've run a lot of experiments where compiling with -O3 instead of -O2 actually slowed down my application rather than speeding it up. I also ran a set of experiments with LLVM where I tested how the performance of the same codes (common compiler benchmarks) varied when turning individual optimizations on and off. For most of the optimizations outside the -O2 set, the optimizations hurt as often as they helped. You really need to test the optimizations with your particular application to see if they'll help or hurt.

However, you usually get the most performance by choosing the right data structures and algorithms, not with little bit-twiddling optimizations like this. I find it best to take the following approach to optimization:

Write the code so it's readable.
Profile the code to see which part eats up the most time.
See if there's an algorithmic / data-structure change I can make to that section to speed it up.
When all else fails, resort to this type of low-level code transformation (which you'd hope the compiler would take care of for you).

For loop performance

Tags:

performance

for-loop

Alexey

1 Answers

DaoWen

Recent Activity

Donate For Us

For loop performance

Tags:

performance

for-loop

Alexey

1 Answers

DaoWen

Related questions

Recent Activity

Donate For Us