Context: I have several loops in an Objective-C library I am writing which deal with processing large text arrays. I can see that right now it is running in a single threaded manner.
I understand that LLVM is now capable of auto-vectorising loops, as described at Apple's session at WWDC. It is however very cautious in the way it does it, one reason being the possibility of variables being modified due to CPU pipelining.
My question: how can I see where LLVM has vectorised my code, and, more usefully, how can I receive debug messages that explain why it can't vectorise my code? I'm sure if it can see why it can't auto-vectorise it, it could point that out to me and I could make the necessary manual adjustments to make it vectorisable.
I would be remiss if I didn't point out that this question has been more or less asked already, but quite obtusely, here.
Interleaving means that unrolled iterations are interleaved within a loop. In this example it first load 4 ymmword s (256 bits) from memory, starting all 4 iterations in parallel. Then it makes 4 additions again kinda in parallel.
Loop vectorization transforms procedural loops by assigning a processing unit to each pair of operands. Programs spend most of their time within such loops. Therefore, vectorization can significantly accelerate them, especially over large data sets.
Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values at one time. Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD).
The standard llvm toolchain provided by Xcode doesn't seem to support getting debug info from the optimizer. However, if you roll your own llvm and use that, you should be able to pass flags as mishr suggested above. Here's the workflow I used:
1. Using homebrew, install llvm
brew tap homebrew/versions
brew install llvm33 --with-clang --with-asan
This should install the full and relatively current llvm toolchain. It's linked into /usr/local/bin/*-3.3 (i.e. clang++-3.3). The actual on-disk location is available via brew info llvm33 - probably /usr/local/Cellar/llvm33/3.3/bin.
2. Build the single file you're optimizing, with homebrew llvm and flags
If you've built in Xcode, you can easily copy-paste the build parameters, and use your clang++-3.3 instead of Xcode’s own clang.
Appending -mllvm -debug-only=loop-vectorize will get you the auto-vectorization report. Note: this will likely NOT work with any remotely complex build, e.g. if you've got PCH's, but is a simple way to tweak a single cpp file to make sure it's vectorizing correctly.
3. Create a compiler plugin from the new llvm
I was able to build my entire project with homebrew llvm by:
/Library/Application Support/Developer/5.0/Xcode/Plug-ins/
Relaunching Xcode should show this plugin in the list of available compilers. At this point, the -mllvm -debug-only=loop-vectorize flag will show the auto-vectorization report.
I have no idea why this isn't exposed in the Apple builds.
UPDATE: This is exposed in current (8.x) versions of Xcode. The only thing required is to enable one or more of the loop-vectorize flags.
Identifies loops that were successfully vectorized:
clang -Rpass=loop-vectorize
Identifies loops that failed vectorization and indicates if vectorization was specified:
clang -Rpass-missed=loop-vectorize 
Identifies the statements that caused vectorization to fail:
clang -Rpass-analysis=loop-vectorize
Source: http://llvm.org/docs/Vectorizers.html#diagnostics
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With