I have a quad-core processor with hyper-threading.  When I use make -j8 it is faster than make -j4 (I read the number of cores in Java and then called make -j<number of cores>). 
I don't understand why make -j32 is faster than make -j8 when I have (read in Java) just 8 cores (hyper-threading doubles the number of physical cores).  How is that possible? 
A faster CPU speed typically helps you to load applications faster, while having more cores allows you to have more programs running at the same time and to switch from one program to the other with more ease.
Having more cores means your CPU is able to understand instructions of multiple tasks, while optimal single threading means it's able to process each of those individually, and really well. Video games are about transporting you to another world and giving you the chance to explore new territory.
Every process has at least one thread, but there is no maximum number of threads a process can use. For specialized tasks, the more threads you have, the better your computer's performance will be. With multiple threads, a single process can handle a variety of tasks simultaneously.
The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses and performance decline.
There's more to compiling than CPU speed and number of cores available: disk bandwidth and memory bandwith matter a lot too.
In your case, I imagine that each CPU HT sibling is getting roughly 4 processes to execute. As it starts one, it blocks on disk IO and moves onto the next process. The second one tries to open a second file, blocks on disk IO, and the sibling moves onto the next process. Starting four compilers before the first disk IO is ready wouldn't surprise me.
So when the first one finally read in the program source, the compiler must start hunting through directories to find the #included files. Each one requires some open() calls followed by read() calls, all of which can block, and all of which will relinquish the sibling for other processes to run.
Now multiply that by eight siblings -- each HT core will run until it blocks on memory access, at which point it'll swap over to the other sibling, and run for a while. Once the memory of the first sibling has been fetched into the cache, it is probably time for the second sibling to stall while waiting for memory.
There is an upper limit on how much faster you can get your compiles to run by using make -j, but twice-number-of-cpus has been a good starting point for me in the past.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With