I have 4 executables that do some very complex tasks, each of these programs alone might take nearly 100% of the power of a single core of a quad core CPU, thus resulting in almost 25% of total CPU power. Since all of these programs use hardware resources that can't be shared between mutiple processes, I wish to run a single executable that spawns 3 child processes which, in turn, occupy the other three cores. I'm on Linux and I'm using C++11. Most of the complex code is running in its own class and the hardest part runs in a function that I usually call Process(), so I have 4 objects, each with its own Process() that, when running, takes 100% of a single core.
I tried using OpenMP but I don't think it's the best solution as I have no control over CPU affinity. Also using std::thread is not a good idea, because threads inherit the main process' CPU affinity. In Linux I think I can do this with fork() but I have no idea how the whole structure is made.
This might be related to my other question that was partly left unanswered, maybe because I was trying the wrong approach that works in some cases but not in my case.
An example of pseudo-code could be this:
int main()
{
// ...init everything...
// This alone takes 100% of a single core
float out1 = object1->Process();
// This should be spawned as a child process running on another core
float out2 = object2->Process();
// on another core...
float out3 ...
// yet another core...
float out4 ...
// This should still run in the parent process
float total_output = out1 + out2 + out3 + out4;
}
You can use std::thread, that's a front-end to pthread_create().
Then set its affinity with sched_setaffinity() from the thread itself as well.
As you asked, here a working stub:
#include <sched.h>
#include <thread>
#include <list>
void thread_func(int cpu_index) {
cpu_set_t cpuSet;
CPU_ZERO(&cpuSet);
CPU_SET(cpu_index, &cpuSet);
sched_setaffinity(0, sizeof( cpu_set_t), &cpuSet);
/* the rest of the thread body here */
}
using namespace std;
int main(int argc, char **argv) {
if (argc != 2) exit(1);
int n_cpus = atoi(argv[1]);
list< shared_ptr< thread > > lot;
for (int i=0; i<n_cpus; ++i) {
lot.push_back( shared_ptr<thread>(new thread(thread_func, i)));
}
for(auto tptr = lot.begin(); tptr != lot.end(); ++tptr) {
(*tptr)->join();
}
}
Note that for optimal behaviour it's important that each thread initialises its memory (that is, constructs its objects) in the thread body, if you want that your code is optimized also on multi-processors, because in case you are working on a NUMA system, memory pages are allocated on memory close to the CPU using them.
For example you can have a look to this blog.
However this is not an issue in your specific case, since your are dealing with a single processor system, or more specifically a system with just one numa node (many current AMD processors do contain two numa nodes, even if within a single physical package), and all the memory banks are attached there.
The final effect of using sched_setaffinity() in this context will be just to pin down each thread to a specific core.
You don't have to program anything. The command taskset changes the CPU affinity of a currently running process or creates and sets it for a new process.
Running a single executable that spawns other programs is no different than executing the programs directly, except for the common initialization implied by the comments in your stub.
taskset 1 program_for_cpu_0 arg1 arg2 arg3...
taskset 2 program_for_cpu_1 arg1 arg2 arg3...
taskset 4 program_for_cpu_2 arg1 arg2 arg3...
taskset 8 program_for_cpu_3 arg1 arg2 arg3...
I am suspicious of setting CPU affinities. I have yet to find an actual use for doing so (other than satisfying some inner need for control).
Unless the CPUs are not identical in some way, there should be no need to restrict a process to a particular CPU.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With