What is the time complexity of MPI barriers? Do they scale for large number of cores (>> 10k)?
Barrier complexity is highly implementation-specific. It could be linear, it could be logarithmic, or it could be better or worse. Some architectures provide dedicated networks for some collective operations, e.g. IBM's Blue Gene has a specialised global interrupt network which allows for very fast MPI_BARRIER implementation with almost constant complexity but only when performed over MPI_COMM_WORLD.
While Hristo Iliev is correct, you can assume that any reasonable MPI implementation used on these scales has logarithmic complexity on collective operations. Yes this does scale >> 10k cores. There can still be a dramatic factor between different modern implementations. Also at this scale OS noise can have a very significant impact on collective operations (see e.g. [1]).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With