Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed Data Parallel (DDP) Batch size

Tags:

pytorch

ddp

Suppose, I use 2 gpus in a DDP setting.

So, if I intend to use 16 as a batch size if I run the experiment on a single gpu,

should I give 8 as a batch size, or 16 as a batch size in case of using 2 gpus with DDP setting??

Does 16 is divided into 8 and 8 automatically?

Thank you -!

like image 912
Seungwoo Ryu Avatar asked Mar 23 '26 06:03

Seungwoo Ryu


2 Answers

No, it won't be split automatically. When you set batch_size=8 under DDP mode, each GPU will receive dataset with batch_size=8, so the global batch_size=16

like image 116
Gabriel Avatar answered Mar 25 '26 17:03

Gabriel


I don't agree with Deusy94's answer.

If I understand correctly according to pytorch's official example using distributed data parallel (ddp) at line 160:

args.batch_size = int(args.batch_size / ngpus_per_node)

the batch size when you instantiate the DataLoader is the batch size for a single process/single node.

Note that in the argparser the comment was:

parser.add_argument('-b', '--batch-size', default=256, type=int,
                metavar='N',
                help='mini-batch size (default: 256), this is the total '
                     'batch size of all GPUs on the current node when '
                     'using Data Parallel or Distributed Data Parallel')

Hence, let's say you have passed --batch-size 16 here and you have two GPUs, the args.batch_size will be updated to 8 manually (divided by number of GPUs) in Line 160 above, and the actual dataloader you created is with batch_size of 8 - which is the dataloader for individual GPUs.

Therefore, if you create dataloader with DataLoader(datasetm batch_size=16), and you start the DDP with 2 GPUs, each GPU will proceed with batch_size=16 and your global batch_size will be 32.

This is different with DataParallel which has a gather/scatter procedure, such that your batch is automatically scattered into equal size of chunks for each GPUs (i.e., DataLoader(datasetm batch_size=16) --> each GPU gets 8).

Eather way, it's quite easy to verify it if you iterate the dataloader with a progress bar (e.g., tqdm) to log how many steps it needed to traverse all batches (i.e., number of batches), and you can always compute which of the equation is True: batch_size * num_batches == dataset_size or num_gpu * batch_size * num_batches == dataset_size.

like image 24
M Ciel Avatar answered Mar 25 '26 17:03

M Ciel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!