When I am training using keras + tensorflow-gpu, and I set the batch_size to 128, which is the max size the gpu could accept, otherwise, there's OOM problem. My question is when the batch_size is 128, the pics size is 128*224*224*3*4(the img size is 224*224, in RGB channel), total is around 10M Bytes, which I think is too small compared to the memory of GPU. Is there any explanation for it?
You are forgetting 3 more things which also require GPU memory.
Your Model weights.
Temporary variables during calculation of gradients.
These two take up a huge chunk of memory.
This is why even though your batch consumes 10M.
The image is uint where is tensor is float64 which increases size by eight times. Forward path, gradients, and other tensors use a significant chunk of memory.
You can compute memory required for your model as given here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With