I am using cuda in pytorch framwework in linux server with multiple cuda devices.
The problem is that eventhough I specified certain gpus that can be shown, the program keeps using only first gpu.
(But other program works fine and other specified gpus are allocated well. because of that, I think it is not nvidia or system problem. nvidia-smi shows all gpus well and there's no problem. I didn't have problem with allocating gpus with below codes before (except when the system is not working) )
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBILE_DEVICES"] = str(args.gpu)
I wrote that before running main function. and it works fine for other programs in same system.
I printed that args.gpu variable, and could see that the value is not "0".
Have you tried something like this?
device = torch.device("cuda:0,1" if torch.cuda.is_available() else "cpu") ## specify the GPU id's, GPU id's start from 0.
model = CreateModel()
model= nn.DataParallel(model,device_ids = [0, 1])
model.to(device)
let me know about this
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With