TensorFlow ignores the RTX 3000 series GPU

Question

I am trying to train my model using the RTX 3090 GPU.
In order to be able to use it at all, i had to install TensorFlow==2.4.0-rc0, however, there is a problem with actually using that GPU.

(Yes, i have downclocked memory as it is getting really toasty while running at stock 19,5 Ghz, that is why memory bandwidth is 60 Gbps lower)

First of all, it detects GPU but then saying:

tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 871.81GiB/s

Then it says:

Adding visible gpu devices: 0

But a couple of lines below that message, this message is displayed:

Created TensorFlow device 
(/job:localhost/replica:0/task:0/device:GPU:0 with 21821 MB memory) -> 
physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)

And then it just continues to hammer CPU and not actually using GPU at all. The most important part, when training is done purely on CPU, time to complete one epoch is around 80 seconds, however, when GPU is used, it wont be able to complete even a single epoch.

enter image description here

This is the complete text output of my Jupyter Notebook (when it is running)

[I 04:06:47.194 NotebookApp] Kernel started: e4bec12d-3d85-4019-9b5a-67d34a45acfc
[I 04:06:50.799 NotebookApp] Starting buffering for e4bec12d-3d85-4019-9b5a-67d34a45acfc:591585a545fe4d33977dac034060b33c
[I 04:06:51.031 NotebookApp] Kernel restarted: e4bec12d-3d85-4019-9b5a-67d34a45acfc
[I 04:06:51.557 NotebookApp] Restoring connection for e4bec12d-3d85-4019-9b5a-67d34a45acfc:591585a545fe4d33977dac034060b33c
[I 04:06:51.558 NotebookApp] Replaying 3 buffered messages
2020-11-06 04:06:53.766169: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-11-06 04:07:01.412837: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-06 04:07:01.420283: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2020-11-06 04:07:01.438547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-06 04:07:01.438675: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-11-06 04:07:01.450544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-11-06 04:07:01.450698: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-11-06 04:07:01.453610: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2020-11-06 04:07:01.454496: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2020-11-06 04:07:01.457436: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2020-11-06 04:07:01.459702: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2020-11-06 04:07:01.460296: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-11-06 04:07:01.460439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-11-06 04:07:01.461093: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-06 04:07:01.461751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-06 04:07:01.461854: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-11-06 04:07:01.462144: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-11-06 04:07:01.462407: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-11-06 04:07:01.462690: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2020-11-06 04:07:01.462941: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2020-11-06 04:07:01.464597: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2020-11-06 04:07:01.464843: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2020-11-06 04:07:01.465087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-11-06 04:07:01.465348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-11-06 04:07:01.838515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-06 04:07:01.838596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2020-11-06 04:07:01.838999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2020-11-06 04:07:01.839431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21821 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
2020-11-06 04:07:01.842196: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-06 04:07:10.441807: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-06 04:07:11.435159: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-11-06 04:07:12.026347: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-11-06 04:07:12.044635: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
[I 04:08:47.169 NotebookApp] Saving file at /train_model.ipynb
2020-11-06 04:13:24.212460: I tensorflow/stream_executor/cuda/cuda_blas.cc:1838] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

P.S. Update #1
It took 579 seconds to complete single epoch using GPU, while it used to take only 80 seconds to complete it on CPU

Vedant Joshi · Accepted Answer

It's because rtx 3090 has Ampere architecture and is compatible with Cuda-11 and cuDNN-8 while TensorFlow hasn't covered the requirements of Cuda-11 in v2.3 ...

I'm facing the same issue but I've figured out that it's the compatibility issue, maybe waiting for v2.4 is the best option. or else you can try compiling TensorFlow from source code.

you can refer - https://medium.com/@dun.chwong/the-simple-guide-deep-learning-with-rtx-3090-cuda-cudnn-tensorflow-keras-pytorch-e88a2a8249bc

Theodore Popp · Answer

Adding visible gpu devices: 0 is misleading and actually means one device was added. The portion after the colon is a comma separated list of devices, not the number of devices.

Setting an environment variable TF_CPP_MIN_VLOG_LEVEL=10 will show a lot of information, some of which might help you debug this case.

Given, your logs show that the device was available, cuBLAS libraries were loaded, no other relevant error messages were shown, and there's a very noticeable timing change, the most likely answer is that Tensorflow is not ignoring your GPU and your model is just not optimized to run quickly on GPUs.

My recommended next steps would be looking at VLOGs to see if the GPU is being used for the execution of any ops. It's possible, though I think unlikely, that they would show there are library mismatch issues leading to the CPU still being used and not your GPU along with a slowdown while the process realizes this issue.

After confirming that the GPU is being used, I would advise looking here to confirm all ops you expect to be run on the GPU are and to debug why your model does not work well on a GPU: https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras

TensorFlow ignores the RTX 3000 series GPU

Tags:

python

tensorflow

Ivan Zhivolupov

2 Answers

Vedant Joshi

Theodore Popp

Recent Activity

Donate For Us

TensorFlow ignores the RTX 3000 series GPU

Tags:

python

tensorflow

Ivan Zhivolupov

2 Answers

Vedant Joshi

Theodore Popp

Related questions

Recent Activity

Donate For Us