Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

Question

I'm reaching out to the community for some assistance with an issue I'm encountering in llama.cpp. Previously, the program was successfully utilizing the GPU for execution. However, recently, it seems to have switched to CPU execution.

Observations:

BLAS=1 is set, indicating the use of BLAS routines (likely for linear algebra operations). llm_load_print_meta: LF token = 13 '<0x0A>' (potential output related to loading metadata, but its specific meaning might be context-dependent). llm_load_tensors: ggml ctx size = 0.11 MiB (indicates the size of the global memory context, which seems relatively small). llm_load_tensors: offloading 0 repeating layers to GPU (no repeating layers are being offloaded to the GPU). llm_load_tensors: offloaded 0/33 layers to GPU (no layers have been offloaded to the GPU). llm_load_tensors: CPU buffer size = 7338.64 MiB (a significant amount of data is being loaded into CPU buffers).

Questions:

Has anyone else encountered a similar situation with llama.cpp switching from GPU to CPU execution? Are there any known configuration changes or environmental factors that might be causing this behavior? Could there be specific conditions in my code that are preventing GPU offloading?

Chris A. · Accepted Answer

Try eg the parameter -ngl 100 (for llama.cpp main) or --n_gpu_layers 100 (for llama-cpp-python) to offload to gpu.

Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

Tags:

large-language-model

llama

llama-cpp-python

llamacpp

Montassar Jaziri

1 Answers

Chris A.

Recent Activity

Donate For Us

Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

Tags:

large-language-model

llama

llama-cpp-python

llamacpp

Montassar Jaziri

1 Answers

Chris A.

Related questions

Recent Activity

Donate For Us