You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is something llama.cpp actually supports. i set the following environment variable: GGML_VK_VISIBLE_DEVICES like this in a linux command line:
GGML_VK_VISIBLE_DEVICES=0./llama-server -hf ggml-org/Qwen3-1.7B-GGUF -c 9000 --port 8081 -cram 0
you can replace the 0 in the environment variable with the device you want to use. This will force llama.cpp to use that device, because if you also have an nvidia gpu it will default to it.
I compiled llama cpp using vulkan so i am running it on a GCN5 amd igpu. An rx vega 7 on a 5600h cpu. The igpu may be slower than the cpu if you prompt the llm with questions, but it can free up the cpu, and on prompt sizes of over 1000 tokens, i find the speed of the igpu comparable to that of the cpu in the 5 to 11 tok/s range due to limited memory bandwidth using ddr4.
I use vulkan because it seems to be impossible to use opencl or rocm on integrated gpus because they don't support any llvm targets for rocm compilation.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is something llama.cpp actually supports. i set the following environment variable: GGML_VK_VISIBLE_DEVICES like this in a linux command line:
GGML_VK_VISIBLE_DEVICES=0./llama-server -hf ggml-org/Qwen3-1.7B-GGUF -c 9000 --port 8081 -cram 0
you can replace the 0 in the environment variable with the device you want to use. This will force llama.cpp to use that device, because if you also have an nvidia gpu it will default to it.
I compiled llama cpp using vulkan so i am running it on a GCN5 amd igpu. An rx vega 7 on a 5600h cpu. The igpu may be slower than the cpu if you prompt the llm with questions, but it can free up the cpu, and on prompt sizes of over 1000 tokens, i find the speed of the igpu comparable to that of the cpu in the 5 to 11 tok/s range due to limited memory bandwidth using ddr4.
I use vulkan because it seems to be impossible to use opencl or rocm on integrated gpus because they don't support any llvm targets for rocm compilation.
Beta Was this translation helpful? Give feedback.
All reactions