Using both igpu on the same machine as a gpu when igpu doesn't support matrix cores #18241

Fran2789 · 2025-12-21T01:38:07Z

Fran2789
Dec 21, 2025

This is something llama.cpp actually supports. i set the following environment variable: GGML_VK_VISIBLE_DEVICES like this in a linux command line:
GGML_VK_VISIBLE_DEVICES=0./llama-server -hf ggml-org/Qwen3-1.7B-GGUF -c 9000 --port 8081 -cram 0

you can replace the 0 in the environment variable with the device you want to use. This will force llama.cpp to use that device, because if you also have an nvidia gpu it will default to it.

I compiled llama cpp using vulkan so i am running it on a GCN5 amd igpu. An rx vega 7 on a 5600h cpu. The igpu may be slower than the cpu if you prompt the llm with questions, but it can free up the cpu, and on prompt sizes of over 1000 tokens, i find the speed of the igpu comparable to that of the cpu in the 5 to 11 tok/s range due to limited memory bandwidth using ddr4.

I use vulkan because it seems to be impossible to use opencl or rocm on integrated gpus because they don't support any llvm targets for rocm compilation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using both igpu on the same machine as a gpu when igpu doesn't support matrix cores #18241

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Using both igpu on the same machine as a gpu when igpu doesn't support matrix cores #18241

Uh oh!

Uh oh!

Fran2789 Dec 21, 2025

Replies: 0 comments

Fran2789
Dec 21, 2025