Skip to content

Conversation

@GermanAizek
Copy link
Contributor

@GermanAizek GermanAizek commented Dec 17, 2025

@am17an, hi again. Thanks a lot for the past tests, I was able to better test my changes now, how will you have free time could you test this branch?

Reference: #1595 (review)

CTest successful all, but model not found strange

14 - test-tokenizers-ggml-vocabs (Failed)
Reason:

14/43 Test #14: test-tokenizers-ggml-vocabs .......***Failed    0.36 sec
Already up to date.
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf'
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf'
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf'
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf'
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf'

ctest_all_output.txt

My hyperfine tests on NUMA Xeon 2xE5-2699:

devuan@devuan:/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/cmake-build-release/bin$ hyperfine --warmup 1 -r 5 "./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128"
Benchmark 1: ./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128
  Time (mean ± σ):     32.360 s ±  0.182 s    [User: 1150.270 s, System: 1.218 s]
  Range (min … max):   32.049 s … 32.514 s    5 runs
 
devuan@devuan:/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/cmake-build-release/bin$ hyperfine --warmup 1 -r 5 "./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128"
Benchmark 1: ./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128
  Time (mean ± σ):     28.896 s ±  0.267 s    [User: 1024.634 s, System: 1.303 s]
  Range (min … max):   28.568 s … 29.183 s    5 runs

Single run (not accuracy for me):

tg128 increased, as well as in hyperfine, average execution time llama-bench fell

cpu-vec-simd

model size params backend threads test t/s
llama 1B Q2_K - Medium 546.50 MiB 1.24 B CPU 4 pp512 76.53 ± 0.09
llama 1B Q2_K - Medium 546.50 MiB 1.24 B CPU 4 tg128 28.97 ± 0.85

build: be23f5f (7424)

master

model size params backend threads test t/s
llama 1B Q2_K - Medium 546.50 MiB 1.24 B CPU 4 pp512 77.00 ± 0.10
llama 1B Q2_K - Medium 546.50 MiB 1.24 B CPU 4 tg128 27.05 ± 0.73

build: d674212 (7421)

@taronaeo
Copy link
Collaborator

CTest successful all, but model not found strange

Did you pull via Git LFS? Looks like the models were not downloaded via LFS

@am17an
Copy link
Collaborator

am17an commented Dec 18, 2025

If I understand correctly, this is only affecting variance calculation which is only used in GGML_OP_NORM and the model you are testing (llama-1B) uses the rms norm (i.e. GGML_OP_RMS_NORM), so I wouldn't expect a change in performance

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants