ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems #18150

GermanAizek · 2025-12-17T22:41:20Z

@am17an, hi again. Thanks a lot for the past tests, I was able to better test my changes now, how will you have free time could you test this branch?

Reference: #1595 (review)

CTest successful all, but model not found strange

14 - test-tokenizers-ggml-vocabs (Failed)
Reason:

14/43 Test #14: test-tokenizers-ggml-vocabs .......***Failed    0.36 sec
Already up to date.
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf'
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf'
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf'
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf'
main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf
llama_model_load_from_file_impl: failed to load model
main: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf'

ctest_all_output.txt

My hyperfine tests on NUMA Xeon 2xE5-2699:

devuan@devuan:/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/cmake-build-release/bin$ hyperfine --warmup 1 -r 5 "./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128"
Benchmark 1: ./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128
  Time (mean ± σ):     32.360 s ±  0.182 s    [User: 1150.270 s, System: 1.218 s]
  Range (min … max):   32.049 s … 32.514 s    5 runs
 
devuan@devuan:/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/cmake-build-release/bin$ hyperfine --warmup 1 -r 5 "./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128"
Benchmark 1: ./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128
  Time (mean ± σ):     28.896 s ±  0.267 s    [User: 1024.634 s, System: 1.303 s]
  Range (min … max):   28.568 s … 29.183 s    5 runs

Single run (not accuracy for me):

tg128 increased, as well as in hyperfine, average execution time llama-bench fell

cpu-vec-simd

model	size	params	backend	threads	test	t/s
llama 1B Q2_K - Medium	546.50 MiB	1.24 B	CPU	4	pp512	76.53 ± 0.09
llama 1B Q2_K - Medium	546.50 MiB	1.24 B	CPU	4	tg128	28.97 ± 0.85

build: be23f5f (7424)

master

model	size	params	backend	threads	test	t/s
llama 1B Q2_K - Medium	546.50 MiB	1.24 B	CPU	4	pp512	77.00 ± 0.10
llama 1B Q2_K - Medium	546.50 MiB	1.24 B	CPU	4	tg128	27.05 ± 0.73

build: d674212 (7421)

…2/AVX512 Reference: ggml-org#1595 (review)

…AVX512

…ax_f32 for all platforms

taronaeo · 2025-12-18T01:42:36Z

CTest successful all, but model not found strange

Did you pull via Git LFS? Looks like the models were not downloaded via LFS

am17an · 2025-12-18T02:55:47Z

If I understand correctly, this is only affecting variance calculation which is only used in GGML_OP_NORM and the model you are testing (llama-1B) uses the rms norm (i.e. GGML_OP_RMS_NORM), so I wouldn't expect a change in performance

GermanAizek added 3 commits December 18, 2025 01:14

ggml-cpu/vec: fix ggml-org#15953 using multiple accum vectors for AVX…

61ca2e4

…2/AVX512 Reference: ggml-org#1595 (review)

ggml-cpu/vec: calculate 4 exp and add ggml_vec_soft_max_f32 for AVX2/…

ea82649

…AVX512

ggml-cpu/vec: rewrite to SIMD with 4 element calc ggml_vec_log_soft_m…

be23f5f

…ax_f32 for all platforms

GermanAizek requested a review from ggerganov as a code owner December 17, 2025 22:41

loci-dev mentioned this pull request Dec 17, 2025

UPSTREAM PR #18150: ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems auroralabs-loci/llama.cpp#610

Open

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems #18150

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems #18150

GermanAizek commented Dec 17, 2025 •

edited

Loading

Uh oh!

taronaeo commented Dec 18, 2025

Uh oh!

am17an commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems #18150

Are you sure you want to change the base?

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems #18150

Conversation

GermanAizek commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Dec 18, 2025

Uh oh!

am17an commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GermanAizek commented Dec 17, 2025 •

edited

Loading