Skip to content

Eval bug: MMAP off causes load model to fail #18198

@askmyteapot

Description

@askmyteapot

Name and Version

D:\llama.cpp(llvmB)>llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 1: Tesla P40, compute capability 6.1, VMM: no
version: 7478 (0a271d8)
built with Clang 20.1.8 for Windows x86_64

D:\llama.cpp(llvmB)>llama-cli -m D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf -ngl 99 -fa 1 -dev cuda0 --no-mmap
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: Tesla P40, compute capability 6.1, VMM: no

Loading model... \llama_model_load: error loading model: read error: An attempt was made to move the file pointer before the beginning of the file.

llama_model_load_from_file_impl: failed to load model                                                                                               \common_init_from_params: failed to load model 'D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf'
srv    load_model: failed to load model, 'D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf' 
D:\llama.cpp(llvmB)>llama-cli -m D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf -ngl 99 -fa 1 -dev cuda0 --mmap
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: Tesla P40, compute capability 6.1, VMM: no

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7478-0a271d82b
model      : magnum-v4-22b-Q6_K.gguf
modalities : text

Operating systems

Windows

GGML backends

CUDA

Hardware

Ryzen 5800x + 64GB DDR4
Tesla P40 + RTX 3090

Models

All models i tested, MoE, Dense, Granite 4 H Tiny

Problem description & steps to reproduce

Mmap off causes the failure

First Bad Commit

57c1e05
4d4f4ca

Relevant log output

above

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions