Eval bug: MMAP off causes load model to fail

### Name and Version

D:\llama.cpp(llvmB)>llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: Tesla P40, compute capability 6.1, VMM: no
version: 7478 (0a271d82b)
built with Clang 20.1.8 for Windows x86_64

```
D:\llama.cpp(llvmB)>llama-cli -m D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf -ngl 99 -fa 1 -dev cuda0 --no-mmap
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: Tesla P40, compute capability 6.1, VMM: no

Loading model... \llama_model_load: error loading model: read error: An attempt was made to move the file pointer before the beginning of the file.

llama_model_load_from_file_impl: failed to load model                                                                                               \common_init_from_params: failed to load model 'D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf'
srv    load_model: failed to load model, 'D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf' 
```
```
D:\llama.cpp(llvmB)>llama-cli -m D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf -ngl 99 -fa 1 -dev cuda0 --mmap
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: Tesla P40, compute capability 6.1, VMM: no

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7478-0a271d82b
model      : magnum-v4-22b-Q6_K.gguf
modalities : text
```


### Operating systems

Windows

### GGML backends

CUDA

### Hardware

Ryzen 5800x + 64GB DDR4 
Tesla P40 + RTX 3090

### Models

All models i tested, MoE, Dense, Granite 4 H Tiny

### Problem description & steps to reproduce

Mmap off causes the failure

### First Bad Commit

~57c1e0564365b76a954304812cc90d3e60939bc7~
4d4f4cacd1c18975e5a97f20369fc30a225dc51f

### Relevant log output

```shell
above
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: MMAP off causes load model to fail #18198

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: MMAP off causes load model to fail #18198

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions