Skip to content

Commit aa89595

Browse files
kylesayrsdsikka
andauthored
[Example] [Model Free] Remove Mistral Experimental, point to model_free_ptq (#2130)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
1 parent e25cbf1 commit aa89595

File tree

2 files changed

+1
-164
lines changed

2 files changed

+1
-164
lines changed

experimental/mistral/README.md

Lines changed: 1 addition & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,2 @@
11
# Mistral-format model compression (experimental)
2-
3-
This folder contains tools for compressing Mistral-format models, like `mistralai/Devstral-Small-2505` and `mistralai/Magistral-Small-2506`.
4-
5-
## FP8 W8A8 Quantization
6-
7-
This script quantizes Mistral-format models to FP8. It is not for use with HuggingFace-format models.
8-
9-
### 1. Download the model
10-
11-
Download the model and save it to a new "FP8" folder. We use `mistralai/Magistral-Small-2506` as an example.
12-
13-
```bash
14-
huggingface-cli download mistralai/Magistral-Small-2506 --local-dir Magistral-Small-2506-FP8
15-
```
16-
17-
### 2. Clean up HuggingFace-specific files
18-
19-
Models from the Hub often include files for both the native Mistral format and the HuggingFace `transformers` format. This script works on the native format, so the `transformers` files should be removed to avoid confusion.
20-
21-
The HuggingFace-specific files are typically `config.json`, `model-000*-of-000*.safetensors`, and `model.safetensors.index.json`. The `params.json`, `tekken.json` and `consolidated.safetensors` are for the native format.
22-
23-
Before deleting, it's a good idea to look at the files in the directory to understand what you're removing.
24-
25-
Once you're ready, remove the `transformers`-specific files:
26-
27-
```bash
28-
rm Magistral-Small-2506/config.json Magistral-Small-2506/model.safetensors.index.json Magistral-Small-2506-FP8/model-000*
29-
```
30-
31-
### 3. Run the quantization script
32-
33-
Now, run the FP8 quantization script on the directory. This will modify the `.safetensors` files in-place and update `params.json` and `consolidated.safetensors`.
34-
35-
```bash
36-
python fp8_quantize.py Magistral-Small-2506-FP8
37-
```
38-
39-
### 4. Use the quantized model
40-
41-
The model should now be ready to use in vLLM!
42-
43-
```bash
44-
vllm serve Magistral-Small-2506-FP8 --tokenizer-mode mistral --config-format mistral --load-format mistral --tool-call-parser mistral --enable-auto-tool-choice
45-
```
2+
For quantizing mistral models which do not have a huggingface model definition such as `mistralai/Devstral-Small-2505`, `mistralai/Magistral-Small-2506`, and `mistralai/mistral-large-3`, please use the [`model_free_ptq`](/src/llmcompressor/entrypoints/model_free/) entrypoint.

experimental/mistral/fp8_quantize.py

Lines changed: 0 additions & 120 deletions
This file was deleted.

0 commit comments

Comments
 (0)