Error in loading quantized models without quantize_config.json

**Issue:**
Trying to load a quantized model (e.g. https://huggingface.co/RedHatAI/granite-3.1-2b-instruct-quantized.w4a16) through `fms-hf-tuning` which fails with the following error:
```console
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m Traceback (most recent call last):
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/wrapper_fms_hf_tuning/scripts/wrapper_sfttrainer.py", line 467, in main
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m     module.parse_arguments_and_execute_wrapper(
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/wrapper_fms_hf_tuning/tuning_versions/at_least_2_5_0.py", line 51, in parse_arguments_and_execute_wrapper
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m     return tuning.sft_trainer.train(
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/tuning/sft_trainer.py", line 278, in train
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m     model = model_loader(
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration/framework.py", line 183, in model_loader
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m     return plugin.model_loader(model_name, **kwargs)
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration_peft/framework_plugin_autogptq.py", line 121, in model_loader
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m     quantize_config = QuantizeConfig.from_pretrained(model_name)
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration_peft/gptqmodel/quantization/config.py", line 297, in from_pretrained
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m     with open(resolved_config_file, "r", encoding="utf-8") as f:
[36m(launch_finetune pid=20238, ip=10.48.25.68)[0m FileNotFoundError: [Errno 2] No such file or directory: '/hf-models-pvc/granite-3.1-2b-instruct-int4-gptq/quantize_config.json'

No such file or directory: '/hf-models-pvc/granite-3.1-2b-instruct-int4-gptq/quantize_config.json
```

**Expected behaviour**:
The model will load successfully 

**Additional information**

To quote from an internal Slack message, these models are produced through llm-compressor and have a `quantization_config` in `config.json` [example](https://huggingface.co/RedHatAI/granite-3.1-2b-instruct-quantized.w4a16/blob/main/config.json#L24) instead of `quantize_config.json`. 

While the new configuration is supported in fms-acceleration, it is never executed as the `from_pretrained` method in `fms_acceleration_peft/gptqmodel/quantization/config.py`. The reason why this happens is in this [for loop](https://github.com/foundation-model-stack/fms-acceleration/blob/1a804e4e8ab92a501bdbdbc9ee059bf5fed95b60/plugins/accelerated-peft/src/fms_acceleration_peft/gptqmodel/quantization/config.py#L268) where the loop is supposed to iterate over all the supported config files but exits after the first file (`quantize_config.json`) is converted to a path. 

A simple check of the existence of `quantize_config.json` would resolve this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error in loading quantized models without quantize_config.json #144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error in loading quantized models without quantize_config.json #144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions