Skip to content

Error in loading quantized models without quantize_config.json #144

@srikumar003

Description

@srikumar003

Issue:
Trying to load a quantized model (e.g. https://huggingface.co/RedHatAI/granite-3.1-2b-instruct-quantized.w4a16) through fms-hf-tuning which fails with the following error:

�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m Traceback (most recent call last):
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/wrapper_fms_hf_tuning/scripts/wrapper_sfttrainer.py", line 467, in main
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     module.parse_arguments_and_execute_wrapper(
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/wrapper_fms_hf_tuning/tuning_versions/at_least_2_5_0.py", line 51, in parse_arguments_and_execute_wrapper
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     return tuning.sft_trainer.train(
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/tuning/sft_trainer.py", line 278, in train
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     model = model_loader(
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration/framework.py", line 183, in model_loader
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     return plugin.model_loader(model_name, **kwargs)
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration_peft/framework_plugin_autogptq.py", line 121, in model_loader
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     quantize_config = QuantizeConfig.from_pretrained(model_name)
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration_peft/gptqmodel/quantization/config.py", line 297, in from_pretrained
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     with open(resolved_config_file, "r", encoding="utf-8") as f:
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m FileNotFoundError: [Errno 2] No such file or directory: '/hf-models-pvc/granite-3.1-2b-instruct-int4-gptq/quantize_config.json'

No such file or directory: '/hf-models-pvc/granite-3.1-2b-instruct-int4-gptq/quantize_config.json

Expected behaviour:
The model will load successfully

Additional information

To quote from an internal Slack message, these models are produced through llm-compressor and have a quantization_config in config.json example instead of quantize_config.json.

While the new configuration is supported in fms-acceleration, it is never executed as the from_pretrained method in fms_acceleration_peft/gptqmodel/quantization/config.py. The reason why this happens is in this for loop where the loop is supposed to iterate over all the supported config files but exits after the first file (quantize_config.json) is converted to a path.

A simple check of the existence of quantize_config.json would resolve this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions