-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Issue:
Trying to load a quantized model (e.g. https://huggingface.co/RedHatAI/granite-3.1-2b-instruct-quantized.w4a16) through fms-hf-tuning which fails with the following error:
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m Traceback (most recent call last):
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/wrapper_fms_hf_tuning/scripts/wrapper_sfttrainer.py", line 467, in main
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m module.parse_arguments_and_execute_wrapper(
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/wrapper_fms_hf_tuning/tuning_versions/at_least_2_5_0.py", line 51, in parse_arguments_and_execute_wrapper
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m return tuning.sft_trainer.train(
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/tuning/sft_trainer.py", line 278, in train
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m model = model_loader(
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration/framework.py", line 183, in model_loader
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m return plugin.model_loader(model_name, **kwargs)
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration_peft/framework_plugin_autogptq.py", line 121, in model_loader
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m quantize_config = QuantizeConfig.from_pretrained(model_name)
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration_peft/gptqmodel/quantization/config.py", line 297, in from_pretrained
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m with open(resolved_config_file, "r", encoding="utf-8") as f:
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m FileNotFoundError: [Errno 2] No such file or directory: '/hf-models-pvc/granite-3.1-2b-instruct-int4-gptq/quantize_config.json'
No such file or directory: '/hf-models-pvc/granite-3.1-2b-instruct-int4-gptq/quantize_config.jsonExpected behaviour:
The model will load successfully
Additional information
To quote from an internal Slack message, these models are produced through llm-compressor and have a quantization_config in config.json example instead of quantize_config.json.
While the new configuration is supported in fms-acceleration, it is never executed as the from_pretrained method in fms_acceleration_peft/gptqmodel/quantization/config.py. The reason why this happens is in this for loop where the loop is supposed to iterate over all the supported config files but exits after the first file (quantize_config.json) is converted to a path.
A simple check of the existence of quantize_config.json would resolve this.