Skip to content

Bug: lora_post_process_for_vllm has no effect #414

@achillefokoue

Description

@achillefokoue

Describe the bug

The option "lora_post_process_for_vllm" does not seem to have any effect. It is described in https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/build/README.md#configuration as "If tuning for inference on vLLM, set lora_post_process_for_vllm to true. Post process LoRA adapters to allow inferencing on vLLM. vLLM needs new token embedding weights added during tuning to be moved to a new file new_embeddings.safetensors."

When fine tuning mistralai/Mixtral-8x7B-v0.1, "lora_post_process_for_vllm":true does not result in the creation of the new file new_embeddings.safetensors. Later, when the fine-tuned model is served by vllm, the following error occurs:

raise ValueError(f"{name} is unsupported LoRA weight")
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight"

Platform

  • Interpreter version: Python 3.12.7
  • Library version: The version of the code from the main branch as of December 9 at 4:27 pm ET

Sample Code

export SFT_TRAINER_CONFIG_JSON_PATH= config.json

accelerate launch --num_processes=5 --config_file fixtures/accelerate_fsdp_defaults.yaml tuning/sft_trainer.py

where the content of config.json is as follows:

{
    "config_file": "fixtures/accelerate_fsdp_defaults.yaml",
    "model_name_or_path": "mistralai/Mixtral-8x7B-v0.1",
    "training_data_path": $TRAINING_PATH,
    "output_dir": $OUTPUT_PATH,
    "num_train_epochs": 10.0,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 4,
    "torch_dtype": "float16",
    "peft_method": "lora",
    "r": 8,
    "lora_dropout": 0.05,
    "target_modules": "all-linear",
    "lora_post_process_for_vllm": true
}

Expected behavior

The expected behavior is described in https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/build/README.md#configuration: "If tuning for inference on vLLM, set lora_post_process_for_vllm to true. Post process LoRA adapters to allow inferencing on vLLM. vLLM needs new token embedding weights added during tuning to be moved to a new file new_embeddings.safetensors."

Observed behavior

When fine tuning mistralai/Mixtral-8x7B-v0.1, "lora_post_process_for_vllm":true does not result in the creation of the new file new_embeddings.safetensors. Later, when the fine-tuned model is served by vllm, the following error occurs:

raise ValueError(f"{name} is unsupported LoRA weight")
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight"

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions