Bug: lora_post_process_for_vllm has no effect

## Describe the bug

The option "lora_post_process_for_vllm" does not seem to have any effect.  It is described in https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/build/README.md#configuration as "If tuning for inference on vLLM, set lora_post_process_for_vllm to true. Post process LoRA adapters to allow inferencing on vLLM. vLLM needs new token embedding weights added during tuning to be moved to a new file new_embeddings.safetensors." 

When fine tuning mistralai/Mixtral-8x7B-v0.1, "lora_post_process_for_vllm":true does not result in the creation of the new file new_embeddings.safetensors.  Later, when the fine-tuned model is served by vllm, the following error occurs: 
``` 
raise ValueError(f"{name} is unsupported LoRA weight")
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight"
```

## Platform

- Interpreter version: Python 3.12.7
- Library version: The version of the code from the main branch as of December 9 at 4:27 pm ET

## Sample Code
```
export SFT_TRAINER_CONFIG_JSON_PATH= config.json

accelerate launch --num_processes=5 --config_file fixtures/accelerate_fsdp_defaults.yaml tuning/sft_trainer.py
```
where the content of config.json is as follows:
```
{
    "config_file": "fixtures/accelerate_fsdp_defaults.yaml",
    "model_name_or_path": "mistralai/Mixtral-8x7B-v0.1",
    "training_data_path": $TRAINING_PATH,
    "output_dir": $OUTPUT_PATH,
    "num_train_epochs": 10.0,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 4,
    "torch_dtype": "float16",
    "peft_method": "lora",
    "r": 8,
    "lora_dropout": 0.05,
    "target_modules": "all-linear",
    "lora_post_process_for_vllm": true
}
```

## Expected behavior

The expected behavior is described in https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/build/README.md#configuration: "If tuning for inference on vLLM, set lora_post_process_for_vllm to true. Post process LoRA adapters to allow inferencing on vLLM. vLLM needs new token embedding weights added during tuning to be moved to a new file new_embeddings.safetensors." 

## Observed behavior

When fine tuning mistralai/Mixtral-8x7B-v0.1, "lora_post_process_for_vllm":true does not result in the creation of the new file new_embeddings.safetensors.  Later, when the fine-tuned model is served by vllm, the following error occurs: 
``` 
raise ValueError(f"{name} is unsupported LoRA weight")
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight"
```

## Additional context

Add any other context about the problem here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: lora_post_process_for_vllm has no effect #414

Describe the bug

Platform

Sample Code

Expected behavior

Observed behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: lora_post_process_for_vllm has no effect #414

Description

Describe the bug

Platform

Sample Code

Expected behavior

Observed behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions