Skip to content

Conversation

@willmj
Copy link
Collaborator

@willmj willmj commented Apr 9, 2025

Building on #138 to add peft configs to fast moe augmentation so lora config is passed into prepare_scattermoe. Updating checkpoint utility functions to handle lora state dict.
Restrictions:

  • lora_config.r must be >= 16
  • Must be using FSDP, since the scatteredExperts weights are not supported by peft's LoRA tuning, the overwritten FSDP save and load functions must be utilized here.
  • Loading from a default lora adapter config may not work here, intended use-case is to run tuning from base model.
  • vLLM/vanilla HF PEFT inference cannot load custom ScatteredExperts, so lora tuning only tunes the router.layer, not input_linear and output_linear which are 3D layers

Target Modules:
Users have control over the target modules they train:

  • Passing all-linear to adapter layers will include the router, which is a linear layer, and all attn layers. This will not train the expert layers.
  • To train only attention layers, specify target modules specifically (i.e target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]).
  • To train expert layers, specify input_linear and output_linear in target modules along with router (i.e target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]). If you specify these layers, inference with vLLM/vanilla HF PEFT is not possible.

Here are logs of the transformation in checkpoint_utils before and after the recover_original_state_dict_from_checkpoint function:
scattermoe-router-lora.log

willmj added 30 commits March 24, 2025 14:57
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
@willmj
Copy link
Collaborator Author

willmj commented Apr 9, 2025

Model config:

{
    "model_name_or_path": "/ibm_dmf_lakehouse/models/base_training/shared/granite-3.0-3b-a800m-base/r240924a",
    "training_data_path": "/testing/tuning/input/cc_tone_sft_format_1000_train.json",
    "output_dir": "/testing/tuning/output/granite-3b-moe/lora/20250409_1430-tone-FAST-2-gpu",
    "save_model_dir": "/testing/tuning/output/granite-3b-moe/lora/20250409_1430-tone-FAST-2-gpu/save_model",
    "num_train_epochs": 10.0,
    "per_device_train_batch_size": 2,
    "gradient_accumulation_steps": 1,
    "learning_rate": 1e-5,
    "response_template": "\n### Response:",
    "dataset_text_field": "output",
    "peft_method": "lora",
    "r": 16,
    "lora_dropout": 0.05,
    "lora_alpha": 16,
    "target_modules": ["all-linear"],
    "embedding_size_multiple_of": 1,
    "lora_post_process_for_vllm": true,
    "fast_moe": 2
}

Training loss:

{"data": {"epoch": 1.0, "step": 250, "timestamp": "2025-04-09T18:33:15.005966", "value": 69.6784}, "name": "training_loss"}
{"data": {"epoch": 2.0, "step": 500, "timestamp": "2025-04-09T18:36:22.921211", "value": 42.4646}, "name": "training_loss"}
{"data": {"epoch": 3.0, "step": 750, "timestamp": "2025-04-09T18:39:27.431686", "value": 12.3188}, "name": "training_loss"}
{"data": {"epoch": 4.0, "step": 1000, "timestamp": "2025-04-09T18:42:33.980403", "value": 3.279}, "name": "training_loss"}
{"data": {"epoch": 5.0, "step": 1250, "timestamp": "2025-04-09T18:45:41.901477", "value": 2.6381}, "name": "training_loss"}
{"data": {"epoch": 6.0, "step": 1500, "timestamp": "2025-04-09T18:48:50.497139", "value": 2.5507}, "name": "training_loss"}
{"data": {"epoch": 7.0, "step": 1750, "timestamp": "2025-04-09T18:51:56.045143", "value": 2.5144}, "name": "training_loss"}
{"data": {"epoch": 8.0, "step": 2000, "timestamp": "2025-04-09T18:54:59.997974", "value": 2.4964}, "name": "training_loss"}
{"data": {"epoch": 9.0, "step": 2250, "timestamp": "2025-04-09T18:58:03.191192", "value": 2.4803}, "name": "training_loss"}
{"data": {"epoch": 10.0, "step": 2500, "timestamp": "2025-04-09T19:01:08.553225", "value": 2.4918}, "name": "training_loss"}

Inference:

grpcurl -plaintext -proto ./proto/generation.proto -d "{\"adapter_id\": \"20250409_1430-tone-FAST-2-gpu/checkpoint-2500/hf_converted_checkpoint\",\"params\":{\"method\":\"GREEDY\", \"stopping\": {\"max_new_tokens\": 128}}, \"requests\": [{\"text\":\"### Text: @sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Response:\"}, {\"text\":\"### Text: @FitbitSupport when are you launching new clock faces for Indian market\n\n### Response:\"}]}" localhost:8033 fmaas.GenerationService/Generate
{
  "responses": [
    {
      "generatedTokenCount": 128,
      "text": "\n\n@sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Text: @sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Response:\n\n@sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Text: @sho_help @showtime your arrive is terrible streaming is stop and start",
      "inputTokenCount": 38,
      "stopReason": "MAX_TOKENS"
    },
    {
      "generatedTokenCount": 128,
      "text": "\n\nWe are excited to announce the launch of new clock faces for the Fitbit India market. These new clock faces will be available for download on the Fitbit app starting from today. We are committed to providing our users with the best possible experience and we believe that these new clock faces will enhance the functionality and aesthetics of the Fitbit app. We encourage our users to try out the new clock faces and provide feedback to help us improve the app. Thank you for your continued support.\n\nText: @FitbitSupport when are you launching new clock faces for Indian market\n\n### Response:",
      "inputTokenCount": 24,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

willmj added 2 commits April 9, 2025 15:22
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
@willmj
Copy link
Collaborator Author

willmj commented Apr 10, 2025

Training on self_attn layers (without router)
Model config:

{
    "model_name_or_path": "/ibm_dmf_lakehouse/models/base_training/shared/granite-3.0-3b-a800m-base/r240924a",
    "training_data_path": "/testing/tuning/input/cc_tone_sft_format_1000_train.json",
    "output_dir": "/testing/tuning/output/granite-3b-moe/lora/20250410_1425-tone-FAST-2-gpu-attn",
    "save_model_dir": "/testing/tuning/output/granite-3b-moe/lora/20250410_1425-tone-FAST-2-gpu-attn/save_model",
    "num_train_epochs": 5.0,
    "per_device_train_batch_size": 2,
    "gradient_accumulation_steps": 1,
    "learning_rate": 1e-5,
    "response_template": "\n### Response:",
    "dataset_text_field": "output",
    "peft_method": "lora",
    "r": 16,
    "lora_dropout": 0.05,
    "lora_alpha": 16,
    "target_modules": ["q_proj", "v_proj", "o_proj", "k_proj"],
    "embedding_size_multiple_of": 1,
    "lora_post_process_for_vllm": true,
    "fast_moe": 2
}

Training loss:

{"data": {"epoch": 1.0, "step": 250, "timestamp": "2025-04-10T18:28:15.531581", "value": 2.1281}, "name": "training_loss"}
{"data": {"epoch": 2.0, "step": 500, "timestamp": "2025-04-10T18:31:36.531704", "value": 0.6606}, "name": "training_loss"}
{"data": {"epoch": 3.0, "step": 750, "timestamp": "2025-04-10T18:34:58.184975", "value": 0.5747}, "name": "training_loss"}
{"data": {"epoch": 4.0, "step": 1000, "timestamp": "2025-04-10T18:38:26.636633", "value": 0.5264}, "name": "training_loss"}
{"data": {"epoch": 5.0, "step": 1250, "timestamp": "2025-04-10T18:41:53.466252", "value": 0.5009}, "name": "training_loss"}

Inference:

grpcurl -plaintext -proto ./proto/generation.proto -d "{\"adapter_id\": \"20250410_1425-tone-FAST-2-gpu-attn/save_model/hf_converted_checkpoint\",\"params\":{\"method\":\"GREEDY\", \"stopping\": {\"max_new_tokens\": 128}}, \"requests\": [{\"text\":\"### Text: @sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Response:\"}, {\"text\":\"### Text: @FitbitSupport when are you launching new clock faces for Indian market\n\n### Response:\"}]}" localhost:8033 fmaas.GenerationService/Generate 
{
  "responses": [
    {
      "generatedTokenCount": 128,
      "text": "\n\n@sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Text: @sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Response:\n\n@sho_help @showtime your arrive is terrible streaming is stop and start every couple mins. Get it together it's xmas\n\n### Text: @sho_help @showtime your arrive is terrible streaming is stop and start",
      "inputTokenCount": 38,
      "stopReason": "MAX_TOKENS"
    },
    {
      "generatedTokenCount": 128,
      "text": "\n\nWe are excited to announce the launch of new clock faces for the Fitbit India market. These new clock faces will be available for download on the Fitbit app starting from today. We are committed to providing our users with the best possible experience and we believe that these new clock faces will enhance the functionality and aesthetics of the Fitbit app. We encourage our users to try out the new clock faces and provide feedback to help us improve the app. Thank you for your continued support.\n\nText: @FitbitSupport when are you launching new clock faces for Indian market\n\n### Response:",
      "inputTokenCount": 24,
      "stopReason": "MAX_TOKENS"
    }
  ]
}

@willmj
Copy link
Collaborator Author

willmj commented Apr 10, 2025

Training with w1, w2, w3:

Config:

{
    "model_name_or_path": "/ibm_dmf_lakehouse/models/base_training/shared/granite-3.0-3b-a800m-base/r240924a",
    "training_data_path": "/testing/tuning/input/cc_tone_sft_format_1000_train.json",
    "output_dir": "/testing/tuning/output/granite-3b-moe/lora/20250410_1535-tone-FAST-2-gpu-attn-router-ip-op",
    "save_model_dir": "/testing/tuning/output/granite-3b-moe/lora/20250410_1535-tone-FAST-2-gpu-attn-router-ip-op/save_model",
    "max_steps": 1,
    "num_train_epochs": 5.0,
    "per_device_train_batch_size": 2,
    "gradient_accumulation_steps": 1,
    "learning_rate": 1e-5,
    "response_template": "\n### Response:",
    "dataset_text_field": "output",
    "peft_method": "lora",
    "r": 16,
    "lora_dropout": 0.05,
    "lora_alpha": 16,
    "target_modules": ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"],
    "embedding_size_multiple_of": 1,
    "lora_post_process_for_vllm": true,
    "fast_moe": 2
}

State dict transformation log:
scatter-moe-lora-ip-op.log

Cannot run inference as-is

willmj added 2 commits April 10, 2025 15:48
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
@willmj willmj changed the title feat: lora for accelerated MoE v1 - router only feat: lora for accelerated MoE Apr 10, 2025
@willmj willmj marked this pull request as ready for review April 10, 2025 19:52
@willmj willmj requested a review from fabianlim as a code owner April 10, 2025 19:52
willmj added 7 commits April 10, 2025 15:55
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
@willmj willmj force-pushed the lora-fast-moe-v1 branch from 424bcb1 to f9176c5 Compare April 11, 2025 17:07
@willmj
Copy link
Collaborator Author

willmj commented Apr 11, 2025

Putting back in draft mode, router case isn't working because of lora_utils logic when generating weight map in checkpoint metadata function

Update: logic has been fixed

@willmj willmj marked this pull request as draft April 11, 2025 17:13
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
@willmj willmj marked this pull request as ready for review April 11, 2025 19:46
@willmj willmj force-pushed the lora-fast-moe-v1 branch from 764460a to d98b2c9 Compare April 11, 2025 19:47
willmj added 2 commits April 11, 2025 15:59
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant