Skip to content

Conversation

@willmj
Copy link
Collaborator

@willmj willmj commented Apr 17, 2025

Description of the change

A more limited version of #523 to enable LoRA but explicitly block off router linear and expert layers.

Related issue number

How to verify the PR

image
image

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

willmj added 19 commits March 24, 2025 13:39
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
@github-actions
Copy link

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@willmj willmj changed the title Save peft fast moe limited feat: save lora config fast moe -limited Apr 17, 2025
@github-actions github-actions bot added the feat label Apr 17, 2025
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
@willmj willmj force-pushed the save-peft-fast-moe-limited branch from 6b70cc8 to da81f93 Compare April 18, 2025 01:35
@kmehant kmehant changed the title feat: save lora config fast moe -limited feat: Enable LoRA saving only for non MoE linear layers training with kernels. Apr 18, 2025
Comment on lines +127 to +129
model = self.trainer.model
if hasattr(model, "module"):
model = model.module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed for non lora use case? since after this step it would directly reach the below block

                        else:
                            model.config.save_pretrained(hf_converted_output_dir)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed to determine if there is a peft config for the following if statement, but in fine-tuning case this if statement should be false

@kmehant
Copy link
Collaborator

kmehant commented Apr 18, 2025

@willmj thanks for diligently taking my inputs, after addressing my comments, I request you to run all tests related to moe kernels on your local setup and paste screenshots here for record. Thanks, appreciate this PR!

willmj added 8 commits April 18, 2025 09:28
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
willmj added 2 commits April 18, 2025 10:56
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
model, train_args, modifiable_args=(peft_config,)
)
# For LoRa ScatterMoE, if expert layers are included, disable grad
if peft_config is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to just do an instance check using ScatterMoE and then freeze everything inside.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.

willmj added 6 commits April 18, 2025 14:48
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
fabianlim
fabianlim previously approved these changes Apr 21, 2025
Copy link
Collaborator

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @kmehant is not around I approve it. but TBH im not that familiar with the recent updates in this repo. But overall looks ok. Just one more comment

model, train_args, modifiable_args=(peft_config,)
)
# For LoRa ScatterMoE, if expert layers are included, disable grad
if peft_config is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
@willmj willmj merged commit 179da0a into foundation-model-stack:main Apr 21, 2025
9 checks passed
kmehant pushed a commit that referenced this pull request Apr 28, 2025
… kernels. (#530)

* save peft

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* post process hf converted dir

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: convert hf converted checkpoint

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* lora config

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* save adapter config

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: add input linear and output linear to target modules

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: extend instead of append

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: if hasattr peft config

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: remove unneeded target modules

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* test: lora for scattermoe

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* explitcitly don't support router layer

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* docs: update documentation

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: simplify accelerate launch post processing

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* tests: more target modules + ep_degree

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: only restrict all-linear, raise warning for other modules

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: augmentation test

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: raise error

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* turn off requires grad if using scattermoe with lora

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: freeze scattermoe params

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: safer freezing

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

---------

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
dushyantbehl pushed a commit to dushyantbehl/fms-hf-tuning that referenced this pull request Jun 23, 2025
… kernels. (foundation-model-stack#530)

* save peft

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* post process hf converted dir

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: convert hf converted checkpoint

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* lora config

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* save adapter config

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: add input linear and output linear to target modules

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: extend instead of append

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: if hasattr peft config

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: remove unneeded target modules

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* test: lora for scattermoe

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* explitcitly don't support router layer

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* docs: update documentation

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: simplify accelerate launch post processing

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* tests: more target modules + ep_degree

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: only restrict all-linear, raise warning for other modules

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: augmentation test

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: raise error

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* turn off requires grad if using scattermoe with lora

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: freeze scattermoe params

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* fix: safer freezing

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

---------

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants