-
Notifications
You must be signed in to change notification settings - Fork 65
feat: Enable LoRA saving only for non MoE linear layers training with kernels. #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enable LoRA saving only for non MoE linear layers training with kernels. #530
Conversation
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
|
Thanks for making a pull request! 😃 |
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
6b70cc8 to
da81f93
Compare
| model = self.trainer.model | ||
| if hasattr(model, "module"): | ||
| model = model.module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed for non lora use case? since after this step it would directly reach the below block
else:
model.config.save_pretrained(hf_converted_output_dir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed to determine if there is a peft config for the following if statement, but in fine-tuning case this if statement should be false
|
@willmj thanks for diligently taking my inputs, after addressing my comments, I request you to run all tests related to moe kernels on your local setup and paste screenshots here for record. Thanks, appreciate this PR! |
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
| model, train_args, modifiable_args=(peft_config,) | ||
| ) | ||
| # For LoRa ScatterMoE, if expert layers are included, disable grad | ||
| if peft_config is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its better to just do an instance check using ScatterMoE and then freeze everything inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
fabianlim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since @kmehant is not around I approve it. but TBH im not that familiar with the recent updates in this repo. But overall looks ok. Just one more comment
| model, train_args, modifiable_args=(peft_config,) | ||
| ) | ||
| # For LoRa ScatterMoE, if expert layers are included, disable grad | ||
| if peft_config is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should put a comment that this is a workaround, and in the future where ScatterMoE loras can be tuned, this code needs to be removed.
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
… kernels. (#530) * save peft Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * post process hf converted dir Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: convert hf converted checkpoint Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * lora config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * save adapter config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: add input linear and output linear to target modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: extend instead of append Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: if hasattr peft config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: remove unneeded target modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * test: lora for scattermoe Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * explitcitly don't support router layer Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * docs: update documentation Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: simplify accelerate launch post processing Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * tests: more target modules + ep_degree Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: only restrict all-linear, raise warning for other modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: augmentation test Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: raise error Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * turn off requires grad if using scattermoe with lora Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: freeze scattermoe params Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: safer freezing Signed-off-by: Will Johnson <mwjohnson728@gmail.com> --------- Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
… kernels. (foundation-model-stack#530) * save peft Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * post process hf converted dir Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: convert hf converted checkpoint Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * lora config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * save adapter config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: add input linear and output linear to target modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: extend instead of append Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: if hasattr peft config Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: remove unneeded target modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * test: lora for scattermoe Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * explitcitly don't support router layer Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * docs: update documentation Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: simplify accelerate launch post processing Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * tests: more target modules + ep_degree Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: only restrict all-linear, raise warning for other modules Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: augmentation test Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: raise error Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * turn off requires grad if using scattermoe with lora Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: freeze scattermoe params Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: safer freezing Signed-off-by: Will Johnson <mwjohnson728@gmail.com> --------- Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Description of the change
A more limited version of #523 to enable LoRA but explicitly block off router linear and expert layers.
Related issue number
How to verify the PR
Was the PR tested