-
Notifications
You must be signed in to change notification settings - Fork 466
Enable unsloth as training backend #667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enable unsloth as training backend #667
Conversation
|
|
|
Definitely want to have a nice unsloth integration, and potentially open to adding something like this, but could we restructure this PR in a way where it's less breaking/intrusive? We shouldn't add unsloth as required dependency, or remove the existing 'rl' dependencies (maybe use a new dependency group?); config classes should live in the same 'rl' subfolder and follow the established style/dev patterns. In general, we're not planning on doing many feature changes to the included RLTrainer, which is intended as a "minimal demo trainer" that people can easily read + modify -- for performance + customizability, we generally recommend https://github.com/PrimeIntellect-ai/prime-rl, but it doesn't have all of the fancy Unsloth features for quantized models (and likely won't any time soon, below fp8 at least). Also confused by some of the other changes, which seem unrelated (e.g. https protocol option)? |
|
Thank you @willccbb for your reply! I know the PR was not very well structured. I did it fastly and started it as a draft. Waiting for your feedback. Let me know if I can help in other issue for 'verifiers' or 'prime-rl' |
Description
This PR has the goal to select unsloth as training backend in order to improve GPU usage efficiency
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes