Enable unsloth as training backend #667

FilippoBoni1921 · 2025-12-26T09:48:40Z

Description

This PR has the goal to select unsloth as training backend in order to improve GPU usage efficiency

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

CLAassistant · 2025-12-26T09:48:47Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

willccbb · 2026-01-03T04:23:36Z

Definitely want to have a nice unsloth integration, and potentially open to adding something like this, but could we restructure this PR in a way where it's less breaking/intrusive?

We shouldn't add unsloth as required dependency, or remove the existing 'rl' dependencies (maybe use a new dependency group?); config classes should live in the same 'rl' subfolder and follow the established style/dev patterns.

In general, we're not planning on doing many feature changes to the included RLTrainer, which is intended as a "minimal demo trainer" that people can easily read + modify -- for performance + customizability, we generally recommend https://github.com/PrimeIntellect-ai/prime-rl, but it doesn't have all of the fancy Unsloth features for quantized models (and likely won't any time soon, below fp8 at least).

Also confused by some of the other changes, which seem unrelated (e.g. https protocol option)?

FilippoBoni1921 · 2026-01-07T21:24:19Z

Thank you @willccbb for your reply! I know the PR was not very well structured. I did it fastly and started it as a draft.
I moved the UnslothConfig under the 'rl' subfolder and fixed the dependencies.
In my opinion creating a 'config' subfolder can be potentially useful for future extension eg a config dedicated to vllm args. For this reason I kept the UnslothConfig sperated in a different file, but let me know what you think.
In addition let me knoe what do you think about the methods to define the unsloth model. I tried to make them similar to the ones already existing. Unfortunately I had to insert them in a 'if/else' loop for the differences between 'trl' and 'unsloth'.
In conclusion https protocol option was a little experiment I have removed, that was unrelated.

Waiting for your feedback. Let me know if I can help in other issue for 'verifiers' or 'prime-rl'

FilippoBoni1921 and others added 4 commits November 24, 2025 14:46

add https protocol (#2)

ee75911

fix bug (#4)

58935cd

Merge branch 'PrimeIntellect-ai:main' into main

f1aad27

start unsloth config

78a231c

FilippoBoni1921 changed the title ~~6 enable unsloth as training backend~~ Enable unsloth as training backend Dec 26, 2025

FilippoBoni1921 added 2 commits December 26, 2025 11:54

add unsloth config

1f7286d

add unsloth usage

58fdef0

FilippoBoni1921 added 3 commits January 7, 2026 22:14

move unsloth config

c43d6ac

remove http experiment

8d62f31

modify dependencies

e8953a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable unsloth as training backend #667

Enable unsloth as training backend #667

Uh oh!

FilippoBoni1921 commented Dec 26, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Dec 26, 2025

Uh oh!

willccbb commented Jan 3, 2026

Uh oh!

FilippoBoni1921 commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable unsloth as training backend #667

Are you sure you want to change the base?

Enable unsloth as training backend #667

Uh oh!

Conversation

FilippoBoni1921 commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

CLAassistant commented Dec 26, 2025

Uh oh!

willccbb commented Jan 3, 2026

Uh oh!

FilippoBoni1921 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FilippoBoni1921 commented Dec 26, 2025 •

edited

Loading

FilippoBoni1921 commented Jan 7, 2026 •

edited

Loading