GRPO loss #454

jlamypoirier · 2026-01-20T14:55:59Z

✨ Description

rafapi · 2026-01-20T18:12:51Z

fast_llm/layers/language_model/loss/config.py

+
+        # Policy loss
+        # TODO: advantages or rewards?
+        log_ratio_old = torch.exp(target_logprobs - old_logprobs)


this would read better as ratio_new_old = torch.exp(target_logprobs - old_logprobs)

rafapi · 2026-01-20T18:13:38Z

fast_llm/layers/language_model/loss/config.py

+        target_logprobs = torch.gather(logprobs, dim=2, index=labels.unsqueeze(2)).squeeze(2)
+
+        # Policy loss
+        # TODO: advantages or rewards?


answer: advantages

rafapi · 2026-01-20T18:14:37Z

fast_llm/layers/language_model/loss/config.py

+        )
+
+        # TODO: tokens_weights = 1/batch_size ?
+        # TODO: Reduce loss?


need to sum over tokens and apply mask

rafapi · 2026-01-20T18:15:12Z

fast_llm/layers/language_model/loss/config.py

+            torch.clamp(log_ratio_old, 1 - self.epsilon_low, 1 + self.epsilon_high) * advantage,
+        )
+
+        # TODO: tokens_weights = 1/batch_size ?


i think so, we do that for the simple case

rafapi · 2026-01-20T18:15:56Z

fast_llm/layers/language_model/loss/config.py

+
+        # TODO: tokens_weights = 1/batch_size ?
+        # TODO: Reduce loss?
+        loss = loss / batch_size  # 1 x (BxL) x 1


loss = -loss - we want to maximise the objective

jlamypoirier added 4 commits January 20, 2026 09:54

GRPO loss

baa0944

GRPO loss

966e151

simplify

7b24f94

simplify

978be16

rafapi reviewed Jan 20, 2026

View reviewed changes

jlamypoirier added 6 commits January 20, 2026 13:17

simplify

9d25147

Loss class

58a3191

Loss class

5f245a8

stuff

e10cf4d

Merge branch 'jlp_entropy_loss' into jlp_grpo

58f1316

fixes

ed57346

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO loss #454

GRPO loss #454

jlamypoirier commented Jan 20, 2026

Uh oh!

rafapi Jan 20, 2026

Uh oh!

rafapi Jan 20, 2026

Uh oh!

rafapi Jan 20, 2026

Uh oh!

rafapi Jan 20, 2026

Uh oh!

rafapi Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GRPO loss #454

Are you sure you want to change the base?

GRPO loss #454

Conversation

jlamypoirier commented Jan 20, 2026

✨ Description

Uh oh!

rafapi Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

rafapi Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

rafapi Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

rafapi Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

rafapi Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants