normalize messages from sub-LLM calls to prevent errors #664

snimu · 2025-12-23T18:42:31Z

Description

The sub-LLMs can receive more than just strings from the RLM. This PR introduces a normalization step of these messages, so that sub-LLMs work in all valid cases, which they didn't before.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Improves robustness of sub-LLM chat calls by sanitizing message content formats.

Adds _normalize_message_content() in verifiers/envs/experimental/rlm_env.py to coerce content into API-accepted forms (extract nested {role, content}, wrap {type: ...} content-part objects, fallback wrap unknown dicts)
Uses normalized messages in _call_sub_llm_api() instead of raw messages to avoid malformed payload errors

^{Written by Cursor Bugbot for commit c5b18a8. This will update automatically on new commits. Configure here.}

cursor · 2025-12-23T18:45:30Z

verifiers/envs/experimental/rlm_env.py

+                # Check if content is a nested message dict (has 'role' and 'content' keys)
+                # This happens when model passes message dicts to llm_batch instead of strings
+                if "role" in content and "content" in content:
+                    msg_copy["content"] = content["content"]


Nested content extraction skips further normalization checks

When extracting inner content from a nested message dict (one with both role and content keys), the extracted content["content"] is assigned directly without further normalization. If the inner content is itself a malformed dict (e.g., a content part object with type key, or another nested message), it won't be wrapped in an array or recursively normalized. This means the final content could still be an invalid bare dict, violating the stated invariant that the API expects content to be a string, array of objects, or None.

Seems overly defensive, I've never seen that happen. Also, maybe the models just shouldn't nest too deep, and failing on this edgecase is fine.

normalize messages from sub-LLM calls to prevent errors

c5b18a8

cursor bot reviewed Dec 23, 2025

View reviewed changes

snimu requested a review from willccbb December 23, 2025 18:54

willccbb approved these changes Dec 27, 2025

View reviewed changes

willccbb merged commit 1a428e7 into main Jan 3, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

normalize messages from sub-LLM calls to prevent errors #664

normalize messages from sub-LLM calls to prevent errors #664

Uh oh!

snimu commented Dec 23, 2025 •

edited by cursor bot

Loading

Uh oh!

cursor bot Dec 23, 2025

Uh oh!

snimu Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

normalize messages from sub-LLM calls to prevent errors #664

normalize messages from sub-LLM calls to prevent errors #664

Uh oh!

Conversation

snimu commented Dec 23, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

cursor bot Dec 23, 2025

Choose a reason for hiding this comment

Nested content extraction skips further normalization checks

Uh oh!

snimu Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

snimu commented Dec 23, 2025 •

edited by cursor bot

Loading