Skip to content

Conversation

@garrett4wade
Copy link
Collaborator

@garrett4wade garrett4wade commented Dec 23, 2025

Description

The rollout controller should maintain a globally unique task_id for request tracing. To implement this feature, the following changes are made:

  • perf_tracer.register_task() no longer produces task_id and only accepts task_id parameter for session registeration
  • Add _register_task() method with sequential counter to WorkflowExecutor and RolloutController
  • Add optional task_id parameter to submit() in InferenceEngine, RemoteInfEngine, RemoteSGLangEngine, RemotevLLMEngine, and RolloutController
  • Remove uuid-based task_id generation in favor of centralized sequential IDs because 128 bit UUID integers cannot be serialized by orjson and is hard to read

In the SPMD mode, task IDs are generated by the workflow executor, and its value is the same as the one generated by perf_tracer. In the single-controller mode, the ID is generated by the rollout controller monotonically from zero. It will be passed to engines and workflow executors. No breaking change is involved in this PR.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

- Change perf_tracer.register_task() to accept task_id parameter
- Add _register_task() method with sequential counter to WorkflowExecutor and RolloutController
- Add optional task_id parameter to submit() in InferenceEngine, RemoteInfEngine, RemoteSGLangEngine, RemotevLLMEngine, and RolloutController
- Remove uuid-based task_id generation in favor of centralized sequential IDs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @garrett4wade, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the internal mechanism for generating and managing task_ids within the system. The primary goal is to establish a globally unique and sequential task_id for request tracing, enhancing consistency and simplifying debugging. This is achieved by centralizing the generation logic and modifying various submission points to either accept a pre-generated task_id or utilize the new sequential counter. The change also resolves practical issues related to UUID serialization and improves the overall clarity of task identification.

Highlights

  • Centralized Task ID Generation: The system now uses a single source for generating task_id values, moving away from distributed UUID generation to a centralized sequential counter within WorkflowExecutor and RolloutController.
  • Updated perf_tracer.register_task(): The perf_tracer.register_task() function no longer generates a task_id but instead accepts an existing task_id as a parameter for session registration, reflecting the new centralized generation approach.
  • New _register_task() Method: A new private method, _register_task(), has been introduced in WorkflowExecutor and RolloutController to handle the sequential generation of unique task_ids.
  • Optional task_id in submit() Methods: The submit() methods across InferenceEngine, RemoteInfEngine, RemoteSGLangEngine, and RemotevLLMEngine have been updated to accept an optional task_id parameter, allowing for external provision of task identifiers.
  • UUID Removal for Task IDs: UUID-based task_id generation has been removed due to issues with orjson serialization of 128-bit integers and to improve readability, favoring sequential integer IDs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to explicitly pass a task_id through various submit methods across the engine_api, remote_inf_engine, workflow_executor, rollout_controller, sglang_remote, and vllm_remote modules. If a task_id is not provided, a new sequential ID is generated internally using a dedicated counter (_task_cnt) in both RolloutController and WorkflowExecutor, replacing the previous uuid based generation. The perf_tracer module was updated to accept an existing task_id for registration instead of generating its own. Review comments highlighted the duplication of the _register_task method's logic between RolloutController and WorkflowExecutor, suggesting a common mixin or helper, and recommended using a dedicated lock for the task counter in RolloutController for better separation of concerns. Additionally, a review comment proposed encapsulating the perf_tracer.register_task call within the _register_task method in WorkflowExecutor to streamline task ID generation and registration, and adjusted the call site in the submit method accordingly.

@garrett4wade garrett4wade added the safe-to-test Ready to run unit-tests in a PR. label Dec 23, 2025
@garrett4wade garrett4wade added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 23, 2025
@garrett4wade garrett4wade added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 24, 2025
Copy link
Collaborator

@rchardx rchardx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rchardx rchardx merged commit 42b9244 into main Dec 24, 2025
7 checks passed
@rchardx rchardx deleted the fw/task_id branch December 24, 2025 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Ready to run unit-tests in a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants