[Feat][1/N] support async_rl in replaybuffer by YanhuiDua · Pull Request #1337 · InternLM/xtuner

YanhuiDua · 2025-12-05T09:56:45Z

This PR introduces asynchronous RL support to the replay buffer system, enabling partial rollouts and version-based sample management for more efficient training data generation. This is the first part of a multi-part feature implementation.

Key changes:

Added async-related configuration parameters including partial_rollout, tail_batch_candidate_steps, tail_batch_trigger_size and staleness_threshold

staleness_threshold: The maximum allowed threshold of stale (expired) samples in a training batch. Must be between 0.0 and 1.0.
enable_partial_rollout: Whether to enable partial rollout for asynchronous data generation.
tail_batch_candidate_steps: Number of rollout steps after which a sample becomes a candidate for the tail batch. Set to 0 to disable. 0 means no tail batch.
tail_batch_trigger_size: Number of candidate samples needed in the queue to trigger a tail batch operation. It will be set to global_batch_size when not provided by user

Refactored replay buffer storage to support versioned samples with bucketed tracking of completed, aborted, and expired states
Renamed Sampler to DatasetSampler and separated dataset sampling logic from replay buffer sampling

Copilot

Pull request overview

This PR introduces asynchronous RL support to the replay buffer system, enabling partial rollouts and version-based sample management for more efficient training data generation. This is the first part of a multi-part feature implementation.

Key changes:

Refactored replay buffer storage to support versioned samples with bucketed tracking of completed, aborted, and expired states
Renamed Sampler to DatasetSampler and separated dataset sampling logic from replay buffer sampling
Added async-related configuration parameters including partial_rollout, tail_batch_candidate_steps, and staleness_threshold

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File	Description
xtuner/v1/ray/dataflow/replay_buffer.py	Major refactoring: added version tracking to ReplayMeta, introduced bucketed storage for different sample states, renamed and split Sampler class, updated storage management methods
xtuner/v1/ray/dataflow/flow.py	Added async-related config parameters, updated DataFlow initialization to pass async configs to replay buffer, renamed `_reset_internal_states` to `_prepare` with prerun state fetching

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xtuner/v1/ray/dataflow/replay_buffer.py

…nd storage

…orage

YanhuiDua requested a review from Copilot December 5, 2025 09:57

Copilot started reviewing on behalf of YanhuiDua December 5, 2025 09:57 View session

Copilot finished reviewing on behalf of YanhuiDua December 5, 2025 10:01

Copilot AI reviewed Dec 5, 2025

View reviewed changes

YanhuiDua force-pushed the support_async_rl branch 5 times, most recently from 2191802 to 87c9c14 Compare December 8, 2025 05:39

[Feat][1/N] support async_rl in replaybuffer by refactoring sampler a…

6cff996

…nd storage

YanhuiDua force-pushed the support_async_rl branch from 87c9c14 to 2f6fc68 Compare December 8, 2025 07:43

[Feat][1/N] support async_rl in replaybuffer by supporting expired st…

4c6d2fc

…orage

YanhuiDua force-pushed the support_async_rl branch from 2f6fc68 to 4c6d2fc Compare December 8, 2025 07:52

YanhuiDua mentioned this pull request Dec 15, 2025

[Feature] support async rl #1360

Merged

YanhuiDua closed this Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Feat][1/N] support async_rl in replaybuffer#1337

[Feat][1/N] support async_rl in replaybuffer#1337
YanhuiDua wants to merge 2 commits intoInternLM:mainfrom
YanhuiDua:support_async_rl

YanhuiDua commented Dec 5, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

YanhuiDua commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YanhuiDua commented Dec 5, 2025 •

edited

Loading