Multi turn rollout by tastelikefeet · Pull Request #193 · modelscope/twinkle

tastelikefeet · 2026-05-18T09:05:17Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Features

Add twinkle_agent package to hold multi-turn rollouts and tool callings
Add notifier to notify user when training is failed
Add GRPO entropy loss/ref logps loss
Add GRPO series metrics
Add a patch to qwen3 series models to fix a bug that special tokens will cause jinja encode error
Support sample with base model when enable_lora is True
Support lora model id used in sampling
Support Qwen models tool parsing and cleaning
Support selective_log_softmax returns entropy
Support api/vllm multi turn rollouts and trace saving
Support tool manager
[Experimental] Support passage chunker and condenser and tool

Bug Fix

Fix dataset multi-process preprocessing
Fix a bug that mm model templates cause cache failed

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist

Code Review

This pull request introduces a comprehensive framework for multi-turn agentic rollouts with trajectory compression, including a new MultiTurnCondenseRollout class, chunking utilities, and a keyword-based condenser. It also adds support for per-token oracle bonuses, entropy bonuses, and improved logging diagnostics for GRPO training. Additionally, it patches the Qwen3 chat template to resolve parsing issues with orphan </think> tags and includes a dataset builder for condensed SFT training.

…ti-turn-rollout # Conflicts: # src/twinkle/template/__init__.py # src/twinkle/template/base.py # src/twinkle/utils/transformers_utils.py

hjh0119 · 2026-05-18T13:44:30Z

-        clamped_ratios = torch.clamp(ratio, max=1 + self.epsilon).detach()
+
+        # Two-sided IS clamp with asymmetric epsilon, matching MiniMax CISPO spec.
+        clamped_ratios = torch.clamp(ratio, min=1 - self.epsilon, max=1 + self.epsilon_high).detach()


hjh0119 · 2026-05-18T13:47:57Z

        per_token_logps = torch.stack(per_token_logps)
+        if return_entropy:
+            return per_token_logps, torch.stack(per_token_entropy)
    return per_token_logps


Suggest unifying the return type: when return_entropy=False,
return (per_token_logps, None) instead of just per_token_logps.

tpx818 · 2026-05-18T15:37:33Z

+        self.trace_dir = trace_dir
+        self.trace_callback = trace_callback
+        self.success_callback = success_callback
+        os.makedirs(self.trace_dir, exist_ok=True)


这个操作是否放在call中if self.trace_dir后处理

tastelikefeet added 30 commits May 9, 2026 11:52

wip

6aade99

wip

99394a2

wip

27cd090

fix

9e31c07

fix

33b8b32

fix

bbed39d

fix

504cfa0

fix

2393272

fix

5b731ea

fix

7576ef7

fix

eb85331

fix

1c0a093

fix

af4a892

fix

04565b6

wip

95d47f4

fix

88ceb1d

fix

e14e582

fix

56182f3

fix

e4dee4a

fix

f728a8d

fix

1ee5235

fix

2bfda3d

fix

b6f6b8b

fix

73d828b

fix

7cb1845

fix

34e6b44

fix

ce46d94

fix

e0e836e

fix

5ab035b

fix

e265980

tastelikefeet added 8 commits May 17, 2026 19:59

fix

d1da15d

fix

dd03790

fix

f8c7129

fix

519afd7

fix

aba84b2

fix

ea32a03

revert files

c357b83

revert files

a9dad48

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

tastelikefeet added 16 commits May 18, 2026 19:58

fix

8dca215

fix

f299ae4

fix

9494a6c

fix

bfe3838

fix

75484e2

fix

573812b

fix

200bb57

fix

b0d0fe2

fix

1bd27e6

fix

52bcf0e

fix

2260906

fix

3c8f04e

fix

f30ffe8

lint

b8f0f0a

Merge commit '513a625b913790a3cfb1d3bf8b706dc44a1f89a4' into feat/mul…

46f38e0

…ti-turn-rollout # Conflicts: # src/twinkle/template/__init__.py # src/twinkle/template/base.py # src/twinkle/utils/transformers_utils.py

lint code

f98c9ea

tastelikefeet changed the title ~~[WIP]Multi turn rollout~~ Multi turn rollout May 18, 2026

hjh0119 approved these changes May 18, 2026

View reviewed changes

tpx818 reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi turn rollout#193

Multi turn rollout#193
tastelikefeet wants to merge 54 commits into
modelscope:mainfrom
tastelikefeet:feat/multi-turn-rollout

tastelikefeet commented May 18, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

hjh0119 May 18, 2026

Uh oh!

hjh0119 May 18, 2026

Uh oh!

tpx818 May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tastelikefeet commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Features

Bug Fix

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

hjh0119 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

hjh0119 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

tpx818 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tastelikefeet commented May 18, 2026 •

edited

Loading