Skip to content

Python: Add checkpoint save and restore hooks to executor#2097

Merged
TaoChenOSU merged 11 commits intomicrosoft:mainfrom
TaoChenOSU:taochen/python-add-checkpoint-save-and-restore-hooks
Nov 17, 2025
Merged

Python: Add checkpoint save and restore hooks to executor#2097
TaoChenOSU merged 11 commits intomicrosoft:mainfrom
TaoChenOSU:taochen/python-add-checkpoint-save-and-restore-hooks

Conversation

@TaoChenOSU
Copy link
Contributor

@TaoChenOSU TaoChenOSU commented Nov 11, 2025

Motivation and Context

Closes #1816

Description

This PR adds the following:

  1. on_checkpoint_save and on_checkpoint_restore contracts on the executor base class that will get invoked when creating a checkpoint and restoring from a checkpoint. This will allow users to have a clear contract to save executor states for checkpointing.
  2. Backward compatibility is preserved.
  3. Tests + sample
  4. Two new event types: SuperStepStartedEvent and SuperStepCompletedEvent

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@TaoChenOSU TaoChenOSU self-assigned this Nov 11, 2025
@TaoChenOSU TaoChenOSU added python workflows Related to Workflows in agent-framework labels Nov 11, 2025
@TaoChenOSU TaoChenOSU moved this to In Progress in Agent Framework Nov 11, 2025
@github-actions github-actions bot changed the title Add checkpoint save and restore hooks to executor Python: Add checkpoint save and restore hooks to executor Nov 11, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new checkpoint state management contract for executors by adding on_checkpoint_save() and on_checkpoint_restore() hooks to the Executor base class. This replaces the pattern of using ctx.get_executor_state() and ctx.set_executor_state() for managing executor state during checkpointing.

Key Changes:

  • New checkpoint hooks (on_checkpoint_save and on_checkpoint_restore) on the Executor base class
  • Deprecation of set_executor_state and get_executor_state methods in WorkflowContext
  • New workflow events (SuperStepStartedEvent and SuperStepCompletedEvent) for tracking superstep boundaries
  • Backward compatibility maintained by supporting both old (snapshot_state/restore_state) and new checkpoint methods

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
_executor.py Added on_checkpoint_save() and on_checkpoint_restore() hooks to Executor base class with documentation
_runner.py Updated checkpoint save/restore logic to call new hooks while maintaining backward compatibility; added superstep event emission
_workflow_context.py Deprecated set_executor_state() and get_executor_state() methods with migration guidance
_events.py Added SuperStepEvent, SuperStepStartedEvent, and SuperStepCompletedEvent classes; simplified ExecutorInvokedEvent and ExecutorCompletedEvent
_workflow_executor.py Refactored WorkflowExecutor to use new checkpoint hooks instead of manual state loading; removed _state_loaded flag
_agent_executor.py Updated AgentExecutor to use on_checkpoint_save() and on_checkpoint_restore()
_magentic.py Converted all magentic executors from snapshot_state/restore_state to new checkpoint hooks
_handoff.py Updated HandoffCoordinator to use base class checkpoint methods via pattern metadata hooks
_base_group_chat_orchestrator.py Updated base orchestrator to use new checkpoint hooks
__init__.py / __init__.pyi Exported new SuperStep event types
Sample files Updated all checkpoint samples to demonstrate new executor state management pattern
Test files Updated tests to use new checkpoint hook names; converted sync test to async

@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Nov 12, 2025

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework/_workflows
   _agent_executor.py1532782%25, 92, 104–106, 131–132, 134–135, 150–151, 202–203, 205–206, 237–239, 249–251, 253, 257, 261, 265–266, 287
   _base_group_chat_orchestrator.py78889%18, 83, 98, 174, 184, 191, 247, 274
   _conversation_history.py11190%20
   _events.py1281489%59–60, 78, 86, 90, 180–181, 232, 257, 294, 312, 337, 376, 388
   _executor.py141695%208, 324, 339, 341, 456, 466
   _handoff.py47713671%55, 68–70, 77–78, 80, 82, 156, 164–169, 172–173, 187, 196, 215–218, 227–229, 240, 243, 253–264, 266, 272, 278, 308, 353, 364–366, 422, 431–436, 438–439, 450, 467, 485, 509, 537, 549–551, 557, 581–583, 586–589, 591–593, 827, 833, 837, 843, 847, 859, 907, 910, 998, 1003, 1013, 1019–1022, 1030–1031, 1035–1037, 1039–1049, 1051–1052, 1054, 1056, 1071–1072, 1075–1076, 1079, 1099–1105, 1107, 1113, 1147–1148, 1201–1202, 1384, 1392, 1460, 1473, 1477, 1482–1483
   _magentic.py89625671%48, 53, 75–84, 89, 93–104, 328, 333, 350, 352, 367, 375–384, 462, 466, 480, 486, 501, 581, 594, 611, 620–621, 623–625, 627, 638, 780–783, 786–790, 792–794, 801, 840, 887, 923–925, 927, 935–938, 942–945, 1009, 1067–1068, 1085, 1087–1088, 1096, 1132, 1141–1143, 1163, 1216, 1236, 1239, 1268, 1271, 1279–1283, 1289, 1317–1319, 1321, 1323, 1331–1334, 1336, 1340–1341, 1344–1347, 1349–1350, 1356–1358, 1361–1362, 1367–1368, 1376, 1384, 1399, 1411, 1423–1426, 1455–1456, 1461–1463, 1494, 1520, 1535, 1551, 1568, 1635, 1642, 1645, 1647–1648, 1651–1652, 1656, 1659, 1680, 1715–1716, 1718, 1722–1724, 1739, 1750, 1760, 1804, 1809–1810, 2136–2137, 2141, 2156, 2161, 2164, 2218, 2229, 2240–2242, 2255–2256, 2261, 2272–2274, 2285–2287, 2299–2306, 2308–2309, 2317, 2325–2326, 2328–2330, 2332–2335, 2339–2347, 2351–2352, 2355–2359, 2361–2362, 2364–2366, 2368–2371, 2373–2375, 2377–2378, 2380–2382, 2397–2400, 2411–2414, 2426–2429, 2433
   _runner.py2153683%112, 114–117, 159–162, 166, 206–208, 230–231, 233–234, 268–270, 294–298, 302, 337, 341, 343, 347, 355–358, 371, 411
   _workflow.py2481892%94, 267–269, 271–272, 290, 314, 316, 409, 658, 692, 697, 700, 719–721, 786
   _workflow_context.py1642684%60–61, 69, 73, 87, 163, 188, 295, 397, 406, 411, 424–426, 428, 430–431, 433–434, 443–445, 447–449, 451
   _workflow_executor.py1604472%30, 96, 384, 401, 405, 411, 415, 426, 430, 450, 462–465, 468–470, 473–474, 476, 479–481, 484–488, 492–493, 502, 507, 541, 565–570, 573, 576, 584, 589, 600
TOTAL14616209685% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
2027 126 💤 0 ❌ 0 🔥 38.011s ⏱️

@TaoChenOSU TaoChenOSU requested a review from moonbox3 November 13, 2025 19:40
@TaoChenOSU TaoChenOSU enabled auto-merge November 17, 2025 17:58
@TaoChenOSU TaoChenOSU added this pull request to the merge queue Nov 17, 2025
Merged via the queue into microsoft:main with commit c361ad8 Nov 17, 2025
23 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Agent Framework Nov 17, 2025
arisng pushed a commit to arisng/agent-framework that referenced this pull request Feb 2, 2026
…2097)

* Add checkpoint hooks

* Deprecate get_executor_state and set_executor_state

* Fix tests and samples

* Add doc strings

* Add sample

* Fix import

* Address comments and fix tests

* Address comments

* conditional import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python workflows Related to Workflows in agent-framework

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Python: Save & Restore states hooks for executors

5 participants