Skip to content

Switch to use the new scheduler & remove old unused Scheduler & Launcher logic#172

Open
AlexJones0 wants to merge 10 commits intolowRISC:masterfrom
AlexJones0:switch_scheduler
Open

Switch to use the new scheduler & remove old unused Scheduler & Launcher logic#172
AlexJones0 wants to merge 10 commits intolowRISC:masterfrom
AlexJones0:switch_scheduler

Conversation

@AlexJones0
Copy link
Copy Markdown
Contributor

@AlexJones0 AlexJones0 commented Apr 6, 2026

Note: this PR is currently a draft as it depends on #170, #135, and #173, which have not yet been merged; the first commit is from #170, the 2nd-4th commits are from #135, and the 5th commit is from #173, all of which can be safely ignored. Only the last 5 commits are relevant. It is otherwise ready to review.

This PR is the twentieth of a series of PRs to rewrite DVSim's core scheduling functionality (Scheduler, status display, launchers / runtime backends) to use an async design, with key goals of long term maintainability and extensibility.

This PR makes the major change to switch to the new scheduler that was previously behind the --experimental-enable-async-scheduler flag, and removes the old scheduler and launcher logic that has been ported. From this point onwards, the new scheduler will be the default when using DVSim.

Note: ideally we would want to test the new scheduler a bit more thoroughly with general usage before this PR is merged, to give some confidence. Also, it is an open question whether it is fine to switch like this, or whether we should still support the old scheduler for some time through an optional flag. This would likely depend on perceived stability.

This commit ports the FakeLauncher over to the RuntimeBackend async
interface to allow faking jobs via the new interface.

Additionally, by defining a method for attaching a "fake policy" to the
backend, flows can tell the fake backend "how" it should fake their
jobs, without the backend itself needing to know any of the internal
details (e.g. job types).

The default/base flow just always returns PASSED since it can't make
assumptions, but the sim flow can randomly choose between PASSED and
FAILED for its sim results. Furthermore, because we can use the callback
to access the deploy object, we can work around the hacky coupling of
the `cov_results_dict` results existing on the Deploy object for now by
manually finding the corresponding Deploy object via a search and
injecting some faked coverage results. This is a hack, but fixes the
immediate issue and can be resolved with a more widespread refactor
later on.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
The new scheduler uses an async model, so it's helpful for testing to
pull in the asyncio pytest plugin.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit performs the changes necessary to port the scheduler tests
to use the new async scheduler. This involves:
- Creating a Mock RuntimeBackend. For now, to keep changes minimal and
  simple, we just use the LauncherAdapter with the MockLauncher. In the
  future it would be nice to make a mock RuntimeBackend as well though.
- Mark all the tests as being asyncio with async def and use the new
  scheduler interface.
- Update a couple of tests that were weirdly constructed (e.g. in terms
  of targets/ordering) due to constraints of the old scheduler.

With these changes, _all_ scheduler tests are now passing with the new
async scheduler across multiple iterations.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Extend an existing test case for launcher / runtime backend parallelism
to be able to also consider global scheduler-level parallelism.

Introduce a new test to check that we can provide a custom
prioritization function, and that jobs are indeed scheduled according to
the priorities assigned by it if not blocked.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
@AlexJones0 AlexJones0 changed the title Switch to use the new scheduler, and remove old unused Scheduler & Launcher logic Switch to use the new scheduler & remove old unused Scheduler & Launcher logic Apr 6, 2026
This was being applied in the command-line interface to the launchers,
but the relevant logic was not ported to the `RuntimeBackend` base class
when it was introduced.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit makes the full transition to use the new async scheduler in
place of the old scheduler, instead of hiding it behind an experimental
environment variable.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
These have now been replaced by equivalent / improved asynchronous
interfaces.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Now that the async scheduler and status printer have been made the
defaults, and the old versions have been removed, rename the
`async_core` and `async_status_printer` modules to just be `core` and
`status_printer` as before, respectively.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This `Timer` class was only used by the scheduler and is not
particularly useful, especially in async contexts. Deprecate and remove
this util module.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Now that everything goes through the Runtime Backend registry instead,
there is no need for the Launcher factory to exist any more, as it has
been superseded. Likewise, the LocalLauncher and FakeLauncher have been
replaced by the LocalRuntimeBackend and FakeRuntimeBackend respectively
- as such, these legacy launchers can also be removed.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Copy link
Copy Markdown
Collaborator

@machshev machshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AlexJones0! Really good work, this is a huge improvement overall.

In terms of migration I'd suggest merging this sooner rather than later. Opentitan and Mocha both use pinned versions of the package, so I don't think there is much risk in releasing. By default the OT nightly regressions will run the latest package and we can use that as a final check (it's more difficult to do this with a feature flag, although still possible). If there are issues then we can pin the nightly to the last known good and focus on patching any issues.

Personally I'd opt for the moving fast approach here as trying to maintain two schedulers in parallel is likely to introduce more risk than just switching. The OT and mocha pinned version of DVSim can be updated after a successful nightly regression run, potentially delayed a couple of days if people want to test it out locally themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants