Switch to use the new scheduler & remove old unused Scheduler & Launcher logic#172
Switch to use the new scheduler & remove old unused Scheduler & Launcher logic#172AlexJones0 wants to merge 10 commits intolowRISC:masterfrom
Conversation
This commit ports the FakeLauncher over to the RuntimeBackend async interface to allow faking jobs via the new interface. Additionally, by defining a method for attaching a "fake policy" to the backend, flows can tell the fake backend "how" it should fake their jobs, without the backend itself needing to know any of the internal details (e.g. job types). The default/base flow just always returns PASSED since it can't make assumptions, but the sim flow can randomly choose between PASSED and FAILED for its sim results. Furthermore, because we can use the callback to access the deploy object, we can work around the hacky coupling of the `cov_results_dict` results existing on the Deploy object for now by manually finding the corresponding Deploy object via a search and injecting some faked coverage results. This is a hack, but fixes the immediate issue and can be resolved with a more widespread refactor later on. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
The new scheduler uses an async model, so it's helpful for testing to pull in the asyncio pytest plugin. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit performs the changes necessary to port the scheduler tests to use the new async scheduler. This involves: - Creating a Mock RuntimeBackend. For now, to keep changes minimal and simple, we just use the LauncherAdapter with the MockLauncher. In the future it would be nice to make a mock RuntimeBackend as well though. - Mark all the tests as being asyncio with async def and use the new scheduler interface. - Update a couple of tests that were weirdly constructed (e.g. in terms of targets/ordering) due to constraints of the old scheduler. With these changes, _all_ scheduler tests are now passing with the new async scheduler across multiple iterations. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Extend an existing test case for launcher / runtime backend parallelism to be able to also consider global scheduler-level parallelism. Introduce a new test to check that we can provide a custom prioritization function, and that jobs are indeed scheduled according to the priorities assigned by it if not blocked. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This was being applied in the command-line interface to the launchers, but the relevant logic was not ported to the `RuntimeBackend` base class when it was introduced. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit makes the full transition to use the new async scheduler in place of the old scheduler, instead of hiding it behind an experimental environment variable. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
These have now been replaced by equivalent / improved asynchronous interfaces. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Now that the async scheduler and status printer have been made the defaults, and the old versions have been removed, rename the `async_core` and `async_status_printer` modules to just be `core` and `status_printer` as before, respectively. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This `Timer` class was only used by the scheduler and is not particularly useful, especially in async contexts. Deprecate and remove this util module. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Now that everything goes through the Runtime Backend registry instead, there is no need for the Launcher factory to exist any more, as it has been superseded. Likewise, the LocalLauncher and FakeLauncher have been replaced by the LocalRuntimeBackend and FakeRuntimeBackend respectively - as such, these legacy launchers can also be removed. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
c0e6b1f to
bd6d6f4
Compare
machshev
left a comment
There was a problem hiding this comment.
Thanks @AlexJones0! Really good work, this is a huge improvement overall.
In terms of migration I'd suggest merging this sooner rather than later. Opentitan and Mocha both use pinned versions of the package, so I don't think there is much risk in releasing. By default the OT nightly regressions will run the latest package and we can use that as a final check (it's more difficult to do this with a feature flag, although still possible). If there are issues then we can pin the nightly to the last known good and focus on patching any issues.
Personally I'd opt for the moving fast approach here as trying to maintain two schedulers in parallel is likely to introduce more risk than just switching. The OT and mocha pinned version of DVSim can be updated after a successful nightly regression run, potentially delayed a couple of days if people want to test it out locally themselves.
Note: this PR is currently a draft as it depends on #170, #135, and #173, which have not yet been merged; the first commit is from #170, the 2nd-4th commits are from #135, and the 5th commit is from #173, all of which can be safely ignored. Only the last 5 commits are relevant. It is otherwise ready to review.
This PR is the twentieth of a series of PRs to rewrite DVSim's core scheduling functionality (Scheduler, status display, launchers / runtime backends) to use an async design, with key goals of long term maintainability and extensibility.
This PR makes the major change to switch to the new scheduler that was previously behind the
--experimental-enable-async-schedulerflag, and removes the old scheduler and launcher logic that has been ported. From this point onwards, the new scheduler will be the default when using DVSim.Note: ideally we would want to test the new scheduler a bit more thoroughly with general usage before this PR is merged, to give some confidence. Also, it is an open question whether it is fine to switch like this, or whether we should still support the old scheduler for some time through an optional flag. This would likely depend on perceived stability.