Skip to content

Update JIT stage timing heuristic#3143

Open
timcassell wants to merge 1 commit into
masterfrom
jit-stage-heuristic
Open

Update JIT stage timing heuristic#3143
timcassell wants to merge 1 commit into
masterfrom
jit-stage-heuristic

Conversation

@timcassell
Copy link
Copy Markdown
Collaborator

Bails out after the second invocation if it detects a long-running benchmark using the same calculation as the pilot stage, instead of continuing invokes for 10 seconds.

Fixes #3114

@timcassell timcassell added this to the v0.16.0 milestone May 23, 2026
@timcassell timcassell requested a review from adamsitnik May 23, 2026 11:51
Copy link
Copy Markdown
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @timcassell !

Thank you for addressing my feedback. I haven’t reviewed the original changes in #2806, so I’d like to ask a few questions first to better understand the current design and behavior.

  • Is the new tiering heuristic based on deterministic signals such as JIT events emitted by the runtime or disassembly data, or is it intended to approximate the CLR’s current heuristic?
  • How is this handled across different runtimes? For example, is the new heuristic enabled for AOT and .NET Framework, and how does it behave on older versus newer Mono? Also, do we respect all environment variables that allow users to disable tiered JIT, such as DOTNET_TieredCompilation and COMPlus_TieredCompilation?
  • How does it affect the total time it takes to run a typical micro benchmark?
  • How does this interact with other stages, especially Warmup? For example, if the JITting phase runs long enough to satisfy the warmup heuristic, do we then skip the warmup phase?
  • How can users configure the JITting phase? Do we expose new Job APIs, attributes, and command-line arguments?

Thanks,
Adam

@timcassell
Copy link
Copy Markdown
Collaborator Author

timcassell commented May 25, 2026

  • Is the new tiering heuristic based on deterministic signals such as JIT events emitted by the runtime or disassembly data, or is it intended to approximate the CLR’s current heuristic?

It does not listen to JIT events, it simply queries the values used by the JIT for number of invokes for each stage, with an approximate wait time between JIT stages. You can see how those values are queried in JitInfo.cs.

  • How is this handled across different runtimes? For example, is the new heuristic enabled for AOT and .NET Framework, and how does it behave on older versus newer Mono? Also, do we respect all environment variables that allow users to disable tiered JIT, such as DOTNET_TieredCompilation and COMPlus_TieredCompilation?

It only handles JIT stages in CoreCLR, and when tiering is enabled. In all other runtimes JitInfo.IsTiered returns false. I'm not sure if new Mono has a tiered JIT, I didn't look into it or test it. And yes we respect all env vars and knobs that allow users to control JIT behavior.

  • How does it affect the total time it takes to run a typical micro benchmark?

It depends on how fast the benchmark is. By default, it invokes it 30x per jit stage promotion, with 250ms wait time between stages. On the latest runtime there are 3 calculated stage promotions (it should be 2, but there is a runtime bug that we account for).

  • How does this interact with other stages, especially Warmup? For example, if the JITting phase runs long enough to satisfy the warmup heuristic, do we then skip the warmup phase?

Other stages were left unchanged. Warmup stage has existed since even before tiered JIT was a thing and I didn't want to mess with it. We could definitely improve things there, though it's tracked by #1993.

  • How can users configure the JITting phase? Do we expose new Job APIs, attributes, and command-line arguments?

Via environment variables or MSBuild properties that the runtime exposes. No new APIs were added to BDN for it.

@timcassell
Copy link
Copy Markdown
Collaborator Author

how does it behave on older versus newer Mono?

It only handles JIT stages in CoreCLR

The CoreCLR detection was broken, fixed in #3150.

@adamsitnik
Copy link
Copy Markdown
Member

The CoreCLR detection was broken, fixed in #3150.

Great, thanks for fixing it!

It depends on how fast the benchmark is. By default, it invokes it 30x per jit stage promotion, with 250ms wait time between stages. On the latest runtime there are 3 calculated stage promotions (it should be 2, but there is a runtime bug that we account for).

So for benchmarks that are neither very short or long running, like:

[Benchmark]
public void Sleep() => Thread.Sleep(100);

We are going to spend a lot of time trying to promote them. In this case it was 9+ seconds while the whole benchmarking took 25s.

WorkloadJitting  1: 1 op, 103811800.00 ns, 103.8118 ms/op
WorkloadJitting  2: 30 op, 3247958900.00 ns, 108.2653 ms/op
WorkloadJitting  3: 30 op, 3259087200.00 ns, 108.6362 ms/op
WorkloadJitting  4: 30 op, 3247723900.00 ns, 108.2575 ms/op
WorkloadJitting  5: 1 op, 108950800.00 ns, 108.9508 ms/op

16s -> 25s is quite a lot. Do you see any way to reduce the time it takes to run it?

Via environment variables or MSBuild properties that the runtime exposes. No new APIs were added to BDN for it.

I think it would be nice to offer an ability to disable (or hardcode) it. So for example if I am having plenty of benchmarks similar to the 100ms sleep described above, I can avoid regressing the time it takes to run all of them. Or just have a config that I can use during a conference talk. FWIW the dotnet/performance repo has been using these settings for years.

config.WithJittingCount(1);
[JittingCount(1)]

@timcassell
Copy link
Copy Markdown
Collaborator Author

The only way to reduce the time is to reduce the CallCountThreshold. Originally I had set it to 1 via TC_AggressiveTiering, but Egor explained in that PR why that was a bad idea, so I went with a timeout instead. We already have config options to skip the JIT stage (RunStrategy.Monitoring/ColdStart).

Generally users are most interested in measuring tier1 performance, which is the default Throughput strategy, so it doesn't make sense to not push it all the way there unless the user selects a different run strategy.

config.WithJittingCount(1);
[JittingCount(1)]

What is that supposed to do? Just run a single JIT iteration? I guess it would kinda make sense to have it get to tier0 before the pilot stage runs. But probably it should just be a bool value for whether to go through all the tiers or not, because I don't know what JittingCount(2) would even mean.

@timcassell
Copy link
Copy Markdown
Collaborator Author

timcassell commented May 28, 2026

FWIW the dotnet/performance repo has been using these settings for years.

It seems that with those settings you were actually measuring tier0 performance in the first iterations (unless the pilot stage invoked it enough times to bump it up), and possibly higher tiers in later iterations depending how fast the benchmark is. So each iteration was not measuring the same thing.

But the perf repo is on a version of BDN that has the new JIT stage, so it should always be measuring tier1 now. I don't have access to the data, so I don't know what effect that had on the total runtime and measurement stability.

@timcassell
Copy link
Copy Markdown
Collaborator Author

Do you see any way to reduce the time it takes to run it?

Also if the runtime would fix the duplicated tier0 with OSR enabled we could reduce the time by 1/3, but that's out of our control.

@adamsitnik
Copy link
Copy Markdown
Member

Generally users are most interested in measuring tier1 performance, which is the default Throughput strategy, so it doesn't make sense to not push it all the way there unless the user selects a different run strategy.

They are, but Tiered JIT was introduced 7(?) years ago and they have figured various ways of dealing with this problem. Most of them were lucky enough to just run micro-benchmarks that were getting promoted to Tier 1 with the default Pilot/Warmup settings. Others increased warmup count or just disabled Tiered JIT.

It seems that with those settings you were actually measuring tier0 performance in the first iterations (unless the pilot stage invoked it enough times to bump it up), and possibly higher tiers in later iterations depending how fast the benchmark is. So each iteration was not measuring the same thing.

But the perf repo is on a version of BDN that has the new JIT stage, so it should always be measuring tier1 now.

In case of dotnet/performance we have went back and forth about Tiered JIT when the feature got introduced (dotnet/performance#247, dotnet/performance#320). What we actually ended up doing (dotnet/performance#1536, dotnet/performance#666) was setting explicit min/max warmup count for just a couple of benchmarks that were not getting promoted to Tier 1 with our custom settings.

And what is very important in case of dotnet/performance is the time it takes to run all the benchmarks. Assuming that we run only nano-benchmarks (which is not true) and the new logic takes 1s per benchmark, our 5k benchmarks will need additional 83 minutes to run. And we are aiming at running the benchmarks multiple time per day, for multiple architectures. So in our (.NET Team) case, it really matters to not prolong the time it takes to run all the benchmarks.

But probably it should just be a bool value for whether to go through all the tiers or not, because I don't know what JittingCount(2) would even mean.

Bool is enough for our needs.

Also if the runtime would fix the duplicated tier0 with OSR enabled we could reduce the time by 1/3, but that's out of our control.

@EgorBo @AndyAyersMS Is this something that can be easily fixed? Do we have any plans to do it?

Other stages were left unchanged. Warmup stage has existed since even before tiered JIT was a thing and I didn't want to mess with it. We could definitely improve things there, though it's tracked by #1993.

Thank you for providing the link. Ideally all of this should be combined and the total time it takes to get the perfect invocation count, code promoted to Tier 1 and warmed up should be minimal.

Also another thing to consider are the "worst case" benchmarks here. Benchmarks that take less than one iteration time to execute (so we are not going to skip the extra warmup), but running them 3x30 times takes a lot of time. Most of them just delegate the work to smaller methods, and these small, hot path methods are usually executed multiple times even for a single invocation. And they usually get promoted to Tier 1 with the default settings. If my memory serves me well, this was the case for all the C# Computer Game benchmarks. What I am trying to say is that based on the evidence we had in the past, very few benchmarks required the extra warmup.

@timcassell
Copy link
Copy Markdown
Collaborator Author

They are, but Tiered JIT was introduced 7(?) years ago and they have figured various ways of dealing with this problem. Most of them were lucky enough to just run micro-benchmarks that were getting promoted to Tier 1 with the default Pilot/Warmup settings. Others increased warmup count or just disabled Tiered JIT.

Right, but that involved extra knowledge of the JIT tiering strategy and how the benchmarks were being invoked. New users run into the same problem. Now that's no longer a concern.

In case of dotnet/performance we have went back and forth about Tiered JIT when the feature got introduced (dotnet/performance#247, dotnet/performance#320). What we actually ended up doing (dotnet/performance#1536, dotnet/performance#666) was setting explicit min/max warmup count for just a couple of benchmarks that were not getting promoted to Tier 1 with our custom settings.

I think now the recommended config can be updated to set warmup count to 0, and remove all the specific benchmark warmup settings since tiering up is automatically handled now.

And what is very important in case of dotnet/performance is the time it takes to run all the benchmarks. Assuming that we run only nano-benchmarks (which is not true) and the new logic takes 1s per benchmark, our 5k benchmarks will need additional 83 minutes to run. And we are aiming at running the benchmarks multiple time per day, for multiple architectures. So in our (.NET Team) case, it really matters to not prolong the time it takes to run all the benchmarks.

Dotnet/performance updated to the new JIT stage in dotnet/performance#5073, do you have the numbers from before/after?

Bool is enough for our needs.

I think that option will probably not be needed. With the JIT stage already promoting to tier1, the pilot stage should be able to reach stable more quickly, and with setting warmup count to 0 that's less time spent "guessing" at tiering up. So total time I would think would be not much more than before (except for the duplicated tier0 of course).

Thank you for providing the link. Ideally all of this should be combined and the total time it takes to get the perfect invocation count, code promoted to Tier 1 and warmed up should be minimal.

I agree. @AndreyAkinshin mentioned combining the pilot and warmup stages in #2787 (also #1210), which I think would also help here, but I'm not sure about concrete plans for how to go about it.

Also another thing to consider are the "worst case" benchmarks here. Benchmarks that take less than one iteration time to execute (so we are not going to skip the extra warmup), but running them 3x30 times takes a lot of time.

I mean that's what this PR is for, to skip the tiering-up for long-running benchmarks. We can adjust the heuristic for what we consider to be the optimal cutoff value. I don't know what that cutoff value should be, though, so I just started with copying the pilot stage's heuristic.

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented May 29, 2026

Also if the runtime would fix the duplicated tier0 with OSR enabled we could reduce the time by 1/3, but that's out of our control.

@EgorBo @AndyAyersMS Is this something that can be easily fixed? Do we have any plans to do it?

If I understand it correctly, it's dotnet/runtime#76402 we probably should indeed address it for .NET 11.0 (am not promising, but will try)

@timcassell
Copy link
Copy Markdown
Collaborator Author

timcassell commented May 29, 2026

If I understand it correctly, it's dotnet/runtime#76402 we probably should indeed address it for .NET 11.0 (am not promising, but will try)

I don't think it's the same issue. It was described in dotnet/runtime#117787 (comment), I'm not sure if there's a dedicated issue for it. [Edit] Also later comment in that issue dotnet/runtime#76402 (comment).

@timcassell
Copy link
Copy Markdown
Collaborator Author

Also working with Claude it found that we can listen to JIT events in-process, so I'm going to try building a prototype to do that for deterministic tiering-up instead of waiting 250ms.

@timcassell timcassell force-pushed the jit-stage-heuristic branch from 39adb6f to bb136af Compare May 30, 2026 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WorkloadJitting executed more than once

3 participants