Investigate GetUnicodeCategory performance for non-ASCII characters by Copilot · Pull Request #124320 · dotnet/runtime

Copilot · 2026-02-12T06:25:57Z

Description

Investigated the reported 1-3ns regression in char.GetUnicodeCategory(char) for non-ASCII characters since .NET 5.0. The original issue suggested calling CharUnicodeInfo.GetUnicodeCategory(char) instead of CharUnicodeInfo.GetUnicodeCategory((int)c) to avoid a redundant IsValidCodePoint check.

Changes

Two approaches were attempted and benchmarked via EgorBot:

Char overload change: Changed Char.GetUnicodeCategory(char c) to call CharUnicodeInfo.GetUnicodeCategory(c) (char overload) instead of CharUnicodeInfo.GetUnicodeCategory((int)c) (int overload)
AggressiveInlining: Added [MethodImpl(MethodImplOptions.AggressiveInlining)] to CharUnicodeInfo.GetUnicodeCategory(char)

Both changes were reverted after benchmark results showed they caused a regression rather than an improvement.

Benchmark Results

EgorBot results (run 1, run 2):

x64 (AMD EPYC 9V74): PR was 1.88x slower across all non-ASCII characters:

Method	Main	PR	Ratio
Hebrew	0.29 ns	0.55 ns	1.88
Arabic	0.29 ns	0.55 ns	1.89
Chinese	0.29 ns	0.55 ns	1.88
Cyrillic	0.29 ns	0.55 ns	1.88

ARM64 (Neoverse-N2): Neutral (ratio ~0.98-1.02).

Adding AggressiveInlining made no difference — both runs produced identical x64 results.

Analysis

The JIT already optimizes the (int)c → CharUnicodeInfo.GetUnicodeCategory(int) path extremely well. The IsValidCodePoint check is trivially eliminated for char-width values, and the resulting code is more efficient than routing through the char overload. Main's ~0.29ns is already ~10x faster than the 3-4ns the original issue reported for .NET 5.0, indicating the regression has already been resolved by subsequent JIT/runtime optimizations.

All changes have been reverted. No code modifications remain in this PR.

Original prompt

This section details on the original issue you should resolve

<issue_title>Minor regression in System.Tests.Perf_Char.GetUnicodeCategory for non-ascii characters</issue_title>
<issue_description>I am now running an independent 3.1 vs 5.0 comparison and I've confirmed that it's actual regression introduced in 5.0.

The issue has been originally reported by a bot owned by @DrewScoggins in DrewScoggins/performance-2#574

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --filter 'System.Tests.Perf_Char.GetUnicodeCategory'

Details

System.Tests.Perf_Char.GetUnicodeCategory(c: '?')

Conclusion	Base	Diff	Base/Diff	Operating System	Bit	Processor Name	Base Runtime	Diff Runtime
Slower	5.82	7.89	0.74	Windows 10.0.18363.1016	Arm	Microsoft SQ1 3.0 GHz	.NET Core 3.1.6	5.0.100-rc.1.20413.9
Slower	3.15	4.08	0.77	Windows 10.0.18363.959	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20404.3
Slower	3.03	4.10	0.74	Windows 10.0.18363.959	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20418.3
Slower	3.40	4.80	0.71	Windows 10.0.19041.450	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	.NET Core 3.1.6	5.0.100-rc.1.20413.9
Slower	2.63	3.43	0.77	Windows 10.0.19042	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	.NET Core 3.1.6	5.0.100-rc.1.20418.3
Slower	1.86	4.54	0.41	Windows 10.0.18363.959	X86	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20420.2
Slower	2.27	5.28	0.43	Windows 10.0.19041.450	X86	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	.NET Core 3.1.6	5.0.100-rc.1.20419.5
Slower	3.12	3.77	0.83	ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20403.23
Slower	3.52	5.28	0.67	macOS Mojave 10.14.5	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	.NET Core 3.1.6	5.0.100-rc.1.20404.2
Slower	2.01	3.01	0.67	ubuntu 18.04	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	.NET Core 3.1.6	5.0.100-rc.1.20418.3

No regressions for ascii chars:

System.Tests.Perf_Char.GetUnicodeCategory(c: '.')

Conclusion	Base	Diff	Base/Diff	Modality	Operating System	Bit	Processor Name	Base Runtime	Diff Runtime
Slower	0.66	2.31	0.29		Windows 10.0.18363.1016	Arm	Microsoft SQ1 3.0 GHz	.NET Core 3.1.6	5.0.100-rc.1.20413.9
Same	0.51	0.54	0.94		Windows 10.0.18363.959	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20404.3
Same	0.52	0.55	0.95		Windows 10.0.18363.959	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20418.3
Same	0.61	0.61	1.00		Windows 10.0.19041.450	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	.NET Core 3.1.6	5.0.100-rc.1.20413.9
Same	0.53	0.27	1.92		Windows 10.0.19042	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	.NET Core 3.1.6	5.0.100-rc.1.20418.3
Slower	0.43	0.76	0.56		Windows 10.0.18363.959	X86	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20420.2
Slower	0.50	0.87	0.57		Windows 10.0.19041.450	X86	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	.NET Core 3.1.6	5.0.100-rc.1.20419.5
Same	0.47	0.52	0.92		ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20403.23
Same	0.58	0.59	0.98		macOS Mojave 10.14.5	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	.NET Core 3.1.6	5.0.100-rc.1.20404.2
Same	0.23	0.47	0.48	several?	ubuntu 18.04	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	.NET Core 3.1.6	5.0.100-rc.1.20418.3

System.Tests.Perf_Char.GetUnico...

Fixes Minor regression in System.Tests.Perf_Char.GetUnicodeCategory for non-ascii characters #41107

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>

jkotas · 2026-02-12T07:04:52Z

@copilot Validate the performance improvement using performance-benchmark skill

Copilot · 2026-02-12T07:10:16Z

@copilot Validate the performance improvement using performance-benchmark skill

Running benchmark to validate the performance improvement:

@EgorBot -x64 -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser]
public class Bench
{
    private char _hebrewChar;
    private char _arabicChar;
    private char _chineseChar;
    private char _cyrillicChar;

    [GlobalSetup]
    public void Setup()
    {
        _hebrewChar = '\x05D0';   // Hebrew letter Aleph (from original issue)
        _arabicChar = '\x0623';   // Arabic letter Alef with Hamza above
        _chineseChar = '\u4E00';  // CJK Unified Ideograph
        _cyrillicChar = '\u0410'; // Cyrillic Capital Letter A
    }

    [Benchmark]
    public UnicodeCategory GetUnicodeCategory_Hebrew()
    {
        return char.GetUnicodeCategory(_hebrewChar);
    }

    [Benchmark]
    public UnicodeCategory GetUnicodeCategory_Arabic()
    {
        return char.GetUnicodeCategory(_arabicChar);
    }

    [Benchmark]
    public UnicodeCategory GetUnicodeCategory_Chinese()
    {
        return char.GetUnicodeCategory(_chineseChar);
    }

    [Benchmark]
    public UnicodeCategory GetUnicodeCategory_Cyrillic()
    {
        return char.GetUnicodeCategory(_cyrillicChar);
    }
}

This will compare the PR against the base branch to confirm the ~1-3ns improvement for non-ASCII characters.

EgorBot · 2026-02-12T07:43:40Z

@copilot, please review the benchmark results and logs at this URL: #124320

jkotas · 2026-02-12T07:58:32Z

@copilot Analyze why the fix has not produced the expected improvements.

Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>

Copilot · 2026-02-12T08:05:46Z

@copilot Analyze why the fix has not produced the expected improvements.

I analyzed why the initial fix didn't produce the expected improvement. The issue was that CharUnicodeInfo.GetUnicodeCategory(char) was missing the [MethodImpl(MethodImplOptions.AggressiveInlining)] attribute.

Root cause: While the fix correctly avoids the IsValidCodePoint validation check by calling the char overload, the JIT compiler may not have been inlining the char overload aggressively enough. This meant the call overhead could offset the benefit of skipping the validation.

Solution: Added [MethodImpl(MethodImplOptions.AggressiveInlining)] to CharUnicodeInfo.GetUnicodeCategory(char) to match the pattern used by similar methods in the same file (e.g., ToUpper(char), ToLower(char)).

This ensures that:

The char overload gets inlined into Char.GetUnicodeCategory(char)
The entire call chain collapses to just the core logic without validation overhead
The performance improvement becomes measurable

The change is in commit ca339b9. Please re-run the benchmark to verify the improvement.

jkotas · 2026-02-12T08:11:13Z

@copilot Run the above microbenchmark #124320 (comment) using performance-benchmark skill

Copilot · 2026-02-12T08:19:11Z

@copilot Run the above microbenchmark #124320 (comment) using performance-benchmark skill

Running the benchmark with the updated code (including AggressiveInlining):

@EgorBot -x64 -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser]
public class Bench
{
    private char _hebrewChar;
    private char _arabicChar;
    private char _chineseChar;
    private char _cyrillicChar;

    [GlobalSetup]
    public void Setup()
    {
        _hebrewChar = '\x05D0';   // Hebrew letter Aleph (from original issue)
        _arabicChar = '\x0623';   // Arabic letter Alef with Hamza above
        _chineseChar = '\u4E00';  // CJK Unified Ideograph
        _cyrillicChar = '\u0410'; // Cyrillic Capital Letter A
    }

    [Benchmark]
    public UnicodeCategory GetUnicodeCategory_Hebrew()
    {
        return char.GetUnicodeCategory(_hebrewChar);
    }

    [Benchmark]
    public UnicodeCategory GetUnicodeCategory_Arabic()
    {
        return char.GetUnicodeCategory(_arabicChar);
    }

    [Benchmark]
    public UnicodeCategory GetUnicodeCategory_Chinese()
    {
        return char.GetUnicodeCategory(_chineseChar);
    }

    [Benchmark]
    public UnicodeCategory GetUnicodeCategory_Cyrillic()
    {
        return char.GetUnicodeCategory(_cyrillicChar);
    }
}

This will verify the ~1-3ns improvement now that both fixes are in place (char overload + AggressiveInlining).

EgorBot · 2026-02-12T08:53:34Z

@copilot, please review the benchmark results and logs at this URL: #124320

stephentoub · 2026-02-12T13:19:23Z

@copilot, please review the benchmark results and logs at this URL: #124320

@EgorBo, what configuration do we need to do for @copilot to recognize / respond to your bot account?

stephentoub · 2026-02-12T13:20:07Z

Also, @EgorBo, it looks like the bot is including the wrong link in the comment it posts.

The EgorBot benchmark results show the changes caused a 1.88x regression on x64 (AMD EPYC), not the expected improvement. Main: ~0.29ns, PR: ~0.55ns per call for non-ASCII chars. On ARM64 results were neutral (ratio ~1.00). Reverting both changes: - Char.cs: restore call to CharUnicodeInfo.GetUnicodeCategory((int)c) - CharUnicodeInfo.cs: remove AggressiveInlining attribute Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>

Copilot · 2026-02-12T13:27:46Z

@copilot, please review the benchmark results and logs at this URL: #124320

@EgorBo, what configuration do we need to do for @copilot to recognize / respond to you...

I reviewed the benchmark results from both EgorBot runs (#612, #613).

Both runs show the changes caused a regression, not an improvement:

x64 (AMD EPYC 9V74): PR is 1.88x slower across all characters:

Method	Main	PR	Ratio
Hebrew	0.29 ns	0.55 ns	1.88
Arabic	0.29 ns	0.55 ns	1.89
Chinese	0.29 ns	0.55 ns	1.88
Cyrillic	0.29 ns	0.55 ns	1.88

ARM64 (Neoverse-N2): Neutral (ratio ~0.98-1.02).

Adding AggressiveInlining made no difference — both runs produced identical x64 results.

The JIT is already optimizing the (int)c path extremely well — the IsValidCodePoint check is trivially eliminated for char-width values, and the resulting code is more efficient than going through the char overload. Main's ~0.29ns is already ~10x faster than the 3-4ns the original issue reported for .NET 5.0.

I've reverted all changes in 8a07341. The original regression from the issue appears to have already been resolved by subsequent JIT/runtime optimizations.

Initial plan

ecdf1ac

Copilot AI assigned Copilot and jkotas Feb 12, 2026

Copilot started work on behalf of jkotas February 12, 2026 06:26 View session

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 12, 2026

Fix GetUnicodeCategory performance regression for non-ASCII chars

91dbaa6

Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix regression in GetUnicodeCategory for non-ascii characters~~ Fix GetUnicodeCategory performance regression for non-ASCII characters Feb 12, 2026

Copilot AI requested a review from jkotas February 12, 2026 06:53

Copilot finished work on behalf of jkotas February 12, 2026 06:53

Copilot started work on behalf of jkotas February 12, 2026 07:07 View session

EgorBot mentioned this pull request Feb 12, 2026

Benchmarks for #124320 (Copilot) EgorBot/runtime-utils#612

Open

Copilot finished work on behalf of jkotas February 12, 2026 07:11

Copilot started work on behalf of jkotas February 12, 2026 07:58 View session

Add AggressiveInlining to CharUnicodeInfo.GetUnicodeCategory(char)

ca339b9

Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>

Copilot finished work on behalf of jkotas February 12, 2026 08:15

Copilot started work on behalf of jkotas February 12, 2026 08:15 View session

EgorBot mentioned this pull request Feb 12, 2026

Benchmarks for #124320 (Copilot) EgorBot/runtime-utils#613

Open

Copilot finished work on behalf of jkotas February 12, 2026 08:20

Copilot started work on behalf of stephentoub February 12, 2026 13:19 View session

Copilot AI changed the title ~~Fix GetUnicodeCategory performance regression for non-ASCII characters~~ Investigate GetUnicodeCategory performance for non-ASCII characters Feb 12, 2026

Copilot AI requested a review from stephentoub February 12, 2026 13:28

Copilot finished work on behalf of stephentoub February 12, 2026 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate GetUnicodeCategory performance for non-ASCII characters#124320

Investigate GetUnicodeCategory performance for non-ASCII characters#124320
Copilot wants to merge 4 commits intomainfrom
copilot/fix-unicode-category-regression

Copilot AI commented Feb 12, 2026 •

edited

Loading

Uh oh!

jkotas commented Feb 12, 2026

Uh oh!

Copilot AI commented Feb 12, 2026

Uh oh!

EgorBot commented Feb 12, 2026

Uh oh!

jkotas commented Feb 12, 2026

Uh oh!

Copilot AI commented Feb 12, 2026

Uh oh!

jkotas commented Feb 12, 2026

Uh oh!

Copilot AI commented Feb 12, 2026

Uh oh!

EgorBot commented Feb 12, 2026

Uh oh!

stephentoub commented Feb 12, 2026

Uh oh!

stephentoub commented Feb 12, 2026

Uh oh!

Copilot AI commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Benchmark Results

Analysis

Repro

System.Tests.Perf_Char.GetUnicodeCategory(c: '?')

System.Tests.Perf_Char.GetUnicodeCategory(c: '.')

System.Tests.Perf_Char.GetUnico...

Uh oh!

jkotas commented Feb 12, 2026

Uh oh!

Copilot AI commented Feb 12, 2026

Uh oh!

EgorBot commented Feb 12, 2026

Uh oh!

jkotas commented Feb 12, 2026

Uh oh!

Copilot AI commented Feb 12, 2026

Uh oh!

jkotas commented Feb 12, 2026

Uh oh!

Copilot AI commented Feb 12, 2026

Uh oh!

EgorBot commented Feb 12, 2026

Uh oh!

stephentoub commented Feb 12, 2026

Uh oh!

stephentoub commented Feb 12, 2026

Uh oh!

Copilot AI commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Feb 12, 2026 •

edited

Loading