Skip to content

Conversation

@steveisok
Copy link
Member

Add a pre-configured CMake cache file for macOS ARM64 (Apple Silicon) that eliminates redundant feature detection checks during the configure phase.

Performance improvement:

  • CMake configure time: 105s → 12s (89% faster)
  • Full clean build (clr+libs): 9:51 → 7:36 (18% faster)

The build currently runs CMake configuration 3 times (coreclr, native libs, host), with 597 total checks of which 395 are duplicated across configurations. The cache file pre-populates known results for macOS ARM64, similar to the existing tryrun.browser.cmake for WebAssembly builds.

Valid for:

  • macOS 14.0+ (Sonoma and later)
  • Xcode 15.0+ / AppleClang 15.0+
  • Architecture: arm64 (Apple Silicon)

To disable if issues arise (e.g., after Xcode upgrade):
export CLR_CMAKE_SKIP_PLATFORM_CACHE=1

Add a pre-configured CMake cache file for macOS ARM64 (Apple Silicon) that
eliminates redundant feature detection checks during the configure phase.

Performance improvement:
- CMake configure time: 105s → 12s (89% faster)
- Full clean build (clr+libs): 9:51 → 7:36 (18% faster)

The build currently runs CMake configuration 3 times (coreclr, native libs,
host), with 597 total checks of which 395 are duplicated across configurations.
The cache file pre-populates known results for macOS ARM64, similar to the
existing tryrun.browser.cmake for WebAssembly builds.

Valid for:
- macOS 14.0+ (Sonoma and later)
- Xcode 15.0+ / AppleClang 15.0+
- Architecture: arm64 (Apple Silicon)

To disable if issues arise (e.g., after Xcode upgrade):
  export CLR_CMAKE_SKIP_PLATFORM_CACHE=1
@steveisok steveisok requested review from a team and Copilot February 5, 2026 14:09
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 5, 2026
@steveisok
Copy link
Member Author

steveisok commented Feb 5, 2026

@dotnet/runtime-infrastructure @jkotas @am11 I have this in draft as I think we need to make sure we strike the right balance in terms of caching and to make sure there isn't a great deal of friction using and updating this.

I do believe this is the right path for most of our configurations. If you're all on board, easiest to start with one and we'll methodically go through the rest.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CMake tryrun cache file for macOS ARM64 builds to significantly reduce CMake configure time. The change introduces a pre-configured cache file (tryrun.osx-arm64.cmake) containing 337 feature detection results that are known for macOS 14.0+ with Xcode 15.0+, eliminating redundant checks during the build configuration phase. The cache is automatically applied for macOS ARM64 builds and can be disabled via the CLR_CMAKE_SKIP_PLATFORM_CACHE=1 environment variable.

Changes:

  • Added a new CMake cache file with pre-populated feature detection results for macOS ARM64 platform
  • Modified the build system to automatically load this cache file when building for macOS ARM64
  • Implemented an opt-out mechanism for users who encounter issues

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
eng/native/tryrun.osx-arm64.cmake New CMake cache file containing 337 pre-configured feature detection results for macOS ARM64 (Sonoma 14.0+, Xcode 15.0+) to eliminate redundant checks during configuration
eng/native/gen-buildsys.sh Added logic to automatically load the macOS ARM64 cache file when target_os is "osx" and host_arch is "arm64", with opt-out via CLR_CMAKE_SKIP_PLATFORM_CACHE=1 environment variable

Comment on lines 88 to 93
# Use platform-specific tryrun cache to speed up CMake configure (opt-out via CLR_CMAKE_SKIP_PLATFORM_CACHE=1)
if [[ "$CLR_CMAKE_SKIP_PLATFORM_CACHE" != "1" ]]; then
if [[ "$target_os" == "osx" && "$host_arch" == "arm64" && -f "$scriptroot/tryrun.osx-arm64.cmake" ]]; then
cmake_extra_defines="-C $scriptroot/tryrun.osx-arm64.cmake $cmake_extra_defines"
fi
fi
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When cross-compiling to macOS ARM64, both the new platform-specific cache (line 91) and the cross-compilation cache (line 75) will be loaded. Since tryrun.osx-arm64.cmake is prepended, CMake processes it first, but values can be overridden by the later tryrun.cmake.

There's a specific value discrepancy: HAVE_SHM_OPEN_THAT_WORKS_WELL_ENOUGH_WITH_MMAP_EXITCODE is set to 255 in tryrun.osx-arm64.cmake (line 233) but 1 in tryrun.cmake for Darwin ARM64 (line 75 context). In cross-compile scenarios, tryrun.cmake's value of 1 will override. However, for native macOS ARM64 builds, the value of 255 from tryrun.osx-arm64.cmake will be used.

Additionally, tryrun.osx-arm64.cmake is missing several _EXITCODE variables that are defined in tryrun.cmake for Darwin (HAVE_BROKEN_FIFO_SELECT_EXITCODE, HAVE_CLOCK_MONOTONIC_COARSE_EXITCODE, HAVE_CLOCK_GETTIME_NSEC_NP_EXITCODE, HAVE_MMAP_DEV_ZERO_EXITCODE, HAVE_SCHED_GETCPU_EXITCODE, MMAP_ANON_IGNORES_PROTECTION_EXITCODE, SEM_INIT_MODIFIES_ERRNO_EXITCODE).

Consider adding a check to skip loading tryrun.osx-arm64.cmake when CROSSCOMPILE is set to avoid any potential conflicts, or verify that these value differences are intentional and correct for the different build scenarios.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a valid point.

We are effectively maintaining a smaller version of this cache for the runtime checks for cross-builds. We should avoid the duplication.

The runtime checks have a high likelihood of being forward compatible (if an API works on version X, we assume that it works on version X+1 too),. They do not have the fragility that comes with compiler checks

I am wondering how much we can make this better by combination of:

  • Use the existing cross-build runtime check cache even for native builds
  • Delete checks that are no longer relevant
  • Restructure the build to reduce duplicated checks

Comment on lines +1 to +4
# CMake pre-configured cache for macOS ARM64 (Apple Silicon) native builds
#
# This file caches the results of CMake feature detection checks to significantly
# speed up the CMake configure phase for macOS ARM64 builds.
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states this is for "macOS ARM64 (Apple Silicon) native builds" but the file can also be loaded during cross-compilation to macOS ARM64 (when CROSSCOMPILE=1 is set). Consider updating the comment to clarify that it applies to both native and cross-compilation scenarios, or alternatively, modify the loading logic in gen-buildsys.sh to skip this cache file when CROSSCOMPILE=1 to avoid the potential conflicts noted in the other comment.

Suggested change
# CMake pre-configured cache for macOS ARM64 (Apple Silicon) native builds
#
# This file caches the results of CMake feature detection checks to significantly
# speed up the CMake configure phase for macOS ARM64 builds.
# CMake pre-configured cache for macOS ARM64 (Apple Silicon) builds
#
# This file caches the results of CMake feature detection checks to significantly
# speed up the CMake configure phase when targeting macOS ARM64 (both native
# builds on Apple Silicon and cross-compilation with CROSSCOMPILE=1).

Copilot uses AI. Check for mistakes.
@jkotas
Copy link
Member

jkotas commented Feb 5, 2026

I think we need to have some sort of detection when the checked in configs get out of sync before adding more of them.

We seem to have bugs due to messed up checked in configs #123950

set(COMPILER_SUPPORTS_W_RESERVED_IDENTIFIER 1 CACHE INTERNAL "")
set(FNO_LTO_AVAILABLE 1 CACHE INTERNAL "")
set_cache_value(HAS_POSIX_SEMAPHORES_EXITCODE 1)
set(HAS_POSIX_SEMAPHORES "" CACHE INTERNAL "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to check how many of these are actually needed. I checked a few and immediately discovered a cluster of dead code: #124049

@steveisok
Copy link
Member Author

I think we need to have some sort of detection when the checked in configs get out of sync before adding more of them.

We seem to have bugs due to messed up checked in configs #123950

I think the kinds of checks we need to do are likely to be different per target. With wasm, it's when we bump emscripten. What do you think is appropriate here? XCode major/minor versions? OS version?

@steveisok
Copy link
Member Author

I think the kinds of checks we need to do are likely to be different per target. With wasm, it's when we bump emscripten. What do you think is appropriate here? XCode major/minor versions? OS version?

I think we have a min version the cache is valid for and a check that we don't get too far ahead (2 major versions?). That way regeneration would likely to occur when we bump above the min version on our CI machines.

Do we want to warn or error when we violate the cache checks?

@am11
Copy link
Member

am11 commented Feb 6, 2026

Caching the introspection result is fine for CI like environments which are deterministic. But for dev innerloop, we can't predict what user machine has installed. I honestly don't think saving a few seconds on non-critical use-cases is worth the hassle and it also begs the questions like; "why only the macOS".

@jkotas
Copy link
Member

jkotas commented Feb 6, 2026

What do you think is appropriate here? XCode major/minor versions? OS version?

I do not know what the best practices around this are.

Copilot highlighted some of the discrepancies with what's getting added here in https://github.com/dotnet/runtime/pull/124046/files#r2769414845

@steveisok
Copy link
Member Author

steveisok commented Feb 6, 2026

Caching the introspection result is fine for CI like environments which are deterministic. But for dev innerloop, we can't predict what user machine has installed. I honestly don't think saving a few seconds on non-critical use-cases is worth the hassle and it also begs the questions like; "why only the macOS".

I think we have to try because the various build time savings (this plus ninja to start) is hard to ignore. I think around 18% for caching is about what you'd find in most cases (in addition to the ~8% ninja boost). That savings adds up.

"why on the macOS"

I want to apply this approach everywhere that makes sense. Starting with one allows us to keep the discussion focused and then we can go down the line.

@steveisok
Copy link
Member Author

steveisok commented Feb 6, 2026

Copilot highlighted some of the discrepancies with what's getting added here in https://github.com/dotnet/runtime/pull/124046/files#r2769414845

Nice! I wondered if there were difference between when we cross build on CI and when we just run. Apple does a fairly decent job of making it appear seamless, so this is easy to miss.

I do not know what the best practices around this are.

I double checked with copilot and here's what it suggests (I think I pretty much agree).

Summary: Sync Detection for macOS ARM64 CMake Cache

Approach: Track minimum Xcode version, warn on mismatch

  1. Add version tracking to the cache file: set(TRYRUN_OSX_ARM64_MIN_XCODE_VERSION "15" CACHE INTERNAL "Minimum Xcode version this cache is valid for")
  2. Detection logic in gen-buildsys.sh:
  - Detect current Xcode major version (via xcodebuild -version or clang version)
  - Warn if older than minimum → cache may reference features that don't exist
  - Silent if newer → expected for local dev, cache is conservative
  - Optional: warn if 2+ major versions ahead → gentle reminder to consider regenerating
  3. Regeneration trigger:
  - When CI updates its Xcode image, bump the minimum version and regenerate the cache
  - This is a manual process tied to infrastructure updates (similar to emscripten bumps for WASM)

Why this works:

  - CI runs at the minimum version → perfect match
  - Local devs run ahead → no friction, cache remains valid (features only grow)
  - Devs running old Xcode get a warning before hitting mysterious build issues
  - The CLR_CMAKE_SKIP_PLATFORM_CACHE=1 escape hatch remains for edge cases

Differences from browser/WASM:

  - WASM has a version file (emscripten-version.txt) to check against
  - macOS doesn't have an equivalent, so we detect Xcode at configure time instead

We also need to factor in when crosscompiling.

@steveisok
Copy link
Member Author

One more thing... There are comments in wasm highlighting how to reconstruct the cache. We need to automate that part to make it easier on whoever is charged with updating the cache.

@jkotas
Copy link
Member

jkotas commented Feb 6, 2026

I think we have to try because the various build time savings

An alternative way to address this is to ask why we have so many of these and why they are so slow to evaluate.

@am11
Copy link
Member

am11 commented Feb 6, 2026

I think we have to try because the various build time savings (this plus ninja to start) is hard to ignore. I think around 18% for caching is about what you'd find in most cases (in addition to the ~8% ninja boost). That savings adds up.

I am thinking about the potential risks. The whole point of cmake introspection is to adapt to the machine or ensuring the "desired state" of the machine. Otherwise, we can have hand-rolled config.h file. :)

In cmake, there is no single type of introspection. In places, we raise manual error in cmake script for unexpected state, so this will paper over those situation where something can go wrong. IOW, it has potential to break stuff in non-obvious ways.

Perhaps, we can add a ./build.sh --validate-config mode so if someone runs into weird issue (or recommend users to run it every once in a while, e.g. after upgrading the system), they can validate if their machine matches the cached preset.

@steveisok
Copy link
Member Author

I am thinking about the potential risks. The whole point of cmake introspection is to adapt to the machine or ensuring the "desired state" of the machine. Otherwise, we can have hand-rolled config.h file. :)

Agreed - I think part of what we're also trying to figure out is where the line is. It's likely different for every platform.

Perhaps, we can add a ./build.sh --validate-config mode so if someone runs into weird issue (or recommend users to run it every once in a while, e.g. after upgrading the system), they can validate if their machine matches the cached preset.

I think that's a good idea.

@steveisok
Copy link
Member Author

steveisok commented Feb 6, 2026

An alternative way to address this is to ask why we have so many of these and why they are so slow to evaluate.

Summary of analysis from the coreclr part of configure:

  1. Runtime checks are ~2.8s (6.3%) - These test behaviors standardized 20-30 years ago. Extremely safe to cache.
  2. The rest of configure time (~42s) is spent on:
  • try_compile / check_*_source_compiles (~20s) - Compile-only checks
  • fetchcontent (~4.4s) - Fetching zlib-ng, zstd, brotli
  • CMake processing overhead
  1. CMake already caches these results in CMakeCache.txt for incremental builds - the problem is only for clean builds (like CI or fresh clones).

The fundamental issue is that cmake has to invoke the compiler many times (~300+ try_compile calls) to probe the toolchain/platform. There's no way to make those faster other than:

  • Caching (the PR 124046 approach)
  • Eliminating obsolete checks (minor gains, code churn)
  • Precomputing results for known CI environments

So yes - caching is the right lever to pull. The values are stable, and for CI where build times matter most, a checked-in cache makes sense.

https://gist.githubusercontent.com/steveisok/97a54cfd082562ac5893a3e4fc49d49e/raw/28361a98f1da445e9f022edc0747496951634df1/cmake-macarm64-analysis.md

Add options to validate and regenerate platform-specific CMake cache files:

- --validate-config: Runs cmake configure without the cache and compares
  detected values against the cached file, reporting any differences.

- --regenerate-config: Same as validate, but updates the cache file if
  differences are found.

The new validate-platform-cache.sh script supports multiple platforms:
- osx-arm64, osx-x64, linux-x64, linux-arm64, browser, ios/tvos

This addresses feedback about providing a way to verify cached preset
values match the current system configuration.
@steveisok
Copy link
Member Author

Perhaps, we can add a ./build.sh --validate-config mode so if someone runs into weird issue (or recommend users to run it every once in a while, e.g. after upgrading the system), they can validate if their machine matches the cached preset.

I think that's a good idea.

I added a validation and regeneration switch along with a script to carry it out. Is the concept in line with what you were thinking? We may want to tweak the regen script, but what matters first is if this is what we want.

Comment on lines +310 to +311
echo "Run with --regenerate to update the cache file:"
echo " $0 --regenerate $platform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "Run with --regenerate to update the cache file:"
echo " $0 --regenerate $platform"
echo "Run with --regenerate-config to update the cache file:"

Maybe it's better to use the arg name in the top-level script.

@am11
Copy link
Member

am11 commented Feb 6, 2026

Just tried and it has caught one difference:

$ ./build.sh --validate-config
...
============================================================
Results
============================================================
Total variables checked: 312
Matched: 311
Differences: 1

Differences found:

  - HAVE_SYS_ENDIAN_H: cached='' detected='1'

Run with --regenerate to update the cache file:
  /Users/adeel/projects/runtime5/eng/native/validate-platform-cache.sh --regenerate osx-arm64

@steveisok
Copy link
Member Author

steveisok commented Feb 6, 2026

Just tried and it has caught one difference:

Curious, what is your mac setup?

I surfaced it on my setup. I generated the cache from MacOS 15.4 / XCode 16.3. On my other Mac, that has 26.2 and is a newer SDK.

The initial cache as part of this PR is closer to the floor that we want. You're too new.

@am11
Copy link
Member

am11 commented Feb 6, 2026

Mine is macos 26.2 (25C56) with Xcode 26.2 (17C52).

@jkotas
Copy link
Member

jkotas commented Feb 7, 2026

The initial cache as part of this PR is closer to the floor that we want

The checked-on cache should match our CI and official build environments. It means that it needs to match the floor that is our target for given release.

We may want to disable the cache for source builds.

@jkoritzinsky
Copy link
Member

I think we need to disable it for source build as we can't know the capabilities of a given distro.

We should only use the cache for Microsoft cross builds and macOS (with matching settings for the min macOS version, not the current macOS version).

This pipeline runs weekly to validate that platform cache files are
up-to-date with CI pool images. If validation fails, it regenerates
the cache and publishes it as an artifact.

- Runs on schedule (Sunday 8 AM UTC) for all platforms
- Can be triggered manually for specific platforms
- Supports: osx_arm64, osx_x64, linux_x64, browser_wasm, ios_arm64
Copilot AI review requested due to automatic review settings February 8, 2026 13:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Comment on lines +59 to +60
# First validate; if that fails, regenerate
buildArgs: --validate-config || $(Build.SourcesDirectory)/build.sh --regenerate-config
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildArgs uses --validate-config || $(Build.SourcesDirectory)/build.sh --regenerate-config, but the regeneration invocation does not pass -ci, -os, or -arch. This will regenerate the cache for the agent’s default OS/arch instead of the matrix platform (notably wrong for ios_arm64, and can also mismatch for hosted build setups). Pass the same platform args to the second build.sh invocation, or move regeneration into a separate step that reuses the same command-line parameters.

Suggested change
# First validate; if that fails, regenerate
buildArgs: --validate-config || $(Build.SourcesDirectory)/build.sh --regenerate-config
# First validate; if that fails, regenerate with the same platform/arch as the matrix job
buildArgs: --validate-config || $(Build.SourcesDirectory)/build.sh -ci -os $(osGroup) -arch $(archType) --regenerate-config

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +83
filepath="$(Build.SourcesDirectory)/eng/native/$cachefile"

# Check if cache file was regenerated by comparing to git
if git diff --quiet -- "$filepath" 2>/dev/null; then
echo "✓ Cache file is up-to-date: $cachefile"
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git diff is run with an absolute path ($filepath = $(Build.SourcesDirectory)/eng/native/...). Git pathspecs are repo-relative, so this can yield “outside repository” / pathspec errors and incorrectly mark caches as regenerated. Use a repo-relative path (e.g., eng/native/$cachefile) or run git -C $(Build.SourcesDirectory) diff ....

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +33
- osx_x64
- linux_x64
- browser_wasm
- ios_arm64

Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default platform list includes osx_x64 and linux_x64, but this PR doesn’t add eng/native/tryrun.osx-x64.cmake or eng/native/tryrun.linux-x64.cmake. Either add the missing cache files (and keep them updated) or remove these entries from the default list so the pipeline validates only platforms with committed caches.

Suggested change
- osx_x64
- linux_x64
- browser_wasm
- ios_arm64
- browser_wasm
- ios_arm64

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +203
# Extract all set() and set_cache_value() calls from the cache file and compare
while IFS= read -r line; do
var_name=""
cached_value=""

Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation loop compares every set(...) entry in the cache file against CMakeCache.txt. Cache-only metadata variables (e.g., TRYRUN_BROWSER_EMSCRIPTEN_VERSION in tryrun.browser.cmake) won’t exist when configuring without the cache, so this will report permanent false differences. Consider filtering out known metadata vars (or only validating vars present in the detected cache).

Copilot uses AI. Check for mistakes.
Comment on lines +258 to +262
cat > "$new_cache" << HEADER
# CMake pre-configured cache for $platform builds
#
# This file caches the results of CMake feature detection checks to significantly
# speed up the CMake configure phase.
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regeneration overwrites the entire cache file with a generic auto-generated header. For caches that intentionally contain hand-maintained documentation (notably tryrun.browser.cmake), this will discard important guidance. Consider preserving existing headers or using per-platform templates instead of rewriting from scratch.

Copilot uses AI. Check for mistakes.

# Extract all relevant variables from CMakeCache.txt and write to new cache
# Sort for consistent output
grep -E "^(HAVE_|HAS_|COMPILER_|LINKER_|MMAP_|ONE_SHARED|PTHREAD_|REALPATH_|SEM_INIT|NEON_|FNO_|KEVENT_|IPV6MR_|INOTIFY_|LD_FLAG)" "$cmake_cache" | sort | \
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regeneration filter (grep -E "^(HAVE_|HAS_|...)") excludes cache metadata like TRYRUN_BROWSER_EMSCRIPTEN_VERSION, so regenerating tryrun.browser.cmake would silently drop version tracking. Either extend the filter to retain required metadata or handle browser cache regeneration separately.

Suggested change
grep -E "^(HAVE_|HAS_|COMPILER_|LINKER_|MMAP_|ONE_SHARED|PTHREAD_|REALPATH_|SEM_INIT|NEON_|FNO_|KEVENT_|IPV6MR_|INOTIFY_|LD_FLAG)" "$cmake_cache" | sort | \
grep -E "^(HAVE_|HAS_|COMPILER_|LINKER_|MMAP_|ONE_SHARED|PTHREAD_|REALPATH_|SEM_INIT|NEON_|FNO_|KEVENT_|IPV6MR_|INOTIFY_|LD_FLAG|TRYRUN_BROWSER_)" "$cmake_cache" | sort | \

Copilot uses AI. Check for mistakes.
When -sourcebuild is passed, export CLR_CMAKE_SKIP_PLATFORM_CACHE=1
to ensure source builds don't use the pre-configured cache. This
addresses reproducibility concerns for source builds.
@steveisok
Copy link
Member Author

The initial cache as part of this PR is closer to the floor that we want

The checked-on cache should match our CI and official build environments. It means that it needs to match the floor that is our target for given release.

We may want to disable the cache for source builds.

I updated to disable the cache on source builds. I think your point about releases makes sense. I would prefer that to be in a follow up.

I also added a pipeline that we can run on a schedule (weekly) and be triggered manually. Here's what it does:

  1. Runs ./build.sh --validate-config to compare current system state against cached values
  2. If validation fails, runs ./build.sh --regenerate-config to create an updated cache
  3. Publishes only the caches that changed as pipeline artifacts
  4. Someone downloads the artifact and commits the updated cache file

If that makes sense, I'll ask for the pipeline to be created.

@am11
Copy link
Member

am11 commented Feb 8, 2026

On developer machines, we rarely start from scratch. Most of the time we rebuild incrementally, and CMake's built-in caching skips the configuration step anyway. When we are in the middle of development and add or modify cmake options, then the next rebuild only runs the new / modified test; it's built-in mechanism and optimized as-is. The main use case for this change seems to be CI, where builds always start from a clean state and run dozens of times per day, so a few seconds saved makes sense.

I think it makes sense to tie this behavior to the --ci option, since CI machines typically run a consistent toolchain and OS versions and receive announced updates in a controlled way, something we cannot guarantee on developers' systems.

@steveisok
Copy link
Member Author

I think it makes sense to tie this behavior to the --ci option, since CI machines typically run a consistent toolchain and OS versions and receive announced updates in a controlled way, something we cannot guarantee on developers' systems.

I may not be typical, but I find myself routinely needing to blow away the artifacts directory and start from scratch. I'd prefer to leave this on if we're able to find a good floor for each configuration.

@am11
Copy link
Member

am11 commented Feb 8, 2026

good floor for each configuration

Even if we find one now; as soon as someone updates the system brew/apt/apk/winget update, there is no guarantee it won't go out of sync. We cannot predict other's dev environments. The only way is to limit the build to specific supported versions of toolchain and OS (more restricted than what we allow today; all currently supported by Apple versions on macOS), which I think we don't want to change. cc @janvorli

#
# Valid for:
# - macOS 14.0+ (Sonoma and later)
# - Xcode 15.0+ / AppleClang 15.0+
Copy link
Member

@jkotas jkotas Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is tied to the exact version of the compiler and system headers. I do not think it is safe to assume that the cache generated with XCode 15 is going to work for XCode 16, dtto for system headers.

For example, look at the COMPILER_SUPPORTS_W... checks. If the cached value differs from what the compiler on your machine supports in either direction, you are going to see build breaks.

Folks would need to be very careful about managing the XCode versions on their machines to avoid the build breaks. For example, if they need to build a servicing fix, they would need to switch to matching XCode version first and then switch back once they are done. They would need to be careful to install a new XCode version at the exact same time as we install a new XCode version in the CI.

I am not sure how many would find the build time savings worth the pain. I would not be worth the pain for me personally. If the build takes more than a minute, I am going to task switch and come back to it only after like an hour anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. As @am11 mentioned, most of the benefit is in CI. I'm happy to take the win there.

How about we make it on by default in CI and opt in locally?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that the CI machine updates happen atomically?

IIRC, the update is rolling. We have seen non-deterministic build breaks caused by some CI machines being on updated version and some still waiting for being updated.

I think the use of this cache would have to precondition by exact toolset and system header version to make the setup reliable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a version generated with in the cache file that we check against what is running. It'll skip if they don't match.

Address feedback about rolling CI updates causing build breaks when
cached compiler check results don't match the current toolchain.

- Add CLR_CMAKE_PLATFORM_CACHE_COMPILER_VERSION to cache file
- Check version in gen-buildsys.sh before loading cache
- Skip cache with informative message if versions don't match
- Update validate-platform-cache.sh to embed version when regenerating
Copilot AI review requested due to automatic review settings February 8, 2026 20:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment on lines +88 to +101
# Use platform-specific tryrun cache to speed up CMake configure (opt-out via CLR_CMAKE_SKIP_PLATFORM_CACHE=1)
if [[ "$CLR_CMAKE_SKIP_PLATFORM_CACHE" != "1" ]]; then
if [[ "$target_os" == "osx" && "$host_arch" == "arm64" && -f "$scriptroot/tryrun.osx-arm64.cmake" ]]; then
# Extract current AppleClang major version
current_clang_version=$(clang --version 2>/dev/null | head -1 | sed -n 's/.*clang version \([0-9]*\)\..*/\1/p')
# Extract expected version from cache file
cache_clang_version=$(grep -o 'CLR_CMAKE_PLATFORM_CACHE_COMPILER_VERSION "[0-9]*"' "$scriptroot/tryrun.osx-arm64.cmake" 2>/dev/null | sed 's/.*"\([0-9]*\)".*/\1/')

if [[ -n "$current_clang_version" && -n "$cache_clang_version" && "$current_clang_version" == "$cache_clang_version" ]]; then
cmake_extra_defines="-C $scriptroot/tryrun.osx-arm64.cmake $cmake_extra_defines"
elif [[ -n "$current_clang_version" && -n "$cache_clang_version" ]]; then
echo "Skipping platform cache: AppleClang version mismatch (current: $current_clang_version, cache: $cache_clang_version)"
fi
fi
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block can run during cross-compiles to macOS arm64 as well (when CROSSCOMPILE=1, target_os=osx, host_arch=arm64). In that case tryrun.cmake is already loaded earlier, and adding tryrun.osx-arm64.cmake can override TRY_RUN results (e.g., Darwin values in eng/native/tryrun.cmake), leading to inconsistent configuration between cross and native paths. Consider skipping the platform cache when CROSSCOMPILE=1 (or otherwise ensuring the two caches cannot conflict).

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +20
# AppleClang major version this cache was generated with
set(CLR_CMAKE_PLATFORM_CACHE_COMPILER_VERSION "17" CACHE STRING "AppleClang version for this cache")

Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is marked as generated with AppleClang major version "17", but the PR description states it is valid for Xcode 15 / AppleClang 15+. With the current logic in gen-buildsys.sh, this mismatch will cause the cache to be skipped for the described target toolchain. Please regenerate the cache with the intended AppleClang version or update the stated validity/toolchain requirements accordingly.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this local - will address with a CI official run before marking this PR as ready.

Comment on lines +128 to +129
echo "Nothing to validate."
exit 0
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the cache file is missing and --regenerate is not specified, the script exits 0 (“Nothing to validate”). That makes --validate-config succeed even though there is no cache to validate, and it prevents the pipeline’s validate || regenerate pattern from ever regenerating missing caches. Consider treating a missing cache as a validation failure (non-zero exit) or adding an explicit option for “allow missing cache” so pipelines can choose the behavior.

Suggested change
echo "Nothing to validate."
exit 0
echo "Nothing to validate; validation failed because cache is missing."
exit 1

Copilot uses AI. Check for mistakes.
@janvorli
Copy link
Member

janvorli commented Feb 9, 2026

I personally prefer correctness to speed in this case. I don't do clean builds very often and even though the saving on the configure phase looks large, the overall gain on the whole build being 18% doesn't feel that substantial. At least on my machine, the build spends much more time in the msbuild driven parts of the build - package restoration and the build anyways.
There were many times in the past where I've burned much more time just because of some attempts we had to skip the configuration phase and having stale results. The thing is that you don't usually realize that something has changed in a way that requires rebuild of the configuration and then you spend time to debug mysterious issues thinking it is your change only to figure out after half a day that it was just some stale check.

@steveisok
Copy link
Member Author

I wish there was a way in the UI to "version" a discussion and keep it clean and focused with a recap.

Here's where I think we are...

Summary of Discussion

Points of Agreement ✅

  1. CI is the primary beneficiary - The main value is in CI where clean builds happen constantly. Local dev builds are incremental and already benefit from CMake's built-in caching.

  2. Disable for source builds - Agreed and implemented. Source builds need reproducibility guarantees we can't provide with cached values.

  3. Validation/regeneration tooling is needed - --validate-config and --regenerate-config options added and well-received.

  4. Automated detection pipeline - Weekly validation pipeline that regenerates cache when out of sync is the right approach.

  5. Runtime checks are stable - @jkotas noted "runtime checks have a high likelihood of being forward compatible (if an API works on version X, we assume it works on X+1 too)" - these are safer to cache than compiler checks.

  6. Dead code cleanup is orthogonal - @jkotas already opened Remove dead thread suspension code from PAL #124049 to remove dead checks. This is complementary work.

Points of Contention ⚠️

  1. Developer machines: on by default vs opt-in
  • @am11, @janvorli, @jkotas: Prefer correctness over speed. Local devs have unpredictable environments. Debugging stale cache issues wastes more time than saved.
  • @steveisok: Would like it on for local dev too, with version gating providing safety.
  1. Version matching granularity
  • @jkotas: Wants exact toolset version match to avoid rolling update issues in CI
  • Current implementation: Major AppleClang version match
  1. Cross-compilation conflicts
  • Copilot reviewer flagged that tryrun.osx-arm64.cmake can conflict with tryrun.cmake during cross-builds
  • @jkotas suggested: Use the existing cross-build cache for native builds too, reducing duplication

Proposed Middle Ground 🤝

Scenario Behavior Rationale
CI builds Cache enabled, gated on exact compiler version match CI environments are controlled; version match ensures safety
Local dev Cache disabled by default, opt-in via env var Correctness > speed for devs; avoids stale cache debugging
Source builds Cache disabled Reproducibility requirement
Cross-builds Skip platform cache, use existing tryrun.cmake Avoid conflicts/duplication

To enable locally: export CLR_CMAKE_USE_PLATFORM_CACHE=1

Open Items

  1. Separate compiler checks from runtime checks - @jkotas's suggestion to use existing tryrun.cmake for runtime checks (stable) and be more conservative about compiler checks (version-sensitive)

  2. Release branch handling - Need to ensure cache matches floor version for each release

@akoeplinger
Copy link
Member

I think rather than checking the values into the repo it'd be easier to write the contents to some predictable file path and then use https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops&tabs=bundler#configure-the-cache-task to cache that file between runs.

We could then configure it to throw away the cache by including OS / xcode version / etc in the cache key.

@jkotas
Copy link
Member

jkotas commented Feb 11, 2026

@agocke is looking into more holistic Ccache-like caching solution for CI . You may want to sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants