Skip to content

Config Server stability improvements#1667

Open
bart-vmware wants to merge 22 commits intomainfrom
config-server-options
Open

Config Server stability improvements#1667
bart-vmware wants to merge 22 commits intomainfrom
config-server-options

Conversation

@bart-vmware
Copy link
Copy Markdown
Member

@bart-vmware bart-vmware commented Mar 31, 2026

Description

  • Fix various multi-threading issues, such as torn/stale reads (by swapping snapshots) and various race conditions
  • Reduced test flakiness by replacing Sandbox usage with MemoryFileProvider (without sleeps)
  • Fix detection of local configuration changes and atomically update internal state
  • Fix local configuration reloads to discard stale settings from earlier updates when keys are removed
  • Fix injected IOptionsMonitor<ConfigServerClientSettings> to return the same settings as the provider uses internally
  • Fix the preservation of the certificate issuer chain during configuration reloads
  • Recreate HttpClientHandler for each request because reusing with changing certificates is not thread-safe
  • Fix unbounded timer allocation when using HashCorp Vault token lease renewal
  • Graceful shutdown, signaling in-flight timer callbacks to terminate instead of throwing unobserved exceptions
  • Rerun service discovery during polled reloads (cached in Eureka)
  • Fix race condition where multiple temporary service containers were built for service discovery
  • Fix leaking change callback accumulation on repeated configuration reloads
  • Fix threadpool starvation during retries
  • Fix duplicate log messages and improve exceptions (messages and stack traces)
  • Add overloads to post-configure Config Server options from code
  • Postpone polling reloads to fix duplicate refresh on startup
  • Fix configuration change detection when combined with placeholder/decryption providers
  • Improved test coverage

Quality checklist

  • Your code complies with our Coding Style.
  • You've updated unit and/or integration tests for your change, where applicable.
  • You've updated documentation for your change, where applicable.
    If your change affects other repositories, such as Documentation, Samples and/or MainSite, add linked PRs here.
  • There's an open issue for the PR that you are making. If you'd like to propose a new feature or change, please open an issue to discuss the change or find an existing issue.
  • You've added required license files and/or file headers (explaining where the code came from with proper attribution), where code is copied from StackOverflow, a blog, or OSS.

`ConfigServerClientOptions` are now re-evaluated from initial options + configuration on every settings change, instead of being mutated in-place. This prevents stale or torn reads when the provider runs concurrently (e.g. polling timer vs reload).

Key changes:
- The provider clones options on each configuration reload, starting from the initial options passed via code, then applying configuration on top. This ensures code-level defaults are restored when keys are removed from configuration.
- Client settings (spring:cloud:config:*) are no longer written into the provider's data dictionary, eliminating a circular feedback loop where the provider's own output could influence its input.
- Discovery lookup results are tracked separately and applied on top of the options snapshot at load time, rather than mutating shared state.
- Client certificate configuration is moved from the source's `Build` method into `ConfigureConfigServerClientOptions`, so it participates in the options pipeline.
- `ConfigServerClientOptions.Clone()` is introduced to produce isolated snapshots, preventing tearing when options are read during a concurrent reload. A bug was fixed that prevented using global certificates.
- An internal `HttpClientHandler` parameter is threaded through the source and builder extensions, enabling direct handler injection for tests without reflection.
- `IOptionsChangeTokenSource<ConfigServerClientOptions>` is registered in DI so that `IOptionsMonitor` properly triggers on configuration changes.
The provider had several concurrency and lifecycle issues:

- Vault token renewal timers were created unboundedly on every HTTP request that carried a token, leaking timers that were never disposed. Vault renewal is now managed as a single timer with the same lifecycle as the polling timer.
- Timer management, handler configuration, and disposal could race with each other without synchronization. A lifecycle lock now guards these operations, while non-blocking try-enter locks prevent timer callbacks from queueing up.
- The HTTP client handler's certificates were reconfigured on every HTTP request, racing with concurrent requests sharing the same handler. Certificate and validation configuration is now applied once per settings change under the lifecycle lock, with certificates cleared before re-adding to prevent unbounded accumulation across reloads.
- Options cloning did not copy the certificate issuer chain, losing intermediate CA certificates across reloads.
- Disposal is now coordinated via a volatile flag so that in-flight timer callbacks and Load() calls exit gracefully instead of throwing unobserved exceptions.
Polled configuration reloads now refresh the discovery service lookup before fetching from Config Server, enabling detection of server address changes between polling intervals. Unlike initial load, polled reload does not apply FailFast or retry semantics.
Fixed a race where concurrent calls to LoadInternalAsync could create multiple ConfigServerDiscoveryService instances, each constructing their own temporary DI container and discovery clients.
Each configuration reload re-registered a new change callback via RegisterChangeCallback, without disposing the previous one. Rapid reloads could accumulate stale callbacks, causing redundant OnSettingsChanged invocations on multiple threads.
Replaced with a single ChangeToken.OnChange registration in the constructor, which automatically handles re-registration and is disposed on shutdown.
Replace shared mutable HttpClientHandler with a per-request factory, eliminating the race where OnSettingsChanged reconfigured a handler concurrently in use by HTTP requests. Production code creates and disposes a fresh handler per request; tests inject a factory for mocking.
Replace _isDisposed flag with CancellationToken-based shutdown, enabling in-flight HTTP requests in timer callbacks to terminate promptly on disposal instead of running to completion.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 31, 2026

Summary - All Code Coverage (ubuntu-latest)

Line coverage Branch coverage

Assembly Line coverage Branch coverage
Steeltoe.Bootstrap.AutoConfiguration 97.4% 100%
Steeltoe.Common 84.3% 77.8%
Steeltoe.Common.Certificates 97.2% 85.9%
Steeltoe.Common.Hosting 84% 70%
Steeltoe.Common.Http 100% 85.2%
Steeltoe.Common.Logging 81.1% 56.2%
Steeltoe.Common.Net 64.5% 66.6%
Steeltoe.Configuration.Abstractions 96.1% 90.7%
Steeltoe.Configuration.CloudFoundry 99.1% 91.8%
Steeltoe.Configuration.ConfigServer 90.7% 85.9%
Steeltoe.Configuration.Encryption 97.6% 92.4%
Steeltoe.Configuration.Kubernetes.ServiceBindings 95.1% 89.3%
Steeltoe.Configuration.Placeholder 93.8% 84.7%
Steeltoe.Configuration.RandomValue 93.2% 90%
Steeltoe.Configuration.SpringBoot 98.3% 95%
Steeltoe.Connectors 93.9% 89.4%
Steeltoe.Connectors.EntityFrameworkCore 81.5% 75%
Steeltoe.Discovery.Configuration 92.3% 100%
Steeltoe.Discovery.Consul 97.6% 96.5%
Steeltoe.Discovery.Eureka 92.2% 85.3%
Steeltoe.Discovery.HttpClients 94.3% 95.4%
Steeltoe.Logging.Abstractions 99.4% 96.9%
Steeltoe.Logging.DynamicConsole 100% 95.4%
Steeltoe.Logging.DynamicSerilog 99.1% 95.4%
Steeltoe.Management.Abstractions 100% 100%
Steeltoe.Management.Endpoint 95.8% 89%
Steeltoe.Management.Prometheus 95.8% 91.6%
Steeltoe.Management.Tasks 100% ****
Steeltoe.Management.Tracing 100% 75%
Steeltoe.Security.Authentication.JwtBearer 100% 100%
Steeltoe.Security.Authentication.OpenIdConnect 73.8% 59%
Steeltoe.Security.Authorization.Certificate 96.7% 75%
Steeltoe.Security.DataProtection.Redis 100% ****

- Improved log messages (and don't log multiple times), fix exception stack traces
- Don't remote-fetch twice at startup (load + timer that immediately fired)
- Moved options bind and remote-fetch from constructor to Load, fix combination with placeholder
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 2, 2026

@bart-vmware bart-vmware marked this pull request as ready for review April 2, 2026 15:13
@bart-vmware bart-vmware requested a review from TimHess April 2, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant