-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix(macros-core): more consistent env loading and reading logic #4018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(macros-core): more consistent env loading and reading logic #4018
Conversation
a8a9e0c to
89858d7
Compare
abonander
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start, but needs a bit more work.
sqlx-macros-core/src/query/mod.rs
Outdated
| static LOADED_ENV_VARS: Mutex<HashMap<String, String, BuildHasherDefault<DefaultHasher>>> = | ||
| Mutex::new(HashMap::with_hasher(BuildHasherDefault::new())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't actually use a single global static here, because of Rust-Analzyer. RA will load the proc-macro dylib into its process and keep it resident for all compiler invocations, so we can't assume the state is reset between different crates.
This was the cause of #3738 and the solution was to store all relevant context per-crate keyed by CARGO_MANIFEST_DIR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same reason, we might actually want to store the modified time of the .env files we load and check for changes to that. If the file has been modified, we need to re-load it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about our approach to cache invalidation within this macro, and to be honest, I'm not sure I precisely understand what the goals and non-goals here are.
From what I can tell, our code for reacting to environment variable and .env file changes depends on unstable proc_macro features gated by procmacro2_semver_exempt. Since it can be assumed that most people are using a stable Rust toolchain, most people aren't getting any sort of change tracking in the macro either, so why the suggestion to handle .env files in a somewhat special way by manually checking their modification times? My understanding is that the intention is to rely on proc_macro::tracked_path for refreshing these, right?
Now, onto the elephant in the room: the static METADATA cache. While it is keyed by crate manifest path, cache entries are never invalidated, so the way I see it, that means the following sequence of events could happen even without this PR:
- The proc macro runs once for crate A with
procmacro2_semver_exemptenabled, populatingMETADATAand instructing the compiler to re-evaluate the macro when environment variables change. - The value of
DATABASE_URLis modified. - The proc macro runs again for crate A after the compiler/RA notices the change. However, since a
METADATAentry for crate A already exists, the proc macro library may not be unloaded between executions, and anyset_envacross threads is not necessarily thread-safe, the stale database URL from the cache may get used instead.
That may not be a huge issue in practice, since users always can restart RA, not use RA, or clear build caches if things go wrong. But it does raise the question of this PR's scope. Should we be trying to tackle cache invalidation here, or is that rabbit hole best left out of scope?
While giving the new `sqlx.toml` feature a try, I discovered that its `database_url_var` setting behaved inconsistently compared to other environment variables: it wasn't loaded from `.env` files, causing it to be effectively ignored in favor of the default `DATABASE_URL` variable. This is an inconvenient outcome for the multi-database workspaces that this feature is designed to support. To address this issue, I reworked the `.env` loading logic to account for this configuration and make it simpler to use to achieve consistent behavior. The new algorithm works as follows: - First, it generates lists of environment variables that can be loaded from `.env` files and candidate `.env` file paths. When applicable, the compiler is informed to track changes to their elements. - Next, it loads the set of potentially tracked environment variables using `set_var`, similar to how `dotenvy::dotenv()` operates. When a variable is defined in both the process environment and a `.env` file, the process environment takes precedence, as before. - Macro code can now use the `env` function freely to read environment variable values, abstracting itself away from their source, which results in simpler, less error-prone code. Trivially, this rework resolves the issue I encountered because the `database_url_var` value is now part of the list of loadable environment variables. Future code can easily make such additions as necessary.
2b5e2fd to
57c1dcf
Compare
|
I've finally addressed most of the review comments in the latest commits I pushed to this PR, which I also rebased on top of the latest As requested, the static map of environment variables loaded from To preserve the ergonomics of the existing See also PR #4039 for a different implementation take on this problem. |
Context, problem statement and description
While giving the new
sqlx.tomlfeature a try, I discovered that itsdatabase_url_varsetting behaved inconsistently compared to other environment variables: it wasn't loaded from.envfiles, causing it to be effectively ignored in favor of the defaultDATABASE_URLvariable. This is an inconvenient outcome for the multi-database workspaces that this feature is designed to support.To address this issue, I reworked the
.envloading logic to account for this configuration and make it simpler to use to achieve consistent behavior. The new algorithm works as follows:.envfiles and candidate.envfile paths. When applicable, the compiler is informed to track changes to their elements.set_envwas used, but I changed it as a response to the review comments below). When a variable is defined in both the process environment and a.envfile, the process environment takes precedence, as before.envfunction freely to read environment variable values, abstracting itself away from their source, which results in simpler, less error-prone code.Trivially, this rework resolves the issue I encountered because the
database_url_varvalue is now part of the list of loadable environment variables. Future code can easily make such additions as necessary.Does your PR solve an issue?
To my knowledge, this PR doesn't directly address any previously reported issue in this repository.
Is this a breaking change?
Technically yes when compared to the released
0.9.0alpha version, as environment variables likedatabase_url_varmay now be loaded from another source. However, I don't think this technically breaking change will cause significant inconvenience for most users, especially since the new behavior is more consistent and useful, and the affected variables are bound to not be widely used due to them being published as an alpha release so far.