Skip to content

Cache hook state to skip redundant pyenv sh-activate calls#523

Open
jakelodwick wants to merge 2 commits intopyenv:masterfrom
jakelodwick:cache-prompt-hook
Open

Cache hook state to skip redundant pyenv sh-activate calls#523
jakelodwick wants to merge 2 commits intopyenv:masterfrom
jakelodwick:cache-prompt-hook

Conversation

@jakelodwick
Copy link

@jakelodwick jakelodwick commented Mar 14, 2026

Problem

_pyenv_virtualenv_hook calls pyenv sh-activate --quiet on every prompt, spawning ~10 subprocesses through the pyenv dispatcher regardless of whether anything has changed. On macOS this adds ~200ms of latency per keystroke+Enter (#259, #490, #338).

Approach

Make the hook detect "nothing changed" and short-circuit before forking, rather than skipping the check entirely (the approach that #456 took, which broke pyenv local).

The hook now caches five values using shell builtins (zero forks):

Cache key Detects
$PWD directory changes (cd)
$PYENV_VERSION pyenv shell changes
$PWD/.python-version content pyenv local changes
$PYENV_ROOT/version content pyenv global changes
$VIRTUAL_ENV manual venv activation/deactivation

When all five match their cached values, the hook returns immediately. When any value changes, the full pyenv sh-activate path runs and the cache is refreshed.

Limitation

.python-version files in parent directories are not tracked. pyenv walks up the directory tree to find .python-version; this cache only reads the file in $PWD. If a parent directory's file changes while the user is in a subdirectory, the next cd triggers a full recheck. Walking up directories from the hook would replicate pyenv's version resolution logic.

Measurement

Test plan

  • All 104 existing bats tests pass
  • Three exact-output tests updated (bash, fish, zsh)
  • Cache invalidation verified for all five state transitions

Acknowledgments

This approach builds on prior work:

Track five values between prompts ($PWD, $PYENV_VERSION,
.python-version content, $PYENV_ROOT/version content, $VIRTUAL_ENV)
using shell builtins. When all match, return immediately without
forking. Covers cd, pyenv shell/local/global, and manual
venv activate/deactivate.

Implemented for bash/zsh/ksh (shared POSIX path) and fish.
@jakelodwick jakelodwick marked this pull request as ready for review March 14, 2026 12:31
Copy link
Member

@native-api native-api left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for a start!

The "limitations" you've outlined are exactly why we haven't done something like this yet! Their complexity has intimidated all the prospective contributors before you! And a stale cache would be a bug -- so we'd have to revert any fundamentally flawed solution in order to fix it.

I think we actually can check for on-disk changes with absolute minimum overhead -- by caching mtimes of the active .pyenv-version file, and of any directories without a .pyenv-version file leading up to it. Then all the disk activity we'll have to do at a prompt is up to a few stat calls.

  • As you can see, this makes it unnecessary to actually read the files each time
  • Moreover, if the current version is set by a higher-priority mechanism (shell -> local -> global) -- we don't have to check for changes in the lower-priority mechanisms at all!
  • stat is not a builtin. It is possible to do with the test -nt builtin -- but then we'd have to create a marker file (which will be per-shell-session) and somehow maintain it. We can create our own builtin if process spawns are really as big of a deal as you make it look...

@native-api
Copy link
Member

P.S. since this is a highly requested change, you are eligible for a payout (non-taxable) from donated money upon completion if you're interested!

@jakelodwick
Copy link
Author

Thanks for the thorough review. The detail on mtime caching and the priority chain was exactly what I needed to get this right.

Revised commit addresses all three points:

  1. Replaced content reads with test -nt against a per-session marker file. The hook no longer reads version file contents at all.
  2. Full directory walk from $PWD to /, checking .python-version at each level. Directory mtimes catch file creation/deletion.
  3. Priority chain: $PYENV_VERSION set → skip all disk checks. Local .python-version found → skip global.

Cache variables simplified from five to three (_PYENV_VH_PWD, _PYENV_VH_VERSION, _PYENV_VH_VENV) plus a marker file ($PYENV_ROOT/.pyenv-vh-marker-$$, 0 bytes, updated with : > in bash/zsh or command touch in fish).

Shared sourced file in pyenv proper left for a follow-up, per your suggestion.

set -g _PYENV_VH_PWD "\$PWD"
set -g _PYENV_VH_VERSION "\$PYENV_VERSION"
set -g _PYENV_VH_VENV "\$VIRTUAL_ENV"
set -g _PYENV_VH_MARKER "\$PYENV_ROOT/.pyenv-vh-marker-\$fish_pid"
Copy link
Member

@native-api native-api Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to maintain these marker files? We have to delete them at some point, otherwise they'll keep accumulating. I don't have an idea of a reliable way atm, that's why I wrote "somehow".

Is calling stat really that bad? It accepts multiple arguments so a single call would be enough regardless of how many entries we have to check.
Then, it's easy to check for changes by simply comparing outputs:

LOCAL_VERSION_PATHS=<paths to .python-version and dirs if any; can be an array>
SAVED_MTIMES="$(stat -c %Y $LOCAL_VERSION_PATHS)"
<...>
if [[ "$(stat -c %Y $LOCAL_VERSION_PATHS)" != "$SAVED_MTIMES" ]]; then <the cache is stale>; fi

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminated the marker file entirely. Cache is now a single string: stat output of all paths, compared against a fresh stat call on each prompt. Format detected at init (-c %Y GNU, -f %m BSD).

Condition Cost
PYENV_VERSION set, env vars unchanged 0 forks
PWD/env vars/mtimes all match 1 fork (~2ms)
Cache miss same as upstream

Path list: every directory + .python-version from $PWD to /, plus $PYENV_ROOT/version. Built with builtins, single stat call with all paths. 104/104 tests pass.

@native-api
Copy link
Member

I'm currently busy with other RL stuff so I may be replying with a delay. Sorry about that.

@jakelodwick
Copy link
Author

I'm currently busy with other RL stuff so I may be replying with a delay. Sorry about that.

No problem, also busy with some RL work, should have response within 24h.

Copy link
Member

@native-api native-api left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Indeed, 2ms is not 40us, forks (or rather, spawns, we don't yet have figures on subshell overhead) do show to matter. But that's still way fast enough to be unnoticeable (and worlds apart from hundreds of ms as it is now). We can always change it back to marker files if we have a solution for them, or to custom builtins if such extreme performance here turns out to really be that critical.
  • I also looked through whether we actually need to refactor this into Pyenv, given that this PR seems to be dragging out and someone may lose interest if this happens for too long. Given how much damage this slowdown deals to our product, judging by user commentary, I would not want that.
    • The $PYENV_VERSION logic and pyenv local logic are unlikely to change in the foreseeable future
    • For pyenv global logic, file location or name might change -- but that's also highly unlikely
    • The curveball is pyenv version-name hooks. If they are present, they may make the above copied logic inconsistent with the subcommand.
      • AFAICS, in that case, we have to either abandon caching entirely, or delegate to pyenv version-name proper to see if the active version has changed for parts that we copy from it.
      • Hook entry points are likely to change unpredictably in the future
    • So, refactoring does seem necessary -- but not immediately and critically so.
      • So we can split the refactoring part into a separate PR. But the payout will then be split, too.

Comment on lines +171 to +174
_pvh_paths="\${_pvh_paths} \${_pvh_d} \${_pvh_d}/.python-version"
[ "\${_pvh_d}" = "/" ] && break
_pvh_d="\${_pvh_d%/*}"
[ -z "\${_pvh_d}" ] && _pvh_d="/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to construct _pvh_paths every time?
AFAICT it's enough to construct it once when saving the cache.

Besides, it's also sufficient to:

  • only continue up until the first .python-version found
    • but if there's a broken symlink named ".python-version" found on the way, it should be included, too: pyenv version will ignore it, but if the link's target is later created, it'll magically "appear" without the directory's mtime changing)
  • directories with a .python-version (including the broken symlinks) don't need to be checked: we check directories to catch that file being created later.

This will stat fewer entities every time and not unnecessarily reset the cache from entities that do not matter -- but constructing the list initially would take somewhat longer. Not sure if the economy is worth it.

Comment on lines +178 to +179
&& [ "\${PYENV_VERSION-}" = "\${_PYENV_VH_VERSION-}" ] \\
&& [ "\${VIRTUAL_ENV-}" = "\${_PYENV_VH_VENV-}" ] \\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already checked PYENV_VERSION and VIRTUAL_ENV earlier

if [ "\${PWD}" = "\${_PYENV_VH_PWD-}" ] \\
&& [ "\${PYENV_VERSION-}" = "\${_PYENV_VH_VERSION-}" ] \\
&& [ "\${VIRTUAL_ENV-}" = "\${_PYENV_VH_VENV-}" ] \\
&& [ "\$(stat ${_stat_fmt} \${_pvh_paths} 2>/dev/null)" = "\${_PYENV_VH_MTIMES-}" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should probably stat -L: if any of the dirs or .pyenv-version is a symlink, its mtime won't change if the target's contents change.
  • Alternatively, we can resolve links beforehand when composing the list -- but then we'd need to check both the link and the target 'cuz either can change. With -L, we assume that if a link changes, it's extremely unlikely that the new target will have exactly the same mtime as the old one.
  1. Now that we unpack _pvh_paths into an arglist, it shows that it has to be an array -- or paths with spaces will break the logic (e.g. I've seen spaces in home directory names in user reports here). Kinda sad 'cuz arrays can't be exported for pyenv version to potentially use (barring serialization) -- but that's not a present concern.

_pvh_d="\${_pvh_d%/*}"
[ -z "\${_pvh_d}" ] && _pvh_d="/"
done
_pvh_paths="\${_pvh_paths} \${PYENV_ROOT}/version"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the above, the global file only needs checking if no local version is active. Same possible caveats apply.

else
_stat_fmt="-f %m"
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite an elegant solution! The comment is very on point, too! (conveys information required for the task that is not in the code itself)

@@ -9,6 +9,13 @@
set -e
[ -n "$PYENV_DEBUG" ] && set -x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As my analysis in the summary shows, we'll have to allow for pyenv version-name hooks in the cache check.
I reckon it's reasonable to check for hooks presence only once during init and thus require the user to restart their sessions if they make drastic changes to their Pyenv installation like installing a new plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants