Skip to content

fix: Scrub GIT_* env vars in hooks that shell to git clone#993

Open
yuriipolishchuk wants to merge 3 commits into
antonbabenko:masterfrom
yuriipolishchuk:fix/scrub-git-env
Open

fix: Scrub GIT_* env vars in hooks that shell to git clone#993
yuriipolishchuk wants to merge 3 commits into
antonbabenko:masterfrom
yuriipolishchuk:fix/scrub-git-env

Conversation

@yuriipolishchuk

Copy link
Copy Markdown

Description

Fixes #992.

Hooks that invoke tofu init / terraform init (terraform_validate, terraform_tflint, terraform_docs) end up running git clone <module> under the hood for each registry/git-source module. When the operator's git commit runs from a linked git worktree, the parent git sets GIT_INDEX_FILE (pointing at .git/worktrees/<name>/index) in the hook subprocess env. The child git clone inherits this and writes the cloned module's blob OIDs into the parent worktree's index. The next git commit then fails at tree-build:

error: invalid object 100644 <oid> for '<path>'
error: Error building trees

Other GIT_* vars (GIT_DIR, GIT_WORK_TREE, GIT_OBJECT_DIRECTORY) are overridden by git clone's own target-dir setup; only GIT_INDEX_FILE slips through. Scrubbing all four matches the four documented offenders in pre-commit framework's own no_git_env helper.

Why hook-side scrub (not framework-side)

pre-commit framework deliberately does NOT scrub GIT_* from user hook subprocesses (only its own internal git calls) per the maintainer's stance in pre-commit/pre-commit#1849:

they need the same code as in our [no_git_env] helper if they are dealing with doing git writes

asottile reaffirmed this in pre-commit/pre-commit#3492 — framework-level scrubbing breaks valid use cases (e.g., gitleaks-docker wants GIT_INDEX_FILE passed through). So per-hook-author opt-in is the design intent.

This patch applies that recommendation in pre-commit-terraform's hooks that perform git writes via tofu init / terraform init.

A/B reproduction (same worktree, same .tf edit, only repo: URL changes)

Same .pre-commit-config.yaml shape, same staged .tf change, same git commit invocation. Switch repo: between antonbabenko/pre-commit-terraform@v1.106.0 and yuriipolishchuk/pre-commit-terraform@fix/scrub-git-env:

Run Config Result
Unpatched rev: v1.106.0 ❌ Commit fails after 27.4s with Error building trees. 5 phantom blobs in worktree index. git fsck reports invalid sha1 in cache-tree.
Patched rev: fix/scrub-git-env ✅ Commit succeeds in 4.7s. 0 phantom blobs. git fsck clean. Terraform validate: Passed.

Full reproduction script:

# Setup
mkdir /tmp/repro && cd /tmp/repro
git init && git commit --allow-empty -m init
mkdir stack
cat > stack/main.tf <<'TF'
module "test" {
  source  = "terraform-aws-modules/iam/aws"
  version = "5.x"
}
TF
cat > .pre-commit-config.yaml <<'YAML'
repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.106.0
    hooks:
      - id: terraform_validate
        args:
          - --hook-config=--retry-once-with-cleanup=true
          - --tf-init-args=-backend=false
YAML
git add -A && git commit -m baseline

# Linked worktree is the trigger
git worktree add ../wt && cd ../wt
pre-commit install
echo "" >> stack/main.tf
git add stack/main.tf
git commit -m test           # → Error building trees on v1.106.0
                             # → Passes when rev is fix/scrub-git-env

git fsck --no-progress       # invalid sha1 pointer pre-fix; clean post-fix

Changes

File Change
hooks/_common.sh New common::scrub_git_env helper. Bare 4-var unset (the four offenders documented in pre-commit's no_git_env). Detailed docstring covering bug class, mechanism, and references.
hooks/terraform_validate.sh One-line common::scrub_git_env call in main() after env setup, before common::per_dir_hook.
hooks/terraform_tflint.sh Same. Defensive — tflint --init may shell to git for plugin sources.
hooks/terraform_docs.sh Same. terraform-docs may invoke tofu init to resolve module schemas.

Net diff: +35 / -0 across 4 files. Zero behavior change for non-worktree users (their GIT_INDEX_FILE is empty for these hooks → unset is a no-op).

Test plan

  • Reproduce bug on master (v1.106.0) from a worktree with a module-using stack → commit fails with Error building trees, index has phantom blobs
  • Apply this patch via fork rev → same scenario passes, commit succeeds, index clean
  • Confirm non-worktree (main checkout) behavior unchanged — unset is a no-op when env var absent
  • Confirm SSH-source modules still authenticate — GIT_SSH, GIT_ASKPASS, GIT_SSL_* etc. are NOT touched (only the four documented dangerous vars)
  • shellcheck clean (no new warnings)

Other notes

  • I considered the allowlist mirror of no_git_env (~15 lines case block in bash) but went with the minimal denylist (4 vars) since only GIT_INDEX_FILE actually causes the observed failure; the others are documented defensive offenders. Happy to expand to allowlist if you'd prefer a stricter mirror.
  • The hook subprocess inherits GIT_* from the parent git commit; this means the unset only takes effect for the duration of the hook (and its child processes). The parent git commit env is unaffected.
  • Hooks that don't shell to git (terraform_fmt, terraform_trivy, terraform_checkov, etc.) are not touched — they don't have this failure mode.

References

Hooks that invoke `tofu init` / `terraform init` (terraform_validate,
terraform_tflint, terraform_docs) end up running `git clone <module>`
under the hood for each registry/git-source module. When the
operator's `git commit` runs from a linked worktree, the parent git
sets GIT_INDEX_FILE (pointing at `.git/worktrees/<name>/index`) in the
hook subprocess env. The child `git clone` inherits this and writes the
cloned module's blob OIDs into the parent worktree's index. The next
`git commit` then fails at tree-build:

    error: invalid object 100644 <oid> for '<path>'
    error: Error building trees

Reproduction:

    git init && git commit --allow-empty -m init
    mkdir stack
    cat > stack/main.tf <<'TF'
    module "test" {
      source  = "terraform-aws-modules/iam/aws"
      version = "5.x"
    }
    TF
    cat > .pre-commit-config.yaml <<'YAML'
    repos:
      - repo: https://github.com/antonbabenko/pre-commit-terraform
        rev: v1.106.0
        hooks:
          - id: terraform_validate
            args:
              - --hook-config=--retry-once-with-cleanup=true
              - --tf-init-args=-backend=false
    YAML
    git add -A && git commit -m baseline
    git worktree add ../wt && cd ../wt
    pre-commit install
    echo "" >> stack/main.tf
    git add stack/main.tf
    git commit -m test       # fails with 'Error building trees'
    git fsck --no-progress   # invalid sha1 pointer in cache-tree

Other GIT_* vars (GIT_DIR, GIT_WORK_TREE, GIT_OBJECT_DIRECTORY) are
overridden by `git clone`'s own target-dir setup; only GIT_INDEX_FILE
slips through. Scrubbing all four matches the four documented offenders
in pre-commit framework's own `no_git_env` helper
(`pre_commit/git.py:20-38`), which explicitly states:

    # GIT_DIR: Causes git clone to clone wrong thing
    # GIT_INDEX_FILE: Causes 'error invalid object ...' during commit

pre-commit framework deliberately does NOT scrub GIT_* from user hook
subprocesses (only its own internal git calls) per the maintainer in
pre-commit/pre-commit#1849: "they need the same code as in our
no_git_env helper if they are dealing with doing git writes". This
patch applies that recommendation in pre-commit-terraform's hooks
that perform git writes via `tofu init` / `terraform init`.

Verified by reproducing the failure on v1.106.0 from a worktree with
a module-using stack, then confirming the fix removes both the index
pollution and the tree-build error. SSH/TLS/config-related GIT_* env
vars are preserved (only the four documented "dangerous" ones are
unset) so SSH-source modules authenticate normally.

Net change: one new helper `common::scrub_git_env` in _common.sh,
called once from `main()` of each affected hook. Zero behavior change
for non-worktree users — their GIT_INDEX_FILE is empty for these hooks
so the unset is a no-op.

Refs: antonbabenko#992
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cb9e681b-f13a-4c46-9a3a-f2f52cda3b52

📥 Commits

Reviewing files that changed from the base of the PR and between 211d9cc and 63ab2bc.

📒 Files selected for processing (1)
  • hooks/_common.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • hooks/_common.sh

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Fixed an issue where Git environment variables inherited during hook execution could interfere with Terraform and OpenTofu initialization processes.

Walkthrough

Adds a common::scrub_git_env helper function to unset inherited Git environment variables that can cause child git clone processes spawned by tofu init or terraform init to corrupt the parent worktree's index. Integrates the helper into terraform_validate to scrub these variables before hook execution.

Changes

Git environment scrubbing across hooks

Layer / File(s) Summary
Git environment scrubbing helper
hooks/_common.sh
Documentation block explains the failure mode where child git clone operations inherit GIT_INDEX_FILE, GIT_DIR, GIT_WORK_TREE, and GIT_OBJECT_DIRECTORY from the parent hook environment, causing blob writes to the parent worktree's index. Implementation of common::scrub_git_env unsets these four problematic variables.
Hook integration of environment scrubbing
hooks/terraform_validate.sh
terraform_validate calls common::scrub_git_env in main after parsing environment variables and before running the per-directory hook logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

bug

Suggested reviewers

  • antonbabenko
  • yermulnik
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding GIT_* environment variable scrubbing to hooks that invoke git clone.
Description check ✅ Passed The description thoroughly explains the bug, root cause, fix rationale, reproduction steps, and test plan, all directly related to the changeset.
Linked Issues check ✅ Passed The changes implement the proposed fix from #992 by adding common::scrub_git_env to unset four GIT_* variables in terraform_validate.sh, preventing worktree index corruption.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing #992: adding the scrub helper and calling it from affected hooks; no unrelated modifications present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hooks/_common.sh`:
- Around line 119-149: The comment above function common::scrub_git_env is
misleading because the function only unsets a targeted subset of GIT_* vars
(GIT_INDEX_FILE, GIT_DIR, GIT_WORK_TREE, GIT_OBJECT_DIRECTORY) rather than
reproducing pre-commit's no_git_env behavior; update the docblock to explicitly
state that this function intentionally clears only those specific offending
variables (and why each is included) and clarify that pre-commit's no_git_env
removes all GIT_* except a specific allowlist (e.g., GIT_CONFIG_KEY_*,
GIT_CONFIG_VALUE_*, GIT_EXEC_PATH, GIT_SSH, GIT_SSH_COMMAND, GIT_SSL_CAINFO,
GIT_SSL_NO_VERIFY, GIT_CONFIG_COUNT, GIT_HTTP_PROXY_AUTHMETHOD,
GIT_ALLOW_PROTOCOL, GIT_ASKPASS), so the comment does not claim parity with
no_git_env and instead documents the targeted subset behavior of
common::scrub_git_env.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee9f744b-e72b-42de-9342-5072872f64a8

📥 Commits

Reviewing files that changed from the base of the PR and between d61ded2 and a96a283.

📒 Files selected for processing (4)
  • hooks/_common.sh
  • hooks/terraform_docs.sh
  • hooks/terraform_tflint.sh
  • hooks/terraform_validate.sh

Comment thread hooks/_common.sh
Address CodeRabbit feedback on PR antonbabenko#993: the previous docstring read as
though common::scrub_git_env mirrored pre-commit's no_git_env helper.
It does not.

no_git_env is allowlist-based: it scrubs all GIT_* except a whitelist
of ~11 known-safe vars (GIT_EXEC_PATH, GIT_SSH, GIT_SSH_COMMAND,
GIT_ASKPASS, GIT_SSL_*, GIT_CONFIG_KEY_*, GIT_CONFIG_VALUE_*,
GIT_CONFIG_COUNT, GIT_HTTP_PROXY_AUTHMETHOD, GIT_ALLOW_PROTOCOL) and
runs only on pre-commit's internal git calls.

common::scrub_git_env is denylist-based: it unsets only four specific
vars that pre_commit/git.py:20-38 documents as dangerous when leaked
into child git (GIT_INDEX_FILE, GIT_DIR, GIT_WORK_TREE,
GIT_OBJECT_DIRECTORY), and runs in user hook subprocesses.

New docstring spells this out, explains why each of the four is
included, and notes why we chose a denylist over an allowlist mirror
(bash portability, observed bug fix scope, lower blast radius for
hook authors with custom registry/proxy env vars).
@yuriipolishchuk yuriipolishchuk changed the title fix: scrub GIT_* env vars in hooks that shell to git clone fix: Scrub GIT_* env vars in hooks that shell to git clone Jun 10, 2026
Comment thread hooks/_common.sh Outdated
Comment thread hooks/_common.sh Outdated
Comment thread hooks/_common.sh Outdated
Comment thread hooks/_common.sh Outdated
# rely on inheritance of vars an aggressive allowlist might drop.
#######################################################################
function common::scrub_git_env {
unset GIT_INDEX_FILE GIT_DIR GIT_WORK_TREE GIT_OBJECT_DIRECTORY

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please put vars in alphabetical order.
  • Might it worth of handling unset failure? E.g. if by any reason the var was declared readonly outside of the hook? 🤔
  • I'd also suggest to unset Git vars only when the hook is run from inside the Git worktree so that we don't mess up non-worktree envs.
    • Here's how I identify this locally:
    local _git_repo_git_file="$(git rev-parse --show-cdup).git"
    [[ -f $_git_repo_git_file ]] && $(grep -qE ^gitdir: "$_git_repo_git_file") && local _is_git_worktree="true"
  • Would we want to let user know what's happening as it's sort of intrusion to user-controlled area? 🤔 Maybe not unconditionally, but e.g. when PCT_LOG is set to e.g. debug@MaxymVlasov WDYT
  • Would it make sense to apply common::scrub_git_env() to all hooks e.g. by making it part of common::initialize() (or being called from common::initialize())? 🤔 This way we would not need to care of, for example, adding it to any new hooks (if any). @MaxymVlasov WDYT?

Comment thread hooks/terraform_docs.sh Outdated
Comment thread hooks/terraform_tflint.sh Outdated
common::parse_cmdline "$@"
common::export_provided_env_vars "${ENV_VARS[@]}"
common::parse_and_export_env_vars
common::scrub_git_env

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't reproduce the worktree bug in this hook.

Suggested change
common::scrub_git_env

Comment thread hooks/terraform_docs.sh Outdated
common::parse_cmdline "$@"
common::export_provided_env_vars "${ENV_VARS[@]}"
common::parse_and_export_env_vars
common::scrub_git_env

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't reproduce the worktree bug in this hook.

Suggested change
common::scrub_git_env

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There another thing that shows, both for terraform_docs and terraform_fmt hooks.

But it's not an issue at all.

18:57 repro-hoo git:(repro-hoo +) 
➜ gc       
Terraform docs...........................................................Passed
Terraform validate with tflint...........................................Passed
Terraform fmt............................................................Failed
- hook id: terraform_fmt
- files were modified by this hook

main.tf


18:57 repro-hoo git:(repro-hoo !+) 
✘1 ➜ gc
Terraform docs...........................................................Passed
Terraform validate with tflint...........................................Passed
Terraform fmt............................................................Passed
Terraform docs.......................................(no files to check)Skipped
Terraform validate with tflint.......................(no files to check)Skipped
Terraform fmt........................................(no files to check)Skipped
[repro-hoo 44cd287] fmt
 2 files changed, 2 insertions(+), 6 deletions(-)

18:58 repro-hoo git:(repro-hoo) took 9s 
➜ git fsck --no-progress | head
dangling tree bba236455e65937967031c5d25b02e11f5c252f5

18:58 repro-hoo git:(repro-hoo) 
➜ git fsck --no-progress | head
dangling tree bba236455e65937967031c5d25b02e11f5c252f5

18:58 repro-hoo git:(repro-hoo) 
➜ gc                           
Terraform docs...........................................................Failed
- hook id: terraform_docs
- files were modified by this hook
Terraform validate with tflint...........................................Passed
Terraform fmt............................................................Passed

18:58 repro-hoo git:(repro-hoo !+) 
✘1 ➜ gc
Terraform docs...........................................................Passed
Terraform validate with tflint...........................................Passed
Terraform fmt............................................................Passed
Terraform docs.......................................(no files to check)Skipped
Terraform validate with tflint.......................(no files to check)Skipped
Terraform fmt........................................(no files to check)Skipped
[repro-hoo c115ce4] docs
 2 files changed, 2 insertions(+), 2 deletions(-)

18:58 repro-hoo git:(repro-hoo) took 6s 
➜ git fsck --no-progress | head
dangling tree a13f3782d35b8ce91a9d45f5b57acd390ea671a2
dangling tree bba236455e65937967031c5d25b02e11f5c252f5

Per review on PR antonbabenko#993:

- MaxymVlasov could not reproduce the worktree bug in terraform_tflint
  or terraform_docs (terraform-docs does not run init), so the
  common::scrub_git_env calls are removed from both. The confirmed
  reproduction path is terraform_validate -> tofu/terraform init ->
  git clone, which keeps its call.
- Docstring reworded per yermulnik: drop 'dangerous' (the vars are
  incompatible with pre-commit's hook env, not dangerous), drop the
  var count, capitalize Git as a proper name, use full clickable URLs
  instead of repo-relative paths and issue shorthand, drop line-range
  references that rot as upstream moves.
- GIT_OBJECT_DIRECTORY provenance corrected: it is not named in
  pre-commit's no_git_env docstring; it comes from Git's own
  repository-location environment variables (man 1 git). The docstring
  now cites that source for all four vars.
- unset arguments alphabetized.
@yuriipolishchuk

Copy link
Copy Markdown
Author

Thanks both for the thorough review - pushed 63ab2bc addressing it.

Scope (@MaxymVlasov)

Accepted both suggestions: common::scrub_git_env calls removed from terraform_tflint.sh and terraform_docs.sh. The helper is now called only from terraform_validate.sh - the confirmed reproduction path (validate -> tofu/terraform init -> git clone). I also re-verified on my side that the validate-only scrub fully fixes the original worktree failure (fresh module clone during commit, index stays clean, git fsck clean, follow-up commits healthy).

The dangling trees you observed with terraform_docs/terraform_fmt make sense as a separate, harmless artifact: pre-commit's own stash/restore cycle writes tree objects that end up unreferenced when the first commit attempt is rejected with "files were modified". No index corruption involved.

Wording / references (@yermulnik)

All applied in 63ab2bc:

  • Function summary now uses your suggested line verbatim.
  • "dangerous" dropped; reworded around the parent-repository-location leak.
  • Var count removed from the docstring (won't go stale if the list changes).
  • Full clickable URLs instead of repo-relative paths and issue shorthand; line-range references dropped.
  • "Git" capitalized as a proper name (kept lowercase only in literal commands).
  • unset arguments alphabetized.
  • GIT_OBJECT_DIRECTORY provenance corrected - you're right that pre-commit's no_git_env docstring names only three vars. It comes from Git's own repository-location environment variables (https://git-scm.com/docs/git#_environment_variables); the docstring now cites that source for all four. Happy to drop it down to the three if you'd rather keep the list minimal.

Open design questions - deferring to you two

  1. Scrub in common::initialize for all hooks vs per-hook opt-in - I see the appeal of central placement (no per-hook bookkeeping), but it would also apply the unset to hooks that demonstrably don't need it, and Maxym's reproduction results argue for the narrow version. Either shape is a small diff; I'll implement whichever you align on.
  2. Worktree-detection gating - the unset is a no-op when the vars aren't set (non-worktree git commit doesn't export them into hooks in a way that breaks anything we've seen), so the gate would add code without changing behavior. But if you prefer the explicit guard for the "don't touch user env unless necessary" principle, I can add your rev-parse --show-cdup check.
  3. PCT_LOG=debug notice when scrubbing - happy to add if you want it; suggest doing it together with whatever logging convention you settle on for (2).
  4. readonly unset failure - a var marked readonly outside the hook would make unset error under set -e. I can add 2> /dev/null || true, though a readonly GIT_INDEX_FILE in a pre-commit env would already be broken in stranger ways. Your call.
  5. terraform_providers_lock - it runs init-adjacent provider fetches but not module git clone to my knowledge; I didn't have a failing reproduction there, so I left it out per the same evidence standard Maxym applied to tflint/docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

terraform_validate corrupts worktree index when tofu init clones modules (GIT_INDEX_FILE leakage)

3 participants