Skip to content

Add skip on git clone when local version matches bundle version#63814

Open
ronaldorcampos wants to merge 4 commits intoapache:mainfrom
ronaldorcampos:feature/git-check
Open

Add skip on git clone when local version matches bundle version#63814
ronaldorcampos wants to merge 4 commits intoapache:mainfrom
ronaldorcampos:feature/git-check

Conversation

@ronaldorcampos
Copy link
Copy Markdown
Contributor

This PR adds a simple check if the local git folder matches the version expected and is already initialized, to avoid re-cloning on every task run.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@kaxil kaxil requested a review from Copilot April 2, 2026 00:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to avoid unnecessary git clone/fetch work by detecting when a local, already-initialized repo is already at the requested version and short-circuiting initialization.

Changes:

  • Add _local_repo_has_version() and early-return from _initialize() when the local repo’s HEAD matches the requested version.
  • Add unit tests asserting clone steps are skipped when the expected version is already present, and that initialization still works when versions differ.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
providers/git/src/airflow/providers/git/bundles/git.py Adds local-repo version check and short-circuit in initialization to reduce repeated cloning/fetching work.
providers/git/tests/unit/git/bundles/test_git.py Adds tests around skipping clone when local repo already matches the requested version.

Comment thread providers/git/src/airflow/providers/git/bundles/git.py Outdated
Comment thread providers/git/src/airflow/providers/git/bundles/git.py
Comment thread providers/git/src/airflow/providers/git/bundles/git.py Outdated
Comment thread providers/git/tests/unit/git/bundles/test_git.py Outdated
Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable for me except that (1) I have no detailled knowledge about the Bundle loading... and (2) asking myself what if the git tree is "dirty" from a previous task execution? Will tree be cleaned in between?

Therefore approving from my side but would request another maintainer pair of eyes prior merge.

@potiuk potiuk requested a review from ephraimbuddy April 6, 2026 22:30
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Apr 6, 2026
@kaxil kaxil requested a review from Copilot April 10, 2026 19:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

"Using existing local repo with correct version",
repo_path=self.repo_path,
version=self.version,
)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When _local_repo_has_version() is true, _initialize() returns early without setting self.repo. For versioned bundles with prune_dotgit_folder=False, the normal path sets self.repo = Repo(self.repo_path) and get_current_version() then returns HEAD’s full SHA; with this early-return path get_current_version() will instead return the raw self.version string (e.g. a tag/short SHA), which is an observable behavior change compared to a first-time initialize. Consider opening/assigning self.repo (and ensuring it’s closed) before returning, or adjusting get_current_version() to resolve HEAD when .git exists even if self.repo isn’t set; adding a regression test for the tag/short-SHA case would prevent this from reappearing.

Suggested change
)
)
if not self.prune_dotgit_folder and self.repo is None:
try:
self.repo = Repo(self.repo_path)
except InvalidGitRepositoryError as e:
raise RuntimeError(f"Invalid git repository at {self.repo_path}") from e
if self.repo is not None:
self.repo.close()

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:git ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants