Add skip on git clone when local version matches bundle version#63814
Add skip on git clone when local version matches bundle version#63814ronaldorcampos wants to merge 4 commits intoapache:mainfrom
Conversation
b165f72 to
e1e502c
Compare
81520c9 to
56f8d7a
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to avoid unnecessary git clone/fetch work by detecting when a local, already-initialized repo is already at the requested version and short-circuiting initialization.
Changes:
- Add
_local_repo_has_version()and early-return from_initialize()when the local repo’s HEAD matches the requested version. - Add unit tests asserting clone steps are skipped when the expected version is already present, and that initialization still works when versions differ.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| providers/git/src/airflow/providers/git/bundles/git.py | Adds local-repo version check and short-circuit in initialization to reduce repeated cloning/fetching work. |
| providers/git/tests/unit/git/bundles/test_git.py | Adds tests around skipping clone when local repo already matches the requested version. |
There was a problem hiding this comment.
Looks reasonable for me except that (1) I have no detailled knowledge about the Bundle loading... and (2) asking myself what if the git tree is "dirty" from a previous task execution? Will tree be cleaned in between?
Therefore approving from my side but would request another maintainer pair of eyes prior merge.
| "Using existing local repo with correct version", | ||
| repo_path=self.repo_path, | ||
| version=self.version, | ||
| ) |
There was a problem hiding this comment.
When _local_repo_has_version() is true, _initialize() returns early without setting self.repo. For versioned bundles with prune_dotgit_folder=False, the normal path sets self.repo = Repo(self.repo_path) and get_current_version() then returns HEAD’s full SHA; with this early-return path get_current_version() will instead return the raw self.version string (e.g. a tag/short SHA), which is an observable behavior change compared to a first-time initialize. Consider opening/assigning self.repo (and ensuring it’s closed) before returning, or adjusting get_current_version() to resolve HEAD when .git exists even if self.repo isn’t set; adding a regression test for the tag/short-SHA case would prevent this from reappearing.
| ) | |
| ) | |
| if not self.prune_dotgit_folder and self.repo is None: | |
| try: | |
| self.repo = Repo(self.repo_path) | |
| except InvalidGitRepositoryError as e: | |
| raise RuntimeError(f"Invalid git repository at {self.repo_path}") from e | |
| if self.repo is not None: | |
| self.repo.close() |
This PR adds a simple check if the local git folder matches the version expected and is already initialized, to avoid re-cloning on every task run.
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.