Skip to content

Conversation

@ahal
Copy link
Collaborator

@ahal ahal commented Sep 11, 2025

No description provided.

@ahal ahal self-assigned this Sep 11, 2025
@ahal ahal force-pushed the run-task-git branch 3 times, most recently from e4b6b56 to bc5cc0f Compare September 17, 2025 15:58
@ahal ahal force-pushed the run-task-git branch 4 times, most recently from 7cba9a9 to d58fa3c Compare September 23, 2025 15:10
@ahal ahal force-pushed the run-task-git branch 2 times, most recently from edeb769 to 5387210 Compare October 3, 2025 15:27
@ahal ahal added the BREAKING CHANGE Backwards incompatible request that will require major version bump label Oct 3, 2025
@ahal ahal changed the title WIP: Improve git clone speeds in run-task WIP: Support shallow clones with Git Oct 3, 2025
@ahal ahal force-pushed the run-task-git branch 10 times, most recently from d609819 to 1848a5e Compare October 7, 2025 16:17
@ahal ahal changed the title WIP: Support shallow clones with Git Support shallow clones with Git Oct 7, 2025
@ahal ahal marked this pull request as ready for review October 7, 2025 16:22
@ahal ahal requested a review from a team as a code owner October 7, 2025 16:22
@ahal ahal requested a review from abhishekmadan30 October 7, 2025 16:22
@ahal ahal force-pushed the run-task-git branch 3 times, most recently from 50f6468 to 3a746a0 Compare October 7, 2025 17:54
@ahal ahal force-pushed the run-task-git branch 4 times, most recently from dde7615 to 3f72a12 Compare October 15, 2025 18:09
@ahal ahal requested review from a team and jcristau and removed request for abhishekmadan30 October 15, 2025 18:33
@ahal ahal marked this pull request as ready for review October 15, 2025 18:34

# If we have a shallow clone and specific commit, we need to fetch it too.
if shallow and head_rev and head_rev != head_ref:
git_fetch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ideally we'd call git fetch just once against the head repo, i.e. combine this with the head_ref fetch

if not targets or shallow:
# If head_ref wasn't provided, we fallback to head_rev. If we have a
# shallow clone, head_rev needs to be fetched independently regardless.
targets.append(head_rev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert somewhere that if shallow is True then we have a head_rev?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, good point. I guess a head_rev isn't necessary for shallow clones either though.. I'll fix this up.

ahal added 10 commits October 17, 2025 11:23
…head_rev

This makes the naming consistent with what we use in .taskcluster.yml
and the rest of Taskgraph. Previously, I always had to look up where
"ref" and "commit" / "revision" were coming from to double check they
were the values I was expecting. This rename makes that much more
obvious.
If the condition in the if statement is true, then we've already fetched
ref from head_repo. There's no need to do so again.
BREAKING CHANGE: `base_ref` will no longer be fetched or checked out by
run-task

Taskgraph uses base_rev anyway for computing files changed, so there's
no need to additionally fetch base_ref. Some tasks may need to be
updated to not rely on base_ref being present in the local clone.
BREAKING CHANGE: omitting `head_ref` no longer fetches all heads

Previously we were fetching all heads in this case so that we could then
run `git checkout <head_rev>` successfully. But it's much faster to
just explicitly fetch `<head_rev>` in the first place.

This also refactors `git_fetch` to be able to fetch multiple targets at
once.
This fixes the case where head_ref is passed in with a `refs/heads`
prefix.
Shallow clones yield a massive improvement to clone performance, at the
expense of making it tricky to determine the files that were modified.
`git log BASE..HEAD` says, show me commits reachable from HEAD, but not
reachable from BASE. In a shallow clone where we only fetch BASE and
HEAD (which is what run-task does), this means the command will only
return `HEAD`. In otherwords, we're only returning files changed by the
tip commit of the push and ignoring everything else.

By switching to `git diff BASE HEAD`, we're instead comparing the
snapshots of both revisions. Sometimes this is what we want, e.g for
force pushes, it'll be the interdiff of files modified between the two
pushes (though some developers might expect it to contain the files
modified since the merge base).

Sometimes it's not what we want, e.g for PRs, it'll be the files changed
between the PR and the latest commit on `main`.

Either way, this behaviour is at least somewhat more accurate than git
log when we don't have full history. Likely we'll need to fetch the
proper changed files using the Github API in the future, but for now
this is better than nothing.
@ahal ahal merged commit 3a80bed into taskcluster:main Oct 17, 2025
21 checks passed
@ahal ahal deleted the run-task-git branch October 17, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BREAKING CHANGE Backwards incompatible request that will require major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants