Skip to content

fix: clean PVC source workspace before each on-cluster build#3583

Draft
Ankitsinghsisodya wants to merge 5 commits intoknative:mainfrom
Ankitsinghsisodya:fix/pvc-workspace-cleanup-between-builds
Draft

fix: clean PVC source workspace before each on-cluster build#3583
Ankitsinghsisodya wants to merge 5 commits intoknative:mainfrom
Ankitsinghsisodya:fix/pvc-workspace-cleanup-between-builds

Conversation

@Ankitsinghsisodya
Copy link
Copy Markdown
Contributor

@Ankitsinghsisodya Ankitsinghsisodya commented Apr 4, 2026

Root cause

source/ and cache/ share a single 256 Mi PVC. The buildpack layer cache (cache/) grows across builds while source/ is already cleaned by git-clone's deleteExisting default. Once cache/ consumes all available space, git-clone fails with "No space left on device" during git-init — affecting both the standard pipeline and PAC paths.

Fix

Separate source and cache into two distinct PVCs so they can no longer starve each other:

Path Source workspace Cache workspace
Standard pipeline (non-Git upload) Named source PVC — deleted and recreated before every build Separate named cache PVC — persistent across builds
Standard pipeline (Git URL) Same — fresh PVC every run Same persistent cache PVC
PAC pipeline volumeClaimTemplate — Tekton provisions a fresh ephemeral PVC per run and auto-deletes it Separate named cache PVC — persistent across builds

Buildpack layer cache is preserved. Every build starts from a clean source workspace. The two volumes can never fill each other's disk.

Changes

  • pkg/k8s/persistent_volumes.go — add DeletePersistentVolumeClaim (by name) and CleanAndUploadToVolume
  • pkg/pipelines/tekton/resources.go — add getPipelineCachePvcName
  • pkg/pipelines/tekton/pipelines_provider.go — delete+recreate source PVC before each standard build; create cache PVC separately
  • pkg/pipelines/tekton/pipelines_pac_provider.go — create cache PVC at repo-config time
  • templates.go — add CachePvcName, PvcSize fields; add pipelinePvcSize helper
  • templates_pack.go / templates_s2i.go — standard runs reference separate PVCs; PAC runs use volumeClaimTemplate for source
  • task-buildpack.yaml.tmpl / task-s2i.yaml.tmpl — revert redundant deleteExisting: "true" (already the default in the pinned StepAction)

Fixes #3516

Test plan

  • Run func deploy (non-Git) 10+ times; confirm source PVC is recreated fresh each time and PVC never fills
  • Run func deploy (Git URL) 10+ times; confirm git-clone always succeeds and cache PVC grows independently
  • Trigger 10+ PAC builds via git push; confirm each run gets an ephemeral source PVC that disappears after completion
  • Confirm func delete removes both source and cache PVCs
  • go test ./pkg/k8s/... ./pkg/pipelines/tekton/...

The named PVC reused across builds was accumulating stale source files,
eventually causing 'No space left on device' errors during git-init.

- Add CleanAndUploadToVolume which removes the source/ directory before
  re-extracting the tar stream, leaving the cache/ subpath intact
- Switch the direct-upload path to use CleanAndUploadToVolume
- Pass deleteExisting: "true" to the git-clone StepAction in both
  buildpack and s2i task templates so the git path also gets a clean slate

Fixes knative#3516
Copilot AI review requested due to automatic review settings April 4, 2026 17:56
@knative-prow
Copy link
Copy Markdown

knative-prow bot commented Apr 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Ankitsinghsisodya
Once this PR has been reviewed and has the lgtm label, please assign gauron99 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot requested review from dsimansk and jrangelramos April 4, 2026 17:56
@knative-prow knative-prow bot added the size/S 🤖 PR changes 10-29 lines, ignoring generated files. label Apr 4, 2026
@knative-prow
Copy link
Copy Markdown

knative-prow bot commented Apr 4, 2026

Hi @Ankitsinghsisodya. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knative-prow knative-prow bot added the needs-ok-to-test 🤖 Needs an org member to approve testing label Apr 4, 2026
@Ankitsinghsisodya Ankitsinghsisodya marked this pull request as draft April 4, 2026 17:57
@knative-prow knative-prow bot added the do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. label Apr 4, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses PVC workspace bloat in on-cluster Tekton builds by ensuring each build starts from a clean source/ workspace while preserving the cache/ path for build caching.

Changes:

  • Added k8s.CleanAndUploadToVolume to remove the PVC’s source/ directory before extracting a new source tar stream (leaving cache/ intact).
  • Switched the non-Git (direct upload) pipeline path to use CleanAndUploadToVolume.
  • Configured Tekton git-clone StepAction to wipe the existing checkout via deleteExisting: "true" for Git-based builds.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
pkg/k8s/persistent_volumes.go Adds a new clean+upload helper that removes source/ before untarring content.
pkg/pipelines/tekton/pipelines_provider.go Updates direct-upload pipeline flow to use the clean+upload helper.
pkg/pipelines/tekton/task-buildpack.yaml.tmpl Sets deleteExisting: "true" for git-clone to avoid accumulating workspace content.
pkg/pipelines/tekton/task-s2i.yaml.tmpl Sets deleteExisting: "true" for git-clone to avoid accumulating workspace content.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +82 to +88
// CleanAndUploadToVolume removes the "source" directory from the volume root,
// then extracts the provided tar stream into the volume. The "cache"
// subdirectory is intentionally left intact so that build-layer caches
// accumulated by previous runs are preserved.
func CleanAndUploadToVolume(ctx context.Context, content io.Reader, claimName, namespace string) error {
return runWithVolumeMounted(ctx, TarImage, []string{"sh", "-c", "umask 0000 && rm -rf source && exec tar -xmf -"}, content, claimName, namespace)
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanAndUploadToVolume introduces new cleanup behavior (deleting the existing source/ directory while preserving cache/), but there’s no test covering that contract. Consider adding an integration test that (1) pre-populates both source/ and cache/, (2) calls CleanAndUploadToVolume, and (3) asserts source/ is replaced and cache/ remains intact, to prevent regressions that could reintroduce PVC bloat or accidentally wipe caches.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 4, 2026

Codecov Report

❌ Patch coverage is 27.27273% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.11%. Comparing base (3e493ac) to head (86cd2c5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/k8s/persistent_volumes.go 0.00% 29 Missing ⚠️
pkg/pipelines/tekton/pipelines_provider.go 11.11% 8 Missing ⚠️
pkg/pipelines/tekton/templates.go 82.35% 2 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (3e493ac) and HEAD (86cd2c5). Click for more details.

HEAD has 9 uploads less than BASE
Flag BASE (3e493ac) HEAD (86cd2c5)
e2e 3 2
e2e springboot 1 0
e2e node 1 0
e2e typescript 1 0
e2e go 1 0
e2e rust 1 0
e2e python 1 0
integration 1 0
e2e quarkus 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3583      +/-   ##
==========================================
- Coverage   56.24%   47.11%   -9.14%     
==========================================
  Files         180      180              
  Lines       20465    20501      +36     
==========================================
- Hits        11511     9659    -1852     
- Misses       7755     9951    +2196     
+ Partials     1199      891     -308     
Flag Coverage Δ
e2e 22.67% <0.00%> (-13.43%) ⬇️
e2e go ?
e2e node ?
e2e python ?
e2e quarkus ?
e2e rust ?
e2e springboot ?
e2e typescript ?
e2e-config-ci 18.10% <0.00%> (-0.04%) ⬇️
integration ?
unit macos-14 43.31% <31.91%> (-0.06%) ⬇️
unit macos-latest 43.31% <31.91%> (-0.06%) ⬇️
unit ubuntu-24.04-arm 43.51% <27.27%> (-0.07%) ⬇️
unit ubuntu-latest 44.19% <31.91%> (-0.06%) ⬇️
unit windows-latest 43.33% <31.91%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…full

Root cause: source/ and cache/ shared a single 256Mi PVC. The buildpack
layer cache (cache/) grows across builds while source/ is already cleaned
by git-clone's deleteExisting default. Once cache/ consumes all available
space, git-clone fails with "No space left on device" during git-init.

Changes:
- Add getPipelineCachePvcName and a by-name DeletePersistentVolumeClaim
- Standard pipeline: delete and recreate the source PVC before every build
  for a guaranteed clean workspace; create a separate persistent cache PVC
  so the buildpack layer cache survives across builds
- PAC pipeline: use volumeClaimTemplate for the source workspace so Tekton
  provisions a fresh, ephemeral source PVC per run and cleans it up
  automatically; create a persistent cache PVC at repo-config time
- Revert the redundant deleteExisting: "true" additions to both task
  templates — the pinned git-clone StepAction already defaults to true

Fixes knative#3516
@knative-prow knative-prow bot added size/L 🤖 PR changes 100-499 lines, ignoring generated files. and removed size/S 🤖 PR changes 10-29 lines, ignoring generated files. labels Apr 4, 2026
- Restore subPath: source on the standard PipelineRun source-workspace
  binding; sourcesAsTarStream archives under source/ so the workspace
  mount must point at that subPath for the task to find func.yaml at
  the expected path

- Honour build.remoteStorageClass in PAC volumeClaimTemplate by adding
  a StorageClassName field to templateData and conditionally emitting
  storageClassName in both pack and s2i PAC PipelineRun templates

- Remove the dead source PVC creation from createClusterPACResources;
  PAC PipelineRuns now bind source-workspace via volumeClaimTemplate so
  the named source PVC was provisioned but never mounted
Two bugs:

1. Race condition: DeletePersistentVolumeClaim is asynchronous. The PVC
   enters Terminating state and the subsequent Create call was receiving
   AlreadyExists, treating it as success, and proceeding against a
   terminating object. Fix: add WaitForPVCDeletion which polls until the
   PVC is NotFound before returning, and call it after every delete.

2. Cache accumulation: the dedicated cache PVC was explicitly kept across
   builds and never trimmed. CNB does not automatically evict unused layers
   so the cache PVC would grow unboundedly over time — the same disk-full
   failure the fix was meant to prevent, just deferred. Fix: rotate both
   source and cache PVCs before every standard build. For PAC, switch cache
   workspace to volumeClaimTemplate so both workspaces are ephemeral per
   run with no long-lived PVCs on the cluster at all.
…create

Two bugs:

1. coschedule constraint: both standard pipeline templates bound source and
   cache to distinct named PVCs. Under Tekton's stable coschedule:workspaces
   mode the affinity assistant creates one pod per PVC and tries to pin the
   same TaskRun to both nodes — an impossible constraint. Fix: switch cache
   to volumeClaimTemplate in both standard templates (matching PAC). Now
   each run has exactly one named PVC (source, required for pre-upload) and
   one Tekton-managed ephemeral PVC (cache), eliminating the conflicting
   affinity. Remove the now-dead createPipelineCachePersistentVolumeClaim,
   getPipelineCachePvcName, and CachePvcName templateData field.

2. AlreadyExists race: after delete+WaitForPVCDeletion a concurrent deploy
   can create the PVC in the window before our Create call. The suppressed
   IsAlreadyExists check would silently reuse that PVC, giving both deploys
   a false clean-workspace guarantee. Fix: remove the IsAlreadyExists
   suppression — return any Create error and let the caller retry. Update
   the corresponding test to assert the new wantErr:true behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. needs-ok-to-test 🤖 Needs an org member to approve testing size/L 🤖 PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

On-cluster build PersistentVolume fills up across builds

2 participants