Skip to content

fix(install): retry download body stream, verification, and extraction#1719

Merged
fengmk2 merged 6 commits into
mainfrom
fix/flaky-download-retry
Jun 1, 2026
Merged

fix(install): retry download body stream, verification, and extraction#1719
fengmk2 merged 6 commits into
mainfrom
fix/flaky-download-retry

Conversation

@fengmk2
Copy link
Copy Markdown
Member

@fengmk2 fengmk2 commented May 30, 2026

Problem

CI intermittently failed with three different errors that share one root cause: the HTTP retry logic wrapped only the request setup (send().await?.error_for_status()), which returns as soon as the response headers arrive. The body streaming to disk, hash verification, and tar/archive extraction all happened after the retry closure returned, so a truncated or corrupt download surfaced immediately and was never retried.

Flaky tests observed across recent runs:

Test Crate Symptom
test_detect_package_manager_with_yarn_config_cjs vite_install TarError UnexpectedEof
test_download_success_package_manager_with_sha1_and_sha224 vite_install HashMismatch
test_execute_node_version vite_js_runtime (via vite_global_cli) node download failed

Fix

crates/vite_install/src/request.rs

  • download_file: the request and the body stream are now a single retried unit, plus a Content-Length truncation check (bytes written != advertised length → error → re-download).
  • download_and_extract_tgz_with_hash: wraps download → verify → extract in a content-integrity retry (is_retryable_extract_error retries only Io/HashMismatch; it deliberately excludes Reqwest — already retried inside download_file — so the caller's 404 → PackageManagerVersionNotFound mapping is preserved). Each attempt resets the target dir.

crates/vite_js_runtime/src/download.rs (node)

  • download_file: same network-layer fix — request + body stream as one retried unit with the Content-Length truncation check.

crates/vite_js_runtime/src/runtime.rs (node)

  • Wraps download → verify → extract in a content-integrity retry mirroring the install path, so a complete-but-corrupt node archive is re-downloaded instead of aborting. (download_file surfaces an exhausted download as DownloadFailed, which the predicate excludes, so there's no retry multiplication.)

Tests

  • Added two deterministic httpmock regression tests proving a corrupt archive and a hash mismatch are retried by the full pipeline (they assert the server receives >1 hit; both failed before the fix with "only attempted 1 time(s)").
  • cargo clippy -p vite_install -p vite_js_runtime --all-targets -- -D warnings → clean.
  • vite_js_runtime lib → 104 passed (incl. the real test_download_node_integration).
  • All three originally-flaky tests pass; the negative test_download_failed_package_manager_with_hash still returns HashMismatch.

Follow-ups (not in this PR)

  • Centralize the retry + truncation logic into a shared vite_shared download helper, which would also cover get_bytes/vp upgrade and prevent the next download site from regressing.
  • The Content-Length truncation check is a no-op for chunked responses (no Content-Length); the hash/extract retry is the backstop there.

Note

Medium Risk
Touches core download/install paths for package managers and Node with layered retries; behavior changes (404 handling preserved) but wrong predicates could over-retry or skip needed retries.

Overview
Fixes intermittent install/CI failures where HTTP retries only covered headers (send + error_for_status), so truncated bodies, hash mismatches, and bad archives failed once with no retry.

Network layer: download_file in vite_install and vite_js_runtime now retries request + body write as one unit and errors when bytes written ≠ advertised Content-Length (Node downloads also reset the progress bar each attempt).

Integrity layer: Package-manager tarballs use a single pipeline retry (download → hash → extract) via download_and_extract_tgz_with_hash_once, with is_retryable_download_error (retries I/O/hash/transient HTTP; not 404). Inner downloads use HttpClient::with_config(0, 0) to avoid nested retries. Node runtime install gets the same pattern with is_retryable_runtime_error after SHASUMS are resolved once.

Adds httpmock tests asserting corrupt archives and hash mismatches trigger more than one download attempt.

Reviewed by Cursor Bugbot for commit 0183c8f. Configure here.

CI intermittently failed with three different errors that share one root
cause: the HTTP retry logic wrapped only the request setup
(`send().await?.error_for_status()`), which returns as soon as the response
headers arrive. The body streaming to disk, hash verification, and tar/archive
extraction all happened *after* the retry closure returned, so a truncated or
corrupt download surfaced immediately as `UnexpectedEof` (tar extraction),
`HashMismatch` (verification), or a failed node download — none of which
triggered a retry.

Affected flaky tests:
- vite_install: test_detect_package_manager_with_yarn_config_cjs (UnexpectedEof)
- vite_install: test_download_success_package_manager_with_sha1_and_sha224 (HashMismatch)
- vite_js_runtime: test_execute_node_version (download failed)

Fix:
- request.rs: `download_file` now retries the request + body stream as a single
  unit and detects truncation (bytes written != advertised Content-Length).
  `download_and_extract_tgz_with_hash` retries the full download → verify →
  extract pipeline on transient content-integrity errors (Io / HashMismatch),
  resetting the target dir each attempt. Network errors stay inside
  `download_file`'s retry and the 404 -> PackageManagerVersionNotFound mapping
  is preserved.
- download.rs (node): `download_file` retries request + body stream with the
  same Content-Length truncation check.
- runtime.rs (node): wraps download → verify → extract in a content-integrity
  retry mirroring the install path, so a complete-but-corrupt node archive is
  re-downloaded instead of aborting.

Adds deterministic httpmock regression tests proving a corrupt archive and a
hash mismatch are retried by the full pipeline.
@fengmk2 fengmk2 self-assigned this May 30, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented May 30, 2026

Deploy Preview for viteplus-preview canceled.

Name Link
🔨 Latest commit cd020be
🔍 Latest deploy log https://app.netlify.com/projects/viteplus-preview/deploys/6a1cddfd6af9670008136686

@fengmk2
Copy link
Copy Markdown
Member Author

fengmk2 commented May 30, 2026

@cursor review

Comment thread crates/vite_install/src/request.rs
Cursor Bugbot flagged retry multiplication: a truncated download surfaced as
`Error::Io` from `download_file`'s own retry, and the outer
`download_and_extract_tgz_with_hash` retry also matched `Error::Io`, so a
persistent failure could trigger up to 4×4 = 16 download attempts.

Collapse to a single retry layer: `_once` now downloads with a no-retry client
(`HttpClient::with_config(0, 0)`), and the pipeline retry owns all resilience.
The predicate (`is_retryable_download_error`) retries transient network/HTTP
errors (except 404), truncated downloads, corrupt-archive extraction, and hash
mismatches; a 404 and permanent config errors fail fast and propagate unchanged
so the caller still maps 404 → PackageManagerVersionNotFound (now without
wasted retries).
@fengmk2
Copy link
Copy Markdown
Member Author

fengmk2 commented May 30, 2026

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 0183c8f. Configure here.

@fengmk2 fengmk2 marked this pull request as ready for review May 30, 2026 14:47
@fengmk2 fengmk2 requested review from Brooooooklyn and wan9chi May 30, 2026 14:47
@fengmk2 fengmk2 added test: e2e Auto run e2e tests test: install-e2e run vite install e2e test test: create-e2e Run `vp create` e2e tests labels May 30, 2026
@fengmk2 fengmk2 requested a review from cpojer June 1, 2026 01:18
@fengmk2 fengmk2 merged commit 4eaa478 into main Jun 1, 2026
93 checks passed
@fengmk2 fengmk2 deleted the fix/flaky-download-retry branch June 1, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test: create-e2e Run `vp create` e2e tests test: e2e Auto run e2e tests test: install-e2e run vite install e2e test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants