Skip to content

Fix ZIP CRC regression in OffloadingOutputStream#431

Merged
slachiewicz merged 3 commits into
masterfrom
gh-423
Jun 8, 2026
Merged

Fix ZIP CRC regression in OffloadingOutputStream#431
slachiewicz merged 3 commits into
masterfrom
gh-423

Conversation

@slachiewicz

@slachiewicz slachiewicz commented Jun 8, 2026

Copy link
Copy Markdown
Member

Avoid CachingOutputStream for temporary scatter fragments as it adds unnecessary complexity and potential for data corruption under high load on Linux. Switching to a plain BufferedOutputStream. Fixes #423

Root Cause Analysis
The investigation revealed that OffloadingOutputStream, which was introduced to reduce heap usage by offloading large ZIP entry fragments to disk, was using Streams.fileOutputStream
to create its temporary backing files. Recently, Streams.fileOutputStream was updated to always use CachingOutputStream (from plexus-utils).

CachingOutputStream is designed for final output files; it writes to a separate temporary file and only renames it to the target path upon closing if the content has changed. Using
it for internal, temporary scatter fragments in a multi-threaded archiving process introduced unnecessary complexity—specifically, redundant temporary files and renames. This
double-layering was the likely source of transient data corruption and visibility issues on Linux, leading to the reported CRC mismatches.

Fix
I modified OffloadingOutputStream to use Files.newOutputStream directly, wrapped in a BufferedOutputStream for performance. This bypasses CachingOutputStream for these temporary
fragments, ensuring a more direct and reliable write path to disk.

…fragments

CachingOutputStream is inappropriate for temporary scatter fragments as it
adds unnecessary complexity and potential for data corruption under high
load on Linux. Switching to a plain BufferedOutputStream.
@slachiewicz slachiewicz added the bug label Jun 8, 2026
Explicitly manage and shut down the ExecutorService in ConcurrentJarCreator
to avoid resource leaks after parallel archiving is complete.
- Refine ByteArrayOutputStream.reset() to safely handle empty buffers.
- Use Path and Files API in DeferredScatterOutputStream and OffloadingOutputStream for better performance and reliability.
- Expose Path in OffloadingOutputStream to avoid unnecessary File object conversions.
@slachiewicz slachiewicz merged commit 1ff67b2 into master Jun 8, 2026
17 checks passed
@slachiewicz slachiewicz deleted the gh-423 branch June 8, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Broken ZIP archive with bad CRC on Linux (possible regression)

1 participant