Fix ZIP CRC regression in OffloadingOutputStream#431
Merged
Conversation
…fragments CachingOutputStream is inappropriate for temporary scatter fragments as it adds unnecessary complexity and potential for data corruption under high load on Linux. Switching to a plain BufferedOutputStream.
Explicitly manage and shut down the ExecutorService in ConcurrentJarCreator to avoid resource leaks after parallel archiving is complete.
- Refine ByteArrayOutputStream.reset() to safely handle empty buffers. - Use Path and Files API in DeferredScatterOutputStream and OffloadingOutputStream for better performance and reliability. - Expose Path in OffloadingOutputStream to avoid unnecessary File object conversions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Avoid CachingOutputStream for temporary scatter fragments as it adds unnecessary complexity and potential for data corruption under high load on Linux. Switching to a plain BufferedOutputStream. Fixes #423
Root Cause Analysis
The investigation revealed that OffloadingOutputStream, which was introduced to reduce heap usage by offloading large ZIP entry fragments to disk, was using Streams.fileOutputStream
to create its temporary backing files. Recently, Streams.fileOutputStream was updated to always use CachingOutputStream (from plexus-utils).
CachingOutputStream is designed for final output files; it writes to a separate temporary file and only renames it to the target path upon closing if the content has changed. Using
it for internal, temporary scatter fragments in a multi-threaded archiving process introduced unnecessary complexity—specifically, redundant temporary files and renames. This
double-layering was the likely source of transient data corruption and visibility issues on Linux, leading to the reported CRC mismatches.
Fix
I modified OffloadingOutputStream to use Files.newOutputStream directly, wrapped in a BufferedOutputStream for performance. This bypasses CachingOutputStream for these temporary
fragments, ensuring a more direct and reliable write path to disk.