[WIP][SPARK-54571][CORE] Use LZ4 safeDecompressor#53290
[WIP][SPARK-54571][CORE] Use LZ4 safeDecompressor#53290dbtsai wants to merge 1 commit intoapache:masterfrom
Conversation
|
It's more involving than I thought as LZ4BlockInputStream doesn't take safeDecompressor. I will take a deeper look tomorrow. |
|
It seems that the fix for CVE‐2025‐12183 wasn't implemented until version 1.8.1, but Spark is still using version 1.8.0. |
|
Note that LZ4BlockInputStream does not support safeDecompressor in lz4-java 1.8.1. If you upgrade to that version, it will still work and be secure, but performance will be much worse than in 1.8.0. lz4-java 1.10.0 introduces a new builder for LZ4BlockInputStream that accepts a safeDecompressor. |
|
It is published, but only under the new group id |
|
@yawkat, which group id does 1.10.0 publish? |
|
we need change to use |
|
Thank you for the updated info, @yawkat , @LuciferYang . Could you update this PR, @dbtsai ? |
|
To all, in order to help this PR, I made an independent PR for dependency upgrade. |
|
We upgraded to |
|
I recommend you wait a few hours with releasing this. Another (smaller, unrelated) CVE has been found in lz4-java. |
|
CVE-2025-66566 has been published and fixed in 1.10.1. I suggest you move to that version. Though cloudflare seems to be having some trouble that breaks maven central at the moment. |
Just FYI, we have no intention to hurry this, @yawkat . To be safe, this will be tested in That's the main reason why LZ4 1.10.0 PR is only in |
|
Gentle ping once more, @dbtsai . |
|
Gentle ping, @dbtsai . |
|
The PR description mentions that |
|
I am not aware of spark benchmarks, but as of 1.8.1, safeDecompressor is substantially faster than fastDecompressor. In earlier versions, the difference was minor. |
|
@yawkat, how about the performance for 1.10.0? @dongjoon-hyun has already bumped to 1.10.0. |
|
Performance between 1.8.1 and 1.10.1 has not changed substantially. |
|
@mridulm I found a perf report at yawkat/lz4-java#3 (comment), but without providing the data. |
|
@mridulm @yawkat @dbtsai @dongjoon-hyun @SteNicholas I created #53453 to add an lz4 benchmark based on TPCDS |
|
The underlying lz4 library was updated in 1.9.0 so a performance difference is possible. |
|
While this is being merged, would you consider "setting As far as I saw, only this config key has LZ4 as default. Also, it seems like |
|
@Dzeri96, setting |
|
Closing in favor of #54496 |
…gression ### What changes were proposed in this pull request? Previously, lz4-java was upgraded to 1.10.x to address CVEs, - #53327 - #53347 - #53971 while this casues significant performance drop, see the benchmark report at - #53453 this PR follows the [suggestion](#53290 (comment)) to migrate to safeDecompressor. ### Why are the changes needed? Mitigate performance regression. ### Does this PR introduce _any_ user-facing change? No, except for performance. ### How was this patch tested? GHA for functionality, [benchmark](#53453 (comment)) for performance. > TL;DR - my test results show lz4-java 1.10.1 is about 10~15% slower on lz4 compression than 1.8.0, and is about ~5% slower on lz4 decompression even with migrating to suggested safeDecompressor ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53454 from pan3793/SPARK-54571. Lead-authored-by: Cheng Pan <chengpan@apache.org> Co-authored-by: pan3793 <pan3793@users.noreply.github.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
… CVE‐2025‐12183 and CVE-2025-66566 ### What changes were proposed in this pull request? - Bump lz4-java version from 1.8.0 to 1.10.4 to resolve CVE‐2025‐12183 and CVE-2025-66566. - `Lz4Decompressor` follows the [suggestion](apache/spark#53290 (comment)) to move from `fastDecompressor` to `safeDecompressor` to mitigate the performance. Backport: - apache/spark#53327 - apache/spark#53347 - apache/spark#53971 - apache/spark#53454 - apache/spark#54585 ### Why are the changes needed? - [CVE‐2025‐12183](https://sites.google.com/sonatype.com/vulnerabilities/cve-2025-12183): Various lz4-java compression and decompression implementations do not guard against out-of-bounds memory access. Untrusted input may lead to denial of service and information disclosure. Vulnerable Maven coordinates: org.lz4:lz4-java up to and including 1.8.0. - [CVE-2025-66566](GHSA-cmp6-m4wj-q63q): Insufficient clearing of the output buffer in Java-based decompressor implementations in lz4-java 1.10.0 and earlier allows remote attackers to read previous buffer contents via crafted compressed input. In applications where the output buffer is reused without being cleared, this may lead to disclosure of sensitive data. JNI-based implementations are not affected. Therefore, lz4-java version should upgrade to 1.10.4. ### Does this PR resolve a correctness bug? No. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI. Closes #3555 from SteNicholas/CELEBORN-2218. Lead-authored-by: SteNicholas <programgeek@163.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: SteNicholas <programgeek@163.com>
… CVE‐2025‐12183 and CVE-2025-66566 - Bump lz4-java version from 1.8.0 to 1.10.4 to resolve CVE‐2025‐12183 and CVE-2025-66566. - `Lz4Decompressor` follows the [suggestion](apache/spark#53290 (comment)) to move from `fastDecompressor` to `safeDecompressor` to mitigate the performance. Backport: - apache/spark#53327 - apache/spark#53347 - apache/spark#53971 - apache/spark#53454 - apache/spark#54585 - [CVE‐2025‐12183](https://sites.google.com/sonatype.com/vulnerabilities/cve-2025-12183): Various lz4-java compression and decompression implementations do not guard against out-of-bounds memory access. Untrusted input may lead to denial of service and information disclosure. Vulnerable Maven coordinates: org.lz4:lz4-java up to and including 1.8.0. - [CVE-2025-66566](GHSA-cmp6-m4wj-q63q): Insufficient clearing of the output buffer in Java-based decompressor implementations in lz4-java 1.10.0 and earlier allows remote attackers to read previous buffer contents via crafted compressed input. In applications where the output buffer is reused without being cleared, this may lead to disclosure of sensitive data. JNI-based implementations are not affected. Therefore, lz4-java version should upgrade to 1.10.4. No. No. CI. Closes #3555 from SteNicholas/CELEBORN-2218. Lead-authored-by: SteNicholas <programgeek@163.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: SteNicholas <programgeek@163.com> (cherry picked from commit dca3749) Signed-off-by: SteNicholas <programgeek@163.com>


What changes were proposed in this pull request?
In recent LZ4 versions, safeDecompressor has become highly optimized and can be as fast, or even sometimes faster, than fasterDecompressor. So it does make sense to switch to safeDecompressor.
Why are the changes needed?
It is recommended to switch to .safeDecompressor(), which is not vulnerable and provides better performance per https://sites.google.com/sonatype.com/vulnerabilities/cve-2025-12183
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
No