Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Apr 7, 2025

What changes were proposed in this pull request?

Bump Parquet to 1.15.1.

Why are the changes needed?

To fix critical CVE: https://www.cve.org/CVERecord?id=CVE-2025-30065

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GHA.

Was this patch authored or co-authored using generative AI tooling?

No.

Fokko and others added 7 commits April 7, 2025 16:40
Fixes quite a few bugs on the Parquet side: https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1140

No

Using the existing unit tests

No

Closes apache#46447 from Fokko/fd-bump-parquet.

Authored-by: Fokko Driesprong <fokko@tabular.io>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This PR aims to upgrade Parquet to 1.14.2.

To bring the latest bug fixes.
- https://mvnrepository.com/artifact/org.apache.parquet/parquet-common/1.14.2

No.

Pass the CIs.

No.

Closes apache#47807 from Fokko/fd-parquet.

Lead-authored-by: Fokko <fokko@apache.org>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
The pr aims to upgrade `Parquet` from `1.14.2` to `1.14.3`.

The full release notes: https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.14.3
apache/parquet-java#3007: Ensure version specific Jackson classes are shaded
apache/parquet-java#3013: Fix potential ClassCastException at reading DELTA_BYTE_ARRAY encoding

No.

Pass GA.

No.

Closes apache#48378 from panbingkun/SPARK-49903.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Bumping Apache Parquet to 1.14.4 because of a critical bug when writing a dictionary larger than 8kb. For a full overview of bugfixes, see: https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.14.4

A serious issue was discovered in the 1.14.x line: https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.14.4-rc2

No

Existing unit tests.

No

Closes apache#48790 from Fokko/fd-bump-parquet-java.

Lead-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko <fokko@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Bumps to the latest version of Parquet.

For the full list of changes, please check the pre-release:

https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.15.0

Including some interesting patches for Spark, such as apache/parquet-java#3030

To bring the latest features and bug fixes for Apache Spark 4.0.0.

No.

Pass the CIs.

No.

Closes apache#48970 from Fokko/fd-parquet-1-15-0.

Authored-by: Fokko <fokko@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Bump Parquet 1.15.1.

Release Notes https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.15.1

No.

Pass GHA.

No

Closes apache#50319 from pan3793/SPARK-51549.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Comment on lines +2666 to +2671
<exclusions>
<exclusion>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
</exclusion>
</exclusions>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix:

 build/sbt "sql/testOnly *.JDBCSuite"

sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.UnsupportedClassVersionError: org/h2/Driver has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
	at java.lang.ClassLoader.defineClass1(Native Method)

LuciferYang pushed a commit that referenced this pull request Apr 9, 2025
### What changes were proposed in this pull request?

Bump Parquet to 1.15.1.

### Why are the changes needed?

To fix critical CVE: https://www.cve.org/CVERecord?id=CVE-2025-30065

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

Pass GHA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50528 from wangyum/parquet-branch-3.5.

Lead-authored-by: yumwang@ebay.com <yumwang@ebay.com>
Co-authored-by: Fokko <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@tabular.io>
Co-authored-by: panbingkun <panbingkun@baidu.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@LuciferYang
Copy link
Contributor

Merged into branch-3.5. Thanks @wangyum @HyukjinKwon @dongjoon-hyun @pan3793

@LuciferYang LuciferYang closed this Apr 9, 2025
@wangyum wangyum deleted the parquet-branch-3.5 branch April 9, 2025 05:07
@CarterFendley
Copy link
Contributor

@LuciferYang @dongjoon-hyun Do you know when we will get a release of this?

The latest 3.5.5 is still critically vulnerable.

@CarterFendley
Copy link
Contributor

@dongjoon-hyun @LuciferYang Bumping. Is there anything I can do to help speed a release along?

@wangyum
Copy link
Member Author

wangyum commented Apr 17, 2025

Hi @CarterFendley A deadlock appears to occur after upgrading Parquet to 1.15.1. We are still investigating this issue. Please wait a few days.

@CarterFendley
Copy link
Contributor

@wangyum Good to know, thanks for the update.

I am curious to know what situations the deadlock appears. I have already patched some systems which use spark 3.5 by replacing the parquet jars directly. Would be nice to know what too look for if those systems will be affected. Are there specific test cases which I should run?

Also, is there an GitHub / JIRA issue which progress on the deadlocks is being tracked on? Would be very helpful to be able to follow along there.

Really appreciate your work on this!

@HyukjinKwon
Copy link
Member

For the record, we're not affected by CVE. The jar isn't included in the release. We can just revert this upgrade if the deadlock matters. Please correct me if I am wrong.

@dongjoon-hyun
Copy link
Member

After checking again, I agree with @HyukjinKwon 's comment. Let me revert this from branch-3.5 .

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 18, 2025

This is reverted from branch-3.5 via the following commit.

To be clear, Parquet 1.15.1 is reverted to 1.13.1 only from the maintenance branch branch-3.5 while keeping master and branch-4.0.

@CarterFendley
Copy link
Contributor

Also agreed here, I think spark is in the clear.

Linking the other PR here for anyone interested: #50583 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants