Skip to content

Conversation

@CarterFendley
Copy link
Contributor

What changes were proposed in this pull request?

Bump Parquet 1.15.1. Backporting #50319

Why are the changes needed?

Release Notes https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.15.1

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GHA.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the BUILD label Apr 14, 2025
@CarterFendley
Copy link
Contributor Author

Tagging @pan3793 @LuciferYang @cloud-fan @yaooqinn @the-sakthi @dongjoon-hyun for visibility.

@github-actions github-actions bot added the INFRA label Apr 14, 2025
@HyukjinKwon
Copy link
Member

branch-3.4 is EOL

@dongjoon-hyun
Copy link
Member

Ya, as mentioned in the above, according to Apache Spark versioning policy, no more 3.4.x releases should be expected after that point, even for bug fixes. Let me close this PR to prevent accidental merging.

We can continue to discuss on this closed PR, @CarterFendley .

@CarterFendley
Copy link
Contributor Author

@dongjoon-hyun @HyukjinKwon Yes I understand this.

Although I would recommend an exception be made in this case. This is not a bug fix, this is a security patch.

The issue in question CVE-2025-30065 currently has the maximum severity vulnerability rating. Spark 3.4 is only a little over a year old, and is likely to have many active users still.

@CarterFendley
Copy link
Contributor Author

@dongjoon-hyun May I get a comment here?

@CarterFendley
Copy link
Contributor Author

Especially with a currently max 10/10 CVSS-B score, I think this vulnerability may warrant some exceptions.

@dongjoon-hyun
Copy link
Member

I'd recommend to use the latest supported versions when Parquet is ready. AFAIK, there is no complete fix (or ETA) yet, isn't it, @CarterFendley ?

@HyukjinKwon
Copy link
Member

@CarterFendley The CVE only affects parquet-avro it says. Would you mind assessing this, and how it affects Apache Spark itself before arguing to make an exception in its release?

@CarterFendley
Copy link
Contributor Author

Sorry for the late reply @HyukjinKwon, again I really appreciate your responsiveness on this.

The CVE only affects parquet-avro it says.

Yep, I am unsure if parquet-avro is used by other modules (parquet-column, parquet-hadoop, etc) used by spark. It seems that Apache Drill is also updating out of caution. And another user opened an issue to do the same thing for spark SPARK-51795.

Can you assure me that these other packages do not use the vulnerable parts of the parquet-avro module?

@HyukjinKwon
Copy link
Member

As the proposal author of this change, would you mind investigating how it affects Apache Spark, and sharing it since you are asking the exception to make a release in EOL branch?

@CarterFendley
Copy link
Contributor Author

@HyukjinKwon I can try to take a look, it may be that those modules are unconnected. Although with a CVE of this severity, I would feel better if someone is able to double check me.

@HyukjinKwon
Copy link
Member

I already roughly checked actually, and it doesn't actually affect Apache Spark. But I am asking this to double check :-).

@CarterFendley
Copy link
Contributor Author

Okay, I think I agree.

So looks like the only other module from parquet-java which places a dependency on parquet-avro is the parquet-cli module. So parquet-column and parquet-hadoop from the Apache Parquet package which spark does place a dependency look like they are unconnected to the vulnerable parquet-avro module.

There is a testing dependency on parquet-avro, but not one that causes that dependency to be distributed with spark. I have double checked some systems with Spark installed at 3.4 and the parquet-avro module is not present there. Good news 😄 🥳

The only suggestion I have would be to update this spark example which may lead users to install vulnerable versions of parquet-avro. As that is not an issue of Spark distributing parquet-avro and more of a user issue, there is probably less need for that to be backported. I would be happy to open a PR to master to update that if that would be helpful @HyukjinKwon.

Thank you maintainers, appreciate the feedback here ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants