Skip to content

Conversation

@joeyutong
Copy link
Contributor

Rationale for this change

Add the configuration for ByteStreamSplit encoding for a specific column when building ParquetWriter

What changes are included in this PR?

A new config method

Are these changes tested?

No

Are there any user-facing changes?

Yes when building ParquetWriter e.g.

ParquetWriter<Group> writer =
            ExampleParquetWriter.builder(...)
                .withByteStreamSplitEncoding("int32_field", true)
                ...

Closes #3213

}

public SELF withByteStreamSplitEncoding(String columnPath, boolean enableByteStreamSplit) {
encodingPropsBuilder.withByteStreamSplitEncoding(columnPath, enableByteStreamSplit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Could you please check if there is any test case covering it? If not, it would be good to add one to make sure it takes effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the corresponding test. Thanks for your reminder

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wgtmac
Copy link
Member

wgtmac commented Jul 11, 2025

Could you help rebase to re-trigger the failed CIs? I'll merge after all CIs are passed.

@joeyutong
Copy link
Contributor Author

@wgtmac Sorry for the late response. Would you still be available to take another look when you have a moment?

@wgtmac wgtmac merged commit dfc025e into apache:master Nov 9, 2025
7 checks passed
@wgtmac
Copy link
Member

wgtmac commented Nov 9, 2025

Thanks @joeyutong!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the configuration for ByteStreamSplit encoding

2 participants