Skip to content

Conversation

@ArnavBalyan
Copy link
Member

@ArnavBalyan ArnavBalyan commented Sep 8, 2025

Rationale for this change

  • Move away from shared singleton DEFAULT_VALUES_WRITER_FACTORY to new instance per builder to prevent concurrent threads from corrupting each other's delegateFactory state.
  • Currently there is a bug where multiple builders can corrupt each others delegateFactory that can cause files to get written with wrong format.
  • Since valuesWriterFactory.initialize() uses the singleton state which can be altered by another writer thread.
  • Fixes: ParquetProperties.valuesWriterFactory is not thread-safe #3224
  • Added UT to ensure multiple builders dont share the same ValuesWriterFactory state.

Are these changes tested?

  • Yes

Are there any user-facing changes?

  • Partially

Closes: #3224

@ArnavBalyan
Copy link
Member Author

cc @gszadovszky thanks! :)

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, @ArnavBalyan.
Unfortunately, ParquetProperties.DEFAULT_VALUES_WRITER_FACTORY is public. But at least, we should deprecate it.

@ArnavBalyan
Copy link
Member Author

Thanks for working on this, @ArnavBalyan. Unfortunately, ParquetProperties.DEFAULT_VALUES_WRITER_FACTORY is public. But at least, we should deprecate it.

Ack thanks, have marked the method as deprecated. Did you also mean to remove the change altogether and just mark it deprecated with a comment?

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also mean to remove the change altogether and just mark it deprecated with a comment?

No, the change looks good to me now.

@gszadovszky gszadovszky merged commit 9db6236 into apache:master Sep 16, 2025
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ParquetProperties.valuesWriterFactory is not thread-safe

2 participants