Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jan 30, 2026

Rationale for this change

The CSV converter was building identical Trie data structures (for null/true/false values) in every decoder instance, causing duplicate memory allocation and initialization overhead.

What changes are included in this PR?

  • Introduced TrieCache struct to hold shared Trie instances (null_trie, true_trie, false_trie)
  • Updated ValueDecoder and all decoder subclasses to accept and reference a shared TrieCache instead of building their own Tries
  • Updated Converter base class to create one TrieCache per converter and pass it to all decoders

Are these changes tested?

Yes, all existing tests. I ran a simple benchmark showing roughly 2-4% faster converter creation, and obviously less memory usage.

Are there any user-facing changes?

No.

@github-actions
Copy link

⚠️ GitHub issue #49069 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

const ConvertOptions& options)
: type_(type), options_(options) {}
const ConvertOptions& options, const TrieCache* trie_cache)
: type_(type), options_(options), trie_cache_(*trie_cache) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using trie_cache_ as a pointer not a reference?

const ConvertOptions& options_;
MemoryPool* pool_;
std::shared_ptr<DataType> type_;
std::shared_ptr<void> trie_cache_; // Opaque pointer to TrieCache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TrieCache's destructor is still called with this, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Let me add a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, I checked:

The destructor of shared_ptr decrements the number of shared owners of the control block. If that counter reaches zero, the control block calls the destructor of the managed object. The control block does not deallocate itself until the std::weak_ptr counter reaches zero as well.

https://en.cppreference.com/w/cpp/memory/shared_ptr.html

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jan 30, 2026
@kou kou merged commit c6090ed into apache:main Jan 30, 2026
51 of 52 checks passed
@kou kou removed the awaiting merge Awaiting merge label Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants