-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-49069: [C++] Share Trie instances across CSV value decoders #49070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
cpp/src/arrow/csv/converter.cc
Outdated
| const ConvertOptions& options) | ||
| : type_(type), options_(options) {} | ||
| const ConvertOptions& options, const TrieCache* trie_cache) | ||
| : type_(type), options_(options), trie_cache_(*trie_cache) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using trie_cache_ as a pointer not a reference?
cpp/src/arrow/csv/converter.h
Outdated
| const ConvertOptions& options_; | ||
| MemoryPool* pool_; | ||
| std::shared_ptr<DataType> type_; | ||
| std::shared_ptr<void> trie_cache_; // Opaque pointer to TrieCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TrieCache's destructor is still called with this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Let me add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, I checked:
The destructor of shared_ptr decrements the number of shared owners of the control block. If that counter reaches zero, the control block calls the destructor of the managed object. The control block does not deallocate itself until the std::weak_ptr counter reaches zero as well.
Rationale for this change
The CSV converter was building identical Trie data structures (for null/true/false values) in every decoder instance, causing duplicate memory allocation and initialization overhead.
What changes are included in this PR?
TrieCachestruct to hold shared Trie instances (null_trie, true_trie, false_trie)ValueDecoderand all decoder subclasses to accept and reference a sharedTrieCacheinstead of building their own TriesConverterbase class to create oneTrieCacheper converter and pass it to all decodersAre these changes tested?
Yes, all existing tests. I ran a simple benchmark showing roughly 2-4% faster converter creation, and obviously less memory usage.
Are there any user-facing changes?
No.