Skip to content

feat: Add Dictionary schema read support in IPC reader#738

Merged
paleolimbot merged 22 commits intoapache:mainfrom
paleolimbot:ipc-dictionary-support
Mar 6, 2026
Merged

feat: Add Dictionary schema read support in IPC reader#738
paleolimbot merged 22 commits intoapache:mainfrom
paleolimbot:ipc-dictionary-support

Conversation

@paleolimbot
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot commented Mar 29, 2025

To start #622, this PR adds read support for Schema messages that contain dictionary-encoded columns. I opened #844 and #845 to track the next steps.

@paleolimbot paleolimbot changed the title feat: Add Dictionary read support in IPC readder feat: Add Dictionary read support in IPC reader Mar 29, 2025
@torchss
Copy link
Copy Markdown

torchss commented Jun 15, 2025

This looks great!

@paleolimbot paleolimbot force-pushed the ipc-dictionary-support branch from 9724dd2 to 3083eb9 Compare February 3, 2026 20:45
@paleolimbot paleolimbot changed the title feat: Add Dictionary read support in IPC reader feat: Add Dictionary schema read support in IPC reader Feb 4, 2026
@paleolimbot paleolimbot force-pushed the ipc-dictionary-support branch from a3a0d31 to 4885b74 Compare March 5, 2026 20:18
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 51.92308% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.67%. Comparing base (f11d517) to head (4885b74).
⚠️ Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
src/nanoarrow/ipc/decoder.c 51.57% 24 Missing and 22 partials ⚠️
src/nanoarrow/ipc/reader.c 55.55% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #738      +/-   ##
==========================================
- Coverage   79.96%   79.67%   -0.30%     
==========================================
  Files         105      106       +1     
  Lines       15461    15726     +265     
  Branches     1738     1769      +31     
==========================================
+ Hits        12364    12529     +165     
- Misses       1998     2076      +78     
- Partials     1099     1121      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines -608 to -615
#if defined(NANOARROW_BUILD_TESTS_WITH_ARROW)
class ArrowTypeParameterizedTestFixture
: public ::testing::TestWithParam<std::shared_ptr<arrow::DataType>> {
protected:
std::shared_ptr<arrow::DataType> data_type;
};

TEST_P(ArrowTypeParameterizedTestFixture, NanoarrowIpcArrowTypeRoundtrip) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just moved because I found the order of the tests was hard to parse with respect to where the parameter s were defined.

Comment on lines +838 to +844
TEST(NanoarrowIpcTest, NanoarrowIpcFooterDecodingErrors) {
struct ArrowError error;

nanoarrow::ipc::UniqueDecoder decoder;
ArrowIpcDecoderInit(decoder.get());

// not enough data to get the size+magic
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also just moved around

Comment on lines +593 to +609
struct ArrowBufferView data;
data.data.as_uint8 = kDictionarySchema;
data.size_bytes = sizeof(kDictionarySchema);

ASSERT_EQ(ArrowIpcDecoderInit(&decoder), NANOARROW_OK);

EXPECT_EQ(ArrowIpcDecoderDecodeHeader(&decoder, data, &error), NANOARROW_OK);
ASSERT_EQ(decoder.message_type, NANOARROW_IPC_MESSAGE_TYPE_SCHEMA);

ASSERT_EQ(ArrowIpcDecoderDecodeSchema(&decoder, &schema, &error), NANOARROW_OK);
ASSERT_EQ(schema.n_children, 1);
EXPECT_STREQ(schema.children[0]->name, "some_col");
EXPECT_EQ(schema.children[0]->flags, ARROW_FLAG_NULLABLE);
EXPECT_STREQ(schema.children[0]->format, "c");

ASSERT_NE(schema.children[0]->dictionary, nullptr);
EXPECT_STREQ(schema.children[0]->dictionary->format, "u");
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crux of what this PR does: it can decode a schema message that contains a dictionary instead of erroring.

@paleolimbot paleolimbot requested a review from Copilot March 5, 2026 20:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial Arrow IPC dictionary support by decoding dictionary-encoded fields in Schema messages and exposing DictionaryBatch headers in the decoder API, while keeping dictionary array decoding itself unsupported for now.

Changes:

  • Introduces a public ArrowIpcDictionaryBatch struct and exposes the last decoded DictionaryBatch on ArrowIpcDecoder.
  • Adds dictionary encoding handling when decoding Schema fields (including extension-metadata placement).
  • Updates reader behavior/tests to recognize DictionaryBatch messages and return ENOTSUP where full decoding isn’t implemented yet.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/nanoarrow/nanoarrow_ipc.h Exposes DictionaryBatch metadata via new struct and decoder pointer.
src/nanoarrow/ipc/reader.c Detects DictionaryBatch during stream read and returns an explicit not-supported error.
src/nanoarrow/ipc/files_test.cc Updates not-supported expectations for dictionary-related streams.
src/nanoarrow/ipc/decoder_test.cc Adds tests for dictionary schema/batch header decoding; extends type coverage to dictionaries/extensions with skips where unsupported.
src/nanoarrow/ipc/decoder.c Implements schema dictionary-encoding decoding + dictionary batch header decoding; explicitly rejects dictionary array decode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/nanoarrow/ipc/decoder.c Outdated
Comment thread src/nanoarrow/nanoarrow_ipc.h Outdated
Comment thread src/nanoarrow/nanoarrow_ipc.h
Comment thread src/nanoarrow/ipc/decoder_test.cc Outdated
Comment thread src/nanoarrow/ipc/decoder_test.cc Outdated
paleolimbot and others added 4 commits March 5, 2026 14:33
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@paleolimbot paleolimbot marked this pull request as ready for review March 6, 2026 05:46
@paleolimbot
Copy link
Copy Markdown
Member Author

If there are no objections I'll merge sometime tomorrow and get started on the harder bits of actually reading dictionaries!

@paleolimbot paleolimbot merged commit 5b7184c into apache:main Mar 6, 2026
42 checks passed
@paleolimbot paleolimbot deleted the ipc-dictionary-support branch March 6, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants