Skip to content

[rust] typed column-vector dispatch#578

Open
fresh-borzoni wants to merge 1 commit into
apache:mainfrom
fresh-borzoni:fluss-543
Open

[rust] typed column-vector dispatch#578
fresh-borzoni wants to merge 1 commit into
apache:mainfrom
fresh-borzoni:fluss-543

Conversation

@fresh-borzoni
Copy link
Copy Markdown
Member

@fresh-borzoni fresh-borzoni commented May 29, 2026

Summary

closes #543

ColumnarRow now holds typed Arrow column vectors built once per batch instead of downcasting on every read. Nested ARRAY/MAP/ROW return lazy slice views - no copy unless the caller asks for a binary form via try_into_binary().

Mirrors Java's VectorizedColumnBatch.

@fresh-borzoni fresh-borzoni force-pushed the fluss-543 branch 6 times, most recently from 915b80b to 8ea41cf Compare May 29, 2026 10:51
@fresh-borzoni
Copy link
Copy Markdown
Member Author

@charlesdong1991 @leekeiabstraction PTAL 🙏

Copy link
Copy Markdown
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty for the PR. Haven't managed to read it all but left a question. Will pick it up again tomorrow

self.row_id
}

/// Returns the source `RecordBatch`. Panics on a nested cursor (returned
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious on why panic as opposed to Error? Both ColumnarRow and this function are pub. IMO we should reduce foot guns if possible.

Copy link
Copy Markdown
Contributor

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for great PR, i will need some more time to read it through and learn a bit better about current behaviour in java, just a quick look, will take another look later 🙏

None => arrow_row_column_indices(&record_batch),
};
let schema = fluss_row_type.as_deref().unwrap_or(&row_type);
let typed = TypedBatch::build(&record_batch, schema)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now build is fallible, and a malformed batch panics the scan instead of surfacing error?

self.outer_array().is_null(row_id)
}

pub(crate) fn get_boolean(&self, row_id: usize) -> Result<bool> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder now since all is centralized in TypedColumn, probably it is cheap to bound check for top level getters?

self.element.get_bytes(self.absolute_index(pos)?)
}

fn get_array(&self, pos: usize) -> Result<ArrayView<'_>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is public right? should we update api-reference since the returned type has changed

}
}

fn get_map(&self, pos: usize) -> Result<MapView<'_>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[rust] Port Java's typed Arrow column-vector pattern to scan path

3 participants