-
Notifications
You must be signed in to change notification settings - Fork 127
feat[layout]: list-of-struct nested projection #6012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[layout]: list-of-struct nested projection #6012
Conversation
|
I wasn't able to apply labels from a fork (permission denied). Could a maintainer please add the ix label for the changelog/CI gating? |
| // List-of-struct field access: `$.items.a` where `items: list<struct{a,b}>`. | ||
| match input.dtype() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really want to tie the get_item to have this for_each element behaviour.
I am not sure we want get_item to pierce thought lists, we would want a this to be map_list(get_item("a", _), get_item("items", root()) I am not sure we have designed the map_list or the _ expression yet?
|
This is a great contribution, however we need to design the list expression more carefully |
Codecov Report❌ Patch coverage is ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Started a discussion: #6030, we can reopen once we find a good answer there. Maybe you can expand on your use case there. |
8a8803c to
11f06b6
Compare
| row_range: &Range<u64>, | ||
| splits: &mut BTreeSet<u64>, | ||
| ) -> VortexResult<()> { | ||
| splits.insert(row_range.end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one thing I struggled with the last time I tried to do ListLayout is that it's hard to split naturally.
Ideally you'd have a way to split based on the elements, then turn that back into row ranges...
d5a8576 to
87d9553
Compare
| fn propagate_nullability(parent_nullability: Nullability, field_dtype: DType) -> DType { | ||
| if matches!( | ||
| (parent_nullability, field_dtype.nullability()), | ||
| (Nullability::Nullable, Nullability::NonNullable) | ||
| ) { | ||
| return field_dtype.with_nullability(Nullability::Nullable); | ||
| } | ||
|
|
||
| field_dtype | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
union_nullability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need these changes?
| fn propagate_nullability(parent_nullability: Nullability, field_dtype: DType) -> DType { | ||
| if matches!( | ||
| (parent_nullability, field_dtype.nullability()), | ||
| (Nullability::Nullable, Nullability::NonNullable) | ||
| ) { | ||
| return field_dtype.with_nullability(Nullability::Nullable); | ||
| } | ||
|
|
||
| field_dtype | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
| field_dtype | ||
| } | ||
|
|
||
| impl VTable for GetItemList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc str please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure its clear this should never be seralized and relied on.
| (element_dtype.as_ref(), *list_nullability, Some(*list_size)) | ||
| } | ||
| _ => { | ||
| return Err(vortex_err!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vortex_bail
|
|
||
| match input.dtype() { | ||
| DType::List(..) => { | ||
| let list = input.to_listview(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use execute, I think needs to be updated too
| let elements = list.elements(); | ||
| let elements = elements.to_struct(); | ||
|
|
||
| let field = elements.field_by_name(field_name).cloned()?; | ||
| let field = match elements.dtype().nullability() { | ||
| Nullability::NonNullable => field, | ||
| Nullability::Nullable => mask(&field, &elements.validity_mask().not())?, | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we share this between list and fsl
| pb::GetItemOpts { | ||
| path: field_name.to_string(), | ||
| } | ||
| .encode_to_vec(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should panic with an error message
| let scan_projection = scan_projection | ||
| .optimize_recursive(vxf.dtype()) | ||
| .map_err(|e| { | ||
| exec_datafusion_err!( | ||
| "Couldn't simplify the underlying Vortex scan projection: {e}" | ||
| ) | ||
| })?; | ||
| let scan_dtype = scan_projection.return_dtype(vxf.dtype()).map_err(|e| { | ||
| exec_datafusion_err!("Couldn't get the dtype for the underlying Vortex scan: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do this
| col: &Expression, | ||
| scope_dtype: &DType, | ||
| ) -> VortexResult<Option<Expression>> { | ||
| let col = col.optimize_recursive(scope_dtype)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again try?
| _ => None, | ||
| }) | ||
| .expect("items layout"); | ||
| assert_eq!(items_layout.encoding_id().as_ref(), "vortex.list"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do an assert_array_eq! that the result in what is expected?
| // | ||
| // NOTE: `vortex.get_item_list` is a temporary list-of-struct projection expression; | ||
| // when pushing down we construct the element projection and pass it into the elements reader. | ||
| let (is_pushdown, element_expr) = if let Some(field_name) = expr.as_opt::<GetItemList>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also need to teach the struct layout reader partitioner about this expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotate_scope_access and (/Users/joeisaacs/git/spiraldb/vortex-4/vortex-array/src/expr/transform/partition.rs) otherwise the struct partitioning at the top level will be over cautious
| } else if expr.vtable().id().as_ref() == "vortex.select" { | ||
| (true, expr.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should never be here, also use the expr id please
| self.lazy_children.get(idx) | ||
| } | ||
|
|
||
| fn list_slice_future( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc str please
6a2a8f7 to
2973c55
Compare
|
@godnight10061 mark this as ready when it is |
a7a6b7d to
da77c0c
Compare
| if expr.is::<GetItemList>() | ||
| && let Some(field_name) = expr.child(0).as_opt::<GetItem>() | ||
| && expr.child(0).child(0).is::<Root>() | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you justify this?
76f2c73 to
830efb6
Compare
CodSpeed Performance ReportMerging this PR will degrade performance by 33.07%Comparing
|
That's a brutal regression. -33% is rough. The get_item_list overhead is definitely killing the vectorization. I think we need to pivot: instead of wrapping this, we should push the projection directly into the list reader's child scans. I'll check the flamegraphs to confirm. |
830efb6 to
cc87962
Compare
6c6326a to
3690d75
Compare
|
The CI was modified again. I am going to close and and I suggest we move this to discussion where we can refine the List Layout carefully |
Summary
Tests
Local (Windows):
Notes:
Fork CI (preflight):
Compatibility
References