Skip to content

Flatten Expanded Calcite Representation of Virtual Tables w/ Complex Expressions when Converting to Substrait #631

@gord02

Description

@gord02

Nested Struct is implemented in this PR.

The motivation for this PR was to implement the changes in the proto representation for VirtualTable to change from rows represented as struct literals to Nested Structs to allow for storing non-literal expressions. This replaces the need for the Project rel above the VirtualTable storing the non-literal expressions.

This PR currently converts the substrait VirtualTable into the following format:

      // LogicalValues relation. Instead, we create a LogicalProject for each row to compute its
      // values, and combine them together using a LogicalUnion. For example the following:
      
       VirtualTable
           (e1, e2)
           (e3, e4)
      
      Becomes:
        LogicalProject(exprs=[0, 1])
          LogicalUnion(all=[true])
            LogicalProject(exprs=[e1, e2])
              <Empty Row>
            LogicalProject(exprs=[e3, e4])
              <Empty Row>

      // where each e1-e4 can represent both literal and non-literal expression

This implementation does not allow for round trip testing as the above format is the final output. Ideally this should be changed such that the original VirtualTable using Nested Struct is returned so that the roundtrip tests now work.

The desired output after implementing should allow for the creation of a VirtualTableScan and have the same VirtualTable returned after Conversion to, and back from Calcite as follows:

Creation of VirtualTableScan using NestedStruct:
 VirtualTableScan:
    names=[col1, col2]
    NestedStruct [ fields: [(1+1,) (3)]
    NestedStruct: [ fields: [(4), (4+5)]

to Calcite:
 LogicalUnion(all=[true])
          LogicalProject(exprs=[1+1, 3])
            <Empty Row>
         LogicalProject(exprs=[4, 4+5])
            <Empty Row>

Back to Substrait:
 VirtualTableScan:
    names=[col1, col2]
    NestedStruct [ fields: [(1+1,) (3)]
    NestedStruct: [ fields: [(4), (4+5)]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions