Skip to content

feat(datafusion): expor table provider constructor#2123

Open
askalt wants to merge 1 commit intoapache:mainfrom
tarantool:askalt/provider-builder
Open

feat(datafusion): expor table provider constructor#2123
askalt wants to merge 1 commit intoapache:mainfrom
tarantool:askalt/provider-builder

Conversation

@askalt
Copy link

@askalt askalt commented Feb 8, 2026

At the moment, the provider is only available through the Iceberg implementation of the DF catalog, which can be inconvenient when using a custom catalog implementation. This patch adds a public builder, allowing external users to create the provider.

@askalt askalt force-pushed the askalt/provider-builder branch from deeb6b8 to 9d7be3b Compare February 8, 2026 09:13
@CTTY
Copy link
Collaborator

CTTY commented Feb 9, 2026

Hi @askalt, thanks for raising this PR!

when using a custom catalog implementation

I'm assuming you are using a custom Catalog/SchemaProvider with IcebergTableProvider?

I'm trying to understand your use case better. Some examples on how a custom Catalog/SchemaProvider with IcebergTableProvider would benefit you would be very helpful!

Also I was wondering if making IcebergTableProvider::try_new a pub function would be an easier solution for you?

@askalt
Copy link
Author

askalt commented Feb 10, 2026

Hi @askalt, thanks for raising this PR!

when using a custom catalog implementation

I'm assuming you are using a custom Catalog/SchemaProvider with IcebergTableProvider?

I'm trying to understand your use case better. Some examples on how a custom Catalog/SchemaProvider with IcebergTableProvider would benefit you would be very helpful!

Also I was wondering if making IcebergTableProvider::try_new a pub function would be an easier solution for you?

Hi! We actually use MemorySchemaProvider from DF and support iceberg tables creation with some known arrow schema (it is why IcebergTableProviderBuilder::with_schema is present). The code looks roughly like:

async fn create_table(
    &self,
    ctx: &SessionContext,
    name: String,
    schema: SchemaRef,
) -> Result<()> {
    let inner_schema = arrow_schema_to_schema(&schema)?;
    let table_creation = TableCreation::builder()
        .name(name.clone())
        .schema(inner_schema)
        .build();
    let table = self
        .catalog
        .create_table(&self.nmid, table_creation)
        .await?;
    let table_provider = IcebergTableProviderBuilder::new(
        Arc::clone(&self.catalog) as _,
        table.identifier().clone(),
    )
    .with_schema(Some(schema))
    .build()
    .await?;
    session_state.register_table(name, Arc::new(table_provider));
    Ok(())
}

@CTTY
Copy link
Collaborator

CTTY commented Feb 11, 2026

Thanks for the example! But I'm still a bit confused about why you need to pass schema into IcebergTableProvider. The only benefit I'm seeing here is that we can avoid one conversion from iceberg schema to arrow schema. Maybe I'm overlooking something?

Imo, the schema should come from iceberg table metadata otherwise there may be unexpected behaviors.

@askalt askalt force-pushed the askalt/provider-builder branch from 9d7be3b to 85d6ce3 Compare February 11, 2026 06:21
@askalt askalt changed the title feat(datafusion): add table provider builder feat(datafusion): expor table provider constructor Feb 11, 2026
@askalt askalt force-pushed the askalt/provider-builder branch from 85d6ce3 to 497ff3b Compare February 11, 2026 06:22
@askalt
Copy link
Author

askalt commented Feb 11, 2026

Thanks for the example! But I'm still a bit confused about why you need to pass schema into IcebergTableProvider. The only benefit I'm seeing here is that we can avoid one conversion from iceberg schema to arrow schema. Maybe I'm overlooking something?

Imo, the schema should come from iceberg table metadata otherwise there may be unexpected behaviors.

Yes, it seems we could avoid explicit schema passing. Reworked to just export the constructor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants