Skip to content

feat: implement native S3 write support#4547

Open
kazantsev-maksim wants to merge 58 commits into
apache:mainfrom
kazantsev-maksim:support_native_s3_write
Open

feat: implement native S3 write support#4547
kazantsev-maksim wants to merge 58 commits into
apache:mainfrom
kazantsev-maksim:support_native_s3_write

Conversation

@kazantsev-maksim

@kazantsev-maksim kazantsev-maksim commented May 31, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Part of: #1625

Rationale for this change

Currently, when Comet executes ETL queries that read from Parquet, perform a transformation, and then write back to Parquet (or S3-backed Parquet), a columnar-to-row conversion is required before the write step, because the write path falls back to the JVM Spark writer. This conversion adds unnecessary overhead and negates the performance benefits of native execution.

What changes are included in this PR?

  • native/core/src/execution/operators/parquet_writer.rs — Extended the native Parquet writer to support writing to S3-compatible object storage. Added S3 object store registration and wired it into the DataFusion execution context so that output paths with s3:// / s3a:// schemes are handled natively via the object_store crate.

  • spark/src/main/scala/org/apache/comet/serde/operator/CometDataWritingCommand.scala — Updated the Scala-side CometDataWritingCommand to detect S3/S3A output paths and route them through the new native write code path instead of delegating to the JVM Spark writer. Passes the necessary S3 credentials and configuration from Hadoop/Spark config to the native layer.

How are these changes tested?

local testing

@kazantsev-maksim kazantsev-maksim marked this pull request as draft May 31, 2026 15:33
@kazantsev-maksim kazantsev-maksim changed the title feat: Native s3 write support feat: implement native S3 write support May 31, 2026
@kazantsev-maksim kazantsev-maksim marked this pull request as ready for review June 7, 2026 14:03
@kazantsev-maksim

Copy link
Copy Markdown
Contributor Author

@andygrove @mbutrovich @comphead Could you please give feedback on these changes – do they make sense to you?

@mbutrovich

Copy link
Copy Markdown
Contributor

Are there any limitations from doing it with object_store instead of opendal? I'd like to stop using both in Comet, eventually.

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kazantsev-maksim I think we need to proceed with #3209 first to make sure writer works properly with Spark tests. I'll prioritize it this week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants