Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Feb 9, 2026

Summary

Need to rebase once #3446 is merged

  • Fix fallback logic for input file name metadata
  • Enable more tests

andygrove and others added 3 commits February 9, 2026 12:47
The native_datafusion scan now correctly falls back to Spark's
FileSourceScanExec when metadata columns (like input_file_name) are
present, so the 3 input_file_name tests no longer need to be ignored.

For ExtractPythonUDFsSuite, the issue was that the test's collect
pattern didn't match CometNativeScanExec. Fixed by adding
CometNativeScanExec to the collect and dataFilters match blocks.

Closes apache#3312

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit accidentally removed the IgnoreComet.scala file
creation from the diff, causing 94 compilation errors when applied
to Spark 3.5.8.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CometScanExec does not populate InputFileBlockHolder (the thread-local
that Spark's FileScanRDD sets), so input_file_name(),
input_file_block_start(), and input_file_block_length() return empty
or default values when Comet replaces the scan. Detect these
expressions in the plan and fall back to Spark's FileSourceScanExec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@andygrove andygrove force-pushed the fix/unignore-input-file-name-tests branch from 432c277 to ab357d1 Compare February 10, 2026 19:10
@andygrove andygrove marked this pull request as ready for review February 10, 2026 20:46
@andygrove andygrove requested a review from mbutrovich February 10, 2026 20:47
@andygrove andygrove marked this pull request as draft February 10, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant