feat: Enable native Parquet writer in Spark SQL tests #3351

coderfender · 2026-01-31T01:28:20Z

Which issue does this PR close?

Closes: #3209

3 kinds of tests have been identified as incompatible for native writer :

We have 12 tests in SQL suite which failed / exposed corner cases with Comet writers ( Insert Overwrite / CTAS statements)
SQL tests for the charvarcha.sql files
Parquet metadata encoding tests (Spark writes metadata such as spark version while comet doesnt)

Rationale for this change

Now that we have a native comet writer, this PR enables spark tests while ignoring the ones which fail

What changes are included in this PR?

Diff file changes to enable comet native writer on spark tests

How are these changes tested?

.github/workflows/spark_sql_test.yml

codecov-commenter · 2026-01-31T01:56:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.99%. Comparing base (f09f8af) to head (6c6a142).
⚠️ Report is 918 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3351      +/-   ##
============================================
+ Coverage     56.12%   59.99%   +3.86%     
- Complexity      976     1464     +488     
============================================
  Files           119      175      +56     
  Lines         11743    16165    +4422     
  Branches       2251     2681     +430     
============================================
+ Hits           6591     9698    +3107     
- Misses         4012     5114    +1102     
- Partials       1140     1353     +213

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderfender · 2026-01-31T07:18:51Z

@comphead , We have bunch of failures largely due to hitting edge cases with comet writer (CTAS , Insert Overwrite which is likely a bug) . I am thinking of two possible options here

Keep the changes minimal and ignore the failing I/O tests for now
Create a separate comet writer suite which ignores all the tests relevant to Comet native writer

coderfender · 2026-02-01T01:46:19Z

@comphead please kickoff CI whenever you get a chance

coderfender · 2026-02-01T08:45:52Z

My diff files for Spark versions 3.4 and 3.5 missed disabling comet writer for CTAS statements (while it was set for Spark 4) causing above test failures . However, there are only 2 failed tests down from 13

coderfender · 2026-02-01T18:43:02Z

Seems like there were issues applying diff file for Spark 4

coderfender · 2026-02-03T01:55:15Z

@comphead all the tests passed .Please take a look whenever you get a chance

coderfender · 2026-02-03T06:21:30Z

All tests passed and the following tests are ignored :

  Common across all Spark versions (3.4, 3.5, 4.0) :

  1. empty file should be skipped while write to file
  2. INSERT INTO TABLE - complex type but different names
  3. Insert overwrite table command should output correct schema: basic
  4. parquet timestamp conversion
  5. SPARK-29174 Support LOCAL in INSERT OVERWRITE DIRECTORY to data source
  6. SPARK-33901: ctas should should not change table's schema
  7. SPARK-37160: CREATE TABLE AS SELECT with CHAR_AS_VARCHAR
  8. SPARK-38336 INSERT INTO statements with tables with default columns: positive tests
  9. SPARK-38811 INSERT INTO on columns added with ALTER TABLE ADD COLUMNS: Positive tests
  10. SPARK-43071: INSERT INTO from queries whose final operators are not projections
  11. write path implements onTaskCommit API correctly
  12. Write Spark version into Parquet metadata

Additional tests which had to be ignored for spark 4 : 

  1. ctas with union
  2. SPARK-48817: test multi inserts

  Total: 12 tests (Spark 3.4/3.5) | 14 tests (Spark 4.0)

andygrove · 2026-02-04T17:48:57Z

dev/diffs/3.4.3.diff

+
+-  test("SPARK-33901: ctas should should not change table's schema") {
+  test("SPARK-33901: ctas should should not change table's schema",
+    IgnoreComet("comet native writer does not support empty dir / table creation yet")) {


Could you link to the specific GitHub issues in the IgnoreComet annotations?

Thank you . I did annotations and pushed a commit . However, per @comphead 's review, I made this PR just to copy the failing tests into the CometParquetWriterSuite and created github issues for each of the failing tests in spark 3x , spark 4x.

The plan is to create a CI pipeline (which doesn't interfere with main comet workflows but runs a separate test suite with comet writer enabled and a listener attached to make sure we are running the tests through native writer

coderfender · 2026-02-06T07:47:52Z

Fixing merge conflicts for the diff files :)

coderfender · 2026-02-06T17:27:53Z

Created github issues for all the failing tests and added links in IgnoreComet annotations

.github/workflows/spark_sql_test.yml

mbutrovich · 2026-02-10T16:03:17Z

Rather than ignoring tests in diffs, are we able to identify these scenarios at planning time and write fallbacks?

coderfender · 2026-02-10T19:47:06Z

@mbutrovich , thank you for the comment. These tests were essentially exposing bugs with our writer which are not caught by our current fallbacks

Empty file/dir creation
empty file should be skipped while write to file
SPARK-33901: ctas should not change table's schema
SPARK-37160: CREATE TABLE AS SELECT with CHAR_AS_VARCHAR
Insert overwrite bugs (5 tests)
Insert overwrite table command should output correct schema: basic
SPARK-29174 Support LOCAL in INSERT OVERWRITE DIRECTORY
SPARK-38336 INSERT INTO with default columns
SPARK-38811 INSERT INTO on ALTER TABLE ADD COLUMNS
SPARK-43071: INSERT INTO from non-projection queries
Other
parquet timestamp conversion (timestamp96)
INSERT INTO TABLE - complex type but different names
Write Spark version into Parquet metadata
write path implements onTaskCommit API correctly

I created separate issues for each failing tests, annotated them in the diffs . However, per my discussion with @comphead , this PR is just the first step to copy those tests into comet test suite to lay foundation for a future comet writer only test suite in CI .

comphead

Thanks @coderfender great to identify issues for the writer.
One question how is it planned to check native writer has been called?
You might consider the option to call writeWithCometNativeWriteExec or look into how SparkListener used inside

coderfender · 2026-02-10T20:07:32Z

Great suggestion @comphead . I initially added a custom listerner to check that the native writer is called. However, I figured it might be cleaner to limit the scope of this PR to just tag the failing tests and follow up with more work to build on this. Do you think it is a good idea to add that verification in this PR or would you prefer the listener changes in a different one ?

coderfender · 2026-02-10T20:15:18Z

nevermind I think I got a better idea after this PR : #3480

coderfender force-pushed the enable_spark_tests_parquet_writer branch from b5cea4f to 951b394 Compare January 31, 2026 01:35

comphead reviewed Jan 31, 2026

View reviewed changes

.github/workflows/spark_sql_test.yml Outdated Show resolved Hide resolved

coderfender changed the title ~~feat : enable_spark_tests_comet_native_writer~~ feat: enable_spark_tests_comet_native_writer Feb 1, 2026

coderfender mentioned this pull request Feb 1, 2026

feat: Support column based partition write in comet IO #3005

Open

andygrove changed the title ~~feat: enable_spark_tests_comet_native_writer~~ feat: Enable native Parquet writer in Spark SQL tests Feb 4, 2026

andygrove reviewed Feb 4, 2026

View reviewed changes

coderfender force-pushed the enable_spark_tests_parquet_writer branch from 6c6a142 to b60ea7e Compare February 6, 2026 07:50

comphead reviewed Feb 6, 2026

View reviewed changes

.github/workflows/spark_sql_test.yml Outdated Show resolved Hide resolved

coderfender force-pushed the enable_spark_tests_parquet_writer branch from 72580df to 5064d58 Compare February 7, 2026 07:45

coderfender added 12 commits February 7, 2026 10:24

enable_spark_tests_comet_native_writer

1123c96

enable_spark_tests_comet_native_writer

c053566

enable_spark_tests_comet_native_writer

0857d5b

enable_spark_tests_comet_native_writer

6af0216

enable_spark_tests_comet_native_writer_fix_spark_4

b1149ee

rebase_main

eb00abc

rebase_main

b4ed78a

rebase_main

c825802

enable_spark_tests_comet_native_writer_fix_spark_4

06915ca

enable_spark_tests_comet_native_writer_fix_spark_4

ecd311e

enable_spark_tests_comet_native_writer_fix_spark_rebase_main

f4a6e9b

enable_spark_tests_comet_native_writer_fix_spark_rebase_main

4d2b342

coderfender added 2 commits February 7, 2026 10:24

enable_spark_tests_comet_native_writer_fix_spark_rebase_main

dd05d3e

enable_spark_tests_comet_native_writer_fix_spark_rebase_main

370d98d

coderfender force-pushed the enable_spark_tests_parquet_writer branch from fec7c20 to 370d98d Compare February 7, 2026 19:15

enable_spark_tests_comet_native_writer_fix_spark_rebase_main

570521f

comphead reviewed Feb 10, 2026

View reviewed changes

comphead mentioned this pull request Feb 10, 2026

chore: Add Comet writer nested types test assertion #3480

Open

feat: Enable native Parquet writer in Spark SQL tests #3351

Are you sure you want to change the base?

feat: Enable native Parquet writer in Spark SQL tests #3351

Conversation

coderfender commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Uh oh!

codecov-commenter commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderfender commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderfender commented Feb 1, 2026

Uh oh!

coderfender commented Feb 1, 2026

Uh oh!

coderfender commented Feb 1, 2026

Uh oh!

coderfender commented Feb 3, 2026

Uh oh!

coderfender commented Feb 3, 2026

Uh oh!

andygrove Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderfender Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderfender commented Feb 6, 2026

Uh oh!

coderfender commented Feb 6, 2026

Uh oh!

Uh oh!

mbutrovich commented Feb 10, 2026

Uh oh!

coderfender commented Feb 10, 2026

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

coderfender commented Feb 10, 2026

Uh oh!

coderfender commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coderfender commented Jan 31, 2026 •

edited

Loading

codecov-commenter commented Jan 31, 2026 •

edited

Loading

coderfender commented Jan 31, 2026 •

edited

Loading