Skip to content

[spark] Support Parquet format in COPY INTO#8037

Merged
JingsongLi merged 1 commit into
apache:masterfrom
JunRuiLee:copy-into-parquet-support
May 30, 2026
Merged

[spark] Support Parquet format in COPY INTO#8037
JingsongLi merged 1 commit into
apache:masterfrom
JunRuiLee:copy-into-parquet-support

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

This PR adds Parquet format support for COPY INTO import and export, as part of #8005.

Changes

Import (COPY INTO table FROM path):

  • Read Parquet files with native typed schema (no string-then-cast like CSV/JSON)
  • Column matching by name (case-insensitive), not by position
  • Extra source columns are ignored; missing columns become NULL
  • Cast validation: detects non-null → null after casting (type incompatibility)
  • Supports explicit column list, PATTERN, FORCE, ON_ERROR = ABORT_STATEMENT

Export (COPY INTO path FROM table):

  • Write Parquet files via df.write.parquet()
  • COMPRESSION option (SNAPPY, GZIP, NONE, etc.)

Refactoring:

  • Extract resolveDefaultColumn() shared helper (was duplicated in Parquet and text paths)
  • Unify recordHistoryAndBuildResults() to accept a countDf parameter (eliminates ~45 lines of copy-paste between Parquet and text paths)
  • Add logWarning when default value expression parsing fails (was silently swallowed)

Tests

12 new tests covering: basic import, column name matching, explicit column list, export, export with compression, round-trip, extra fields ignored, missing fields become null, FORCE=FALSE dedup, PATTERN filtering, unsupported option error, rows_loaded count accuracy.

@JunRuiLee JunRuiLee force-pushed the copy-into-parquet-support branch from 2cd9ed6 to 3715ea4 Compare May 30, 2026 02:23
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit ca6e718 into apache:master May 30, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants