Skip to content

[FAQ Bot] NEW: Why does Spark write multiple parquet files after repartitioning a DataF#238

Open
github-actions[bot] wants to merge 1 commit intomainfrom
faq-bot/issue-237
Open

[FAQ Bot] NEW: Why does Spark write multiple parquet files after repartitioning a DataF#238
github-actions[bot] wants to merge 1 commit intomainfrom
faq-bot/issue-237

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Mar 7, 2026

✨ FAQ NEW

Course: data-engineering-zoomcamp
Section: module-6 (Directly explains why repartitioning leads to multiple output files when writing parquet, fitting Spark-related questions in module-6.)
Related Issue: #237

Question

Why does Spark write multiple parquet files after repartitioning a DataFrame?

Decision Rationale

The proposal explains a Spark behavior (one output file per partition when writing) not explicitly covered by existing FAQs in module-6. It adds a clear explanation and example addressing why multiple parquet files appear after repartitioning.

Placement Details

  • Section ID: module-6
  • Sort Order: 60
  • Filename Slug: spark-write-multiple-parquet-per-partition

🤖 Generated by FAQ Bot

Closes #237

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FAQ] Why does Spark write multiple parquet files after repartitioning a dataset?

0 participants