[SPARK-54657][PS] Refactor pyspark.sql.pandas.serializers for readability/reuse by devin-petersohn · Pull Request #54406 · apache/spark

devin-petersohn · 2026-02-20T22:01:08Z

What changes were proposed in this pull request?

Refactor large module to break up / reuse similar logic

Why are the changes needed?

To break up / reuse similar logic

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI

Was this patch authored or co-authored using generative AI tooling?

Co-authored-by: Claude Opus 4.6

…lity/reuse Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com> Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>

gaogaotiantian · 2026-02-21T04:47:33Z

Ah. You don't have to spend your time on this part for now. @Yicong-Huang is actively working on refactoring serializers and we probably don't want conflict at this point.

Yicong-Huang · 2026-02-21T18:26:18Z

Thanks @devin-petersohn.

We are actively working on refactor serializers but decided to do it surgically and slowly to make sure not to introduce breaking change or regression. The goal is to reduce the number of serializers by extracting transformers and move data transformation logic out of serializers. You can follow https://issues.apache.org/jira/browse/SPARK-55388 and https://issues.apache.org/jira/browse/SPARK-55384 two umbrella tickets for the process.

devin-petersohn · 2026-02-23T15:19:33Z

No problem! I should have made a comment on the JIRA ticket to check that this one wasn't already in progress. Your approach makes sense, but if any of this is useful at all please feel free to use any of the code here.

[SPARK-54657][PS] Refactor pyspark.sql.pandas.serializers for readabi…

e18a887

…lity/reuse Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com> Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>

devin-petersohn closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[SPARK-54657][PS] Refactor pyspark.sql.pandas.serializers for readability/reuse#54406

[SPARK-54657][PS] Refactor pyspark.sql.pandas.serializers for readability/reuse#54406
devin-petersohn wants to merge 1 commit intoapache:masterfrom
devin-petersohn:devin/refactor_pandas_serializers

devin-petersohn commented Feb 20, 2026

Uh oh!

gaogaotiantian commented Feb 21, 2026

Uh oh!

Yicong-Huang commented Feb 21, 2026 •

edited

Loading

Uh oh!

devin-petersohn commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

devin-petersohn commented Feb 20, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gaogaotiantian commented Feb 21, 2026

Uh oh!

Yicong-Huang commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-petersohn commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yicong-Huang commented Feb 21, 2026 •

edited

Loading