[SPARK-54657][PS] Refactor pyspark.sql.pandas.serializers for readability/reuse#54406
[SPARK-54657][PS] Refactor pyspark.sql.pandas.serializers for readability/reuse#54406devin-petersohn wants to merge 1 commit intoapache:masterfrom
Conversation
…lity/reuse Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com> Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>
|
Ah. You don't have to spend your time on this part for now. @Yicong-Huang is actively working on refactoring serializers and we probably don't want conflict at this point. |
|
Thanks @devin-petersohn. We are actively working on refactor serializers but decided to do it surgically and slowly to make sure not to introduce breaking change or regression. The goal is to reduce the number of serializers by extracting transformers and move data transformation logic out of serializers. You can follow https://issues.apache.org/jira/browse/SPARK-55388 and https://issues.apache.org/jira/browse/SPARK-55384 two umbrella tickets for the process. |
|
No problem! I should have made a comment on the JIRA ticket to check that this one wasn't already in progress. Your approach makes sense, but if any of this is useful at all please feel free to use any of the code here. |
What changes were proposed in this pull request?
Refactor large module to break up / reuse similar logic
Why are the changes needed?
To break up / reuse similar logic
Does this PR introduce any user-facing change?
No
How was this patch tested?
CI
Was this patch authored or co-authored using generative AI tooling?
Co-authored-by: Claude Opus 4.6