Skip to content

[SPARK-56254][PYTHON][CONNECT] Make spark.read.xml accept DataFrame input#55332

Open
Yicong-Huang wants to merge 4 commits intoapache:masterfrom
Yicong-Huang:SPARK-56254
Open

[SPARK-56254][PYTHON][CONNECT] Make spark.read.xml accept DataFrame input#55332
Yicong-Huang wants to merge 4 commits intoapache:masterfrom
Yicong-Huang:SPARK-56254

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented Apr 14, 2026

What changes were proposed in this pull request?

This PR adds DataFrame input support to spark.read.xml(), following the same pattern established by spark.read.json() (SPARK-56253) and spark.read.csv() (SPARK-56255).

Why are the changes needed?

This completes the DataFrame input support across all text-based readers (JSON, CSV, XML).

Does this PR introduce any user-facing change?

Yes. spark.read.xml() now accepts a DataFrame with a single string column as input, in addition to the existing path/list/RDD inputs.

Users can now parse XML strings stored in a DataFrame column directly:

xml_df = spark.createDataFrame(
    [('<person><name>Alice</name><age>25</age></person>',),
     ('<person><name>Bob</name><age>30</age></person>',)],
    schema="value STRING",
)
spark.read.option("rowTag", "person").xml(xml_df).show()

How was this patch tested?

Added 10 new tests (5 classic + 5 Connect)

Was this patch authored or co-authored using generative AI tooling?

No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants