Ensure subtest's context kwargs are JSON serializable #13963

nicoddemus · 2025-11-13T22:32:27Z

Convert all the values of SubtestContext.kwargs to strings using saferepr.

This complies with the requirement that the returned dict from pytest_report_to_serializable is serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON.

Fixes pytest-dev/pytest-xdist#1273

Convert all the values of `SubtestContext.kwargs` to strings using `saferepr`. This complies with the requirement that the returned dict from `pytest_report_to_serializable` is serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON. Fixes pytest-dev/pytest-xdist#1273

bluetech

A couple of comments for your consideration.

bluetech · 2025-11-13T22:50:10Z

src/_pytest/subtests.py

-        return dataclasses.asdict(self)
+        result = dataclasses.asdict(self)
+        # Brute-force the returned kwargs dict to be JSON serializable (pytest-dev/pytest-xdist#1273).
+        result["kwargs"] = {k: saferepr(v) for (k, v) in result["kwargs"].items()}


IMO we should strive to have the property that from_json(to_json(report)) == report. In this case it means that we should have self.kwargs itself should be serializable, not just in to_json.

IMO we should strive to have the property that from_json(to_json(report)) == report.

I agree this is desirable, but I don't see how we can accomplish this in a general manner, for any type of object.

In unittest subtests, kwargs is permitted to contain any type of object, because it is not serialized to anything and used only for reporting.

I don't see how we can support that in functions that serialize to JSON.

Perhaps I'm missing something, could you elaborate a bit?

What I'm thinking is that kwargs itself contain strings, not the objects themselves, that would of course make it serializable. It is not exposed in the API so should be fine? Unless the original objects are needed for some reason?

Well once they are serialized and come back unserialized on the other side, unfortunately they might not be useful depending on the intent. For reporting this should be fine, but if a plugin is using the serialization hooks and expecting the actual objects to be returned this would be a problem.

We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.), but for other objects I don't see another solution.

We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.)

Implemented that, take a look.

I don't think we'll be getting an arbitration :) So please go ahead with the change (I'll approve).

I might be wrong in my point of view, it would be nice to get other's opinions. 😁

I will wait a few more days.

I don't really have a clear opinion on this, I think. Some considerations:

I do agree pickle kind of gives me the ick. I feel like more of those could be relevant, and I don't think the security argument is necessarily moot. Take this with a grain of salt because I haven't worked with custom reporting much, but where are the hooks coming from vs. where is the JSON data coming from?

I feel like it's not uncommon to have the hooks from a trusted source (e.g. a custom plugin), but reading JSON data from an untrusted source? Say e.g. I build a "fake pytest result viewer" which takes a JSON report from someone and shows me a fake terminal session of how the run looked. It seems like quite a surprising responsibility if my (trusted) viewer runs arbitrary code when I don't carefully audit the (intransparent) binary data in the JSON report, no?

I thought the idea of having JSON reports is precisely that they can be used by third-party tools and not just pytest itself (and pytest-xdist), no?

I think there is a confusion here regarding what it means by "JSON-like" returned by pytest_report_to_serializable. It was originally created just as a way to customize report serialization with pytest-subtests, specially for SubTestReport from pytest-subtests.

It is meant to be the counterpart of pytest_report_from_serializable, meaning that users shouldn't even rely on what pytest_report_to_serializable returns, it is only meant to be passed along to pytest_report_from_serializable to obtain the restored report. The return value is not meant to be looked at or consumed in any way, and is subject to change.

But I needed to document what pytest_report_from_serializable should return, so I went with "JSON-like" which might have been unfortunate because it is not what I intended.

In other words, the intended usage of those hooks is:

data = pytest_report_to_serializable(sub_report) loaded = pytest_report_from_serializable(data) assert sub_report == loaded

Having said all that, it seems I failed to convince my POV on how this should work, so I will go ahead and convert the kwargs on input and use saferepr as suggested.

This will result in:

data = pytest_report_to_serializable(sub_report) loaded = pytest_report_from_serializable(data) assert sub_report != loaded # loaded.kwargs is not a dict, but a saferepr() of the original sub_report.kwargs.

This won't matter for pytest-xdist which just prints it, but might affect user code or some other plugins that expected the first property (sub_report == loaded) to hold.

Ultimately, I think this is the wrong call, because we will be effectively throwing away user data on the off-chance that we might eventually serialize it with pytest_report_to_serializable hook, but I don't seem to be able to convince folks so I defer to your judgment call on this one (I'm not a big fan of pickle either, but can't think of a generic way to save a generic dictionary).

I'm not saying you're wrong here, for what it's worth. I haven't really looked at how reports or those hooks work in detail personally, so take what I said with a grain of salt. I think it does make sense with your explanation, and perhaps the issue you mention could be solved by making it more clear in the documentation?

src/_pytest/subtests.py

The-Compiler · 2025-12-11T14:54:51Z

src/_pytest/subtests.py

    def _to_json(self) -> dict[str, Any]:
-        return dataclasses.asdict(self)
+        result = dataclasses.asdict(self)
+        # Use protocol 0 because it is human-readable and guaranteed to be not-binary.


I wonder if this is really worth it. Human-readable is quite a stretch:

>>> print(pickle.dumps({"a": "b"}, protocol=0).decode("utf-8")) (dp0 Va p1 Vb p2 s.

The Python docs also say:

Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations.

Which implies:

Version 4 would probably work with more things (and IMHO it being binary is a small price to pay, the ASCII-format doesn't seem much more useful to me at all)

This also doesn't give us a guarantee that the serialization will work, so we still need to deal with that problem, which I think isn't done here

nicoddemus force-pushed the xdist-subtests-serialization branch from a3cf834 to 7585f45 Compare November 13, 2025 22:35

psf-chronographer bot added the bot:chronographer:provided (automation) changelog entry is part of PR label Nov 13, 2025

nicoddemus added the backport 9.0.x apply to PRs at any point; backports the changes to the 9.0.x branch label Nov 13, 2025

bluetech reviewed Nov 13, 2025

View reviewed changes

nicoddemus force-pushed the xdist-subtests-serialization branch 2 times, most recently from 53f50a5 to f8da2cb Compare November 22, 2025 12:23

Best-effort to convert to JSON

fdfa22f

nicoddemus force-pushed the xdist-subtests-serialization branch from f8da2cb to fdfa22f Compare November 22, 2025 14:47

Use pickle

b5b4035

nicoddemus requested a review from bluetech November 27, 2025 19:07

jhillairet mentioned this pull request Nov 30, 2025

Temporary fix of continuous integration scikit-rf/scikit-rf#1357

Merged

bluetech approved these changes Dec 10, 2025

View reviewed changes

The-Compiler reviewed Dec 11, 2025

View reviewed changes

Uh oh!

Ensure subtest's context kwargs are JSON serializable #13963

Are you sure you want to change the base?

Ensure subtest's context kwargs are JSON serializable #13963

Conversation

nicoddemus commented Nov 13, 2025

Uh oh!

bluetech left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicoddemus Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicoddemus Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicoddemus Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nicoddemus Nov 21, 2025 •

edited

Loading

nicoddemus Dec 10, 2025 •

edited

Loading

nicoddemus Dec 12, 2025 •

edited

Loading