Skip to content

Conversation

@nicoddemus
Copy link
Member

Convert all the values of SubtestContext.kwargs to strings using saferepr.

This complies with the requirement that the returned dict from pytest_report_to_serializable is serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON.

Fixes pytest-dev/pytest-xdist#1273

Convert all the values of `SubtestContext.kwargs` to strings using `saferepr`.

This complies with the requirement that the returned dict from `pytest_report_to_serializable` is serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON.

Fixes pytest-dev/pytest-xdist#1273
@nicoddemus nicoddemus force-pushed the xdist-subtests-serialization branch from a3cf834 to 7585f45 Compare November 13, 2025 22:35
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided (automation) changelog entry is part of PR label Nov 13, 2025
@nicoddemus nicoddemus added the backport 9.0.x apply to PRs at any point; backports the changes to the 9.0.x branch label Nov 13, 2025
Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments for your consideration.

return dataclasses.asdict(self)
result = dataclasses.asdict(self)
# Brute-force the returned kwargs dict to be JSON serializable (pytest-dev/pytest-xdist#1273).
result["kwargs"] = {k: saferepr(v) for (k, v) in result["kwargs"].items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should strive to have the property that from_json(to_json(report)) == report. In this case it means that we should have self.kwargs itself should be serializable, not just in to_json.

Copy link
Member Author

@nicoddemus nicoddemus Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should strive to have the property that from_json(to_json(report)) == report.

I agree this is desirable, but I don't see how we can accomplish this in a general manner, for any type of object.

In unittest subtests, kwargs is permitted to contain any type of object, because it is not serialized to anything and used only for reporting.

I don't see how we can support that in functions that serialize to JSON.

Perhaps I'm missing something, could you elaborate a bit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm thinking is that kwargs itself contain strings, not the objects themselves, that would of course make it serializable. It is not exposed in the API so should be fine? Unless the original objects are needed for some reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well once they are serialized and come back unserialized on the other side, unfortunately they might not be useful depending on the intent. For reporting this should be fine, but if a plugin is using the serialization hooks and expecting the actual objects to be returned this would be a problem.

We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.), but for other objects I don't see another solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.)

Implemented that, take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll be getting an arbitration :) So please go ahead with the change (I'll approve).

Copy link
Member Author

@nicoddemus nicoddemus Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong in my point of view, it would be nice to get other's opinions. 😁

I will wait a few more days.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have a clear opinion on this, I think. Some considerations:

I do agree pickle kind of gives me the ick. I feel like more of those could be relevant, and I don't think the security argument is necessarily moot. Take this with a grain of salt because I haven't worked with custom reporting much, but where are the hooks coming from vs. where is the JSON data coming from?

I feel like it's not uncommon to have the hooks from a trusted source (e.g. a custom plugin), but reading JSON data from an untrusted source? Say e.g. I build a "fake pytest result viewer" which takes a JSON report from someone and shows me a fake terminal session of how the run looked. It seems like quite a surprising responsibility if my (trusted) viewer runs arbitrary code when I don't carefully audit the (intransparent) binary data in the JSON report, no?

I thought the idea of having JSON reports is precisely that they can be used by third-party tools and not just pytest itself (and pytest-xdist), no?

Copy link
Member Author

@nicoddemus nicoddemus Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a confusion here regarding what it means by "JSON-like" returned by pytest_report_to_serializable. It was originally created just as a way to customize report serialization with pytest-subtests, specially for SubTestReport from pytest-subtests.

It is meant to be the counterpart of pytest_report_from_serializable, meaning that users shouldn't even rely on what pytest_report_to_serializable returns, it is only meant to be passed along to pytest_report_from_serializable to obtain the restored report. The return value is not meant to be looked at or consumed in any way, and is subject to change.

But I needed to document what pytest_report_from_serializable should return, so I went with "JSON-like" which might have been unfortunate because it is not what I intended.

In other words, the intended usage of those hooks is:

data = pytest_report_to_serializable(sub_report)
loaded = pytest_report_from_serializable(data)
assert sub_report == loaded

Having said all that, it seems I failed to convince my POV on how this should work, so I will go ahead and convert the kwargs on input and use saferepr as suggested.

This will result in:

data = pytest_report_to_serializable(sub_report)
loaded = pytest_report_from_serializable(data)
assert sub_report != loaded  # loaded.kwargs is not a dict, but a saferepr() of the original sub_report.kwargs.

This won't matter for pytest-xdist which just prints it, but might affect user code or some other plugins that expected the first property (sub_report == loaded) to hold.

Ultimately, I think this is the wrong call, because we will be effectively throwing away user data on the off-chance that we might eventually serialize it with pytest_report_to_serializable hook, but I don't seem to be able to convince folks so I defer to your judgment call on this one (I'm not a big fan of pickle either, but can't think of a generic way to save a generic dictionary).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying you're wrong here, for what it's worth. I haven't really looked at how reports or those hooks work in detail personally, so take what I said with a grain of salt. I think it does make sense with your explanation, and perhaps the issue you mention could be solved by making it more clear in the documentation?

@nicoddemus nicoddemus force-pushed the xdist-subtests-serialization branch 2 times, most recently from 53f50a5 to f8da2cb Compare November 22, 2025 12:23
@nicoddemus nicoddemus force-pushed the xdist-subtests-serialization branch from f8da2cb to fdfa22f Compare November 22, 2025 14:47
def _to_json(self) -> dict[str, Any]:
return dataclasses.asdict(self)
result = dataclasses.asdict(self)
# Use protocol 0 because it is human-readable and guaranteed to be not-binary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is really worth it. Human-readable is quite a stretch:

>>> print(pickle.dumps({"a": "b"}, protocol=0).decode("utf-8"))
(dp0
Va
p1
Vb
p2
s.

The Python docs also say:

Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations.

Which implies:

  • Version 4 would probably work with more things (and IMHO it being binary is a small price to pay, the ASCII-format doesn't seem much more useful to me at all)
  • This also doesn't give us a guarantee that the serialization will work, so we still need to deal with that problem, which I think isn't done here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 9.0.x apply to PRs at any point; backports the changes to the 9.0.x branch bot:chronographer:provided (automation) changelog entry is part of PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

execnet.gateway_base.DumpError: can't serialize <enum 'SortBy'> with pytest==9.0.0 and while using pytest-xdist

3 participants