-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Ensure subtest's context kwargs are JSON serializable #13963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Ensure subtest's context kwargs are JSON serializable #13963
Conversation
Convert all the values of `SubtestContext.kwargs` to strings using `saferepr`. This complies with the requirement that the returned dict from `pytest_report_to_serializable` is serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON. Fixes pytest-dev/pytest-xdist#1273
a3cf834 to
7585f45
Compare
bluetech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments for your consideration.
src/_pytest/subtests.py
Outdated
| return dataclasses.asdict(self) | ||
| result = dataclasses.asdict(self) | ||
| # Brute-force the returned kwargs dict to be JSON serializable (pytest-dev/pytest-xdist#1273). | ||
| result["kwargs"] = {k: saferepr(v) for (k, v) in result["kwargs"].items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should strive to have the property that from_json(to_json(report)) == report. In this case it means that we should have self.kwargs itself should be serializable, not just in to_json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should strive to have the property that from_json(to_json(report)) == report.
I agree this is desirable, but I don't see how we can accomplish this in a general manner, for any type of object.
In unittest subtests, kwargs is permitted to contain any type of object, because it is not serialized to anything and used only for reporting.
I don't see how we can support that in functions that serialize to JSON.
Perhaps I'm missing something, could you elaborate a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm thinking is that kwargs itself contain strings, not the objects themselves, that would of course make it serializable. It is not exposed in the API so should be fine? Unless the original objects are needed for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well once they are serialized and come back unserialized on the other side, unfortunately they might not be useful depending on the intent. For reporting this should be fine, but if a plugin is using the serialization hooks and expecting the actual objects to be returned this would be a problem.
We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.), but for other objects I don't see another solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.)
Implemented that, take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll be getting an arbitration :) So please go ahead with the change (I'll approve).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong in my point of view, it would be nice to get other's opinions. 😁
I will wait a few more days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have a clear opinion on this, I think. Some considerations:
I do agree pickle kind of gives me the ick. I feel like more of those could be relevant, and I don't think the security argument is necessarily moot. Take this with a grain of salt because I haven't worked with custom reporting much, but where are the hooks coming from vs. where is the JSON data coming from?
I feel like it's not uncommon to have the hooks from a trusted source (e.g. a custom plugin), but reading JSON data from an untrusted source? Say e.g. I build a "fake pytest result viewer" which takes a JSON report from someone and shows me a fake terminal session of how the run looked. It seems like quite a surprising responsibility if my (trusted) viewer runs arbitrary code when I don't carefully audit the (intransparent) binary data in the JSON report, no?
I thought the idea of having JSON reports is precisely that they can be used by third-party tools and not just pytest itself (and pytest-xdist), no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a confusion here regarding what it means by "JSON-like" returned by pytest_report_to_serializable. It was originally created just as a way to customize report serialization with pytest-subtests, specially for SubTestReport from pytest-subtests.
It is meant to be the counterpart of pytest_report_from_serializable, meaning that users shouldn't even rely on what pytest_report_to_serializable returns, it is only meant to be passed along to pytest_report_from_serializable to obtain the restored report. The return value is not meant to be looked at or consumed in any way, and is subject to change.
But I needed to document what pytest_report_from_serializable should return, so I went with "JSON-like" which might have been unfortunate because it is not what I intended.
In other words, the intended usage of those hooks is:
data = pytest_report_to_serializable(sub_report)
loaded = pytest_report_from_serializable(data)
assert sub_report == loadedHaving said all that, it seems I failed to convince my POV on how this should work, so I will go ahead and convert the kwargs on input and use saferepr as suggested.
This will result in:
data = pytest_report_to_serializable(sub_report)
loaded = pytest_report_from_serializable(data)
assert sub_report != loaded # loaded.kwargs is not a dict, but a saferepr() of the original sub_report.kwargs.This won't matter for pytest-xdist which just prints it, but might affect user code or some other plugins that expected the first property (sub_report == loaded) to hold.
Ultimately, I think this is the wrong call, because we will be effectively throwing away user data on the off-chance that we might eventually serialize it with pytest_report_to_serializable hook, but I don't seem to be able to convince folks so I defer to your judgment call on this one (I'm not a big fan of pickle either, but can't think of a generic way to save a generic dictionary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying you're wrong here, for what it's worth. I haven't really looked at how reports or those hooks work in detail personally, so take what I said with a grain of salt. I think it does make sense with your explanation, and perhaps the issue you mention could be solved by making it more clear in the documentation?
53f50a5 to
f8da2cb
Compare
f8da2cb to
fdfa22f
Compare
| def _to_json(self) -> dict[str, Any]: | ||
| return dataclasses.asdict(self) | ||
| result = dataclasses.asdict(self) | ||
| # Use protocol 0 because it is human-readable and guaranteed to be not-binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is really worth it. Human-readable is quite a stretch:
>>> print(pickle.dumps({"a": "b"}, protocol=0).decode("utf-8"))
(dp0
Va
p1
Vb
p2
s.The Python docs also say:
Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations.
Which implies:
- Version 4 would probably work with more things (and IMHO it being binary is a small price to pay, the ASCII-format doesn't seem much more useful to me at all)
- This also doesn't give us a guarantee that the serialization will work, so we still need to deal with that problem, which I think isn't done here
Convert all the values of
SubtestContext.kwargsto strings usingsaferepr.This complies with the requirement that the returned dict from
pytest_report_to_serializableis serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON.Fixes pytest-dev/pytest-xdist#1273