refactor: Improve constructors for testplan elements#216
Conversation
This doesn't change the behaviour (except that it might be slightly more careful about setting arbitrary field names). I've rather simplified the flow though. It now looks like this: - The derived class constructor can set some placeholder values (to help tooling see their expected types). - It then calls super().__init__ - The Element constructor checks it can find strings for the "name" and "desc" fields. - The "tags" field gets initialised to [] (so the element will have an empty list of tags if none are specified). - The Element constructor then sets all the requested fields, but only allows them to be strings or lists of strings. (No need to allow more complicated types: these are the only types we expect anyway). - It finally checks that "tags" is still a list of strings. - When we get back to the derived class constructor, we check that any other expected fields have been supplied, given the right type, and (for "stage") have a known value. - The Testpoint class also has some fields that it doesn't expect to load from the dictionary. So we check that none were supplied, then set them appropriately. Phew! Signed-off-by: Rupert Swarbrick <rswarbrick@lowrisc.org>
There was a problem hiding this comment.
Thanks @rswarbrick. In terms of error reporting, this is definitely an improvement.
As a high-level comment, this is mostly cleaning up the JSON/dict validation logic that is already there and enforcing typing better. This is great - but contemporary best practice would be to not do this ourself, and instead rely on some Python data model. This is what we've already done for other parts of DVSim and want to end up doing for Deploy, Testplan, FlowCfg etc.
As an example (very untested, probably some errors), I'd imagine this flow would end up looking something like this:
from pydantic import BaseModel, ConfigDict, model_validator
class Element(BaseModel):
model_config = ConfigDict(extra="ignore")
name: str
desc: str
tags: list[str]
# ... whatever other fields we might expect to exist in any `Element`.
# If we have optional fields that we need to access, we can either add them here like e.g.
# my_optional_field: str | None = None
# Or we can use `extra="allow"` in our config and query `.extra["my_optional_field"]`
class Covergroup(Element):
# If using `extra="allow"`...
model_config = ConfigDict(extra="allow")
# This validator is nice - other approaches are to completely separate the data
# (e.g. `CovergroupData`) from the thing consuming that data (e.g. `Covergroup`) - so
# that way we don't have to worry about attribute overlaps from magic dict merging!
# In terms of the process of refactoring, it's also considered good practice to have
# strict models at data boundaries as it splits the system up into well-typed
# boundaries that can be more easily reasoned with.
#
# Also note that if the extra attributes are just derived from this data, then e.g. properties
# or protected attrs inside the class might mean better encapsulation etc. But if we still
# want to allow extra attributes (using `extra="allow"`) and disallow certain fields, a
# validator is the way to go.
@model_validator(mode='after')
def check_allowed_extra_fields(self) -> 'Covergroup':
disallowed = {"test_results", "not_mapped"}
if self.model_extra:
disallowed = set(self.model_extra.keys()) & disallowed
if disallowed:
raise ValueError(f"Covergroup fields disallowed for use: {disallowed}")
return self
class Testpoint(Element):
tests: list[str]
stage: str
# Then just do e.g.
Testpoint.model_validate(raw_dict)
# or even:
Covergroup.model_validate_json(raw_json_str)| stages = ("N.A.", "V1", "V2", "V2S", "V3") | ||
|
|
||
| def __init__(self, raw_dict) -> None: | ||
| def __init__(self, raw_dict: dict) -> None: |
There was a problem hiding this comment.
Nit (in a couple of places) - prefer dict[str, Any] to dict for typing?
| # Reindent the multiline desc with 4 spaces. | ||
| desc = "\n".join([" " + line.lstrip() for line in self.desc.split("\n")]) | ||
| return f" {self.kind.capitalize()}: {self.name}\n Description:\n{desc}\n" | ||
| raw_dict is the dictionary parsed from the HJSon file. |
There was a problem hiding this comment.
| raw_dict is the dictionary parsed from the HJSon file. | |
| raw_dict is the dictionary parsed from the Hjson file. |
(or HJSON).
| def __init__(self, raw_dict) -> None: | ||
| """Initialize the testplan element. | ||
| Args: | ||
| d: The dictionary being read. |
There was a problem hiding this comment.
Nit: avoid single-letter variable names outside of e.g. symbols in mathematical formulae, as they tend to propagate and make the code harder to read.
| if not isinstance(raw, str): | ||
| name_comment = f" with name {elt_name}" if elt_name is not None else "" | ||
| msg = ( | ||
| f"Testplan element {name_comment}has a {field_name} field but this is not a string." |
There was a problem hiding this comment.
Looks like a missing space in the f-string? unless it's just the diff formatting?
There was a problem hiding this comment.
I think the space should come after {name_comment} (i.e. {name_comment} ), not before, since the space is at the start of the name_comment string if elt_name is not None?
machshev
left a comment
There was a problem hiding this comment.
I'd agree with @AlexJones0, the concept is good! But this should probably be a Pydantic model instead as that gives us the full schema check with type checking. It also gives us serialisation and deserialisation using syntax that looks little more than a dataclass.
Given we already have pydantic as a dependency and several models already it seems like the most logical long term solution. Though we could perhaps merge this as a short term workaround if it meets your immediate needs?
This doesn't change the behaviour (except that it might be slightly more careful about setting arbitrary field names).
I've rather simplified the flow though. It now looks like this:
The derived class constructor can set some placeholder values (to help tooling see their expected types).
It then calls super().init
The Element constructor checks it can find strings for the "name" and "desc" fields.
The "tags" field gets initialised to [] (so the element will have an empty list of tags if none are specified).
The Element constructor then sets all the requested fields, but only allows them to be strings or lists of strings. (No need to allow more complicated types: these are the only types we expect anyway).
It finally checks that "tags" is still a list of strings.
When we get back to the derived class constructor, we check that any other expected fields have been supplied, given the right type, and (for "stage") have a known value.
The Testpoint class also has some fields that it doesn't expect to load from the dictionary. So we check that none were supplied, then set them appropriately.
Phew!