Skip to content

reject unknown fields#139

Open
the-infinity wants to merge 3 commits intomainfrom
prevent-additional-properties
Open

reject unknown fields#139
the-infinity wants to merge 3 commits intomainfrom
prevent-additional-properties

Conversation

@the-infinity
Copy link
Contributor

For strict validation (eg validation services), it makes sense to be able to prevent additional properties. This MR provides this feature.

Open question: it might make sense to enable prevent additional attributes via env or something else, too, to make a strict validation in tests and therefore find inconsistencies. One could just take os.env for that. Do you think this is a good idea? And if yes, how should we call the env var? just VALIDATACLASS_PREVENT_ADDITIONAL_ATTRIBUTES?

@binaryDiv
Copy link
Contributor

Thanks for the PR!

First of all, terminology:

  1. In dataclasses (including validataclasses), we're usually speaking of "fields" rather than attributes or properties. Generally speaking, an attribute is an attribute in the literal sense of Python, a member variable of a class or object. A field is an attribute with a type annotation in a dataclass, for which an argument in __init__ is generated etc. In a validataclass, a field is an attribute that has a type annotation and a validator. More importantly, a field is what gets validated by the DataclassValidator. (And a property is a method that's decorated with @property.)
  2. By default, keys in an input dictionary are simply ignored if they don't have a corresponding field in the dataclass. We never get "additional attributes/fields" (this would imply they're added to and stored in the validated object and can be accessed).
  3. Prevent" also doesn't imply an error, so "prevent additional fields" really just describes the default behaviour. We want to reject fields in the input data that don't exist in the dataclass, i.e. unknown or non-existant fields. That would also be more in line with the RejectValidator. In a way, your new option is kind of like setting a RejectValidator as the fallback validator for unknown fields.

I would strongly suggest to use the wording that I've already suggested in our call last week: "reject unknown fields".

Secondly, about the implementation approach: Currently it's a setting on a validataclass that gets read by the DataclassValidator. That means a validataclass will always either allow or reject unknown fields. Instead, I would suggest to leave the validataclasses untouched and add the setting to the DataclassValidator directly. This allows more flexibility, because you can use the same validataclass with a different validator to either allow or reject unknown fields. Which also makes it easier to allow/reject based on the context or environment, because you don't need to change the validataclass for that, just the validator.

Which brings me to the third point: "One could just take os.env for that." - I'm against this for multiple reasons. One, it adds a kind of complexity and dependency to the user application (and even OS) that I don't think belongs in this library. The library is designed to be simple and relatively agnostic of how it's used. Second, we already have a system for context-sensitive validation (context arguments). If anything, we should use that. But I also think we shouldn't really implement something that's highly application-dependent into the library, but rather design it in a way that enables the library user to implement it themselves however fits best to their use case. And I think by moving the setting from the validataclass decorator to the DataclassValidator, we already give the user full flexibility of how to use this new feature - for example by subclassing and extending the DataclassValidator.

For example, if you have a project where unknown fields should always (or in most cases) be rejected, you can subclass the DataclassValidator and just set reject_unknown_fields=True as the default. If you want to set this depending on the context (e.g. for debugging purposes based on the app config or even for single API requests if a ?debug=true URL parameter is set or something like that), you can subclass the DataclassValidator and set the field depending on the context in whatever way works best for your application (current_app.config, os.env, or pass it as a context argument).

Would you agree with all this or do you see problems / have better ideas?

@the-infinity
Copy link
Contributor Author

Will rename the field accordingly, I just came from the JSON Schema wirld where this makes sense.

About the DataclassValidator: it makes sense to have them both. The decorator-approach makes most sense for JSON Schema translations, where this additionalProperties is a property of the object definition. If you auto-translate JSON Schema into validataclass and therefore create a library of validated objects, you don't know where its's used, so having it at the dataclass directly is important there. This can of course be overwritten (or first time set) at the DataclassValidator . Therefore, why not fulfilling both needs? Will add it to the MR.

About os.env: I like the idea of having a CI testing in struct mode, and normal operations not. By setting the CI in struct mode, you can catch accidental fields in test data which were not handled by accident. Reason for this in general: I like systems which check my code, not manual work. Also, it makes a lot of sense for DATEX2 validation again: DATEX2 is multiple hundred validataclasses. You don't want to subclass all of them and rebuild the whole validataclass tree if you want to have a not so strict normal input validation, but a strict web validator which actually checks if your DATEX2 is valid. This apples to any other complex data model, too: subclassing the specific validataclass means you have to rebuild the whole tree of validators. Subclassing to add a debug attribute means that one has to replace all usages in a project with the sublassed ExtendedDataclassValidator, which is also quite a lot of change for a simple switch. I am open to other solutions there, they should just not end up into too much work on the usage side.. Not important in the first step, though, but I would like to get to this somehow.

@binaryDiv
Copy link
Contributor

About the DataclassValidator: it makes sense to have them both.

So the validataclass can define the default behaviour and the DataclassValidator can override that. Yes, I think that makes sense. I thought about that too but thought it's a bit redundant, and my assumption was that in your project you would probably need that setting for every validataclass and not just for a specific subset of them. But if you have a good use case where it makes sense to have that as an inherent property of a validataclass, that makes sense.

You don't want to subclass all of them and rebuild the whole validataclass tree

No, I think you misunderstood me. I wasn't speaking of subclassing the validataclass to add a debug flag, that really doesn't make sense. But it does make sense to define a custom DataclassValidator for your specific application that you can adjust to your needs, rather than enforcing a very specific way to enable a feature via env variable. That's also one of the main design goals of this library: Keep things simple but extendable. It's not a workaround to subclass the DataclassValidator, that's intended usage. You can also keep the name "DataclassValidator", then you don't need to change every usage but only the import lines.

What complicates things is also that you might not actually want "debug mode" to affect every DataclassValidator. Unknown fields are not always an error case, sometimes you explicitly want to validate only a subset of fields (think of validation for responses of outgoing API requests - you may be interested in only 2 fields of the entire response, why validate the entire request?) This use case can be easily solved with the subclassing approach, just use the regular DataclassValidator for those objects. If it's a built-in feature of the DataclassValidator that reads os.env or something like that, you also need to build in a way to "force ignore" unknown fields, so now you have a lot of possible combinations for "flag in validataclass", "flag in DataclassValidator that overrides flag in validataclass", "flag in env that overrides otherflags", "flag that overrides the env flag" etc...

There may be ways to provide a simple and usage-agnostic way to do this, like some sort of global context/configuration of library behavior, but I think it's out of scope for this PR because there's a lot of things to consider to do that right. For now, please just leave it at 1. an option in the validataclass decorator to set the default behaviour, 2. an option in the DataclassValidator to override the default behaviour.

@the-infinity the-infinity force-pushed the prevent-additional-properties branch from 14ca883 to 81e5b42 Compare February 18, 2026 20:02
@the-infinity the-infinity changed the title prevent additional properties reject unknown fields Feb 18, 2026
@the-infinity
Copy link
Contributor Author

Changed the naming and added the DataclassValidator option. Did postpone the env thing as I want to play around with that a bit before jumping to conclusions.

]


class UnknownFieldsError(ValidationError):
Copy link
Contributor

@binaryDiv binaryDiv Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this validation error. It feels a bit inconsistent to me, because usually validation errors about a specific field in a dictionary/object are collected in a DictFieldsValidationError.

Also we already have the FieldNotAllowedError that is raised by the RejectValidator, and essentially it describes the same kind of error: A field was given in a dictionary that is not allowed and therefore rejected. And basically the reject_unknown_fields feature can be seen as having RejectValidator as the default value for unknown fields.

(Actually, you could literally implement this feature by setting default_validator=RejectValidator('Unknown field') or similar in the DictValidator that the DataclassValidator is based on.)

Do you see a specific reason why the UnknownFieldsError with a list of unknown fields makes more sense or is easier to handle than the regular DictFieldsValidationError with FieldNotAllowedError for every unknown field?


def decorator(_cls: type[_T]) -> type[_T]:
# Pop validataclass-specific options before passing kwargs to @dataclass
reject_unknown_fields = kwargs.pop('reject_unknown_fields', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to use None as the default value here, and only set the __reject_unknown_fields__ if the parameter is explicitly set to True or False. You already use getattr with False as the default value in the DataclassValidator, so the DataclassValidator doesn't need to be adjusted for this.

I think this makes it a bit more futureproof to allow for some "global debug setting" or similar in the future, which would only be applied to dataclasses that don't set this parameter.

In other words, reject_unknown_fields=True means unknown fields are always rejected (unless overridden by the DataclassValidator), =False means unknown fields are always allowed (unless overridden by the DataclassValidator), and not setting this parameter either on the dataclass or in the validator means "use the default behaviour". Right now the default behaviour is the same as False, but in the future the default behaviour could be overridden (e.g. using env).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants