Feature Request: Add an argument to `FailureInfo.write_parquet(only_invalid_rules: bool = False)`

When dealing with very large schemas (e.g., 50 string columns, all with min and max lengths), the failure info parquet file ends up very large/wide. It's challenging to open, and challenging to sort horizontally through.

It'd be helpful if there was a way to only write the debugging rule output columns that were responsible for at least one failing row (i.e., only the columns with min length violations).

All the core columns from the dataframe being validated would still be included - this feature request is only scoped to the additional columns like `id|min_length`, `id|max_length`, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add an argument to `FailureInfo.write_parquet(only_invalid_rules: bool = False)` #296

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add an argument to FailureInfo.write_parquet(only_invalid_rules: bool = False) #296

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Feature Request: Add an argument to `FailureInfo.write_parquet(only_invalid_rules: bool = False)` #296