Skip to content

Feature Request: Add an argument to FailureInfo.write_parquet(only_invalid_rules: bool = False) #296

@DeflateAwning

Description

@DeflateAwning

When dealing with very large schemas (e.g., 50 string columns, all with min and max lengths), the failure info parquet file ends up very large/wide. It's challenging to open, and challenging to sort horizontally through.

It'd be helpful if there was a way to only write the debugging rule output columns that were responsible for at least one failing row (i.e., only the columns with min length violations).

All the core columns from the dataframe being validated would still be included - this feature request is only scoped to the additional columns like id|min_length, id|max_length, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions