We should write more assertions to be 100% sure that all the transformations happened as expected.
We have already some assertions to make sure that, e.g., after balancing, the dataset has 50%-50% in both classes. What other assertions should we add?
Suggestion:
- Number of features at the end is the one expected
- We have no duplicated data (to avoid some SQL query returning the same IDs more than once)
- ... ?