Core: Use partition filtering for dangling DV detection in ManifestFilterManager#15671
Core: Use partition filtering for dangling DV detection in ManifestFilterManager#15671hemanthboyina wants to merge 3 commits intoapache:mainfrom
Conversation
|
I think the existing test cases should be good to validate this. |
anoopj
left a comment
There was a problem hiding this comment.
Clean implementation, thanks @hemanthboyina.
Just one optional suggestion: could we add a test that exercises the optimization? e.g., a partitioned table where data files are removed from one partition, verifying that delete manifests for non-overlapping partitions are correctly skipped. Existing tests validate correctness, but a targeted test would protect against future regressions in the filtering logic.
Otherwise LGTM.
|
added the tests too @anoopj , please review |
|
@singhpk234 @nastra @amogh-jahagirdar can you please review |
anoopj
left a comment
There was a problem hiding this comment.
Please get this reviewed by a committer.
|
can you please help in review this @amogh-jahagirdar @danielcweeks @manuzhang , thanks |
In ManifestFilterManager#removeDanglingDeletesFor receives DataFiles which have partition info but only stores their paths as strings. This means ManifestFilterManager#canContainDroppedFiles returns true for every delete manifest since there's no partition info to filter against. The existing ManifestFilterManager#delete(F File) does this correctly, it stores both the file and a PartitionSet. then uses this information in ManifestFileUtil#canContainAny to skip manifests whose partition range doesn't overlap.