Data: Add TCK tests for Metadata Columns in BaseFormatModelTests#15675
Data: Add TCK tests for Metadata Columns in BaseFormatModelTests#15675Guosmilesmile wants to merge 11 commits intoapache:mainfrom
Conversation
12a56cf to
eca8c4d
Compare
eca8c4d to
486652f
Compare
| new String[] {FEATURE_FILTER, FEATURE_CASE_SENSITIVE, FEATURE_SPLIT}, | ||
| FileFormat.ORC, | ||
| new String[] {FEATURE_REUSE_CONTAINERS}); | ||
| new String[] {FEATURE_REUSE_CONTAINERS, FEATURE_META_ROW_LINEAGE}); |
There was a problem hiding this comment.
How hard would it be to implement this?
There was a problem hiding this comment.
I think it should work. I'll give it a try in the next PR.
There was a problem hiding this comment.
Hi Peter, the corresponding PR has been submitted. #15776
|
|
||
| @ParameterizedTest | ||
| @FieldSource("FILE_FORMATS") | ||
| void testReadMetadataColumnPartitionEvolutionAddColumn(FileFormat fileFormat) throws IOException { |
There was a problem hiding this comment.
Could we have a test with addColumnWithDefaultReadValue?
There was a problem hiding this comment.
Add testReaderSchemaEvolutionNewColumnWithDefault , and found orc don't support it.
iceberg/orc/src/main/java/org/apache/iceberg/orc/ORCSchemaUtil.java
Lines 407 to 413 in 5dff6f6
There was a problem hiding this comment.
Either create PR for it and we can start working on that, or create an issue and link it here.
53c0dd2 to
d0d35a0
Compare
| @FieldSource("FILE_FORMATS") | ||
| void testReaderSchemaEvolutionNewColumnWithDefault(FileFormat fileFormat) throws IOException { | ||
|
|
||
| assumeSupports(fileFormat, FEATURE_READER_DEFAULT); |
There was a problem hiding this comment.
Do we have a PR which adds this to the ORC reader?
There was a problem hiding this comment.
Currently none. That PR should be about supporting bloodlines in Orcs.
There was a problem hiding this comment.
supporting bloodlines in Orcs
This is a funny transaltion 😅
There was a problem hiding this comment.
Lineage.. My bad.
1cc1e7b to
0dad664
Compare
0dad664 to
4e63bbc
Compare
|
mark https://github.com/apache/iceberg/actions/runs/24493984088/job/71584757252?pr=15675 flink run cancelled after 60m. |
| }) | ||
| .toList(); | ||
|
|
||
| readAndAssertGenericRecords(fileFormat, evolvedSchema, expectedGenericRecords); |
There was a problem hiding this comment.
Maybe we could save on the conversion as well. What if we just do this:
readAndAssertGenericRecords(fileFormat, evolvedSchema, record -> {
Record expected = GenericRecord.create(evolvedSchema);
for (Types.NestedField col : writeSchema.columns()) {
expected.setField(col.name(), record.getField(col.name()));
}
expected.setField("col_f", defaultStringValue);
expected.setField("col_g", defaultIntValue);
return expected;
});
There was a problem hiding this comment.
We also need to pass genericRecords into the method, otherwise there's nowhere to call this function conversion. Is the difference not that significant, or did I miss something?
There was a problem hiding this comment.
You are right, we need to pass the genericRecords too.
The diff is not that significant here, but it is more pronounced where the conversion is simpler
There was a problem hiding this comment.
Add a method to do it now.
| List<Record> expected = | ||
| IntStream.range(0, genericRecords.size()) | ||
| .mapToObj( | ||
| i -> | ||
| GenericRecord.create(projectionSchema) | ||
| .copy(MetadataColumns.FILE_PATH.name(), filePath)) | ||
| .toList(); | ||
|
|
||
| readAndAssertMetadataColumn(fileFormat, projectionSchema, idToConstant, expected); |
There was a problem hiding this comment.
Same pattern than readAndAssertGenericRecords
There was a problem hiding this comment.
Got it. Overload readAndAssertMetadataColumn keep the same pattern.
This pr add TCK tests for metadata column reading in BaseFormatModelTests.
Metadata Colums:
Part of #15415