Skip to content

Data: Add TCK tests for Metadata Columns in BaseFormatModelTests#15675

Open
Guosmilesmile wants to merge 11 commits intoapache:mainfrom
Guosmilesmile:tck_metadata
Open

Data: Add TCK tests for Metadata Columns in BaseFormatModelTests#15675
Guosmilesmile wants to merge 11 commits intoapache:mainfrom
Guosmilesmile:tck_metadata

Conversation

@Guosmilesmile
Copy link
Copy Markdown
Contributor

This pr add TCK tests for metadata column reading in BaseFormatModelTests.

Metadata Colums:

  • FILE_PATH
  • SPEC_ID
  • ROW_POSITION
  • IS_DELETED
  • Lineage
    • ROW_ID
    • LAST_UPDATED_SEQUENCE_NUMBER
  • PARTITION_COLUMN
    • Transformations
    • Partition evolution (adding and removing columns)

Part of #15415

@github-actions github-actions bot added the data label Mar 18, 2026
@Guosmilesmile Guosmilesmile force-pushed the tck_metadata branch 4 times, most recently from 12a56cf to eca8c4d Compare March 20, 2026 16:46
@Guosmilesmile Guosmilesmile marked this pull request as draft March 21, 2026 15:21
@Guosmilesmile Guosmilesmile marked this pull request as ready for review March 22, 2026 12:28
Comment thread data/src/test/java/org/apache/iceberg/data/BaseFormatModelTests.java Outdated
new String[] {FEATURE_FILTER, FEATURE_CASE_SENSITIVE, FEATURE_SPLIT},
FileFormat.ORC,
new String[] {FEATURE_REUSE_CONTAINERS});
new String[] {FEATURE_REUSE_CONTAINERS, FEATURE_META_ROW_LINEAGE});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to implement this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should work. I'll give it a try in the next PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Peter, the corresponding PR has been submitted. #15776

Comment thread data/src/test/java/org/apache/iceberg/data/BaseFormatModelTests.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/data/BaseFormatModelTests.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/data/BaseFormatModelTests.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/data/BaseFormatModelTests.java Outdated

@ParameterizedTest
@FieldSource("FILE_FORMATS")
void testReadMetadataColumnPartitionEvolutionAddColumn(FileFormat fileFormat) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a test with addColumnWithDefaultReadValue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add testReaderSchemaEvolutionNewColumnWithDefault , and found orc don't support it.

if (field.initialDefault() != null) {
throw new UnsupportedOperationException(
String.format(
"ORC cannot read default value for field %s (%s): %s",
root.findColumnName(fieldId), type, field.initialDefault()));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either create PR for it and we can start working on that, or create an issue and link it here.

@FieldSource("FILE_FORMATS")
void testReaderSchemaEvolutionNewColumnWithDefault(FileFormat fileFormat) throws IOException {

assumeSupports(fileFormat, FEATURE_READER_DEFAULT);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a PR which adds this to the ORC reader?

Copy link
Copy Markdown
Contributor Author

@Guosmilesmile Guosmilesmile Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently none. That PR should be about supporting bloodlines in Orcs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supporting bloodlines in Orcs

This is a funny transaltion 😅

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lineage.. My bad.

Comment thread data/src/test/java/org/apache/iceberg/data/BaseFormatModelTests.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/data/BaseFormatModelTests.java Outdated
@Guosmilesmile
Copy link
Copy Markdown
Contributor Author

})
.toList();

readAndAssertGenericRecords(fileFormat, evolvedSchema, expectedGenericRecords);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could save on the conversion as well. What if we just do this:

readAndAssertGenericRecords(fileFormat, evolvedSchema, record -> {
                  Record expected = GenericRecord.create(evolvedSchema);
                  for (Types.NestedField col : writeSchema.columns()) {
                    expected.setField(col.name(), record.getField(col.name()));
                  }

                  expected.setField("col_f", defaultStringValue);
                  expected.setField("col_g", defaultIntValue);
                  return expected;
                });

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to pass genericRecords into the method, otherwise there's nowhere to call this function conversion. Is the difference not that significant, or did I miss something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we need to pass the genericRecords too.
The diff is not that significant here, but it is more pronounced where the conversion is simpler

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a method to do it now.

Comment on lines +708 to +716
List<Record> expected =
IntStream.range(0, genericRecords.size())
.mapToObj(
i ->
GenericRecord.create(projectionSchema)
.copy(MetadataColumns.FILE_PATH.name(), filePath))
.toList();

readAndAssertMetadataColumn(fileFormat, projectionSchema, idToConstant, expected);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pattern than readAndAssertGenericRecords

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Overload readAndAssertMetadataColumn keep the same pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants