Skip to content

Conversation

@zhjwpku
Copy link
Collaborator

@zhjwpku zhjwpku commented Nov 14, 2025

including AddEntry, AddExistingEntry, AddDeleteEntry

@zhjwpku zhjwpku force-pushed the manifest_writer_with_add_entry branch 5 times, most recently from 138cee5 to 7a4c48e Compare November 16, 2025 13:15
@zhjwpku zhjwpku marked this pull request as ready for review November 16, 2025 14:25
@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Nov 17, 2025

Some unclear points while porting Java's TestManifestWriterVersions:

  • Should we update the ManifestFile's sequence number when writing to ManifestList?
  • Should we update ManifestEntry's status to Delete when performing AddDeleteEntry?

I've temporarily commented out some failing assertions, but the PR seems ready for review, just pay special attention to the above two questions.

@zhjwpku zhjwpku requested a review from wgtmac November 17, 2025 02:08
/// \param entry Manifest entry to write.
/// \return Status::OK() if entry was written successfully
Status Add(const ManifestEntry& entry);
Status AddEntry(const ManifestEntry& entry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding following functions to deal with different use cases:

  // Write the entry that all its fields are populated correctly.
  Status WriteEntry(const ManifestEntry& entry);

  // Write the input as a new entry to add.
  Status WriteAddedEntry(std::shared_ptr<DataFile> file,
                         std::optional<int64_t> data_sequence_number);
  Status WriteAddedEntry(const ManifestEntry& entry);

  // Write the input as an existing entry.
  Status WriteExistingEntry(std::shared_ptr<DataFile> file, int64_t file_snapshot_id,
                            int64_t data_sequence_number,
                            std::optional<int64_t> file_sequence_number);
  Status WriteExistingEntry(const ManifestEntry& entry);

  // Write the input as a deleted entry.
  Status WriteDeletedEntry(std::shared_ptr<DataFile> file, int64_t data_sequence_number,
                           std::optional<int64_t> file_sequence_number);
  Status WriteDeletedEntry(const ManifestEntry& entry);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteEntry can be the delegate function to accept writes from other functions. Inside each WriteXXXEntry function, we can just create a new entry and the copy should be cheap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I tried this, please take a look again.

@zhjwpku zhjwpku force-pushed the manifest_writer_with_add_entry branch from 7a4c48e to e0d03e6 Compare November 21, 2025 14:24
@zhjwpku zhjwpku requested a review from wgtmac November 21, 2025 14:53
entry.snapshot_id = snapshot_id_;
if (entry.sequence_number.has_value() &&
entry.sequence_number.value() < TableMetadata::kInitialSequenceNumber) {
entry.sequence_number = std::nullopt;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment why sequence_number should be initialized here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has been removed since GetSequenceNumber handle this, the reason for it been null is because for newly added data, its sequence number is inherited from manifest metadata, see [1].

[1] https://iceberg.apache.org/spec/#sequence-number-inheritance

manifest_file.added_rows_count = add_rows_count_;
manifest_file.existing_rows_count = existing_rows_count_;
manifest_file.deleted_rows_count = delete_rows_count_;
manifest_file.partitions = std::move(partition_summary_->Summaries());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify the ownership movement. Please add a comment why it is required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed to:

ICEBERG_ASSIGN_OR_RAISE(auto partition_summary, partition_summary_->Summaries());
manifest_file.partitions = std::move(partition_summary);

I agree at the version of you comment, the move is not necessary, I guess meson build would warn under warning_level=2. But after changing, the move can avoid an extra copy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After seeing Gang's comment, I change it to the following:

ICEBERG_ASSIGN_OR_RAISE(manifest_file.partitions, partition_summary_->Summaries());


/// \brief Get the relative manifest content type from name
ICEBERG_EXPORT constexpr Result<ManifestFile::Content> ManifestFileContentFromString(
ICEBERG_EXPORT constexpr Result<ManifestContent> ManifestContentFromString(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it a static function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here, are you worried that this function will be included in multiple translation units?
If so, constexpr implies inline, so this should be fine. We already use this pattern in several other places, correct me if I'm wrong.


Status Update(const Literal& value);

PartitionFieldSummary Finish() const;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment.


private:
/// \brief Create a PartitionSummary with the given field stats.
explicit PartitionSummary(std::vector<PartitionFieldStats> field_stats)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we hide the ctor? Please add a comment for reasoning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was there was a Make method, but now I remove the Make and bring the ctor to public.


Status ManifestFileAdapterV3::Append(const ManifestFile& file) {
Status ManifestFileAdapterV3::Append(ManifestFile& file) {
// TODO(zhjwpku): Should we set sequence_number/first_row_id here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file an issue and use the tracking id instead of your id. I know other code already used personal id, but it's not helpful to follow what to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These TODOs were meant to attract reviewers' closer attention. I've removed them now.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I think it generally looks good now. I've left some inline comments.

zhjwpku and others added 2 commits November 25, 2025 00:10
- unify partition summary collection
- fix manifest reader to inherit metadata
- fix first row id round trip
@wgtmac wgtmac changed the title feat: adopt partition summary & add AddEntry variants feat: add partition summary and write added/existing/deleted entries to manifest writer Nov 25, 2025
@wgtmac
Copy link
Member

wgtmac commented Nov 25, 2025

Thanks @zhjwpku for working on this and adding sufficient test cases! I've pushed some changes to fix first_row_id and sequence_number inheritance.

Thanks for your review, @Jinchul81! Do you have any other comment?

@wgtmac wgtmac merged commit c142ef9 into apache:main Nov 26, 2025
10 checks passed
@zhjwpku zhjwpku deleted the manifest_writer_with_add_entry branch November 30, 2025 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants