Skip to content

Conversation

@shangxinli
Copy link
Contributor

Add PendingUpdate template class that provides a builder pattern API for constructing and committing table metadata changes. This interface follows the Java PendingUpdate design with C++ idioms:

  • Apply() returns Result with uncommitted changes for validation
  • Commit() returns Status and atomically commits changes to the table
  • Template parameter T allows type-safe results (Snapshot, Schema, etc.)

The interface is designed for future implementations like AppendFiles, UpdateSchema, UpdateProperties, and other table update operations.

Add PendingUpdate<T> template class that provides a builder pattern API
for constructing and committing table metadata changes. This interface
follows the Java PendingUpdate design with C++ idioms:

- Apply() returns Result<T> with uncommitted changes for validation
- Commit() returns Status and atomically commits changes to the table
- Template parameter T allows type-safe results (Snapshot, Schema, etc.)

The interface is designed for future implementations like AppendFiles,
UpdateSchema, UpdateProperties, and other table update operations.
///
/// \return Status::OK if the commit was successful, or an error status if
/// validation failed or the commit encountered conflicts
virtual Status Commit() = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define explicit Status codes for commit failure categories (validation failed, conflicts, commit-state-unknown) so callers can distinguish the same cases Java documents ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have already added some. If not, we can extend it: https://github.com/apache/iceberg-cpp/blob/main/src/iceberg/result.h#L31. Is this what you expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My meaning is that the status code returned by the Commit interface is in what scenario, and if the status code is clearly named, it can follow the definition in result.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

///
/// \tparam T The type of result returned by Apply()
template <typename T>
class ICEBERG_EXPORT PendingUpdate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both PendingUpdateBase/PendingUpdateTyped<T> and PendingUpdate<T> have advantages and disadvantages for the caller. Does iceberg-java have scenarios where PendingUpdate is used directly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to use the same approach in the Transaction, we need to make PendingUpdate non-templated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we need to collect a list of PendingUpdate, we need PendingUpdate without template parameters. We can obtain the specific implementation class through type promotion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, make sense

Deleted member functions should be public for better error messages
and to follow C++ best practices. Matches the pattern used in
TableMetadataBuilder.
Copy link

@Jinchul81 Jinchul81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't forget to add unit tests for PendingUpdate in the follow up PRs.

PendingUpdate() = default;

// Non-copyable, movable
PendingUpdate(const PendingUpdate&) = delete;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it to the public space.


// Non-copyable, movable
PendingUpdate(const PendingUpdate&) = delete;
PendingUpdate& operator=(const PendingUpdate&) = delete;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it to the public space.


namespace iceberg {

/// \brief Base class for table metadata changes using builder pattern

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't prefer you to delete the context that you're using builder pattern. This is the reason why the ctors of this class are in the protected region, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@shangxinli
Copy link
Contributor Author

Please don't forget to add unit tests for PendingUpdate in the follow up PRs.

Just added

///
/// This matches the Java Iceberg pattern where BaseTransaction stores a
/// List<PendingUpdate> without type parameters.
class ICEBERG_EXPORT PendingUpdateBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class ICEBERG_EXPORT PendingUpdateBase {
class ICEBERG_EXPORT PendingUpdate {

Should we use PendingUpdate as the base name? std::vector<std::unique_ptr<PendingUpdate>> looks nicer than std::vector<std::unique_ptr<PendingUpdateBase>> IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

///
/// \tparam T The type of result returned by Apply()
template <typename T>
class ICEBERG_EXPORT PendingUpdate : public PendingUpdateBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rename it to BasePendingUpdate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename it ti PendingUpdateTyped

@shangxinli shangxinli force-pushed the pending_update branch 2 times, most recently from e1de414 to 4852fcb Compare November 24, 2025 15:48
Added comprehensive unit tests for the PendingUpdate interface including:
- Basic Apply() success and validation failure scenarios
- Commit() success and failure scenarios
- Base class polymorphism to verify PendingUpdateBase works correctly

These tests verify the interface compiles correctly and can be properly
implemented by subclasses.
The builder pattern context explains why constructors are protected -
subclasses should be created through their builder methods, not directly
instantiated.
Renamed:
- PendingUpdateBase -> PendingUpdate (non-template base class)
- PendingUpdate<T> -> PendingUpdateTyped<T> (template class)

This makes the API more intuitive:
std::vector<std::unique_ptr<PendingUpdate>> is cleaner than
std::vector<std::unique_ptr<PendingUpdateBase>>
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. I've left some minor comments. Thanks @shangxinli!

@gty404 @Jinchul81 Do you have more comments?

kDecompressError,
kInvalid, // For general invalid errors
kInvalidArgument,
kValidationFailed,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort it alphabetically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

DEFINE_ERROR_FUNCTION(DecompressError)
DEFINE_ERROR_FUNCTION(Invalid)
DEFINE_ERROR_FUNCTION(InvalidArgument)
DEFINE_ERROR_FUNCTION(ValidationFailed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

schema_json_test.cc
table_metadata_builder_test.cc)
table_metadata_builder_test.cc
pending_update_test.cc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are unfortunately not sorted alphabetically. Could you help fix it as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor

@HeartLinked HeartLinked left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I only have a minor question and comment.

if (should_fail_commit_) {
return CommitFailed("Mock commit failed");
}
apply_called_ = true;
Copy link
Contributor

@HeartLinked HeartLinked Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove apply_called_ = true here? I don't think we should assume that Apply has already been called by the user when Commit is invoked.I haven't seen any content in the comments that requires users to guarantee this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

- Move kValidationFailed to the end of ErrorKind enum after kUnknownError
- Reorder kInvalidSchema to come after kInvalidManifestList
- Sort DEFINE_ERROR_FUNCTION calls to match enum order
- Sort table_test source files alphabetically in CMakeLists.txt
- Fix apply_called_ tracking to be set in Apply() method instead of Commit()
@wgtmac wgtmac merged commit 2adb461 into apache:main Nov 26, 2025
10 checks passed
@shangxinli shangxinli deleted the pending_update branch November 26, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants