Skip to content

Add self-merge check to Message::MergeFrom#26742

Closed
rep0z-debug wants to merge 5 commits intoprotocolbuffers:mainfrom
rep0z-debug:fix-self-merge-check
Closed

Add self-merge check to Message::MergeFrom#26742
rep0z-debug wants to merge 5 commits intoprotocolbuffers:mainfrom
rep0z-debug:fix-self-merge-check

Conversation

@rep0z-debug
Copy link
Copy Markdown

MergeFrom lacks the self-aliasing check that CopyFrom has (message.cc:375).

In release builds, msg.MergeFrom(msg) causes heap corruption:

  • Repeated fields: backing array reallocated during iteration (read-after-free)
  • String fields: source string freed before read
  • Oneof fields: active variant destroyed before copy
  • Map fields: hash table modified during iteration

This patch:

  1. Adds if (&from == this) return; to Message::MergeFrom (matching CopyFrom)
  2. Adds if (&other == this) return; to MessageLite::CheckTypeAndMergeFrom
  3. Upgrades $DCHK$ to $CHK$ in the generated MergeImpl template

@google-cla
Copy link
Copy Markdown

google-cla bot commented Apr 6, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@ClaytonKnittel
Copy link
Copy Markdown
Contributor

We currently have debug checks for this, which I don't think we should upgrade to CHECKs. These are there to verify that the API is used correctly, and would come with a nontrivial cost to do on every MergeFrom in release builds.

As for checking this != &other and returning early, this I think actually makes less sense for MergeFrom than CopyFrom. CopyFrom is meant to copy a message into another, so if source == dest, it's a no-op. For MergeFrom, repeated/map fields need to be extended. So if source == dest, it should actually double each repeated field with a copy of itself. This would require special casing if source == dest due to concurrent modification, as you pointed out. The special casing would be an extra code path that will rarely be executed. The cost of extra code paths in generated message code is particularly expensive, since each unique message type will generate its own.

Unless we have a strong reason to support MergeFrom self, this additional cost is not worth it. Forbidding self-merge and hardening with debug checks is likely the most sensible approach.

@rep0z-debug
Copy link
Copy Markdown
Author

I understand the performance concern, but I can demonstrate that this causes memory safety issues in release builds. CopyFrom already has this check - shouldn't MergeFrom have similar protection?

A single pointer comparison seems like reasonable defensive programming.
Would you reconsider?

@ClaytonKnittel
Copy link
Copy Markdown
Contributor

My point was mainly that returning early on this == &other would actually be incorrect. If we wanted to support self-merge properly, it would be expensive code-size wise and would come with an additional branch on the happy path (i.e. the non-self-merge path).

The philosophy of a lot of Protobuf code is to verify correctness with debug checks, but don't enable those checks in release builds. Protobuf assumes your code is well-formed for release builds, which allows it to skip many otherwise redundant checks. Ideally applications should have robust integration testing with these debug checks enabled, which allows you to catch malformed code before it hits production.

If you would prefer to have debug checks enabled for release builds, you can build the protobuf library with -DDEBUG, but this will come with a runtime cost.

@rep0z-debug
Copy link
Copy Markdown
Author

I understand, but self-aliasing can happen accidentally in production (e.g., through reference passing, container operations). When it does, the result is memory corruption, not just incorrect behavior.

Other Google projects have moved toward more defensive programming for memory safety. Would there not be any middle ground here? For example, upgrading the existing DCHECK to CHECK would catch this without adding new code or branches to the happy path.

Also CopyFrom already has this check - wouldn't parity between CopyFrom and MergeFrom make sense?

@ClaytonKnittel
Copy link
Copy Markdown
Contributor

CopyFrom and MergeFrom don't do the same thing. We can't have parity between the two by returning early if this == &other, that would be incorrect. We could support self-merge, but as I explained earlier, that would be rather expensive and complex.

Code outside of Protobuf which may, even unintentionally, call self-merge is malformed code. Protobuf assumes your code is well-formed, and on hot code paths (which this is), we have decided it is not worth the performance cost to verify again that your code is indeed well-formed. Integration tests are meant to catch these problems. Upgrading to a CHECK adds a branch to this hot path.

If you have code which is prone to accidental aliasing, you could have your own runtime CHECKs before calling MergeFrom. Though I would argue fixing the code is preferable.

@rep0z-debug
Copy link
Copy Markdown
Author

I understand your position, though I respectfully disagree with the tradeoff being made here. Self-aliasing through reference passing, container operations, and callback patterns is a realistic production scenario, not just theoretical misuse. When it occurs, the result is exploitable memory corruption.

I believe a CHECK (not supporting self-merge semantics, just failing safely) is the appropriate middle ground between performance and memory safety. The performance cost of a correctly-predicted branch is negligible compared to the security and debugging costs of heap corruption in production.

That said, I understand this comes down to project philosophy and you've made the decision clear. I'll close this PR, but I stand by the assessment that this represents a memory safety issue that should be addressed.

Thank you for your time reviewing this.

@rep0z-debug rep0z-debug closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants