Skip to content

Fix ReDoS in RFC3966 domain validation and integer underflow in matcher#3987

Open
sharadboni wants to merge 2 commits intogoogle:masterfrom
sharadboni:fix-redos-and-underflow
Open

Fix ReDoS in RFC3966 domain validation and integer underflow in matcher#3987
sharadboni wants to merge 2 commits intogoogle:masterfrom
sharadboni:fix-redos-and-underflow

Conversation

@sharadboni
Copy link
Copy Markdown

Summary

  • Bug A (ReDoS): The rfc3966_domainlabel_ and rfc3966_toplabel_ regex patterns in phonenumberutil.cc use nested quantifiers [chars]+((\\-)*[chars])* that cause O(N^2) backtracking on adversarial inputs via ICU regex. This is reachable through the phone-context parameter in RFC3966 URIs parsed by IsPhoneContextValid(). The fix rewrites both patterns to [chars]+(-[chars]+)*, which requires at least one character after each hyphen, eliminating the quantifier nesting ambiguity.

  • Bug B (Integer underflow): In AllNumberGroupsAreExactlyPresent() in phonenumbermatcher.cc, when candidate_groups is empty, candidate_groups.size() - 1 underflows size_t (unsigned) producing a huge value. This is then cast to int and used as a vector index, causing undefined behavior. The fix adds an early return false guard when candidate_groups is empty.

Test plan

  • Verify existing C++ unit tests pass (the regex change is semantically equivalent for valid RFC3966 domain names since hyphens in DNS labels must be followed by at least one alphanumeric character)
  • Verify that parsing a phone number with a long crafted phone-context domain no longer causes quadratic regex time
  • Verify that AllNumberGroupsAreExactlyPresent returns false for empty candidate groups instead of triggering UB

Bug A: The rfc3966_domainlabel_ and rfc3966_toplabel_ regex patterns
used nested quantifiers of the form `[chars]+((\\-)*[chars])*` which
cause O(N^2) backtracking on non-matching inputs via ICU regex. This
is exploitable through the phone-context parameter in RFC3966 URIs
parsed by IsPhoneContextValid(). Rewrite patterns to
`[chars]+(-[chars]+)*` which requires at least one character after
each hyphen, eliminating quantifier ambiguity.

Bug B: In AllNumberGroupsAreExactlyPresent(), when candidate_groups is
empty, `candidate_groups.size() - 1` underflows size_t (unsigned)
producing a huge value, leading to undefined behavior when cast to int
and used as a vector index. Add an early return guard.
@sharadboni sharadboni requested a review from a team as a code owner April 15, 2026 22:51
@sharadboni
Copy link
Copy Markdown
Author

@rohininidhi Could you review this security fix? It rewrites ReDoS-vulnerable regex patterns in RFC3966 domain validation and fixes an integer underflow in AllNumberGroupsAreExactlyPresent.

The previous fix changed the domain label regex from
[chars]+((\\-)*[chars])* to [chars]+(-[chars]+)*, which broke
parsing of phone-context values like 'a--z' that contain
consecutive hyphens (valid per RFC). Use -+ instead of - to
allow one or more hyphens between alphanumeric groups, while
still preventing ReDoS since hyphen and alphanum character
classes are disjoint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant