Skip to content

oauth2: harden token refresh parsing#11618

Open
erain wants to merge 2 commits intofluent:masterfrom
erain:yiyu/oauth2-refresh-parse
Open

oauth2: harden token refresh parsing#11618
erain wants to merge 2 commits intofluent:masterfrom
erain:yiyu/oauth2-refresh-parse

Conversation

@erain
Copy link
Contributor

@erain erain commented Mar 24, 2026

Problem

  • OAuth2 token refresh accepted partial or malformed responses and mutated the live token state while parsing.
  • That could mask refresh failures, leave Fluent Bit holding inconsistent credentials, or treat invalid expires_in values as usable.

Solution

  • Parse refresh responses into temporary values and update the OAuth2 context only after access_token, token_type, and expires_in all validate.
  • Reject malformed, negative, non-numeric, and too-short expires_in values while still accepting quoted numeric expirations and duplicate keys with last-value-wins behavior.
  • Extend the internal OAuth2 tests to cover transactional refresh updates, malformed payload rejection, duplicate keys, quoted expirations, and refresh flows with the stricter validation.

Testing

  • cmake --build build --target flb-it-oauth2 -j4 && ./build/bin/flb-it-oauth2

Summary by CodeRabbit

  • Bug Fixes

    • Stricter OAuth2 token response validation: access token, token type and expires_in must be present and valid; default token_type/expires_in fallbacks removed.
    • Improved error reporting and added bounds protection during parsing.
    • Applies a 10% safety reduction to computed token lifetimes; parsing failures no longer overwrite existing token state.
  • Tests

    • Replaced broad default-parsing test with focused tests for transactional updates, quoted expires_in, duplicate-key behavior, invalid expires_in cases, and adjusted mock server parameters.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15abf7fb16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@coderabbitai
Copy link

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

JSON parsing for OAuth2 tokens was hardened: iterative bounds checking, transactional temporary holders for extracted fields, strict errno-checked numeric parsing for expires_in with negative/partial-rejects, a 10% expiry reduction applied on success, and rejection when required fields are missing. Tests expanded to cover these cases.

Changes

Cohort / File(s) Summary
OAuth2 Token Parsing
src/flb_oauth2.c
Hardened flb_oauth2_parse_json_response: added errno.h, added bounds guard during token iteration, use of intermediate new_* variables, robust bounded expires_in parsing with strtoull+errno+full-consumption checks, rejects negatives/oversized/malformed values, removes fallback defaults for token_type and expires_in, applies 10% reduction to expiry, updates ctx->expires_at only after validation, and logs include json_size on parse/top-level type failures.
OAuth2 Test Suite
tests/internal/oauth2.c
Replaced test_parse_defaults with multiple targeted tests covering transactional token refresh, quoted numeric expires_in, duplicate-key last-wins semantics, rejection of missing required fields and various invalid expires_in cases; added helpers for pre-populating SDS-backed fields; adjusted mock server expires_in and refresh_skew values; updated TEST_LIST.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • cosmo0920

Poem

🐰 I nibbled tokens, counted every byte,

Guarded the bounds and parsed through the night.
If numbers failed or fields were amiss,
I kept the old seed safe—no careless bliss.
Hops of approval, and one crunchy carrot. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'oauth2: harden token refresh parsing' clearly and concisely summarizes the main change: improving robustness of OAuth2 token refresh JSON parsing through stricter validation and transactional updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/flb_oauth2.c (1)

522-543: Consider using integer arithmetic for the 10% reduction.

The floating-point multiplication new_expires_in * 0.10 works correctly for practical OAuth2 expiry values, but integer division would be more consistent with the integer types in use and avoid any potential floating-point precision concerns.

♻️ Suggested integer arithmetic
             new_expires_in = parsed_expires_in;
-            new_expires_in -= (new_expires_in * 0.10);
+            new_expires_in -= (new_expires_in / 10);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/flb_oauth2.c` around lines 522 - 543, The code uses floating-point to
reduce new_expires_in by 10% (new_expires_in -= (new_expires_in * 0.10)); change
this to integer arithmetic using the existing unsigned integer types (e.g.,
subtract new_expires_in/10 or compute (new_expires_in * 9) / 10) to avoid FP
precision and type mixing; update the logic around
parsed_expires_in/new_expires_in in the block that handles the "expires_in" key
(see key_cmp, tmp_num, parsed_expires_in, new_expires_in) and ensure the result
remains unsigned and does not underflow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/flb_oauth2.c`:
- Around line 522-543: The code uses floating-point to reduce new_expires_in by
10% (new_expires_in -= (new_expires_in * 0.10)); change this to integer
arithmetic using the existing unsigned integer types (e.g., subtract
new_expires_in/10 or compute (new_expires_in * 9) / 10) to avoid FP precision
and type mixing; update the logic around parsed_expires_in/new_expires_in in the
block that handles the "expires_in" key (see key_cmp, tmp_num,
parsed_expires_in, new_expires_in) and ensure the result remains unsigned and
does not underflow.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec7fd1aa-007b-4c69-b7e0-63405394f543

📥 Commits

Reviewing files that changed from the base of the PR and between 32a3cbb and 15abf7f.

📒 Files selected for processing (2)
  • src/flb_oauth2.c
  • tests/internal/oauth2.c

erain added 2 commits March 24, 2026 23:00
Reject malformed refresh payloads so Fluent Bit never reuses stale OAuth2 credentials
after a parse failure. Use integer arithmetic for the 10% TTL reduction and tie the
expires_in floor to ctx->refresh_skew instead of a hardcoded 60. Log only payload size
on parse errors to avoid credential leakage.

Change-Id: I202807582b34f5be48bc4c919e230eeaa0d2b5ff
Signed-off-by: Yu Yi <yiyu@google.com>
Add regression coverage for transactional token updates, quoted expires_in,
duplicate key handling, and stricter expires_in validation. Set refresh_skew
on test contexts and adjust mock server values for integer arithmetic.

Signed-off-by: Yu Yi <yiyu@google.com>
@erain
Copy link
Contributor Author

erain commented Mar 25, 2026

Addressed all AI review feedback:

  1. Codex P1 — Use refresh_skew instead of hardcoded 60: Replaced new_expires_in < 60 with new_expires_in <= ctx->refresh_skew so the parse-time floor is consistent with oauth2_token_needs_refresh.

  2. Codex P2 — Stop logging full OAuth2 payloads: Error messages now log (size=%zu) instead of the raw response body to avoid credential leakage.

  3. CodeRabbit — Integer arithmetic: Changed new_expires_in -= (new_expires_in * 0.10) to new_expires_in -= (new_expires_in / 10) to avoid floating-point type mixing.

Also split into two commits to satisfy the commit-lint prefix validation, and added Signed-off-by for DCO.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/flb_oauth2.c (1)

477-544: ⚠️ Potential issue | 🟠 Major

Still accepts nested and non-string token fields.

This loop only advances to the immediate value token, then resumes scanning the rest of the token array. That means keys inside object/array values can still be revisited as if they were top-level pairs, and access_token / token_type are copied without checking that their value token is actually a string. A payload like {"error":"x","details":{"access_token":"nested","token_type":"Bearer","expires_in":3600}} or {"access_token":123,"token_type":"Bearer","expires_in":3600} will still be treated as a valid refresh. Please restrict extraction to direct children of the root object and require JSMN_STRING for the required token fields.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/flb_oauth2.c` around lines 477 - 544, The loop over jsmn tokens
incorrectly treats nested fields as top-level and accepts non-string values;
update the scan so you only consider direct children of the root object and
validate value token types: for each key token (the current t when iterating
tokens[]), ensure its matching value token is immediately after the key and that
the value token's type is JSMN_STRING for "access_token" and "token_type" (and a
primitive/number for "expires_in"); when the value token is JSMN_OBJECT or
JSMN_ARRAY, skip its entire subtree by advancing i past all tokens that belong
to that value (use t->size and recurse/count descendant tokens in tokens[] to
compute the skip), and only then resume scanning; apply these checks around the
key comparisons (key_cmp, new_access_token, new_token_type, new_expires_in) so
nested or non-string fields are ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/internal/oauth2.c`:
- Around line 914-917: The test is flaky because expires_in (passed to
oauth2_mock_server_start as 65 which becomes 59 after the 10% reduction) and
refresh_skew (58 passed into create_oauth_ctx) leave only a 1s window that can
straddle time(NULL) boundaries and trigger an unexpected refresh; to fix, widen
the gap so the token is not refreshable immediately (e.g., increase the
expires_in value passed to oauth2_mock_server_start or decrease the refresh_skew
passed to create_oauth_ctx) or explicitly force the refresh timing in
test_caching_and_refresh so flb_oauth2_get_access_token cannot refresh on the
second call; adjust the arguments where oauth2_mock_server_start(&server, 65, 0)
and create_oauth_ctx(config, &server, 58) are invoked accordingly and ensure the
token_requests == 1 assertion remains valid.

---

Outside diff comments:
In `@src/flb_oauth2.c`:
- Around line 477-544: The loop over jsmn tokens incorrectly treats nested
fields as top-level and accepts non-string values; update the scan so you only
consider direct children of the root object and validate value token types: for
each key token (the current t when iterating tokens[]), ensure its matching
value token is immediately after the key and that the value token's type is
JSMN_STRING for "access_token" and "token_type" (and a primitive/number for
"expires_in"); when the value token is JSMN_OBJECT or JSMN_ARRAY, skip its
entire subtree by advancing i past all tokens that belong to that value (use
t->size and recurse/count descendant tokens in tokens[] to compute the skip),
and only then resume scanning; apply these checks around the key comparisons
(key_cmp, new_access_token, new_token_type, new_expires_in) so nested or
non-string fields are ignored.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9027142f-b9f1-4b1a-ac2b-2ffb1552871b

📥 Commits

Reviewing files that changed from the base of the PR and between 15abf7f and b418afe.

📒 Files selected for processing (2)
  • src/flb_oauth2.c
  • tests/internal/oauth2.c

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/flb_oauth2.c (1)

477-552: ⚠️ Potential issue | 🟠 Major

Malformed refresh payloads can still pass this validation.

The loop accepts access_token/token_type from any token slice and ignores error, so responses like {"error":"invalid_grant","access_token":"x","token_type":"Bearer","expires_in":3600}, {"meta":{"access_token":"x"},"token_type":"Bearer","expires_in":3600}, or {"access_token":null,"token_type":"Bearer","expires_in":3600} still commit new state. That reopens the partial/malformed acceptance this PR is trying to close. Please require top-level non-empty string values for the required fields and fail fast on any error member, and add regression cases for those shapes.

Possible tightening
+    int has_error = FLB_FALSE;
...
+        if (key_cmp(key, key_len, "error") == 0) {
+            has_error = FLB_TRUE;
+            break;
+        }
+
+        if (t->type == JSMN_OBJECT || t->type == JSMN_ARRAY) {
+            break;
+        }
+
         if (key_cmp(key, key_len, "access_token") == 0) {
+            if (t->type != JSMN_STRING || val_len == 0) {
+                break;
+            }
             ...
         }
         else if (key_cmp(key, key_len, "token_type") == 0) {
+            if (t->type != JSMN_STRING || val_len == 0) {
+                break;
+            }
             ...
         }
...
-    if (!new_access_token || !new_token_type || new_expires_in <= ctx->refresh_skew) {
+    if (has_error || !new_access_token || !new_token_type ||
+        new_expires_in <= ctx->refresh_skew) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/flb_oauth2.c` around lines 477 - 552, The parser currently accepts nested
or non-string values and ignores an "error" field; update the loop in
flb_oauth2.c (the tokens/t/t->type handling around key_cmp, new_access_token,
new_token_type, new_expires_in) to: 1) if key equals "error" (compare via
key_cmp) return/fail fast; 2) require the value token type to be the expected
primitive (for access_token and token_type require t->type == JSMN_STRING and
val_len > 0) before creating flb_sds; 3) reject non-string/null/nested values
(do not accept JSMN_NULL or JSMN_OBJECT/ARRAY) and treat them as errors; 4) for
expires_in ensure the value token is a JSMN_PRIMITIVE and still validate numeric
parsing as existing code does; add regression tests for shapes like
{"error":"...","access_token":"x"}, nested keys, and null access_token to ensure
they fail.
♻️ Duplicate comments (1)
tests/internal/oauth2.c (1)

914-917: ⚠️ Potential issue | 🟡 Minor

This is still only a one-second refresh margin.

After the 10% haircut, 65 becomes 59; with refresh_skew=58, the cached token becomes refreshable one second after issuance. If the two flb_oauth2_get_access_token() calls land on different time(NULL) ticks, server.token_requests == 1 can still fail intermittently. Please widen the gap or make the refresh step explicit.

Suggested tweak
-    ret = oauth2_mock_server_start(&server, 65, 0);
+    ret = oauth2_mock_server_start(&server, 68, 0);
...
-    ctx = create_oauth_ctx(config, &server, 58);
+    ctx = create_oauth_ctx(config, &server, 60);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/internal/oauth2.c` around lines 914 - 917, The test's timing window is
too narrow: increase the token lifetime passed to oauth2_mock_server_start
(currently 65) or reduce the refresh_skew passed to create_oauth_ctx so that
after the 10% haircut the token becomes refreshable with a multi-second margin
(e.g., change 65 to ~80+ or ensure lifetime - haircut >= refresh_skew + 2), or
alternatively make the refresh explicit by advancing time or calling the refresh
path directly before the second flb_oauth2_get_access_token() call; update the
values around oauth2_mock_server_start and create_oauth_ctx to guarantee a
stable gap and avoid intermittent failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/flb_oauth2.c`:
- Around line 477-552: The parser currently accepts nested or non-string values
and ignores an "error" field; update the loop in flb_oauth2.c (the
tokens/t/t->type handling around key_cmp, new_access_token, new_token_type,
new_expires_in) to: 1) if key equals "error" (compare via key_cmp) return/fail
fast; 2) require the value token type to be the expected primitive (for
access_token and token_type require t->type == JSMN_STRING and val_len > 0)
before creating flb_sds; 3) reject non-string/null/nested values (do not accept
JSMN_NULL or JSMN_OBJECT/ARRAY) and treat them as errors; 4) for expires_in
ensure the value token is a JSMN_PRIMITIVE and still validate numeric parsing as
existing code does; add regression tests for shapes like
{"error":"...","access_token":"x"}, nested keys, and null access_token to ensure
they fail.

---

Duplicate comments:
In `@tests/internal/oauth2.c`:
- Around line 914-917: The test's timing window is too narrow: increase the
token lifetime passed to oauth2_mock_server_start (currently 65) or reduce the
refresh_skew passed to create_oauth_ctx so that after the 10% haircut the token
becomes refreshable with a multi-second margin (e.g., change 65 to ~80+ or
ensure lifetime - haircut >= refresh_skew + 2), or alternatively make the
refresh explicit by advancing time or calling the refresh path directly before
the second flb_oauth2_get_access_token() call; update the values around
oauth2_mock_server_start and create_oauth_ctx to guarantee a stable gap and
avoid intermittent failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 128cc8ed-d854-4240-973e-b098d1f01a15

📥 Commits

Reviewing files that changed from the base of the PR and between b418afe and 3754123.

📒 Files selected for processing (2)
  • src/flb_oauth2.c
  • tests/internal/oauth2.c

@erain
Copy link
Contributor Author

erain commented Mar 25, 2026

CI failure analysis: run-macos-unit-tests (-DFLB_SANITIZE_MEMORY=On) is unrelated

The failing tests in this job are not related to the oauth2 changes in this PR:

  • flb-rt-out_counter — also fails on master in the same macOS SANITIZE_MEMORY job. Pre-existing flaky test.
  • flb-it-input_chunk — a networking test (input_chunk.c:892) that fails trying to connect to 127.0.0.1:1 (connection refused / TCP timeout). No relation to oauth2.

This PR only modifies src/flb_oauth2.c and tests/internal/oauth2.c. The oauth2 test suite (flb-it-oauth2) passes in this exact job run (3.39s). Both required checks (Check Commit Message, Unit tests (matrix)) pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant