fix(analytics): strip NUL bytes and slim web-analytics payload#28232
fix(analytics): strip NUL bytes and slim web-analytics payload#28232harshach wants to merge 3 commits into
Conversation
PostgreSQL jsonb rejects strings containing the NUL character, which broke PUT /v1/analytics/web/events/collect whenever the UI captured page text containing NULs (e.g. Health Check error messages). Sanitize NUL characters in WebAnalyticEventResource before the jsonb insert. While auditing the path, drop UI fields that have zero downstream consumers: - Remove global document.body click listener; CustomEvent click data is written, kept 7 days, then deleted with no reader. - Trim PageView payload from 9 fields to 4 (userId, sessionId, url, fullUrl); hostname/language/screenSize/referrer/pageLoadTime are never read by any processor. DAU/MAU and Most Viewed Entities reports are unaffected since the WebAnalyticsUserActivityProcessor and WebAnalyticsEntityViewProcessor only read the kept fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a PSQLException triggered when web analytics payloads contained NUL (\u0000) bytes (PostgreSQL jsonb cannot store them) by sanitizing user-supplied string fields on the server. Additionally trims the UI analytics surface area: removes the global document.body click listener (source of the bug) and the now-unused fields from the PageView payload, keeping only what downstream Data Insights processors consume.
Changes:
- Add
removeNullCharacters/stripNullCharactershelpers and apply them toPageViewDataandCustomEventinsidesanitizeWebAnalyticEventData. - Drop the
track/trackCustomEventplugin, the body click listener, and unused helpers (getReferrerPath,getPageLoadTime); reducePageViewpayload touserId,sessionId,url,fullUrl. - Add
WebAnalyticEventResourceTestand refreshWebAnalyticsUtils.test.tsto match the new behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/resources/analytics/WebAnalyticEventResource.java | Strip NUL bytes from PageView and CustomEvent fields before persistence. |
| openmetadata-service/src/test/java/org/openmetadata/service/resources/analytics/WebAnalyticEventResourceTest.java | New unit tests covering NUL stripping and sanitization paths. |
| openmetadata-ui/src/main/resources/ui/src/utils/WebAnalyticsUtils.ts | Slim PageView payload, remove trackCustomEvent and unused helpers. |
| openmetadata-ui/src/main/resources/ui/src/utils/WebAnalyticsUtils.test.ts | Update tests to assert minimal PageView payload and userId guard. |
| openmetadata-ui/src/main/resources/ui/src/components/AppContainer/AppContainer.tsx | Remove global click listener that triggered custom-event tracking. |
Apply stripNullCharacters to every CustomEvent right after the JSON conversion, mirroring the PageView path, so a future CustomEventTypes value cannot silently bypass sanitization. No behavior change today since CLICK is the only enum value and the else branch throws. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review ✅ Approved 2 resolved / 2 findingsStrips NUL bytes from web-analytics payloads to prevent PostgreSQL JSONB exceptions and reduces payload size by removing unused telemetry fields. The implementation correctly handles NUL stripping before HTML validation, and no issues were found. ✅ 2 resolved✅ Bug: NUL stripping must happen before HTML check to avoid false negatives
✅ Edge Case: HTML check on eventValue may pass after NUL stripping reassembles tag
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
|
🔴 Playwright Results — 15 failure(s), 14 flaky✅ 4123 passed · ❌ 15 failed · 🟡 14 flaky · ⏭️ 86 skipped
Genuine Failures (failed on all attempts)❌
|



Describe your changes:
Fix a
PSQLException: unsupported Unicode escape sequenceraised whenPUT /v1/analytics/web/events/collectpayloads contained the NUL character (PostgreSQLjsonbcannot store NUL bytes).WebAnalyticEventResource.sanitizeWebAnalyticEventDatanow strips NULs from all user-supplied string fields on bothPageViewDataandCustomEventbefore the insert. While auditing the path I also trimmed the UI to stop sending fields that no downstream processor reads — removed the globaldocument.bodyclick listener (CustomEvent click data has zero readers; it was the source of the NUL-byte bug and fired on every click) and reduced the PageView payload from 9 fields to 4 (userId,sessionId,url,fullUrl), droppinghostname,language,screenSize,referrer,pageLoadTime. The Data InsightsWebAnalyticsUserActivityProcessorandWebAnalyticsEntityViewProcessoronly read the kept fields, so DAU/MAU and "Most Viewed Entities" reports remain unchanged.Type of change:
Tests:
Unit tests
WebAnalyticEventResourceTest(5 cases): null input, no-NUL fast path, multi-NUL stripping, end-to-end sanitization forPageViewDataandCustomEvent.WebAnalyticsUtils.test.tsto assert the new minimum PageView payload and the userId guard; removed obsoletegetReferrerPath/trackCustomEventcases.Backend integration tests
WebAnalyticEventResourceITcontinues to exercise the endpoint).Playwright (UI) tests
Manual testing performed
mvn test -pl openmetadata-service -Dtest=WebAnalyticEventResourceTest→ 5/5 pass.yarn test WebAnalyticsUtils→ 3/3 pass.mvn spotless:checkandyarn ui-checkstyle:changedboth clean.UI screen recording / screenshots:
Not applicable — no user-visible UI changes.
Checklist:
🤖 Generated with Claude Code