-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix(core): Sanitize data URLs in http.client spans
#18896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
size-limit report 📦
|
node-overhead report 🧳Note: This is a synthetic benchmark with a minimal express app and does not necessarily reflect the real-world performance impact in an application.
|
|
Thanks a ton Lukas! I'd leave the first n chars of the data stream in there, knowing what type of data it is (by magic bytes, eg |
| [SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN]: 'auto.http.otel.node_fetch', | ||
| 'http.url': sanitizedUrl, | ||
| [SEMANTIC_ATTRIBUTE_URL_FULL]: sanitizedUrl, | ||
| [SEMANTIC_ATTRIBUTE_SENTRY_CUSTOM_SPAN_NAME]: `${request.method || 'GET'} ${sanitizedUrl}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the additional comment from @brunohaid: Maybe we could add the content of the first few bytes as an attribute here 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up adding the first 10 bytes to the span description and data. I went for adding this directly to the URL attributes/description because there isn't really a fitting attribute defined in SemConv for data url content. So I just stuck with the "url.full" one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Our
http.clientspan instrumentations currently treat data URLs (blobs or base64 encoded data) like regular raw URLs. While this is in general fine, the problem is that this leads to incredibly long span names and attribute values, especially because the URL is sent in up to three different attributes per span. This makes Relay reject the the sent events due to exceeding size limits.Therefore, I decided to extract the already existing stack trace URL sanitization logic for data URLs and apply it to
http.clientspans and attributes. This will lead to "lost" information in the sense of not having data URLs but I think for now we can live with that. The replacement in the future could be span attachments (if we decide this is valuable or make it opt-in).Note: This is most likely only a concern in client-side SDKs but since our fetch instrumentation is used across the board, I decided to also add this to our server-side instrumentation. I'd rather have this taken care of everywhere.
closes #17345