Skip to content

Conversation

@vy-ton
Copy link
Contributor

@vy-ton vy-ton commented Jan 13, 2026

Draft PR captures proposal for all DO related spans and attributes we should have for public traces.

  • Assume tracing context propagation exists
  • Ideally docs could be generated from code for public spans/attributes

@github-actions github-actions bot added the product:workers Related to Workers product label Jan 13, 2026
@github-actions
Copy link
Contributor

This pull request requires reviews from CODEOWNERS as it changes files that match the following patterns:

Pattern Owners
/src/content/docs/workers/observability/ @irvinebroque, @mikenomitch, @nevikashah, @cloudflare/pcx-technical-writing

@github-actions
Copy link
Contributor

github-actions bot commented Jan 13, 2026

- `cloudflare.durable_object.response.rows_read`
- `cloudflare.durable_object.response.rows_written`
- `cloudflare.durable_object.response.bytes_written`
- `cloudflare.durable_object.response.sql_duration_ms`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to understand whether this will always be 0 or not because it is a synchronous operation. If it's always 0, then I do not think we should include it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#### `durable_object_subrequest`

- `cloudflare.durable_object.startup_duration_ms`
- `cloudflare.durable_object.constructor_invoked`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's also worth adding constructor_time_ms? This would depend on the time resolution that we can get for potentially synchronous operations. Some constructors might make outbound requests and this would end up being very useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want both? or does constructor_tims_ms=0 indicate the constructor did not run

@shrima-cf
Copy link

How should I read this PR? Is the goal just to list all the spans available? Will there be documentation explaining what the span means and expected duration for it? (similar to what we have in https://gitlab.cfdata.org/cloudflare/ew/edgeworker/-/blob/master/src/edgeworker/scheduling/jaeger-spans.c%2B%2B ?

@vy-ton
Copy link
Contributor Author

vy-ton commented Jan 13, 2026

How should I read this PR? Is the goal just to list all the spans available? Will there be documentation explaining what the span means and expected duration for it? (similar to what we have in https://gitlab.cfdata.org/cloudflare/ew/edgeworker/-/blob/master/src/edgeworker/scheduling/jaeger-spans.c%2B%2B ?

@shrima-cf We will certainly add explanation as part of releasing. I'd love to have all spans/attributes generated from code somehow to avoid syncing issues.

Right now, read this PR as capturing all the DO related spans and attributes we want to add

@justin-mp
Copy link
Contributor

I find it odd that we're adding timing attributes to spans because spans themselves are supposed to represent the time it took something to run.

In particular, I think things that are in the programming model that the programmer has control over should be spans. In particular, constructor_time_ms, sql_duration_ms, and output_gate_lock_held should really be spans (which would then have different names). Generally when you're doing performance engineering, you need to go down to the primitives, and these are the primitives we give users. I could argue that queue time and the like could also be spans. Then a user knows it's blocked because of queuing, which might be due to load.

Additionally, things like CPU time are OK as an attribute because that's an orthogonal dimension than you get out of a span, but having wall time as an attribute makes no sense as that's exactly what the span's duration measures.

@jmorrell-cloudflare
Copy link
Contributor

I find it odd that we're adding timing attributes to spans because spans themselves are supposed to represent the time it took something to run.
having wall time as an attribute makes no sense as that's exactly what the span's duration measures.

@justin-mp I pretty strongly disagree with those assertions. Creating many spans for every possible timing is one of the most common tracing anti-patterns IMO. It makes querying across many operations far harder than it needs to be, and complicates the waterfall visualization unnecessarily. You need to design your data for how you want to query it.

Generally when you're doing performance engineering, you need to go down to the primitives, and these are the primitives we give users.

Attributes are also a core part of the span model, and there is no reason you can't put timing information there. You should think of spans as capturing all of the information about a specific operation. For performance engineering profiling is usually the better tool.

I address this attributes-vs-child-spans tradeoff in my guest chapter in Observability Engineering:

Let's say in your system you've just shipped a new subsystem that prioritizes payload parsing for enterprise users, and you want to see the impact of that change on tail latencies across all of the regions where you have systems deployed. If all of those attributes are present on the wide event, then this is straightforward:

SELECT
  P99(payload_parse.duration_ms)
WHERE
  main = true AND
  service.name = "api-service"
GROUP BY
  user.type,
  cloud.region

However this may seem like we are duplicating data available in the child spans. Surely we can accomplish the same thing by wrapping the payload parsing method in its own span?
This is a valid approach, but now if we want to query that data alongside any other data that we've captured in our wide event, querying has become much more complicated, forcing the use of JOINs:

SELECT
  P99(parse_span.duration_ms)
FROM spans AS main_span
JOIN spans AS parse_span ON main_span.trace_id = parse_span.trace_id
WHERE
  main_span.main = true AND
  main_span.service.name = "api-service" AND
  parse_span.name = "payload-parse"
GROUP BY main_span.user.type, main_span.cloud.region

Mature observability tooling is capable of running these types of queries at the cost of additional processing, however we should keep in mind how we will be using this data. We want to prioritize quick, iterative exploration, often while responding to active incidents, and we want any engineer on our team to be able to easily navigate our observability tooling.
Viewed through this lens, identifying a few important timings and adding them to the wide event is well worth the slight data duplication.

Comment on lines 460 to 462
- `cloudflare.durable_object.response.rows_read`
- `cloudflare.durable_object.response.rows_written`
- `cloudflare.durable_object.response.bytes_written`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these added after the returned cursor is iterated, so after the operation function returns, or before the cursor is iterated hence it will be zero for rows_read?

Similarly, could we also have "bytes_read"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these added after the returned cursor is iterated, so after the operation function returns, or before the cursor is iterated hence it will be zero for rows_read?

For usefulness, I would expect after the cursor is iterated. But we need to confirm.

Added bytes_read

@shrima-cf
Copy link

shrima-cf commented Jan 16, 2026

@vy-ton In addition to spans, there are also tags that hold useful information, Alex added a couple for the input/output gate spans - cloudflare/workerd#5827
Are you planning on adding these to the documentation as well?

@vy-ton
Copy link
Contributor Author

vy-ton commented Jan 22, 2026

@vy-ton In addition to spans, there are also tags that hold useful information, Alex added a couple for the input/output gate spans - cloudflare/workerd#5827 Are you planning on adding these to the documentation as well?

I would not expose the span attributes in cloudflare/workerd#5827 - those seem pretty internal-only to me.

- `cloudflare.durable_object.output_gate_lock_hold_ms`
- `cloudflare.durable_object.output_gate_lock_wait_ms`

#### For handlers[/workers/observability/traces/spans-and-attributes/#handlers] invoked on a Durable Object such as RPC or fetch(), these attributes exist:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lambrospetrou you mentioned having primary/replica context, what other attributes would you expect here?

@vy-ton vy-ton changed the title DRAFT: DO tracing spans DRAFT: RFC for DO tracing spans Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

product:workers Related to Workers product size/s

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants