Skip to content

debug(graph): @traceable _maybe_write_thread_title so LangSmith captures errors#492

Merged
blove merged 1 commit into
mainfrom
claude/langsmith-trace-title-write
May 20, 2026
Merged

debug(graph): @traceable _maybe_write_thread_title so LangSmith captures errors#492
blove merged 1 commit into
mainfrom
claude/langsmith-trace-title-write

Conversation

@blove
Copy link
Copy Markdown
Contributor

@blove blove commented May 20, 2026

Summary

Wrap _maybe_write_thread_title with @langsmith.traceable and convert all early-returns to typed dict returns ({skipped} / {wrote_title} / {error_type, error_message, sdk_url}). The error branch now surfaces in the LangSmith run's outputs field, queryable via the Runs API.

Why

After #491 shipped, I drove a probe against prod (thread_id=019e475a-4add-79a3-95b1-348525d79b0e). Result: run succeeded, title stayed null — so the except path fired — but LangGraph Platform's stdout isn't reachable via the LangSmith Runs API. The print(... flush=True) from #491 lands somewhere I can't query.

Wrapping the helper as a @traceable child run captures inputs/outputs/errors in a place we can query (api.smith.langchain.com/api/v1/runs/query). No platform log access needed.

Test plan

  • Graph compiles (from src.graph import graph)
  • langsmith importable (transitive dep via langgraph)
  • CI green
  • After merge + deploy: fire a probe, query LangSmith for the child run, read the error_type/error_message/sdk_url out of outputs

Follow-up (next PR)

Once we see the actual exception, fix the root cause. Strongly suspect LANGGRAPH_API_URL defaults to http://localhost:2024 inside the runtime container where that port doesn't exist; the real fix is likely a different env var or using langgraph_sdk.RemoteGraph for the self-call.

🤖 Generated with Claude Code

…res errors

LangGraph Platform's stdout (where PR #491's print() lands) isn't
reachable via the LangSmith Runs API. After firing a probe against
prod (thread 019e475a-4add-79a3-95b1-348525d79b0e) we confirmed the
title stays null AND no actionable error surfaces in any API we can
query.

Wrap the helper with @langsmith.traceable so it becomes a child run
in the trace tree, and convert all early-returns to typed dict
returns (skipped/wrote_title/error_type/error_message). The error
branch now captures type, message, and the sdk_url that was used —
all of which show up in the run's `outputs` field when queried via
the LangSmith Runs API.

Cleaner than stdout grep, gives us the proximate cause without
needing platform log access.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@blove blove enabled auto-merge (squash) May 20, 2026 21:52
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
threadplane Ready Ready Preview, Comment May 20, 2026 9:54pm

Request Review

@blove blove merged commit e81ed06 into main May 20, 2026
16 checks passed
blove added a commit that referenced this pull request May 20, 2026
…tes (#493)

* fix(graph): use SDK in-process ASGI transport for thread metadata writes

Root cause of the production "all threads Untitled" bug (diagnosed via
PR #492's @Traceable wrapper):

    error_type:    ConnectError
    error_message: All connection attempts failed
    sdk_url:       http://localhost:2024

The Python helper was calling get_client(url='http://localhost:2024')
when LANGGRAPH_API_URL was unset, then trying to HTTP-call back into
the runtime. In local dev this accidentally works because `langgraph
dev` listens on 2024. In prod the runtime is on a different port, so
every title write threw ConnectError and the bare except swallowed it.

Fix: pass `url=os.environ.get("LANGGRAPH_API_URL")` (no fallback). When
None, the SDK uses its in-process ASGI transport — the canonical path
for graph-to-server self-calls. Docstring excerpt:

> If `None`, the client first attempts an in-process connection via
> ASGI transport. ... This only works if the client is used from
> within the Agent server.

Applies to both:
- examples/chat/python (canonical demo, where the bug surfaced)
- cockpit/chat/threads/python (same anti-pattern, would've failed on
  prod for the same reason)

The @Traceable instrumentation from #492 stays — it'll confirm the
fix on the next prod probe by surfacing `wrote_title: <slice>` in
the LangSmith run output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(c-a2ui): same in-process ASGI fix for generate_title node

PR #474 added a generate_title node to c-a2ui mirroring the
examples/chat pattern — including the same broken localhost:2024
fallback. Same fix: pass `url=None` (via unset env) so the SDK uses
its in-process ASGI transport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant