Skip to content

Per-persona multi-language support (translate-for-retrieval + reply-in-user-language)#40

Open
rajivml wants to merge 6 commits intofeature/darwinfrom
feature/multilanguage-support
Open

Per-persona multi-language support (translate-for-retrieval + reply-in-user-language)#40
rajivml wants to merge 6 commits intofeature/darwinfrom
feature/multilanguage-support

Conversation

@rajivml
Copy link
Copy Markdown
Collaborator

@rajivml rajivml commented May 4, 2026

Summary

Adds a per-persona toggle that makes Darwin work for non-English (Japanese / Chinese / Korean) users without changing the indexing pipeline, embedding model, or any infra. The flag is opt-in per assistant — English-only assistants pay nothing extra, and customers running primarily English traffic aren't impacted.

When the flag is on:

  1. Retrieval — a non-English query is translated to English before hitting Vespa, so the existing English-corpus index returns the right docs (resolution: persona flag → MULTILINGUAL_QUERY_EXPANSION env var → off).
  2. Answer language — after the answering LLM produces a (possibly English) reply, a small second-pass LLM call translates the answer back into the user's original language. Citation markers ([1], [2], [[1]](url)), URLs, and code blocks are preserved verbatim.
  3. Chat session naming — chat titles follow the user's language too.

Both code paths are covered:

  • Chat UI (/chat) — chat/process_message.py
  • Slack one-shot (the path the slackbot listener uses) — one_shot_answer/answer_question.py

What was implemented

Backend

  • Persona.multilingual_query_expansion: bool — new column with default false. Alembic migration a3f1d7c4e9b2_persona_multilingual_query_expansion.py.
  • CreatePersonaRequest / PersonaSnapshot carry the field through the persona admin API.
  • upsert_persona / create_update_persona accept and persist the flag.
  • PromptConfig carries the flag forward so prompt builders can decide whether to append LANGUAGE_HINT.
  • SearchPipeline reads the persona flag and (when on) sets multilingual_expansion_str=\"English\" so retrieval translates to English before search.
  • citations_prompt.py and quotes_prompt.py source the language-hint from prompt_config, with the global env var as fallback.
  • chat_session_naming.get_renamed_conversation_name accepts use_language_hint; the rename endpoint fetches the chat-session's persona and forwards it.
  • chat/multilingual_translation.py (new module): detect_query_language (Unicode-script heuristic), language_name, translate_answer_to_language (LLM call that preserves citations, URLs, code blocks; falls back to English on failure).
  • chat/process_message.py — when persona has the flag on AND the query is non-English: buffer DanswerAnswerPiece tokens during streaming, run the translate pass after stream-end, emit translated text as a single piece, persist translated text in the DB. Other packets (citations, tool responses) flow in real time.
  • one_shot_answer/answer_question.py — same buffer-and-translate logic for the Slack path. CitationInfo packets keep flowing during the buffer, so the slackbot's existing 5-attempt citation-required retry loop works unchanged.

Frontend

  • Persona TS interface, PersonaCreationRequest / PersonaUpdateRequest, and buildPersonaAPIBody carry the field.
  • Assistant editor: a BooleanFormField checkbox in the Misc section labeled "Enable multi-language support" with subtext explicitly noting the cost (+1 LLM call per non-English query).

Drive-by fixes (rolled into this branch by request)

  • web/src/app/admin/bot/SlackBotConfigCreationForm.tsx — fix silent-submit on the Slack-bot config create form. The Yup schema unconditionally required curated_response_config.response_message, but the matching input is only rendered when the integration toggle is on. Result: with the default toggle off, the field was empty, validation failed silently, and Create did nothing. Fixed by gating the validation with .when(...) to mirror the jira_config pattern.
  • web/src/components/table/DragHandle.tsx — destructure isDragging before spreading onto the DOM <div>, silencing React's "unknown DOM attribute" warning that fired on every PersonasTable render.

What was tested

backend/scripts/test_multilanguage_e2e.py — end-to-end smoke test driving the real local stack (Postgres + Vespa + the configured GenAI provider). Five phases:

  1. Setup — create test connector + cc-pair, seed 3 English docs about a unique fictional brand ("Zorblax") into Vespa via the real indexing pipeline, create two test personas (flag on / flag off).
  2. English baseline — sanity check that retrieval and entity-in-answer work for English queries against the seeded corpus.
  3. Non-English (chat-UI path) — for ja / zh / ko, hard-asserts (a) retrieval brought back the expected English doc, (b) the answer contains the factual entity (numerals / proper nouns survive translation), (c) the final answer text is in the user's language.
  4. Control persona — same non-English queries against the flag-OFF persona; logged informationally to show the contrast.
  5. Slack one-shot path — drives get_search_answer (the entry point the slackbot listener uses) with the same hard contract as Phase 3, proving the Slack flow honors the flag end-to-end.

The test is re-run safe: cascading SQL teardown handles chat_message__search_doc, tool_call, chat_feedback, and document_retrieval_feedback FK dependents before deleting chat_message / chat_session / persona.

Local results

Stability run before option C (post-translation pass) was wired:

  • 6/6 retrieval contract: works every time
  • 0/9 non-English answers came back in the user's language — gpt-4o-mini in the gateway routinely ignored LANGUAGE_HINT when context was English-heavy

After option C (post-translation pass):

  • 27/27 hard assertions PASS (12 retrieval + 12 entity-in-answer + 3 English-baseline)
  • 9/9 non-English answers in the user's language via the chat-UI path
  • 3/3 non-English answers in the user's language via the Slack one-shot path (Phase 5)
  • Control persona (flag off) still answers in English — confirms the post-pass only fires for opted-in personas

Re-run on this branch: please run cd backend && PYTHONPATH=$(pwd) python scripts/test_multilanguage_e2e.py --yes against your local stack to verify before merge.

Trade-offs / things to know

  • Streaming UX for non-English replies: token-by-token streaming is suspended for opted-in personas because we buffer the answer and translate it before emitting. The user sees the whole translated answer at once after one extra LLM round-trip. English replies are unaffected.
  • Cost: one extra LLM call per non-English query (translate). Only on personas with the flag on. English personas / queries pay zero overhead.
  • Slack bot citation gate is preserved: the [1]/[2] citation retry loop still works because the translation prompt instructs the LLM to keep markers verbatim, and CitationInfo packets flow in real time even when answer pieces are buffered.
  • LLM non-determinism: occasional borderline queries (e.g. JA office printer with multiple similar docs in the corpus) can have the LLM hedge on which doc to cite — observed in 1 of 6 stability runs. Not a wiring issue; would benefit from a stronger model.
  • Resolution precedence: persona flag → MULTILINGUAL_QUERY_EXPANSION env var → off. Existing global-env-var deployments are untouched.

Process bounce required after merge + deploy

Per CLAUDE.md ("Modify a SQLAlchemy model → run alembic upgrade head, then bounce API server + background jobs"):

  1. alembic upgrade head from backend/ (adds the new column with server_default false).
  2. Bounce dapi (api-server) — new ORM mapping + new module imports.
  3. Bounce dbe (background jobs).
  4. Bounce dsl (Slack listener) — needed for the Slack post-translation path to take effect.

Test plan

  • Pull, run alembic, bounce all three services
  • Run python scripts/test_multilanguage_e2e.py --yes against the dev stack — expect exit 0 with all 5 phases green
  • In the chat UI: pick a multilingual-flagged assistant, ask a question in Japanese; expect a Japanese answer with [1]/[2] citations preserved
  • In a Slack channel bound to the same persona: post the same question; expect a Japanese reply
  • Sanity-check English flow on a non-flagged persona — should still stream tokens normally
  • Smoke-test the drive-by fixes: open /admin/bot/new, click Create with all fields default — should now show backend validation errors instead of doing nothing silently. PersonasTable should no longer log the isDragging warning in dev console.

🤖 Generated with Claude Code

rajivml and others added 6 commits May 4, 2026 22:49
Adds a per-persona `multilingual_query_expansion` boolean. When on:
non-English queries are translated to English for retrieval and the
answer-side prompt gets the LANGUAGE_HINT directive so the LLM is
asked to reply in the user's original language.

Resolution precedence: persona flag > MULTILINGUAL_QUERY_EXPANSION
env var > off. Existing global behavior is preserved.

Touched call sites:
  - Persona model + alembic migration (new column, default false)
  - CreatePersonaRequest / PersonaSnapshot pydantic
  - upsert_persona / create_update_persona
  - PromptConfig (carries the flag through to answer prompts)
  - process_message threads persona.flag into PromptConfig
  - SearchPipeline reads persona flag, falls back to env var
  - citations_prompt / quotes_prompt source language-hint from prompt
    config (with env-var fallback)
  - chat_session_naming accepts use_language_hint; chat_backend
    rename fetches persona and forwards the flag
  - Assistant editor: checkbox + initial value + yup validation +
    payload field

Tests:
  backend/scripts/test_multilanguage_e2e.py drives the real stack:
  seeds 3 English Vespa docs about a unique fictional brand
  ("Zorblax"), creates two personas (flag on / off), and for en/ja/
  zh/ko asserts retrieval lands the right doc and the answer
  contains the factual entity. Answer-language detection is logged
  but not asserted (the LLM's adherence to LANGUAGE_HINT varies by
  model and isn't part of the wiring contract). Re-run safe;
  cascading SQL teardown handles all FK dependents.

Locally: 3 back-to-back runs, 27/27 hard assertions pass on 2 of 3
runs, 26/27 on the third (one LLM hedge on a borderline query).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The persona's `multilingual_query_expansion` flag already added the
LANGUAGE_HINT directive to the answering prompt, but gpt-4o-mini in
practice ignores it and answers in English when context is English-
heavy (0/9 non-English answers came back in the user's language
across 6 prior runs).

This adds a deterministic second pass: when the persona has the flag
on AND the user query is detected as ja/zh/ko, we buffer the
streamed DanswerAnswerPiece tokens during generation, then make one
extra LLM call after stream end to translate the assembled English
answer into the user's language. The translated text is then
emitted as a single answer piece, and the DB-saved message is the
translated version too.

Trade-offs:
- Non-English replies lose token-by-token streaming (one extra
  round-trip; user sees a brief delay then the whole answer at once).
- One extra LLM call per non-English query — only for personas
  opted in via the flag.
- Citations preserved: the translation prompt is directive about
  keeping [1]/[2] markers, URLs, and code blocks verbatim.
- English queries are entirely unaffected (translate_target stays
  None, no buffering, normal streaming).

Files:
- backend/danswer/chat/multilingual_translation.py (new):
    detect_query_language() Unicode-script heuristic, language_name()
    code→display-name lookup, translate_answer_to_language() that
    falls back to English on any LLM failure rather than dropping
    the response.
- backend/danswer/chat/process_message.py:
    Detect query language up front; intercept DanswerAnswerPiece in
    the stream loop when in translate mode; emit translated text and
    persist it as the assistant message.
- backend/scripts/test_multilanguage_e2e.py:
    Promote the answer-language detection from informational to a
    hard assertion. Local run with this commit: 9/9 non-English
    answers came back in the user's language (vs 0/9 prior).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…one-shot path

The chat-UI flow honors the persona's multilingual_query_expansion
flag end-to-end, but the Slack-bot path (one_shot_answer.py) was
never updated:
  - PromptConfig.from_model() was called without the flag, so
    LANGUAGE_HINT was never appended to Slack-served answers.
  - There was no post-translation pass, so the answering LLM's
    English output was sent to Slack verbatim.

This commit mirrors what process_message.py does:
  1. Read chat_session.persona.multilingual_query_expansion and
     thread it into both PromptConfig.from_model() call sites.
  2. Detect non-English query language up front; if matched and the
     persona has the flag on, set translate_target.
  3. In the streaming loop, buffer DanswerAnswerPiece tokens (instead
     of yielding them) when in translate mode. CitationInfo packets
     still flow in real time so the slackbot's citation-required
     retry loop in get_search_answer keeps working — and the
     translation prompt preserves [1]/[2] markers verbatim, so the
     emitted translated text still satisfies the citation gate.
  4. After stream end, run the translate LLM pass and yield the
     translated answer as a single DanswerAnswerPiece (+ None
     terminator).
  5. Persist the translated text in the saved ChatMessage and
     re-count its tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drives `get_search_answer` (the entry point the slack listener uses)
through the same query set as Phase 3 and asserts the same hard
contract: retrieval hits the expected doc, the factual entity
appears in the answer, and the answer is in the user's language.

Local run with this commit: 9/9 hard PASS in Phase 5; 3/3 non-
English answers came back in the user's language via the Slack code
path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Yup schema unconditionally required curated_response_config.
response_message, but the matching text input is only rendered when
enable_curated_response_integration is true. Default is false, so on
a fresh /admin/bot/new the field was empty, validation failed
silently, the Create button did nothing, and no error rendered
because the errored field wasn't on screen.

Mirror the jira_config pattern: only require when the toggle is
enabled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dnd-kit/sortable passes a logical `isDragging` prop; spreading it
onto the underlying <div> tripped React's "unknown DOM attribute"
warning on every PersonasTable render. Destructure it before spread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant