Skip to content

Commit b65c7e7

Browse files
authored
chore refactor (AI Assistant): context offloading; better telemetry data; add google and anthropic models compatibility (baserow#4951)
* chore(deps): replace udspy with pydantic-ai and opentelemetry-sdk Replace the udspy dependency with pydantic-ai-slim (with openai, groq, anthropic, bedrock providers) and opentelemetry-sdk for structured telemetry collection. * fix(sentry): exclude pydantic_ai from auto-enabling integrations sentry-sdk's pydantic_ai integration patches ToolManager._call_tool which was removed in pydantic-ai >= 1.x (now execute_tool_call), causing import-time errors. * feat(settings): add dev log file mirroring and allow embeddings URL in tests - Add BASEROW_LOG_FILE support in dev settings to mirror logs (including loguru output) to a file, useful for AI-assisted debugging. - Allow BASEROW_EMBEDDINGS_API_URL to be overridden via env in test settings for search_user_docs eval tests. * feat(assistant): add message_history field to AssistantChat Add a BinaryField to store serialized pydantic-ai message history (JSON bytes) for multi-turn conversation context, replacing the previous udspy-based conversation state. * refactor(assistant): port to pydantic-ai agent framework Replace udspy with pydantic-ai as the agent framework for the AI assistant. Key changes: - Add Agent definitions with typed deps (AssistantDeps) and dynamic toolsets for runtime tool loading - Add deps module with AssistantDeps, ToolHelpers, and EventBus for streaming events to the UI - Add history module for serializing/deserializing pydantic-ai message history to the database - Add model_profiles for provider-specific configuration (Anthropic, OpenAI, Groq, Bedrock) - Add toolset module with ToolGroup base class replacing the udspy tool registry pattern - Add shared/ with formula_utils and sub-agent helpers - Add tool_types.py per tool module for pydantic-ai ToolDefinition - Port all tool modules (core, database, navigation, automation, search_user_docs) from udspy decorators to pydantic-ai Tool instances - Port assistant orchestrator, handler, and prompts - Remove signatures.py (replaced by pydantic-ai output types) * refactor(assistant): update telemetry for pydantic-ai Rework telemetry collection to use pydantic-ai's message history format and OpenTelemetry SDK for structured span/event recording, replacing the previous udspy-based telemetry hooks. * test(assistant): update unit tests for pydantic-ai port Rewrite assistant unit tests to use pydantic-ai's testing utilities (TestModel, FunctionModel) instead of udspy mocks. Add new test files for core tools, navigation tools, and search docs tools. Remove obsolete skip file. * test(assistant): add LLM eval test suite Add end-to-end eval tests that run the real agent against a live LLM to verify tool selection, schema compatibility, and output quality. Includes evals for: navigation, core builders, database tables/rows, sample rows, automation workflows, search_user_docs, and cross-cutting structured output validation. Tests are marked with @pytest.mark.eval and skipped by default. Configure via EVAL_LLM_MODEL or EVAL_LLM_MODELS env vars. * docs: add eval guide and update AI assistant installation docs - Add docs/development/ai-assistant-evals.md with instructions for running the eval suite, configuring models, and writing new evals. - Update docs/installation/ai-assistant.md to reflect pydantic-ai provider configuration replacing the previous udspy setup. * fix(assistant): fix test patch paths, optional filter args, and eval marker - Fix mock patch paths from `assistant.agent` to `assistant.agents` - Make ListTablesFilterArg fields optional to prevent LLM validation errors - Surface field_errors in create_fields tool result - Simplify EvalToolTracker to use message history inspection - Register `eval` pytest marker and skip evals by default Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: move testing docs to docs/testing/ and add PR test plan Move ai-assistant-evals.md from docs/development/ to docs/testing/, add ai-assistant-test-plan.md with manual and automated test steps for the pydantic-ai port PR. * refactor(assistant): extract row models to types/rows.py, rename utils to helpers - Move FieldDefinition, row model builders, and get_link_row_hints to new types/rows.py module with dict-of-callables dispatch replacing match/case - Simplify update model: fields are optional (omit = don't change), removing the __NO_CHANGE__ sentinel - Move get_table_rows_tools into tools.py as _build_row_tools since it builds pydantic-ai Tool objects - Rename utils.py to helpers.py for clarity, remove dead list_tables - Add docstrings with :param/:returns to all public functions, add proper type annotations throughout * refactor(assistant): flatten field/view/filter types into single models - Replace per-type config classes in fields.py with a single flat FieldItemCreate model using optional type-specific fields and a model_validator for type aliases - Simplify view_filters.py and views.py type hierarchies similarly - Update table.py types and corresponding tests * fix(assistant): improve telemetry span processor and minor tweaks - Replace SpanExporter with SpanProcessor for real-time span handling - Remap child tool spans past 'running tools' grouping span - Parse JSON string arguments in tool call parts - Add data_brief parameter to sample rows prompt - Disable reasoning_effort for groq models temporarily - Rename _fix_formula to _fix_formula_field in helpers * refactor(assistant): use ISO strings for dates and add dateutil fallback Replace Date/Datetime Pydantic model objects with compact ISO 8601 strings. Add lenient parsing via dateutil.parser fallback when fromisoformat fails. * feat(assistant): add DO/EXPLAIN agent modes with switch_mode tool Introduce AgentMode enum (DO/EXPLAIN) and ModeAwareToolset that filters available tools based on mode. DO mode (default) exposes all action tools except search_user_docs. EXPLAIN mode exposes only read-only tools (list_*, navigate) plus search_user_docs for answering Baserow feature questions. The switch_mode tool allows bidirectional switching. * refactor(assistant): flatten automation node types and add $formula: convention Replace 13+ per-type action node classes (RouterNodeCreate, SendEmailActionCreate, etc.) with a single flat ActionNodeCreate model using @model_validator for per-type validation and dict-dispatched functions for ORM conversion and formula generation. Add $formula: prefix convention — values prefixed with '$formula:' are sent to the LLM formula generator, plain values become literal formulas. Also detect raw formula expressions (get(), concat(), etc.) written inline. Add trigger validation: periodic triggers now require periodic_interval (with automatic folding of flat fields), row triggers require rows_triggers_settings. * build: bump pydantic-ai-slim to 0.1.66 and anthropic to 0.84.0 * feat(assistant): add RetryingModel for transient provider error recovery Wraps pydantic-ai model instances to automatically retry on transient errors (rate limits, timeouts, server errors) with exponential backoff. Handles both streaming and non-streaming calls. * feat(assistant): add AgentMode system with ModeAwareToolset and switch_mode Introduce domain modes (DATABASE, APPLICATION, AUTOMATION, EXPLAIN) that control which tools are visible to the agent. ModeAwareToolset filters the combined toolset per-mode, registries generate per-mode manifests, and switch_mode lets the agent transition between domains. Each mode gets a cross-mode summary so the agent knows what other modes offer. * refactor(assistant): integrate RetryingModel, event-based streaming, and JSON retry Replace direct model usage with RetryingModel for resilience. Rewrite streaming to use run_stream_events for proper text/reasoning/tool event handling. Add JSON-tool-call-as-text detection with automatic retry. Auto-detect starting mode from UI context. Update model_profiles with max_tokens settings. * refactor(assistant): improve shared formula utils and add formula language reference Add RAW_FORMULA_RE for detecting raw formula expressions, needs_formula() for $formula: prefix and raw formula detection, literal_or_placeholder() for ORM value creation, and a shared formula language prompt. Improve formula generator to track remaining unresolved fields across retries. * refactor(assistant): improve database tools with routing rules and type fixes Add per-module routing rules via get_routing_rules(). Extract ToolInputError to helpers. Fix field type validators, improve row model handling, and refine view filter types. Update agents and prompts for better tool guidance. * refactor(assistant): improve automation tools with routing rules and formula handling Add per-module routing rules for automation. Improve node type handling with better formula context support. Refine automation agents and prompts. * fix(assistant): improve telemetry span processor with real-time remapping Enhance SpanProcessor with JSON arg parsing, real-time span remapping, and improved trace output handling. Update tests to cover new behavior. * test(assistant): update tests for mode system, retry logic, and type refactors Add tests for switch_mode, mode-aware manifests, JSON retry logic. Add test_assistant_automation_node_tools and test_assistant_database_field_tools. Update existing database/automation/core tests for refactored types and new tool signatures. Update eval_utils for new deps structure. * fix: lint * chore(frontend): ignore .claude dir in vite watcher and fix Nitro EMFILE - Add .claude/ to vite server watch ignore list alongside node_modules and .git to avoid unnecessary file watching in worktrees. - Configure Nitro devStorage to use fs-lite driver to prevent chokidar from watching the entire repo root, which causes EMFILE on macOS in large monorepos. * fix(tests): fix test settings and seat usage test isolation with xdist - Ensure pytest always finds backend/pytest.ini by passing -c pytest.ini explicitly, fixing DJANGO_SETTINGS_MODULE=dev when running from root - Preserve existing TEST dict keys when setting MIGRATE in test settings - Add transaction=True to seat usage tests to prevent data leaking from TransactionTestCase tests running on the same xdist worker * fix(assistant): fix field types, formula regex, validator guard, and consolidate evals Fix multiple_select returning None instead of [], link_row description typo, formula regex missing greater_than_or_equal/less_than_or_equal variants, and guard against overwriting original validators on repeated prepare_tools calls. Consolidate sample rows and navigation evals into the database tables eval file and remove the meta tool-call history test. * fix(assistant): improve tool return types, filter aliases, and eval infrastructure - Return consistent dict types from all tools instead of plain strings - Add operator aliases for view filters so LLMs can use natural names - Fix boolean filter operator (is → equal) - Remove reasoning format from UTILITY model profiles (pollutes structured output) - Add ModelRetry for workflow creation and formula agent errors - Add EvalChecklist for soft assertions with pass/fail scoring - Add EVAL_RETRIES support for flake detection in eval tests - Suppress loguru DEBUG noise during evals * refactor(assistant): extract table creation helpers and remove unused model profile - Extract _create_empty_tables and _create_table_fields from create_tables - Filter out duplicate primary field in field creation to avoid model mistakes - Remove unused gpt-oss-20b model profile - Always attempt sample rows regardless of field errors * fix(assistant): strip <think> tags and unify streaming as reasoning chunks Models like MiniMax-M2.5 emit <think>...</think> tags inline. Handle ThinkingPart/ThinkingPartDelta events from pydantic-ai and extract inline thinking from text parts as a fallback. Stream all content as AiReasoningChunk during the agent run; the final answer is emitted as AiMessageChunk by _emit_answer. * fix(assistant): simplify streaming and add collapsible reasoning UI Replace _accumulate_text/_extract_thinking with a single _get_content_delta helper that forwards text/thinking deltas. Accumulate reasoning_so_far and strip <think> tags before sending to frontend (which replaces content on each chunk). Add collapsible reasoning bubble (max 250px with fade mask and chevron toggle). * fix(assistant): bridge legacy UDSPY_LM_* env vars to pydantic-ai config * docs(assistant): improve ai-assistant.md and add AWS_REGION_NAME backward compat - Add both Bedrock auth methods (boto3 creds + bearer token) - Add section 6 with pydantic-ai model overview link and provider list - Restructure migration table: unchanged / bridged / new variables - Fix AWS_BEARER_TOKEN_BEDROCK incorrectly listed as removed - Bridge AWS_REGION_NAME to AWS_DEFAULT_REGION in settings for backward compat * minor doc/evals fixes * fix(assistant): strip unclosed <think> tags during streaming Models behind Groq emit <think> tags as text content rather than using the native thinking protocol. During streaming, the closing </think> tag may not have arrived yet, causing raw thinking content to leak to the frontend. Also strip think tags from tool thought fields and reset reasoning on tool results. * docs(assistant): use provider:model format and refresh eval docs - Update all docs to use pydantic-ai provider:model format (colon separator) - Fix mixed-up provider descriptions in configuration.md - Refresh eval docs: replace assert_no_tool_errors with EvalChecklist pattern - Add embeddings URL for local vs Docker in ai-assistant-evals.md - Skip KB sync post_migrate signal during tests - Fix temperature type in model_profiles.py - Refactor justfile PYTHONPATH for test recipe * fix(assistant): clean up navigation and improve tool types - Remove unused WorkspaceNavigationRequestType - Narrow exception catch in navigate tool to ObjectDoesNotExist - Guard id field in CreateRowModel.from_django_orm - Use id__in for batch filtering in ListTablesFilterArg * Revert "chore(frontend): ignore .claude dir in vite watcher and fix Nitro EMFILE" This reverts commit 4be1be1. * fix: ai-assistant-test-plan.md tool smoke test prompt for list_builders * fix: Posthog env var names in docs/testing/ai-assistant-test-plan.md * fix: wrong doc reference
1 parent 7ff02b8 commit b65c7e7

97 files changed

Lines changed: 15602 additions & 7113 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

backend/justfile

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,10 @@ uv_run := "uv run --active"
8181
# Repo root (parent of backend/) - clean() normalizes path (removes ..)
8282
repo_root := clean(justfile_directory() / "..")
8383

84+
_set_pythonpath := 'export PYTHONPATH="' + repo_root / 'backend/src:' + repo_root / 'premium/backend/src:' + repo_root / 'enterprise/backend/src:' + repo_root / 'backend/tests:' + repo_root / 'premium/backend/tests:' + repo_root / 'enterprise/backend/tests${PYTHONPATH:+:$PYTHONPATH}"'
8485
# Helper to load .env.local if present and set PYTHONPATH with absolute paths
8586
# Include this at the start of bash recipes that need env vars
86-
_load_env := 'if [ -f "../.env.local" ]; then set -a; source "../.env.local"; set +a; fi; export PYTHONPATH="' + repo_root / 'backend/src:' + repo_root / 'premium/backend/src:' + repo_root / 'enterprise/backend/src:' + repo_root / 'backend/tests:' + repo_root / 'premium/backend/tests:' + repo_root / 'enterprise/backend/tests${PYTHONPATH:+:$PYTHONPATH}"'
87+
_load_env := 'if [ -f "../.env.local" ]; then set -a; source "../.env.local"; set +a; fi; ' + _set_pythonpath
8788

8889
# Source directories
8990
backend_source_dirs := "src/ ../premium/backend/src/ ../enterprise/backend/src/"
@@ -228,14 +229,14 @@ alias f := fix
228229

229230
# PYTHONPATH for test fixtures across all test directories
230231
test_pythonpath := "tests:../premium/backend/tests:../enterprise/backend/tests"
231-
_pytest := 'PYTHONPATH="' + test_pythonpath + ':${PYTHONPATH:-}" ' + uv_run + ' pytest'
232+
_pytest := 'PYTHONPATH="' + test_pythonpath + ':${PYTHONPATH:-}" ' + uv_run + ' pytest -c pytest.ini'
232233

233234
# Run tests. Pass -n=auto to run in parallel with pytest-xdist
234235
[group('3 - testing')]
235236
test *ARGS: _check-dev
236237
#!/usr/bin/env bash
237238
set -euo pipefail
238-
{{ _load_env }}
239+
{{ _set_pythonpath }}
239240
{{ _pytest }} {{ ARGS }}
240241

241242
# Run tests with coverage report

backend/pyproject.toml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,8 @@ dependencies = [
8989
"langchain==0.3.28",
9090
"langchain-openai==0.3.35",
9191
"openai==2.14.0",
92-
"anthropic==0.77.0",
93-
"mistralai==1.1.0",
92+
"anthropic==0.84.0",
93+
"mistralai==2.0.0",
9494
"icalendar==6.3.2",
9595
"jira2markdown==0.5",
9696
"openpyxl==3.1.5",
@@ -100,7 +100,8 @@ dependencies = [
100100
"genson==1.3.0",
101101
"pyotp==2.9.0",
102102
"qrcode==8.2",
103-
"udspy==0.1.8",
103+
"pydantic-ai-slim[anthropic,bedrock,google,groq,openai]==1.66.0",
104+
"opentelemetry-sdk>=1.20.0",
104105
"netifaces==0.11.0",
105106
"requests-futures>=1.0.2",
106107
]

backend/pytest.ini

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,4 @@ markers =
5656
workspace_search: All tests related to workspace search functionality
5757
enable_all_signals: Disables signal deferral for this test (all signals enabled)
5858
enable_signals: Enables specific signals for this test (accepts dotted callable paths)
59+
eval: mark test as an eval test (requires LLM API key)

backend/src/baserow/config/settings/base.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1332,15 +1332,15 @@ def __setitem__(self, key, value):
13321332
from sentry_sdk.integrations.django import DjangoIntegration
13331333
from sentry_sdk.scrubber import DEFAULT_DENYLIST, EventScrubber
13341334

1335-
# Exclude the langchain integration from auto-discovery: its module-level
1336-
# imports are incompatible with Python 3.14 (langchain/pydantic type
1337-
# evaluation crash), and the import happens before disabled_integrations
1338-
# can take effect.
1335+
# Exclude integrations whose module-level imports are incompatible:
1336+
# - langchain: Python 3.14 type evaluation crash
1337+
# - pydantic_ai: sentry-sdk patches ToolManager._call_tool which was
1338+
# removed in pydantic-ai >= 1.x (now execute_tool_call)
13391339

13401340
_sentry_integrations._AUTO_ENABLING_INTEGRATIONS[:] = [
13411341
entry
13421342
for entry in _sentry_integrations._AUTO_ENABLING_INTEGRATIONS
1343-
if "langchain" not in entry
1343+
if "langchain" not in entry and "pydantic_ai" not in entry
13441344
]
13451345

13461346
SENTRY_DENYLIST = DEFAULT_DENYLIST + ["username", "email", "name"]

backend/src/baserow/config/settings/dev.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,24 @@
6666
post_migrate.connect(setup_dev_e2e, dispatch_uid="setup_dev_e2e")
6767

6868

69+
# Mirror logs to a file when BASEROW_LOG_FILE is set (e.g. for AI access when
70+
# running locally). Truncated on each restart.
71+
BASEROW_LOG_FILE = os.getenv("BASEROW_LOG_FILE", "")
72+
if BASEROW_LOG_FILE:
73+
LOGGING["handlers"]["file"] = { # noqa: F405
74+
"class": "logging.FileHandler",
75+
"filename": BASEROW_LOG_FILE,
76+
"formatter": "console",
77+
"mode": "w",
78+
}
79+
LOGGING["root"]["handlers"].append("file") # noqa: F405
80+
81+
# Also route loguru to the same file so modules using loguru (e.g.
82+
# the assistant telemetry) appear alongside stdlib log output.
83+
from loguru import logger as _loguru_logger
84+
85+
_loguru_logger.add(BASEROW_LOG_FILE, mode="a")
86+
6987
try:
7088
from .local import * # noqa: F403, F401
7189
except ImportError:

backend/src/baserow/config/settings/test.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@
2626
TEST_ENV_VARS = {}
2727

2828
# Prefixes for vars that can be overridden via env vars (for DB/Redis configuration)
29-
ALLOWED_ENV_PREFIXES = ("DATABASE_",)
29+
ALLOWED_ENV_PREFIXES = ("DATABASE_", "BASEROW_EMBEDDINGS_API_URL")
3030

3131

3232
def getenv_for_tests(key: str, default: str = "") -> str:
3333
"""
3434
Get env var for tests:
35-
- DATABASE_* vars: check real env first, then TEST_ENV_FILE, then default
35+
- ALLOWED_ENV_PREFIXES vars: use real env var if set, else TEST_ENV_FILE, else default
3636
- Other vars: only use TEST_ENV_FILE or default (never real env)
3737
"""
3838

@@ -65,9 +65,9 @@ def getenv_for_tests(key: str, default: str = "") -> str:
6565
BASEROW_TESTS_SETUP_DB_FIXTURE = str_to_bool(
6666
os.getenv("BASEROW_TESTS_SETUP_DB_FIXTURE", "on")
6767
)
68-
DATABASES["default"]["TEST"] = {
69-
"MIGRATE": not BASEROW_TESTS_SETUP_DB_FIXTURE,
70-
}
68+
DATABASES["default"].setdefault("TEST", {})[
69+
"MIGRATE"
70+
] = not BASEROW_TESTS_SETUP_DB_FIXTURE
7171

7272
# Open a second database connection that can be used to test transactions.
7373
DATABASES["default-copy"] = deepcopy(DATABASES["default"])

0 commit comments

Comments
 (0)