Commit b65c7e7
authored
chore refactor (AI Assistant): context offloading; better telemetry data; add google and anthropic models compatibility (baserow#4951)
* chore(deps): replace udspy with pydantic-ai and opentelemetry-sdk
Replace the udspy dependency with pydantic-ai-slim (with openai, groq,
anthropic, bedrock providers) and opentelemetry-sdk for structured
telemetry collection.
* fix(sentry): exclude pydantic_ai from auto-enabling integrations
sentry-sdk's pydantic_ai integration patches ToolManager._call_tool
which was removed in pydantic-ai >= 1.x (now execute_tool_call),
causing import-time errors.
* feat(settings): add dev log file mirroring and allow embeddings URL in tests
- Add BASEROW_LOG_FILE support in dev settings to mirror logs (including
loguru output) to a file, useful for AI-assisted debugging.
- Allow BASEROW_EMBEDDINGS_API_URL to be overridden via env in test
settings for search_user_docs eval tests.
* feat(assistant): add message_history field to AssistantChat
Add a BinaryField to store serialized pydantic-ai message history
(JSON bytes) for multi-turn conversation context, replacing the
previous udspy-based conversation state.
* refactor(assistant): port to pydantic-ai agent framework
Replace udspy with pydantic-ai as the agent framework for the AI
assistant. Key changes:
- Add Agent definitions with typed deps (AssistantDeps) and dynamic
toolsets for runtime tool loading
- Add deps module with AssistantDeps, ToolHelpers, and EventBus for
streaming events to the UI
- Add history module for serializing/deserializing pydantic-ai message
history to the database
- Add model_profiles for provider-specific configuration (Anthropic,
OpenAI, Groq, Bedrock)
- Add toolset module with ToolGroup base class replacing the udspy
tool registry pattern
- Add shared/ with formula_utils and sub-agent helpers
- Add tool_types.py per tool module for pydantic-ai ToolDefinition
- Port all tool modules (core, database, navigation, automation,
search_user_docs) from udspy decorators to pydantic-ai Tool instances
- Port assistant orchestrator, handler, and prompts
- Remove signatures.py (replaced by pydantic-ai output types)
* refactor(assistant): update telemetry for pydantic-ai
Rework telemetry collection to use pydantic-ai's message history
format and OpenTelemetry SDK for structured span/event recording,
replacing the previous udspy-based telemetry hooks.
* test(assistant): update unit tests for pydantic-ai port
Rewrite assistant unit tests to use pydantic-ai's testing utilities
(TestModel, FunctionModel) instead of udspy mocks. Add new test files
for core tools, navigation tools, and search docs tools. Remove
obsolete skip file.
* test(assistant): add LLM eval test suite
Add end-to-end eval tests that run the real agent against a live LLM
to verify tool selection, schema compatibility, and output quality.
Includes evals for: navigation, core builders, database tables/rows,
sample rows, automation workflows, search_user_docs, and cross-cutting
structured output validation.
Tests are marked with @pytest.mark.eval and skipped by default.
Configure via EVAL_LLM_MODEL or EVAL_LLM_MODELS env vars.
* docs: add eval guide and update AI assistant installation docs
- Add docs/development/ai-assistant-evals.md with instructions for
running the eval suite, configuring models, and writing new evals.
- Update docs/installation/ai-assistant.md to reflect pydantic-ai
provider configuration replacing the previous udspy setup.
* fix(assistant): fix test patch paths, optional filter args, and eval marker
- Fix mock patch paths from `assistant.agent` to `assistant.agents`
- Make ListTablesFilterArg fields optional to prevent LLM validation errors
- Surface field_errors in create_fields tool result
- Simplify EvalToolTracker to use message history inspection
- Register `eval` pytest marker and skip evals by default
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: move testing docs to docs/testing/ and add PR test plan
Move ai-assistant-evals.md from docs/development/ to docs/testing/,
add ai-assistant-test-plan.md with manual and automated test steps
for the pydantic-ai port PR.
* refactor(assistant): extract row models to types/rows.py, rename utils to helpers
- Move FieldDefinition, row model builders, and get_link_row_hints to
new types/rows.py module with dict-of-callables dispatch replacing
match/case
- Simplify update model: fields are optional (omit = don't change),
removing the __NO_CHANGE__ sentinel
- Move get_table_rows_tools into tools.py as _build_row_tools since it
builds pydantic-ai Tool objects
- Rename utils.py to helpers.py for clarity, remove dead list_tables
- Add docstrings with :param/:returns to all public functions, add
proper type annotations throughout
* refactor(assistant): flatten field/view/filter types into single models
- Replace per-type config classes in fields.py with a single flat
FieldItemCreate model using optional type-specific fields and a
model_validator for type aliases
- Simplify view_filters.py and views.py type hierarchies similarly
- Update table.py types and corresponding tests
* fix(assistant): improve telemetry span processor and minor tweaks
- Replace SpanExporter with SpanProcessor for real-time span handling
- Remap child tool spans past 'running tools' grouping span
- Parse JSON string arguments in tool call parts
- Add data_brief parameter to sample rows prompt
- Disable reasoning_effort for groq models temporarily
- Rename _fix_formula to _fix_formula_field in helpers
* refactor(assistant): use ISO strings for dates and add dateutil fallback
Replace Date/Datetime Pydantic model objects with compact ISO 8601
strings. Add lenient parsing via dateutil.parser fallback when
fromisoformat fails.
* feat(assistant): add DO/EXPLAIN agent modes with switch_mode tool
Introduce AgentMode enum (DO/EXPLAIN) and ModeAwareToolset that filters
available tools based on mode. DO mode (default) exposes all action
tools except search_user_docs. EXPLAIN mode exposes only read-only
tools (list_*, navigate) plus search_user_docs for answering Baserow
feature questions. The switch_mode tool allows bidirectional switching.
* refactor(assistant): flatten automation node types and add $formula: convention
Replace 13+ per-type action node classes (RouterNodeCreate, SendEmailActionCreate,
etc.) with a single flat ActionNodeCreate model using @model_validator for per-type
validation and dict-dispatched functions for ORM conversion and formula generation.
Add $formula: prefix convention — values prefixed with '$formula:' are sent to the
LLM formula generator, plain values become literal formulas. Also detect raw formula
expressions (get(), concat(), etc.) written inline.
Add trigger validation: periodic triggers now require periodic_interval (with
automatic folding of flat fields), row triggers require rows_triggers_settings.
* build: bump pydantic-ai-slim to 0.1.66 and anthropic to 0.84.0
* feat(assistant): add RetryingModel for transient provider error recovery
Wraps pydantic-ai model instances to automatically retry on transient
errors (rate limits, timeouts, server errors) with exponential backoff.
Handles both streaming and non-streaming calls.
* feat(assistant): add AgentMode system with ModeAwareToolset and switch_mode
Introduce domain modes (DATABASE, APPLICATION, AUTOMATION, EXPLAIN) that
control which tools are visible to the agent. ModeAwareToolset filters
the combined toolset per-mode, registries generate per-mode manifests,
and switch_mode lets the agent transition between domains. Each mode
gets a cross-mode summary so the agent knows what other modes offer.
* refactor(assistant): integrate RetryingModel, event-based streaming, and JSON retry
Replace direct model usage with RetryingModel for resilience. Rewrite
streaming to use run_stream_events for proper text/reasoning/tool event
handling. Add JSON-tool-call-as-text detection with automatic retry.
Auto-detect starting mode from UI context. Update model_profiles with
max_tokens settings.
* refactor(assistant): improve shared formula utils and add formula language reference
Add RAW_FORMULA_RE for detecting raw formula expressions, needs_formula()
for $formula: prefix and raw formula detection, literal_or_placeholder()
for ORM value creation, and a shared formula language prompt. Improve
formula generator to track remaining unresolved fields across retries.
* refactor(assistant): improve database tools with routing rules and type fixes
Add per-module routing rules via get_routing_rules(). Extract ToolInputError
to helpers. Fix field type validators, improve row model handling, and
refine view filter types. Update agents and prompts for better tool
guidance.
* refactor(assistant): improve automation tools with routing rules and formula handling
Add per-module routing rules for automation. Improve node type handling
with better formula context support. Refine automation agents and prompts.
* fix(assistant): improve telemetry span processor with real-time remapping
Enhance SpanProcessor with JSON arg parsing, real-time span remapping,
and improved trace output handling. Update tests to cover new behavior.
* test(assistant): update tests for mode system, retry logic, and type refactors
Add tests for switch_mode, mode-aware manifests, JSON retry logic.
Add test_assistant_automation_node_tools and test_assistant_database_field_tools.
Update existing database/automation/core tests for refactored types and
new tool signatures. Update eval_utils for new deps structure.
* fix: lint
* chore(frontend): ignore .claude dir in vite watcher and fix Nitro EMFILE
- Add .claude/ to vite server watch ignore list alongside node_modules
and .git to avoid unnecessary file watching in worktrees.
- Configure Nitro devStorage to use fs-lite driver to prevent chokidar
from watching the entire repo root, which causes EMFILE on macOS in
large monorepos.
* fix(tests): fix test settings and seat usage test isolation with xdist
- Ensure pytest always finds backend/pytest.ini by passing -c pytest.ini
explicitly, fixing DJANGO_SETTINGS_MODULE=dev when running from root
- Preserve existing TEST dict keys when setting MIGRATE in test settings
- Add transaction=True to seat usage tests to prevent data leaking from
TransactionTestCase tests running on the same xdist worker
* fix(assistant): fix field types, formula regex, validator guard, and consolidate evals
Fix multiple_select returning None instead of [], link_row description
typo, formula regex missing greater_than_or_equal/less_than_or_equal
variants, and guard against overwriting original validators on repeated
prepare_tools calls. Consolidate sample rows and navigation evals into
the database tables eval file and remove the meta tool-call history test.
* fix(assistant): improve tool return types, filter aliases, and eval infrastructure
- Return consistent dict types from all tools instead of plain strings
- Add operator aliases for view filters so LLMs can use natural names
- Fix boolean filter operator (is → equal)
- Remove reasoning format from UTILITY model profiles (pollutes structured output)
- Add ModelRetry for workflow creation and formula agent errors
- Add EvalChecklist for soft assertions with pass/fail scoring
- Add EVAL_RETRIES support for flake detection in eval tests
- Suppress loguru DEBUG noise during evals
* refactor(assistant): extract table creation helpers and remove unused model profile
- Extract _create_empty_tables and _create_table_fields from create_tables
- Filter out duplicate primary field in field creation to avoid model mistakes
- Remove unused gpt-oss-20b model profile
- Always attempt sample rows regardless of field errors
* fix(assistant): strip <think> tags and unify streaming as reasoning chunks
Models like MiniMax-M2.5 emit <think>...</think> tags inline. Handle
ThinkingPart/ThinkingPartDelta events from pydantic-ai and extract
inline thinking from text parts as a fallback. Stream all content as
AiReasoningChunk during the agent run; the final answer is emitted
as AiMessageChunk by _emit_answer.
* fix(assistant): simplify streaming and add collapsible reasoning UI
Replace _accumulate_text/_extract_thinking with a single _get_content_delta
helper that forwards text/thinking deltas. Accumulate reasoning_so_far and
strip <think> tags before sending to frontend (which replaces content on
each chunk). Add collapsible reasoning bubble (max 250px with fade mask
and chevron toggle).
* fix(assistant): bridge legacy UDSPY_LM_* env vars to pydantic-ai config
* docs(assistant): improve ai-assistant.md and add AWS_REGION_NAME backward compat
- Add both Bedrock auth methods (boto3 creds + bearer token)
- Add section 6 with pydantic-ai model overview link and provider list
- Restructure migration table: unchanged / bridged / new variables
- Fix AWS_BEARER_TOKEN_BEDROCK incorrectly listed as removed
- Bridge AWS_REGION_NAME to AWS_DEFAULT_REGION in settings for backward compat
* minor doc/evals fixes
* fix(assistant): strip unclosed <think> tags during streaming
Models behind Groq emit <think> tags as text content rather than using
the native thinking protocol. During streaming, the closing </think>
tag may not have arrived yet, causing raw thinking content to leak to
the frontend. Also strip think tags from tool thought fields and reset
reasoning on tool results.
* docs(assistant): use provider:model format and refresh eval docs
- Update all docs to use pydantic-ai provider:model format (colon separator)
- Fix mixed-up provider descriptions in configuration.md
- Refresh eval docs: replace assert_no_tool_errors with EvalChecklist pattern
- Add embeddings URL for local vs Docker in ai-assistant-evals.md
- Skip KB sync post_migrate signal during tests
- Fix temperature type in model_profiles.py
- Refactor justfile PYTHONPATH for test recipe
* fix(assistant): clean up navigation and improve tool types
- Remove unused WorkspaceNavigationRequestType
- Narrow exception catch in navigate tool to ObjectDoesNotExist
- Guard id field in CreateRowModel.from_django_orm
- Use id__in for batch filtering in ListTablesFilterArg
* Revert "chore(frontend): ignore .claude dir in vite watcher and fix Nitro EMFILE"
This reverts commit 4be1be1.
* fix: ai-assistant-test-plan.md tool smoke test prompt for list_builders
* fix: Posthog env var names in docs/testing/ai-assistant-test-plan.md
* fix: wrong doc reference1 parent 7ff02b8 commit b65c7e7
97 files changed
Lines changed: 15602 additions & 7113 deletions
File tree
- backend
- src/baserow/config/settings
- changelog/entries/unreleased/refactor
- docs
- installation
- testing
- enterprise
- backend
- src/baserow_enterprise
- api/assistant
- assistant
- tools
- automation
- types
- core
- database
- types
- search_user_docs
- config/settings
- migrations
- tests/baserow_enterprise_tests
- assistant
- evals
- enterprise
- web-frontend/modules/baserow_enterprise
- assets/scss/components
- components/assistant
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
| 84 | + | |
84 | 85 | | |
85 | 86 | | |
86 | | - | |
| 87 | + | |
87 | 88 | | |
88 | 89 | | |
89 | 90 | | |
| |||
228 | 229 | | |
229 | 230 | | |
230 | 231 | | |
231 | | - | |
| 232 | + | |
232 | 233 | | |
233 | 234 | | |
234 | 235 | | |
235 | 236 | | |
236 | 237 | | |
237 | 238 | | |
238 | | - | |
| 239 | + | |
239 | 240 | | |
240 | 241 | | |
241 | 242 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
93 | | - | |
| 92 | + | |
| 93 | + | |
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | | - | |
| 103 | + | |
| 104 | + | |
104 | 105 | | |
105 | 106 | | |
106 | 107 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| 59 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1332 | 1332 | | |
1333 | 1333 | | |
1334 | 1334 | | |
1335 | | - | |
1336 | | - | |
1337 | | - | |
1338 | | - | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
1339 | 1339 | | |
1340 | 1340 | | |
1341 | 1341 | | |
1342 | 1342 | | |
1343 | | - | |
| 1343 | + | |
1344 | 1344 | | |
1345 | 1345 | | |
1346 | 1346 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
69 | 87 | | |
70 | 88 | | |
71 | 89 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
69 | | - | |
70 | | - | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| |||
0 commit comments