Skip to content

TblSqlSource.execute_query() returns lazy tbl objects that fail ellmer serialization with Spark/Databricks connections #198

@asb2111

Description

@asb2111

When using querychat with a Databricks (Spark SQL) connection via dbplyr::tbl(), the execute_query() method in TblSqlSource returns a lazy tbl_sql object. When ellmer tries to serialize this tool result to JSON, it fails because jsonlite::toJSON() calls nrow() on the lazy table, which returns NA for remote connections.

Error Message

Error in tool_string() at ellmer/R/provider-aws.R:
! Could not convert tool result from a  object to JSON.
ℹ If you are the tool author, update the tool to convert the result to a string or JSON.
Caused by error in if (!nrow(x)) ...:
! missing value where TRUE/FALSE needed

Reproduction

library(querychat)
library(DBI)
library(odbc)
library(dbplyr)

con <- dbConnect(odbc(), dsn = "Databricks_DSN")
tbl_obj <- tbl(con, in_catalog("catalog", "schema", "table_name"))

qc <- QueryChat$new(
  tbl_obj,
  client = chat_openai()  # or any ellmer chat client
)

When the LLM invokes the update_dashboard tool, the error occurs

Root Cause

In TblSqlSource.R, execute_query() returns a lazy tbl:

execute_query = function(query) {
  sql_query <- self$prep_query(query)
  dplyr::tbl(private$conn, dplyr::sql(sql_query))  # Returns lazy tbl
}

This works for the Shiny dashboard (which calls collect() before rendering), but fails when ellmer tries to serialize the tool result for the LLM.

Note: test_query() works correctly because it delegates to DBISource which returns collected data.

Attempted Workarounds

  1. Using DBISource directly - Passing the DBI connection + table name doesn't work with Databricks three-part naming (catalog.schema.table) due to table name validation.
  2. Custom DataSource subclass - Creating a subclass that overrides execute_query() to collect results hits the same table name validation issue.
  3. Registering S3 methods for jsonlite::toJSON - The method dispatch doesn't seem to pick up custom methods for tbl_sql classes.

Suggested Fix

One of:

  1. Collect in execute_query() for TblSqlSource:
    execute_query = function(query) {
    sql_query <- self$prep_query(query)
    dplyr::collect(dplyr::tbl(private$conn, dplyr::sql(sql_query)))
    }
  2. Add a parameter to control whether results are collected (for users who need lazy evaluation).
  3. Handle serialization in querychat_tool_result() by collecting lazy tbls before returning.

Environment

  • querychat: (version)
  • ellmer: (version)
  • R: (version)
  • Database: Databricks (Spark SQL) via odbc
  • OS: Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions