Skip to content

execute_sql JSON output repeats column names on every row, wastes ~50% of tokens #296

@ericseastrand-ln

Description

@ericseastrand-ln

Problem

execute_sql returns results as a JSON array of objects, repeating every column name on every row:

[
  {"event_id": "EVT-10001", "event_name": "Concert A", "venue": "Arena 1", "city": "NYC", "status": "Active"},
  {"event_id": "EVT-10002", "event_name": "Concert B", "venue": "Arena 2", "city": "Chicago", "status": "Active"},
  ...
]

Since the consumer is always an LLM (via MCP), this format is extremely wasteful. ~42% of the payload is repeated column names, and the format uses ~2x the characters compared to a markdown table.

Impact (real-world numbers)

In a real debugging session with 107 SQL queries against a Databricks warehouse:

  • 4 context compaction events triggered (at ~168K token limit each time)
  • ~64M cache_read tokens consumed across two sessions
  • Each compaction caused loss of earlier context, leading to redundant re-queries
  • Result payloads of 18K–46K characters each filled the context window rapidly

Comparison

For a 100-row × 10-column result set:

Format Size Savings vs JSON
JSON (current) 27,380 chars
Markdown table 13,860 chars 50%
TSV 11,576 chars 58%

Proposed Solution

Add an output_format parameter to execute_sql and execute_sql_multi with options:

  • "json" — current behavior (List[Dict] → JSON array of objects)
  • "markdown" — markdown table (column names in header only, ~50% smaller)

Default should be "markdown" since MCP tools are consumed by LLMs, which parse markdown tables natively. LLMs don't need machine-parseable JSON for tabular data — they need compact, readable tables.

The change is small: add a formatting step in the MCP server's tools/sql.py layer, keeping the core library (databricks-tools-core) unchanged.

Environment

  • Claude Code with Databricks MCP server
  • Model context window: ~200K tokens
  • Workload: iterative SQL exploration/debugging

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions