Problem
execute_sql returns results as a JSON array of objects, repeating every column name on every row:
[
{"event_id": "EVT-10001", "event_name": "Concert A", "venue": "Arena 1", "city": "NYC", "status": "Active"},
{"event_id": "EVT-10002", "event_name": "Concert B", "venue": "Arena 2", "city": "Chicago", "status": "Active"},
...
]
Since the consumer is always an LLM (via MCP), this format is extremely wasteful. ~42% of the payload is repeated column names, and the format uses ~2x the characters compared to a markdown table.
Impact (real-world numbers)
In a real debugging session with 107 SQL queries against a Databricks warehouse:
- 4 context compaction events triggered (at ~168K token limit each time)
- ~64M cache_read tokens consumed across two sessions
- Each compaction caused loss of earlier context, leading to redundant re-queries
- Result payloads of 18K–46K characters each filled the context window rapidly
Comparison
For a 100-row × 10-column result set:
| Format |
Size |
Savings vs JSON |
| JSON (current) |
27,380 chars |
— |
| Markdown table |
13,860 chars |
50% |
| TSV |
11,576 chars |
58% |
Proposed Solution
Add an output_format parameter to execute_sql and execute_sql_multi with options:
"json" — current behavior (List[Dict] → JSON array of objects)
"markdown" — markdown table (column names in header only, ~50% smaller)
Default should be "markdown" since MCP tools are consumed by LLMs, which parse markdown tables natively. LLMs don't need machine-parseable JSON for tabular data — they need compact, readable tables.
The change is small: add a formatting step in the MCP server's tools/sql.py layer, keeping the core library (databricks-tools-core) unchanged.
Environment
- Claude Code with Databricks MCP server
- Model context window: ~200K tokens
- Workload: iterative SQL exploration/debugging
Problem
execute_sqlreturns results as a JSON array of objects, repeating every column name on every row:[ {"event_id": "EVT-10001", "event_name": "Concert A", "venue": "Arena 1", "city": "NYC", "status": "Active"}, {"event_id": "EVT-10002", "event_name": "Concert B", "venue": "Arena 2", "city": "Chicago", "status": "Active"}, ... ]Since the consumer is always an LLM (via MCP), this format is extremely wasteful. ~42% of the payload is repeated column names, and the format uses ~2x the characters compared to a markdown table.
Impact (real-world numbers)
In a real debugging session with 107 SQL queries against a Databricks warehouse:
Comparison
For a 100-row × 10-column result set:
Proposed Solution
Add an
output_formatparameter toexecute_sqlandexecute_sql_multiwith options:"json"— current behavior (List[Dict] → JSON array of objects)"markdown"— markdown table (column names in header only, ~50% smaller)Default should be
"markdown"since MCP tools are consumed by LLMs, which parse markdown tables natively. LLMs don't need machine-parseable JSON for tabular data — they need compact, readable tables.The change is small: add a formatting step in the MCP server's
tools/sql.pylayer, keeping the core library (databricks-tools-core) unchanged.Environment