Skip to content

Add ES-compatible endpoints for Trino connector support#6168

Open
congx4 wants to merge 7 commits intomainfrom
guilload/trino-connector
Open

Add ES-compatible endpoints for Trino connector support#6168
congx4 wants to merge 7 commits intomainfrom
guilload/trino-connector

Conversation

@congx4
Copy link
Contributor

@congx4 congx4 commented Feb 24, 2026

Description

Adds the Elasticsearch-compatible API endpoints and query DSL extensions required for Trino's Elasticsearch connector to work with Quickwit.

The Trino ES connector (using the ES Java High Level REST Client 7.17) requires several endpoints and query behaviors that Quickwit did not previously support. This PR addresses those gaps.

New ES-compatible endpoints

Endpoint Purpose
GET /_nodes/http Node discovery. Returns node ID, data roles, and publish address. Required by Trino to build its node map and assign shards.
GET /{index}/_search_shards Shard topology. Returns a single shard (shard 0) since Quickwit handles parallelism internally. Used by Trino's ElasticsearchSplitManager to create splits.
GET /{index}/_mapping(s) Index schema. Translates Quickwit's doc_mapping into ES-format {"properties": {"field": {"type": "..."}}}. Supports wildcard and comma-separated index patterns. Also merges dynamically-indexed fields via root_list_fields.
GET /_aliases Alias listing for table discovery. Returns {} (Quickwit doesn't use index aliases). Required by Trino's ElasticsearchMetadata.listTables().
DELETE /_search/scroll Scroll context cleanup. No-op returning {"succeeded": true} since Quickwit manages scroll lifetime via TTL. Without this, Trino gets a 405 error.

Query DSL extensions

Change Why
Accept adjust_pure_negative in bool queries The ES Java BoolQueryBuilder always emits this internal field. Without accepting it, #[serde(deny_unknown_fields)] rejects every WHERE clause pushed down by Trino, returning 400.
Accept from/to/include_lower/include_upper in range queries The ES Java RangeQueryBuilder.doXContent() serializes using these legacy fields instead of gt/gte/lt/lte. Conversion logic maps them to the standard bounds. Without this, all range predicates (WHERE score > 50) fail with 400.
Accept search_type query parameter Trino sends search_type=query_then_fetch on every _search request. Without accepting it, the request fails with 400.

Other changes

Change Why
X-Elastic-Product: Elasticsearch response header Required by the ES Java client 7.14+. The client validates this header on every response and throws TransportException if missing.
Cluster info returns ES 7.17.0 version metadata The ES Java client checks version.number for compatibility. Returning 7.17.0 with proper lucene_version, minimum_wire_compatibility_version, etc. prevents client-side version check failures.
Relax _cat/indices s parameter validation Trino sends s=index:asc for sorted index listing. Previously rejected as unsupported.
Rename check_all_index_metadata_found to ensure_all_indexes_found Minor refactor for clarity, consistent naming.

Trino SQL queries verified working

  • SHOW SCHEMAS, SHOW TABLES, SHOW COLUMNS, DESCRIBE
  • SELECT with filtering (WHERE), aggregation (COUNT, SUM, AVG, MIN, MAX), GROUP BY, ORDER BY, DISTINCT, LIMIT
  • Range predicates (>, >=, <, <=, BETWEEN)
  • String functions (UPPER, LOWER, LENGTH, CONCAT, SUBSTR, TRIM)
  • Conditional expressions (CASE WHEN, COALESCE, NULLIF)
  • Subqueries, JOIN, UNION, INTERSECT, EXCEPT
  • EXPLAIN query plans
  • INFORMATION_SCHEMA queries

Known limitations (not addressed in this PR)

Feature Reason
_id hidden column Quickwit uses positional document addressing (split_id:segment_ord:doc_id) rather than stable per-document IDs. Returning these internal addresses would be misleading. Trino queries requesting _id will fail with "string is null".
Full-text query via table name Requires search_settings.default_search_fields to be configured in the index. Quickwit does not fall back to searching all fields when no default is specified.
getTableStatistics() for join optimization The DD Trino ES connector does not implement table statistics. Trino's cost-based optimizer cannot estimate table sizes, so joins use a fixed build/probe side assignment.

How was this PR tested?

Tested locally with a full Trino + Caddy + Quickwit stack:

  • Quickwit running on localhost:7280 with stackoverflow (15 docs, 9 fields) and users (6 docs, 5 fields) indices
  • Caddy reverse proxy on localhost:9200 rewriting /_elastic prefix
  • Trino 479 with the DD ES connector configured against localhost:9200
  • 40+ SQL query types executed via Trino CLI and verified against expected results
  • Unit tests added for adjust_pure_negative deserialization and from/to range query conversion (inclusive, exclusive, and default behaviors)
  • Existing unit tests updated for new struct fields

guilload and others added 6 commits February 24, 2026 10:58
- Accept `adjust_pure_negative` field in bool queries (emitted by ES Java
  BoolQueryBuilder, blocks all WHERE predicate pushdown without this fix)
- Accept legacy `from`/`to`/`include_lower`/`include_upper` fields in range
  queries (used by ES Java RangeQueryBuilder instead of standard gt/gte/lt/lte)

Co-authored-by: Cursor <cursoragent@cursor.com>
The _mapping(s) endpoint now resolves wildcard (`*`) and comma-separated
index patterns against the metastore, matching the behavior of real
Elasticsearch. This enables Trino's wildcard table feature (e.g.
`SELECT * FROM "stack*"`), which calls `GET /stack*/_mappings` to
discover schemas across matching indices.

Co-authored-by: Cursor <cursoragent@cursor.com>
Move the inline warp path filters for _aliases and _mappings handlers
into dedicated functions in filter.rs, consistent with all other ES
compat endpoints. Revert the --debug CLI flag added to run_tests.py
as per-step debug: true in YAML files is sufficient.

Co-authored-by: Cursor <cursoragent@cursor.com>
@congx4 congx4 changed the title Guilload/trino connector Add ES-compatible endpoints for Trino connector support Feb 25, 2026
Collapse nested if-let + if into if-let chains in range query
conversion logic. Fix line length and import ordering for rustfmt.

Co-authored-by: Cursor <cursoragent@cursor.com>
/// by default. Accepted here for compatibility (e.g. Trino's ES connector)
/// but not used by Quickwit.
#[serde(default)]
adjust_pure_negative: Option<bool>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for all WHERE predicate pushdown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any way to rename to _adjust_pure_negative and have a serde alias adjust_pure_negative to convey that this is unused. Then we can add Option<serde::de::IgnoredAny> on top of that. There's an example of that in quickwit-config:

#[serde(alias = "max_num_concurrent_split_streams", default, skip_serializing)]
pub _max_num_concurrent_split_streams: Option<serde::de::IgnoredAny>,

include_lower: Option<bool>,
/// When `true` (default), `to` maps to `lte`; when `false`, to `lt`.
#[serde(default)]
include_upper: Option<bool>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These legacy fields are needed as well to unblock the query like where score > 50

mut metastore: MetastoreServiceClient,
search_service: Arc<dyn SearchService>,
) -> Result<ElasticsearchMappingsResponse, ElasticsearchError> {
let indexes_metadata = if index_id.contains('*') || index_id.contains(',') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detect wildcard (*) and comma-separated (,) patterns in the index to unblock Trino's wildcard table feature (e.g., SELECT * FROM "stack*") calls GET /stack*/_mappings directly.

@@ -0,0 +1,100 @@
method: [GET]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a placeholder, I will update the rest-api-test in the following pr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg

@congx4 congx4 marked this pull request as ready for review February 25, 2026 19:10
Copy link
Member

@guilload guilload left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a few more unit tests, but LGTM.

/// by default. Accepted here for compatibility (e.g. Trino's ES connector)
/// but not used by Quickwit.
#[serde(default)]
adjust_pure_negative: Option<bool>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any way to rename to _adjust_pure_negative and have a serde alias adjust_pure_negative to convey that this is unused. Then we can add Option<serde::de::IgnoredAny> on top of that. There's an example of that in quickwit-config:

#[serde(alias = "max_num_concurrent_split_streams", default, skip_serializing)]
pub _max_num_concurrent_split_streams: Option<serde::de::IgnoredAny>,

let bool_query: BoolQuery = serde_json::from_str(bool_query_json).unwrap();
assert_eq!(bool_query.adjust_pure_negative, Some(true));
assert_eq!(bool_query.must.len(), 1);
let _ast = bool_query.convert_to_query_ast().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let _ast = bool_query.convert_to_query_ast().unwrap();
bool_query.convert_to_query_ast().unwrap();

));
}

fn json_number(n: u64) -> JsonLiteral {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn json_number(n: u64) -> JsonLiteral {
fn into_json_number(n: u64) -> JsonLiteral {

into_* -> takes owership and converts into
as_* -> converts into using a reference

}
if sort_by[0] != "index" && sort_by[0] != "index:asc" {
return Err(unsupported_parameter_error("s"));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a unit test for this function?

}

impl ElasticsearchMappingsResponse {
pub fn from_doc_mapping(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a more comprehensive unit test.

config.node_id.as_str(): {
"roles": ["data", "ingest"],
"http": {
"publish_address": config.rest_config.listen_addr.to_string()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to use advertise address.

.iter()
.map(|m| m.index_id().to_string())
.collect();
let list_fields_request = quickwit_proto::search::ListFieldsRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can worry about this later, but this is not going to scale. Let's keep this in mind.

cc @trinity-1686a @fulmicoton-dd for visibility:
Trino needs to know the schema of the data before ahead of time. All queries start with a list fields query.

@@ -0,0 +1,100 @@
method: [GET]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants