Add ES-compatible endpoints for Trino connector support#6168
Add ES-compatible endpoints for Trino connector support#6168
Conversation
- Accept `adjust_pure_negative` field in bool queries (emitted by ES Java BoolQueryBuilder, blocks all WHERE predicate pushdown without this fix) - Accept legacy `from`/`to`/`include_lower`/`include_upper` fields in range queries (used by ES Java RangeQueryBuilder instead of standard gt/gte/lt/lte) Co-authored-by: Cursor <cursoragent@cursor.com>
The _mapping(s) endpoint now resolves wildcard (`*`) and comma-separated index patterns against the metastore, matching the behavior of real Elasticsearch. This enables Trino's wildcard table feature (e.g. `SELECT * FROM "stack*"`), which calls `GET /stack*/_mappings` to discover schemas across matching indices. Co-authored-by: Cursor <cursoragent@cursor.com>
Move the inline warp path filters for _aliases and _mappings handlers into dedicated functions in filter.rs, consistent with all other ES compat endpoints. Revert the --debug CLI flag added to run_tests.py as per-step debug: true in YAML files is sufficient. Co-authored-by: Cursor <cursoragent@cursor.com>
Collapse nested if-let + if into if-let chains in range query conversion logic. Fix line length and import ordering for rustfmt. Co-authored-by: Cursor <cursoragent@cursor.com>
| /// by default. Accepted here for compatibility (e.g. Trino's ES connector) | ||
| /// but not used by Quickwit. | ||
| #[serde(default)] | ||
| adjust_pure_negative: Option<bool>, |
There was a problem hiding this comment.
This is needed for all WHERE predicate pushdown.
There was a problem hiding this comment.
nit: any way to rename to _adjust_pure_negative and have a serde alias adjust_pure_negative to convey that this is unused. Then we can add Option<serde::de::IgnoredAny> on top of that. There's an example of that in quickwit-config:
#[serde(alias = "max_num_concurrent_split_streams", default, skip_serializing)]
pub _max_num_concurrent_split_streams: Option<serde::de::IgnoredAny>,| include_lower: Option<bool>, | ||
| /// When `true` (default), `to` maps to `lte`; when `false`, to `lt`. | ||
| #[serde(default)] | ||
| include_upper: Option<bool>, |
There was a problem hiding this comment.
These legacy fields are needed as well to unblock the query like where score > 50
| mut metastore: MetastoreServiceClient, | ||
| search_service: Arc<dyn SearchService>, | ||
| ) -> Result<ElasticsearchMappingsResponse, ElasticsearchError> { | ||
| let indexes_metadata = if index_id.contains('*') || index_id.contains(',') { |
There was a problem hiding this comment.
detect wildcard (*) and comma-separated (,) patterns in the index to unblock Trino's wildcard table feature (e.g., SELECT * FROM "stack*") calls GET /stack*/_mappings directly.
| @@ -0,0 +1,100 @@ | |||
| method: [GET] | |||
There was a problem hiding this comment.
This is just a placeholder, I will update the rest-api-test in the following pr.
guilload
left a comment
There was a problem hiding this comment.
Needs a few more unit tests, but LGTM.
| /// by default. Accepted here for compatibility (e.g. Trino's ES connector) | ||
| /// but not used by Quickwit. | ||
| #[serde(default)] | ||
| adjust_pure_negative: Option<bool>, |
There was a problem hiding this comment.
nit: any way to rename to _adjust_pure_negative and have a serde alias adjust_pure_negative to convey that this is unused. Then we can add Option<serde::de::IgnoredAny> on top of that. There's an example of that in quickwit-config:
#[serde(alias = "max_num_concurrent_split_streams", default, skip_serializing)]
pub _max_num_concurrent_split_streams: Option<serde::de::IgnoredAny>,| let bool_query: BoolQuery = serde_json::from_str(bool_query_json).unwrap(); | ||
| assert_eq!(bool_query.adjust_pure_negative, Some(true)); | ||
| assert_eq!(bool_query.must.len(), 1); | ||
| let _ast = bool_query.convert_to_query_ast().unwrap(); |
There was a problem hiding this comment.
| let _ast = bool_query.convert_to_query_ast().unwrap(); | |
| bool_query.convert_to_query_ast().unwrap(); |
| )); | ||
| } | ||
|
|
||
| fn json_number(n: u64) -> JsonLiteral { |
There was a problem hiding this comment.
| fn json_number(n: u64) -> JsonLiteral { | |
| fn into_json_number(n: u64) -> JsonLiteral { |
into_* -> takes owership and converts into
as_* -> converts into using a reference
| } | ||
| if sort_by[0] != "index" && sort_by[0] != "index:asc" { | ||
| return Err(unsupported_parameter_error("s")); | ||
| } |
There was a problem hiding this comment.
Can we add a unit test for this function?
| } | ||
|
|
||
| impl ElasticsearchMappingsResponse { | ||
| pub fn from_doc_mapping( |
There was a problem hiding this comment.
This needs a more comprehensive unit test.
| config.node_id.as_str(): { | ||
| "roles": ["data", "ingest"], | ||
| "http": { | ||
| "publish_address": config.rest_config.listen_addr.to_string() |
There was a problem hiding this comment.
I think we want to use advertise address.
| .iter() | ||
| .map(|m| m.index_id().to_string()) | ||
| .collect(); | ||
| let list_fields_request = quickwit_proto::search::ListFieldsRequest { |
There was a problem hiding this comment.
We can worry about this later, but this is not going to scale. Let's keep this in mind.
cc @trinity-1686a @fulmicoton-dd for visibility:
Trino needs to know the schema of the data before ahead of time. All queries start with a list fields query.
| @@ -0,0 +1,100 @@ | |||
| method: [GET] | |||
Description
Adds the Elasticsearch-compatible API endpoints and query DSL extensions required for Trino's Elasticsearch connector to work with Quickwit.
The Trino ES connector (using the ES Java High Level REST Client 7.17) requires several endpoints and query behaviors that Quickwit did not previously support. This PR addresses those gaps.
New ES-compatible endpoints
GET /_nodes/httpGET /{index}/_search_shardsElasticsearchSplitManagerto create splits.GET /{index}/_mapping(s)doc_mappinginto ES-format{"properties": {"field": {"type": "..."}}}. Supports wildcard and comma-separated index patterns. Also merges dynamically-indexed fields viaroot_list_fields.GET /_aliases{}(Quickwit doesn't use index aliases). Required by Trino'sElasticsearchMetadata.listTables().DELETE /_search/scroll{"succeeded": true}since Quickwit manages scroll lifetime via TTL. Without this, Trino gets a 405 error.Query DSL extensions
adjust_pure_negativeinboolqueriesBoolQueryBuilderalways emits this internal field. Without accepting it,#[serde(deny_unknown_fields)]rejects everyWHEREclause pushed down by Trino, returning 400.from/to/include_lower/include_upperinrangequeriesRangeQueryBuilder.doXContent()serializes using these legacy fields instead ofgt/gte/lt/lte. Conversion logic maps them to the standard bounds. Without this, all range predicates (WHERE score > 50) fail with 400.search_typequery parametersearch_type=query_then_fetchon every_searchrequest. Without accepting it, the request fails with 400.Other changes
X-Elastic-Product: Elasticsearchresponse headerTransportExceptionif missing.version.numberfor compatibility. Returning7.17.0with properlucene_version,minimum_wire_compatibility_version, etc. prevents client-side version check failures._cat/indicessparameter validations=index:ascfor sorted index listing. Previously rejected as unsupported.check_all_index_metadata_foundtoensure_all_indexes_foundTrino SQL queries verified working
SHOW SCHEMAS,SHOW TABLES,SHOW COLUMNS,DESCRIBESELECTwith filtering (WHERE), aggregation (COUNT,SUM,AVG,MIN,MAX),GROUP BY,ORDER BY,DISTINCT,LIMIT>,>=,<,<=,BETWEEN)UPPER,LOWER,LENGTH,CONCAT,SUBSTR,TRIM)CASE WHEN,COALESCE,NULLIF)JOIN,UNION,INTERSECT,EXCEPTEXPLAINquery plansINFORMATION_SCHEMAqueriesKnown limitations (not addressed in this PR)
_idhidden columnsplit_id:segment_ord:doc_id) rather than stable per-document IDs. Returning these internal addresses would be misleading. Trino queries requesting_idwill fail with "string is null".search_settings.default_search_fieldsto be configured in the index. Quickwit does not fall back to searching all fields when no default is specified.getTableStatistics()for join optimizationHow was this PR tested?
Tested locally with a full Trino + Caddy + Quickwit stack:
localhost:7280withstackoverflow(15 docs, 9 fields) andusers(6 docs, 5 fields) indiceslocalhost:9200rewriting/_elasticprefixlocalhost:9200adjust_pure_negativedeserialization andfrom/torange query conversion (inclusive, exclusive, and default behaviors)