Skip to content

feat: multi-backend fulltext search (Elastic, OpenSearch, Typesense)#10705

Open
dnplkndll wants to merge 5 commits intohcengineering:developfrom
ledoent:feat/multi-search-backend
Open

feat: multi-backend fulltext search (Elastic, OpenSearch, Typesense)#10705
dnplkndll wants to merge 5 commits intohcengineering:developfrom
ledoent:feat/multi-search-backend

Conversation

@dnplkndll
Copy link
Copy Markdown
Contributor

Summary

Add configurable fulltext search backend with three adapters and self-healing for non-persistent backends.

New packages

  • @hcengineering/opensearch — Drop-in OpenSearch 2.x replacement for ES 7.x (Apache 2.0 licensed, same wire protocol)
  • @hcengineering/typesense — Lightweight C++ search engine (~3-5x less RAM, no JVM)

Fulltext pod changes

  • FULLTEXT_BACKEND env var selects adapter at runtime: elastic (default) | opensearch | typesense
  • Auto-reindex on empty backend — non-persistent backends (e.g. Typesense with emptyDir) self-heal after pod restart. On startup, detects 0 docs → enumerates active workspaces → queues fullReindex for each. Elastic/OpenSearch skip this check (persistent storage).
  • Optional getDocCount() method on FullTextAdapter interface enables the detection.

Build fix

  • show_version.js — reads version.txt before git tags. Previously fell back to hardcoded 0.6.0 when no git tags exist (forks), causing model version mismatch that blocked the indexer entirely.

Search Comparison Results

Tested with 565 docs, 7 query types across all 3 backends:

Query ES↔OpenSearch ES↔Typesense
Prefix "set" 5/5 identical 7/7 same docs, minor rank diff
Fuzzy "projecct" (typo) 4/5 overlap 3/5 (top 2 identical)
Multi-word "deployment setup" 1/1 identical 1/1 identical
Class filter "project" 5/5 identical 5/5 same docs

OpenSearch is a perfect drop-in (identical results). Typesense finds the same relevant documents with minor ranking differences (different scoring model).

Resource Comparison (production verified)

ES 7.14 Typesense Savings
Search engine RAM 1353 Mi 84 Mi 94%
Disk 10Gi PVC emptyDir 100%
Self-healing on restart N/A Auto-reindex (~30s)

Commits

  1. feat: add OpenSearch and Typesense fulltext adapters — new @hcengineering/opensearch and @hcengineering/typesense packages implementing FullTextAdapter
  2. feat: configurable fulltext backend via FULLTEXT_BACKEND env var — backend selection in fulltext pod
  3. fix: read version.txt before git tags in show_version.js — fixes model version resolution in forks without git tags
  4. feat: tune Typesense adapter for better search parity with ES — field weights, typo tolerance, sanitizeDoc() for dynamic fields
  5. feat: auto-reindex on empty Typesense backend startupgetDocCount() on adapter + checkAndTriggerReindex() in manager

Backward Compatibility

  • Default backend remains elastic — zero changes for existing deployments
  • FullTextAdapter interface change is additive only (getDocCount is optional)
  • No Helm chart changes required (env vars are optional)

Related

Test plan

  • rush build passes
  • CI: build, formatting, test, docker-build, svelte-check, all uitests pass
  • Fulltext pod starts correctly with each backend
  • Search comparison: 7 queries across all 3 backends
  • Self-healing: Typesense pod kill → auto-reindex → 565 docs restored in ~30s
  • FULLTEXT_BACKEND=invalid exits with clear error message

🤖 Generated with Claude Code

@huly-github-staging
Copy link
Copy Markdown

Connected to Huly®: UBERF-16138

dnplkndll and others added 5 commits March 29, 2026 20:05
Add two new search backend adapters implementing FullTextAdapter:

- @hcengineering/opensearch: Drop-in replacement using OpenSearch 2.x
  client. Removes deprecated type: _doc params, uses osErr error types.
  Fixes inherited bugs: updateMany error ID matching, remove() array
  mutation.

- @hcengineering/typesense: Native Typesense adapter with proper query
  translation. Uses filter_by for workspace/class scoping, optional
  filter clauses for soft boosts (matching ES should semantics),
  read-modify-write for updateByQuery, and batch import for updateMany.
  Binary doc.data is skipped (relies on Rekoni extraction).

Both adapters follow the same factory pattern as the existing elastic
adapter and can be selected at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Don Kendall <kendall@donkendall.com>
Wire up the new OpenSearch and Typesense adapters in the fulltext pod.
The FULLTEXT_BACKEND env var selects the adapter at startup:
  - "elastic" (default): existing Elasticsearch 7.x
  - "opensearch": OpenSearch 2.x (API-compatible drop-in)
  - "typesense": Typesense (lightweight C++ engine)

Register both new packages in rush.json and add them as fulltext pod
dependencies. ELASTIC_INDEX_NAME now defaults to 'huly_storage_index'
instead of requiring it for all backends.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Don Kendall <kendall@donkendall.com>
The script only read version.txt when git describe succeeded, falling
back to hardcoded "0.6.0" when no tags exist (e.g. in forks). This
caused the model version to be 0.6.0 instead of the value in
version.txt, making the fulltext indexer skip all workspaces.

Now reads version.txt first, falls back to git tags, then "0.6.0".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Don Kendall <kendall@donkendall.com>
- Increase query_by_weights to 100,100,1 (matching ES boost ratios)
- Enable num_typos: 2,2,1 for broader fuzzy matching
- Add typo_tokens_threshold: 1 and drop_tokens_threshold: 1
- Use text_match_info.best_field_score for more granular scoring
- Add sanitizeDoc() to coerce dynamic *_fields to string arrays
  (fixes 5 import errors from searchIcon_fields type mismatch)

Search comparison results after tuning:
- ES↔OpenSearch: 100% identical on all queries
- ES↔Typesense: same top results, 5/5 overlap on class filter,
  3/5 on prefix/fuzzy (different ranking, same docs found)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Don Kendall <kendall@donkendall.com>
When the fulltext pod starts and the Typesense collection has 0
documents (e.g. after pod restart with emptyDir), automatically
enumerate all active workspaces and queue a fullReindex event for
each. This makes Typesense self-healing without requiring a PVC.

- Add optional getDocCount() to FullTextAdapter interface
- Implement getDocCount() in TypesenseAdapter via collections().retrieve()
- Add checkAndTriggerReindex() to WorkspaceManager.startIndexer()
- Elastic/OpenSearch backends skip the check (no getDocCount method)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Don Kendall <kendall@donkendall.com>
@dnplkndll dnplkndll force-pushed the feat/multi-search-backend branch from 5b6f894 to 2cf6709 Compare March 30, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant