Add Highspot connector + fix slack-bot create button silent submit#41
Merged
rajivml merged 3 commits intofeature/darwinfrom May 6, 2026
Merged
Add Highspot connector + fix slack-bot create button silent submit#41rajivml merged 3 commits intofeature/darwinfrom
rajivml merged 3 commits intofeature/darwinfrom
Conversation
Indexes Spots and the Items inside them via Highspot's REST API.
Authenticates with HTTP Basic (key+secret) generated from the
Highspot admin console; an optional base URL covers tenants on
non-default Highspot regions.
Per-item content extraction is tiered:
1. WebLink items -> headless-Chromium scrape via Playwright,
reusing one shared browser/context for the whole poll_source
run (mirrors connectors/web/connector.py — spawning Chromium
per item starves worker FDs/RAM and was making co-running
Slack indexing fail with IncompleteRead).
2. Items with a downloadable, supported extension
(.pdf .docx .pptx .xlsx .eml .epub .html .txt) ->
extract_file_text over the bytes from items/{id}/content.
3. Else / on any error -> title + description fallback.
Notable adaptations vs upstream Onyx:
- Drops the Slim/perm-sync interface; this fork has no
SlimConnectorWithPermSync / SlimDocument / TextSection /
OnyxFileExtensions / IndexingHeartbeatInterface in the
upstream shape.
- Uses Section instead of TextSection.
- extract_file_text arg order is (file_name, file, ...) here;
upstream is (file, file_name, ...).
- Parses ISO date_updated to datetime before assignment because
Document.doc_updated_at is typed datetime | None.
- Scroll loop bounds: max_attempts=10 (down from 20), per-scroll
networkidle timeout=5s (down from 60s) — caps single-WebLink
worst case at ~110s, vs the upstream 20-min stall.
- _YIELD_BATCH_SIZE=4 so the indexer's docs_indexed counter
ticks more frequently; API pagination still uses
INDEX_BATCH_SIZE.
Frontend:
- HighspotConfig + HighspotCredentialJson in lib/types.ts.
- HighspotIcon (placeholder Highspot.png — replace with the
real logo before merge).
- Tile in lib/sources.ts (AppConnection category).
- Admin page at /admin/connectors/highspot mirrors the
sf-account/page.tsx template; Spot selection is a live
multi-select dropdown driven by GET /manage/admin/connector/
highspot/spots/{credential_id} that calls the Highspot API
using the saved credential and renders the actual Spot list.
Selecting >=1 Spot is mandatory.
Process bounce after merge + deploy: dapi + dbe + dsl
(DocumentSource enum addition footgun per CLAUDE.md).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Yup schema unconditionally required curated_response_config. response_message, but the matching text input is only rendered when enable_curated_response_integration is true. Default is false, so on a fresh /admin/bot/new the field was empty, validation failed silently, the Create button did nothing, and no error rendered because the errored field wasn't on screen. Mirror the jira_config pattern: only require when the toggle is enabled. Same fix as commit 4ed8bcb on feature/multilanguage-support; applied here independently because feature/highspot was branched off feature/darwin before that PR landed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's pre-commit prettier (v3.1.0) defaults to trailingComma:"all" and flags the missing comma after the last generic param. Local npm prettier 2.8.8 defaults to "es5" and didn't catch it. Adding the comma to satisfy the canonical CI hook. Pre-existing issue surfaced by this PR's diff scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sarath1018
approved these changes
May 6, 2026
swati354
approved these changes
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new Highspot connector to Darwin and includes a small drive-by fix for the slack-bot config admin page (Create button silently doing nothing on a fresh form).
What was implemented
1. Highspot connector
Indexes Spots and the Items inside them via Highspot's REST API (
https://api-su2.highspot.com/v1.0/). Auth: HTTP Basic with an API key + secret pair generated from the Highspot admin console; an optionalhighspot_urlcovers tenants on non-default Highspot regions.For each Item, a
Documentis built whose section text comes from one of three tiers:title + descriptionif the scrape returns empty..pdf,.docx,.pptx,.xlsx,.eml,.epub,.html,.txt) →extract_file_textover the bytes returned byitems/{id}/content.title + "\\n" + description.Backend:
backend/danswer/connectors/highspot/{__init__,client,utils,connector}.py— new package.backend/danswer/configs/constants.py—DocumentSource.HIGHSPOT = "highspot".backend/danswer/connectors/factory.py— registered.backend/danswer/server/documents/connector.py— newGET /manage/admin/connector/highspot/spots/{credential_id}route that returns the live list of Spots visible to a saved credential (powers the multi-select on the admin page).Frontend:
web/src/lib/types.ts—\"highspot\"inValidSources,HighspotConfig,HighspotCredentialJson.web/src/components/icons/icons.tsx—HighspotIcon(placeholder asset; see TODO below).web/src/lib/sources.ts— tile entry underSOURCE_METADATA_MAP.highspot.web/src/app/admin/connectors/highspot/page.tsx— full Step 1 (credentials) + Step 2 (multi-select Spots + create connector) admin page. Selecting at least one Spot is mandatory; the dropdown is populated live from the new backend route using the saved credential.web/public/Highspot.png— placeholder icon (see TODO).Notable adaptations vs upstream Onyx
This fork is ~2 years behind upstream and lacks the perm-sync / slim-doc /
OnyxFileExtensions/IndexingHeartbeatInterfacerewrite that upstream's connector depends on. Adjustments:LoadConnector + PollConnector.TextSectionwith this fork'sSection.OnyxFileExtensions.TEXT_AND_DOCUMENT_EXTENSIONSwith an inline tuple matching this fork'sextract_file_textdispatch.extract_file_textargument order is(file_name, file, ...)here vs upstream's(file, file_name, ...).Document.doc_updated_atisdatetime | Nonehere (upstream isstr); ISO strings are parsed before assignment.Lifecycle / perf adaptations to coexist with other connectors
The naive upstream Highspot connector spawns a fresh Chromium process per WebLink item. We observed this starving the worker's FDs / RAM and causing co-running connectors (specifically Slack's
conversations.list) to fail withIncompleteReadmid-response. Fixed:poll_sourcerun — mirrorsconnectors/web/connector.py's pattern. Oneplaywright.start() + chromium.launch()for the entire run,context.new_page()per WebLink,page.close()after each, full teardown in atry/finallyat end-of-run (or on error).WEB_CONNECTOR_MAX_SCROLL_ATTEMPTS = 10(down from upstream's 20) and per-scrollwait_for_load_state(\"networkidle\", timeout=5000)(down from 60000). Caps single-WebLink worst case at ~110s vs the upstream ~20-minute stall on pages where networkidle never settles._YIELD_BATCH_SIZE = 4decoupled fromINDEX_BATCH_SIZEso the indexer'sdocs_indexedcounter ticks up more often. Per-item processing in this connector is slow enough (Playwright + extract_file_text) that yielding every 16 items can mean minutes between UI counter updates.2. Drive-by fix: slack-bot config Create button
web/src/app/admin/bot/SlackBotConfigCreationForm.tsx— gatecurated_response_config.response_messagevalidation behind theenable_curated_response_integrationtoggle. Without the gate, the schema unconditionally required the field but the input is only rendered when the toggle is on (default off), so on a fresh/admin/bot/newvalidation silently failed, the Create button did nothing, and no error was visible because the errored field wasn't on screen. Mirrors the existingjira_config.when()pattern.(Same fix as commit
4ed8bcbdonfeature/multilanguage-support; applied here independently becausefeature/highspotwas branched offfeature/darwinbefore that PR landed.)What was tested
Pre-commit / quality checks (per
.pre-commit-config.yaml):black --checkon every Python file touchedreorder_python_imports --py311-plusruff(clean)prettier --checkon every TS/TSX touchedtsc --noEmit(clean, exit 0)from danswer.connectors.highspot.connector import HighspotConnectorsucceeds in the venv)Manual verification path:
connector.py'sif __name__ == \"__main__\":block:/admin/connectors/highspot→ enter key + secret in Step 1 → save credential.docs_indexedticks up every ~few items (the smaller yield-batch effect).What's NOT in this PR (follow-ups, if needed)
web/public/Highspot.png— currently a placeholder copy ofHubSpot.png. Swap it for the real Highspot logo before merge.client.get_item(item_id)is called sequentially for every item even just for time-window rejection. Parallelizing this with aThreadPoolExecutor(5-10)would 2-5× indexing throughput; deferred until we see real-world Spot sizes.WEB_CONNECTOR_MAX_SCROLL_ATTEMPTS=10is a count cap; a wall-clock cap (e.g. 30s/page) would handle the long-tail pages better. Also deferred.Process bounce required after deploy
Per
CLAUDE.md's footgun list:DocumentSourceenum value triggerspydantic ValidationError: source_typeon the slackbot otherwise.Test plan
web/public/Highspot.pngwith the real Highspot logo/admin/connectors/highspot, enter creds, verify the Spot multi-select populates from the live APIdocs_indexedshould tick up every ~few items/admin/bot/newCreate button — should now either submit successfully or display backend validation errors (no more silent "nothing happens")🤖 Generated with Claude Code