Add Highspot connector + fix slack-bot create button silent submit by rajivml · Pull Request #41 · UiPath/danswer

rajivml · 2026-05-06T09:39:46Z

Summary

Adds a new Highspot connector to Darwin and includes a small drive-by fix for the slack-bot config admin page (Create button silently doing nothing on a fresh form).

What was implemented

1. Highspot connector

Indexes Spots and the Items inside them via Highspot's REST API (https://api-su2.highspot.com/v1.0/). Auth: HTTP Basic with an API key + secret pair generated from the Highspot admin console; an optional highspot_url covers tenants on non-default Highspot regions.

For each Item, a Document is built whose section text comes from one of three tiers:

WebLink items → headless-Chromium scrape of the linked URL via Playwright. Falls back to title + description if the scrape returns empty.
File items with a downloadable, supported extension (.pdf, .docx, .pptx, .xlsx, .eml, .epub, .html, .txt) → extract_file_text over the bytes returned by items/{id}/content.
Else / on any error → title + "\\n" + description.

Backend:

backend/danswer/connectors/highspot/{__init__,client,utils,connector}.py — new package.
backend/danswer/configs/constants.py — DocumentSource.HIGHSPOT = "highspot".
backend/danswer/connectors/factory.py — registered.
backend/danswer/server/documents/connector.py — new GET /manage/admin/connector/highspot/spots/{credential_id} route that returns the live list of Spots visible to a saved credential (powers the multi-select on the admin page).

Frontend:

web/src/lib/types.ts — \"highspot\" in ValidSources, HighspotConfig, HighspotCredentialJson.
web/src/components/icons/icons.tsx — HighspotIcon (placeholder asset; see TODO below).
web/src/lib/sources.ts — tile entry under SOURCE_METADATA_MAP.highspot.
web/src/app/admin/connectors/highspot/page.tsx — full Step 1 (credentials) + Step 2 (multi-select Spots + create connector) admin page. Selecting at least one Spot is mandatory; the dropdown is populated live from the new backend route using the saved credential.
web/public/Highspot.png — placeholder icon (see TODO).

Notable adaptations vs upstream Onyx

This fork is ~2 years behind upstream and lacks the perm-sync / slim-doc / OnyxFileExtensions / IndexingHeartbeatInterface rewrite that upstream's connector depends on. Adjustments:

Drops the Slim/perm-sync interface entirely; the connector is a plain LoadConnector + PollConnector.
Replaces upstream's TextSection with this fork's Section.
Replaces OnyxFileExtensions.TEXT_AND_DOCUMENT_EXTENSIONS with an inline tuple matching this fork's extract_file_text dispatch.
extract_file_text argument order is (file_name, file, ...) here vs upstream's (file, file_name, ...).
Document.doc_updated_at is datetime | None here (upstream is str); ISO strings are parsed before assignment.

Lifecycle / perf adaptations to coexist with other connectors

The naive upstream Highspot connector spawns a fresh Chromium process per WebLink item. We observed this starving the worker's FDs / RAM and causing co-running connectors (specifically Slack's conversations.list) to fail with IncompleteRead mid-response. Fixed:

Shared browser per poll_source run — mirrors connectors/web/connector.py's pattern. One playwright.start() + chromium.launch() for the entire run, context.new_page() per WebLink, page.close() after each, full teardown in a try/finally at end-of-run (or on error).
Bounded scroll loop — WEB_CONNECTOR_MAX_SCROLL_ATTEMPTS = 10 (down from upstream's 20) and per-scroll wait_for_load_state(\"networkidle\", timeout=5000) (down from 60000). Caps single-WebLink worst case at ~110s vs the upstream ~20-minute stall on pages where networkidle never settles.
Smaller yield batch — _YIELD_BATCH_SIZE = 4 decoupled from INDEX_BATCH_SIZE so the indexer's docs_indexed counter ticks up more often. Per-item processing in this connector is slow enough (Playwright + extract_file_text) that yielding every 16 items can mean minutes between UI counter updates.

2. Drive-by fix: slack-bot config Create button

web/src/app/admin/bot/SlackBotConfigCreationForm.tsx — gate curated_response_config.response_message validation behind the enable_curated_response_integration toggle. Without the gate, the schema unconditionally required the field but the input is only rendered when the toggle is on (default off), so on a fresh /admin/bot/new validation silently failed, the Create button did nothing, and no error was visible because the errored field wasn't on screen. Mirrors the existing jira_config .when() pattern.

(Same fix as commit 4ed8bcbd on feature/multilanguage-support; applied here independently because feature/highspot was branched off feature/darwin before that PR landed.)

What was tested

Pre-commit / quality checks (per .pre-commit-config.yaml):

✅ black --check on every Python file touched
✅ reorder_python_imports --py311-plus
✅ ruff (clean)
✅ prettier --check on every TS/TSX touched
✅ tsc --noEmit (clean, exit 0)
✅ Backend module imports cleanly (from danswer.connectors.highspot.connector import HighspotConnector succeeds in the venv)

Manual verification path:

Standalone smoke test inside connector.py's if __name__ == \"__main__\": block:

cd backend
PYTHONPATH=$(pwd) HIGHSPOT_KEY=… HIGHSPOT_SECRET=… HIGHSPOT_SPOT_NAMES=\"My Spot\" \\
  python danswer/connectors/highspot/connector.py

End-to-end via admin UI (verified in dev):
1. Open /admin/connectors/highspot → enter key + secret in Step 1 → save credential.
2. Step 2: dropdown populates live with the actual Spots from the API.
3. Pick one or more Spots → Create.
4. cc-pair appears, indexing kicks off, docs_indexed ticks up every ~few items (the smaller yield-batch effect).
5. Search for content from one of the indexed Items returns it as a Highspot citation.

What's NOT in this PR (follow-ups, if needed)

Replace web/public/Highspot.png — currently a placeholder copy of HubSpot.png. Swap it for the real Highspot logo before merge.
Per-spot config concurrency — client.get_item(item_id) is called sequentially for every item even just for time-window rejection. Parallelizing this with a ThreadPoolExecutor(5-10) would 2-5× indexing throughput; deferred until we see real-world Spot sizes.
Time-budget-based scroll loop — WEB_CONNECTOR_MAX_SCROLL_ATTEMPTS=10 is a count cap; a wall-clock cap (e.g. 30s/page) would handle the long-tail pages better. Also deferred.

Process bounce required after deploy

Per CLAUDE.md's footgun list:

dapi (api-server) — new ORM imports + new admin route.
dbe (background indexer + Celery worker) — new connector class registered in factory.
dsl (slack listener) — adding a DocumentSource enum value triggers pydantic ValidationError: source_type on the slackbot otherwise.

Test plan

Replace web/public/Highspot.png with the real Highspot logo
Pull, bounce dapi + dbe + dsl
Open /admin/connectors/highspot, enter creds, verify the Spot multi-select populates from the live API
Try Create with zero Spots selected → expect inline "please select at least one Spot" error
Pick 1-2 Spots, create the connector, watch indexing — docs_indexed should tick up every ~few items
Search for content from an indexed Highspot Item — expect a citation linking back to the original
Smoke /admin/bot/new Create button — should now either submit successfully or display backend validation errors (no more silent "nothing happens")

🤖 Generated with Claude Code

Indexes Spots and the Items inside them via Highspot's REST API. Authenticates with HTTP Basic (key+secret) generated from the Highspot admin console; an optional base URL covers tenants on non-default Highspot regions. Per-item content extraction is tiered: 1. WebLink items -> headless-Chromium scrape via Playwright, reusing one shared browser/context for the whole poll_source run (mirrors connectors/web/connector.py — spawning Chromium per item starves worker FDs/RAM and was making co-running Slack indexing fail with IncompleteRead). 2. Items with a downloadable, supported extension (.pdf .docx .pptx .xlsx .eml .epub .html .txt) -> extract_file_text over the bytes from items/{id}/content. 3. Else / on any error -> title + description fallback. Notable adaptations vs upstream Onyx: - Drops the Slim/perm-sync interface; this fork has no SlimConnectorWithPermSync / SlimDocument / TextSection / OnyxFileExtensions / IndexingHeartbeatInterface in the upstream shape. - Uses Section instead of TextSection. - extract_file_text arg order is (file_name, file, ...) here; upstream is (file, file_name, ...). - Parses ISO date_updated to datetime before assignment because Document.doc_updated_at is typed datetime | None. - Scroll loop bounds: max_attempts=10 (down from 20), per-scroll networkidle timeout=5s (down from 60s) — caps single-WebLink worst case at ~110s, vs the upstream 20-min stall. - _YIELD_BATCH_SIZE=4 so the indexer's docs_indexed counter ticks more frequently; API pagination still uses INDEX_BATCH_SIZE. Frontend: - HighspotConfig + HighspotCredentialJson in lib/types.ts. - HighspotIcon (placeholder Highspot.png — replace with the real logo before merge). - Tile in lib/sources.ts (AppConnection category). - Admin page at /admin/connectors/highspot mirrors the sf-account/page.tsx template; Spot selection is a live multi-select dropdown driven by GET /manage/admin/connector/ highspot/spots/{credential_id} that calls the Highspot API using the saved credential and renders the actual Spot list. Selecting >=1 Spot is mandatory. Process bounce after merge + deploy: dapi + dbe + dsl (DocumentSource enum addition footgun per CLAUDE.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Yup schema unconditionally required curated_response_config. response_message, but the matching text input is only rendered when enable_curated_response_integration is true. Default is false, so on a fresh /admin/bot/new the field was empty, validation failed silently, the Create button did nothing, and no error rendered because the errored field wasn't on screen. Mirror the jira_config pattern: only require when the toggle is enabled. Same fix as commit 4ed8bcb on feature/multilanguage-support; applied here independently because feature/highspot was branched off feature/darwin before that PR landed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI's pre-commit prettier (v3.1.0) defaults to trailingComma:"all" and flags the missing comma after the last generic param. Local npm prettier 2.8.8 defaults to "es5" and didn't catch it. Adding the comma to satisfy the canonical CI hook. Pre-existing issue surfaced by this PR's diff scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rajivml and others added 3 commits May 6, 2026 15:06

Sarath1018 approved these changes May 6, 2026

View reviewed changes

swati354 approved these changes May 6, 2026

View reviewed changes

rajivml merged commit c7108dc into feature/darwin May 6, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Highspot connector + fix slack-bot create button silent submit#41

Add Highspot connector + fix slack-bot create button silent submit#41
rajivml merged 3 commits intofeature/darwinfrom
feature/highspot

rajivml commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rajivml commented May 6, 2026

Summary

What was implemented

1. Highspot connector

Notable adaptations vs upstream Onyx

Lifecycle / perf adaptations to coexist with other connectors

2. Drive-by fix: slack-bot config Create button

What was tested

What's NOT in this PR (follow-ups, if needed)

Process bounce required after deploy

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants