feat(capture): support OpenRouter as alternative vision provider for image captioning#840
feat(capture): support OpenRouter as alternative vision provider for image captioning#840CodAr-man wants to merge 1 commit into
Conversation
Adds OpenRouter alongside Gemini. Provider selected by API key at runtime: - OPENROUTER_API_KEY set: uses google/gemma-4-26b-a4b-it via OpenRouter - GEMINI_API_KEY set: existing Gemini behavior (unchanged) - Neither: existing DOM-only behavior (unchanged) Purely additive - Gemini code path untouched.
|
Heads up — GitHub is rendering Cause: the file was saved with a UTF-8 BOM and CRLF line endings, while the version on To unblock review, could you:
Once that lands the diff will render normally and reviewers can take a real look. One small thing for the substantive change: the rewrite dropped some of the inline comments around Gemini batching (free-tier 5 RPM vs paid-tier 2000 RPM rationale, the model-override env var note, the per-model benchmark numbers). Those are useful context for future maintainers — worth preserving in the OpenRouter version too. Separately, I'm opening a Thanks for the contribution! |
Add `* text=auto eol=lf` so text files are checked in with LF regardless of the contributor's OS. Without this, Windows editors can save files with CRLF (and sometimes a UTF-8 BOM), which makes every line differ at the byte level on diff and trips GitHub's "Binary file not shown" heuristic — see heygen-com#840 for an example where a ~30-line change was unreviewable for this reason. Existing LFS rules already carry `-text` and remain unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for this @CodAr-man — the idea is good and still relevant (no alternative vision provider has landed since), so I've reimplemented it cleanly in #1478 and am closing this in favor of that. Why a fresh PR rather than merging this one:
#1478 keeps your design (provider chosen by which key is present, OpenRouter prioritized, Gemini path untouched) and adds: a single |
…apture captioning (#1478) * feat(cli): support OpenRouter as an alternative vision provider for capture captioning `hyperframes capture` could only enrich asset descriptions with Gemini vision, which requires a Google API key. Add OpenRouter as an alternative so users without Google access can caption via any vision-capable model through one unified key. Provider is selected by which key is present: OPENROUTER_API_KEY → OpenRouter (OpenAI-style /chat/completions with an image_url data URI), else GEMINI_API_KEY/GOOGLE_API_KEY → Gemini (unchanged), else DOM-only as before. OpenRouter wins if both are set. Default model is google/gemini-3.1-flash-lite (the OpenRouter analog of the Gemini path's existing 3.1-flash-lite tier), overridable via HYPERFRAMES_OPENROUTER_MODEL. Both vision call sites — the image loop and the rasterized-SVG loop — route through a single `captionOne` dispatcher, so the new provider works for SVGs too (the original PR #840 only patched the image loop, which would have left OpenRouter-only users with crashing SVG captioning). The OpenRouter path checks res.ok and surfaces the status/body on failure. Reimplements #840 (which was unmergeable: saved with a UTF-8 BOM + CRLF so GitHub rendered it as a binary diff, used `any`, reused the Gemini model env var, and had a hallucinated default model id). - Adds unit tests for the OpenRouter path (happy path, graceful degradation on non-OK status, no-key skip). - Documents OPENROUTER_API_KEY in the website-to-video guide and the CLI capture reference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(cli): fix typecheck in OpenRouter caption test — capture request without `as` The test cast `fetchMock.mock.calls[0]` to a tuple (TS2352: `[] | undefined` doesn't overlap `[string, RequestInit]`), which failed the Typecheck CI job. Capture the url/init inside the typed mock and assert via `new Headers()` + `typeof` narrowing instead — no `as` assertions (which the repo bans anyway). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What this does
Adds OpenRouter as an alternative AI provider for image captioning in
hyperframes capture,alongside the existing Gemini integration.
How it works
Provider is selected at runtime based on which API key is present:
OPENROUTER_API_KEYgoogle/gemma-4-26b-a4b-itGEMINI_API_KEY/GOOGLE_API_KEYOpenRouter is prioritized if both keys are present.
Why
Not everyone has access to a Gemini API key. OpenRouter provides access to
vision-capable models (including Gemma) with a single unified API, making
this feature accessible to more users.
Backwards compatibility
✅ Purely additive — the Gemini code path is completely untouched.
✅ Users with no API keys see identical behavior to before.
✅ Verified working end-to-end on a real site capture.