Skip to content

feat(capture): support OpenRouter as alternative vision provider for image captioning#840

Closed
CodAr-man wants to merge 1 commit into
heygen-com:mainfrom
CodAr-man:feat/openrouter-vision-provider
Closed

feat(capture): support OpenRouter as alternative vision provider for image captioning#840
CodAr-man wants to merge 1 commit into
heygen-com:mainfrom
CodAr-man:feat/openrouter-vision-provider

Conversation

@CodAr-man

Copy link
Copy Markdown

What this does

Adds OpenRouter as an alternative AI provider for image captioning in hyperframes capture,
alongside the existing Gemini integration.

How it works

Provider is selected at runtime based on which API key is present:

Environment Variable Provider Model
OPENROUTER_API_KEY OpenRouter google/gemma-4-26b-a4b-it
GEMINI_API_KEY / GOOGLE_API_KEY Google Gemini (existing behavior)
Neither No AI captioning (existing DOM-only behavior)

OpenRouter is prioritized if both keys are present.

Why

Not everyone has access to a Gemini API key. OpenRouter provides access to
vision-capable models (including Gemma) with a single unified API, making
this feature accessible to more users.

Backwards compatibility

✅ Purely additive — the Gemini code path is completely untouched.
✅ Users with no API keys see identical behavior to before.
✅ Verified working end-to-end on a real site capture.

Adds OpenRouter alongside Gemini. Provider selected by API key at runtime:
- OPENROUTER_API_KEY set: uses google/gemma-4-26b-a4b-it via OpenRouter
- GEMINI_API_KEY set: existing Gemini behavior (unchanged)
- Neither: existing DOM-only behavior (unchanged)

Purely additive - Gemini code path untouched.
@jrusso1020

Copy link
Copy Markdown
Collaborator

Heads up — GitHub is rendering packages/cli/src/capture/contentExtractor.ts as "Binary file not shown", which is why the diff appears empty (additions: 0, deletions: 0) even though the file is 15.8 KB.

Cause: the file was saved with a UTF-8 BOM and CRLF line endings, while the version on main is UTF-8 (no BOM) with LF. Every line then differs at the byte level even though the substantive change is small, and GitHub's diff renderer falls back to "binary". I diffed locally with -wB and the actual code change is ~30 lines and matches the description (adds the OpenRouter branch; Gemini path untouched) — so this is fixable PR hygiene, not a content problem.

To unblock review, could you:

  1. Re-save the file as UTF-8 without BOM and LF line endings. In VS Code: click CRLF in the bottom status bar → LF; click UTF-8 with BOM → "Save with Encoding" → UTF-8.
  2. git commit --amend && git push --force-with-lease on this branch.

Once that lands the diff will render normally and reviewers can take a real look.

One small thing for the substantive change: the rewrite dropped some of the inline comments around Gemini batching (free-tier 5 RPM vs paid-tier 2000 RPM rationale, the model-override env var note, the per-model benchmark numbers). Those are useful context for future maintainers — worth preserving in the OpenRouter version too.

Separately, I'm opening a .gitattributes PR (* text=auto eol=lf) so this class of issue can't reach a diff again — that's on the repo side, not anything you need to do.

Thanks for the contribution!

lovicho pushed a commit to lovicho/hyperframes that referenced this pull request May 14, 2026
Add `* text=auto eol=lf` so text files are checked in with LF regardless
of the contributor's OS. Without this, Windows editors can save files
with CRLF (and sometimes a UTF-8 BOM), which makes every line differ at
the byte level on diff and trips GitHub's "Binary file not shown"
heuristic — see heygen-com#840 for an example where a ~30-line change was
unreviewable for this reason.

Existing LFS rules already carry `-text` and remain unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jrusso1020

Copy link
Copy Markdown
Collaborator

Thanks for this @CodAr-man — the idea is good and still relevant (no alternative vision provider has landed since), so I've reimplemented it cleanly in #1478 and am closing this in favor of that.

Why a fresh PR rather than merging this one:

  • The file was saved with a UTF-8 BOM + CRLF, so GitHub renders it as a binary diff (+0/-0, and the PR shows CONFLICTING) — it can't be reviewed or merged as-is.
  • It only patches the image captioning loop. main has since added a second vision call site (rasterized-SVG captioning), and since the Gemini client is created only in the Gemini branch, an OpenRouter-only user would hit undefined.models and all SVG captioning would throw.
  • A few smaller things: let ai: any, reusing HYPERFRAMES_GEMINI_MODEL for the OpenRouter model, no res.ok check, and a default model id (google/gemma-4-26b-a4b-it) that isn't a real OpenRouter model.

#1478 keeps your design (provider chosen by which key is present, OpenRouter prioritized, Gemini path untouched) and adds: a single captionOne dispatcher used by both the image and SVG loops, a separate HYPERFRAMES_OPENROUTER_MODEL override, a verified default (google/gemini-3.1-flash-lite), res.ok error handling, unit tests, and docs. Credited you in the PR. 🙏

@jrusso1020 jrusso1020 closed this Jun 16, 2026
jrusso1020 added a commit that referenced this pull request Jun 16, 2026
…apture captioning (#1478)

* feat(cli): support OpenRouter as an alternative vision provider for capture captioning

`hyperframes capture` could only enrich asset descriptions with Gemini vision,
which requires a Google API key. Add OpenRouter as an alternative so users
without Google access can caption via any vision-capable model through one
unified key.

Provider is selected by which key is present: OPENROUTER_API_KEY → OpenRouter
(OpenAI-style /chat/completions with an image_url data URI), else
GEMINI_API_KEY/GOOGLE_API_KEY → Gemini (unchanged), else DOM-only as before.
OpenRouter wins if both are set. Default model is google/gemini-3.1-flash-lite
(the OpenRouter analog of the Gemini path's existing 3.1-flash-lite tier),
overridable via HYPERFRAMES_OPENROUTER_MODEL.

Both vision call sites — the image loop and the rasterized-SVG loop — route
through a single `captionOne` dispatcher, so the new provider works for SVGs too
(the original PR #840 only patched the image loop, which would have left
OpenRouter-only users with crashing SVG captioning). The OpenRouter path checks
res.ok and surfaces the status/body on failure.

Reimplements #840 (which was unmergeable: saved with a UTF-8 BOM + CRLF so
GitHub rendered it as a binary diff, used `any`, reused the Gemini model env
var, and had a hallucinated default model id).

- Adds unit tests for the OpenRouter path (happy path, graceful degradation on
  non-OK status, no-key skip).
- Documents OPENROUTER_API_KEY in the website-to-video guide and the CLI capture
  reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(cli): fix typecheck in OpenRouter caption test — capture request without `as`

The test cast `fetchMock.mock.calls[0]` to a tuple (TS2352: `[] | undefined`
doesn't overlap `[string, RequestInit]`), which failed the Typecheck CI job.
Capture the url/init inside the typed mock and assert via `new Headers()` +
`typeof` narrowing instead — no `as` assertions (which the repo bans anyway).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants