feat(capture): support OpenRouter as alternative vision provider for image captioning by CodAr-man · Pull Request #840 · heygen-com/hyperframes

CodAr-man · 2026-05-14T16:54:29Z

What this does

Adds OpenRouter as an alternative AI provider for image captioning in hyperframes capture,
alongside the existing Gemini integration.

How it works

Provider is selected at runtime based on which API key is present:

Environment Variable	Provider	Model
`OPENROUTER_API_KEY`	OpenRouter	`google/gemma-4-26b-a4b-it`
`GEMINI_API_KEY` / `GOOGLE_API_KEY`	Google Gemini	(existing behavior)
Neither	No AI captioning	(existing DOM-only behavior)

OpenRouter is prioritized if both keys are present.

Why

Not everyone has access to a Gemini API key. OpenRouter provides access to
vision-capable models (including Gemma) with a single unified API, making
this feature accessible to more users.

Backwards compatibility

✅ Purely additive — the Gemini code path is completely untouched.
✅ Users with no API keys see identical behavior to before.
✅ Verified working end-to-end on a real site capture.

Adds OpenRouter alongside Gemini. Provider selected by API key at runtime: - OPENROUTER_API_KEY set: uses google/gemma-4-26b-a4b-it via OpenRouter - GEMINI_API_KEY set: existing Gemini behavior (unchanged) - Neither: existing DOM-only behavior (unchanged) Purely additive - Gemini code path untouched.

jrusso1020 · 2026-05-14T17:42:33Z

Heads up — GitHub is rendering packages/cli/src/capture/contentExtractor.ts as "Binary file not shown", which is why the diff appears empty (additions: 0, deletions: 0) even though the file is 15.8 KB.

Cause: the file was saved with a UTF-8 BOM and CRLF line endings, while the version on main is UTF-8 (no BOM) with LF. Every line then differs at the byte level even though the substantive change is small, and GitHub's diff renderer falls back to "binary". I diffed locally with -wB and the actual code change is ~30 lines and matches the description (adds the OpenRouter branch; Gemini path untouched) — so this is fixable PR hygiene, not a content problem.

To unblock review, could you:

Re-save the file as UTF-8 without BOM and LF line endings. In VS Code: click CRLF in the bottom status bar → LF; click UTF-8 with BOM → "Save with Encoding" → UTF-8.
git commit --amend && git push --force-with-lease on this branch.

Once that lands the diff will render normally and reviewers can take a real look.

One small thing for the substantive change: the rewrite dropped some of the inline comments around Gemini batching (free-tier 5 RPM vs paid-tier 2000 RPM rationale, the model-override env var note, the per-model benchmark numbers). Those are useful context for future maintainers — worth preserving in the OpenRouter version too.

Separately, I'm opening a .gitattributes PR (* text=auto eol=lf) so this class of issue can't reach a diff again — that's on the repo side, not anything you need to do.

Thanks for the contribution!

Add `* text=auto eol=lf` so text files are checked in with LF regardless of the contributor's OS. Without this, Windows editors can save files with CRLF (and sometimes a UTF-8 BOM), which makes every line differ at the byte level on diff and trips GitHub's "Binary file not shown" heuristic — see heygen-com#840 for an example where a ~30-line change was unreviewable for this reason. Existing LFS rules already carry `-text` and remain unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jrusso1020 · 2026-06-16T01:14:56Z

Thanks for this @CodAr-man — the idea is good and still relevant (no alternative vision provider has landed since), so I've reimplemented it cleanly in #1478 and am closing this in favor of that.

Why a fresh PR rather than merging this one:

The file was saved with a UTF-8 BOM + CRLF, so GitHub renders it as a binary diff (+0/-0, and the PR shows CONFLICTING) — it can't be reviewed or merged as-is.
It only patches the image captioning loop. main has since added a second vision call site (rasterized-SVG captioning), and since the Gemini client is created only in the Gemini branch, an OpenRouter-only user would hit undefined.models and all SVG captioning would throw.
A few smaller things: let ai: any, reusing HYPERFRAMES_GEMINI_MODEL for the OpenRouter model, no res.ok check, and a default model id (google/gemma-4-26b-a4b-it) that isn't a real OpenRouter model.

#1478 keeps your design (provider chosen by which key is present, OpenRouter prioritized, Gemini path untouched) and adds: a single captionOne dispatcher used by both the image and SVG loops, a separate HYPERFRAMES_OPENROUTER_MODEL override, a verified default (google/gemini-3.1-flash-lite), res.ok error handling, unit tests, and docs. Credited you in the PR. 🙏

…apture captioning (#1478) * feat(cli): support OpenRouter as an alternative vision provider for capture captioning `hyperframes capture` could only enrich asset descriptions with Gemini vision, which requires a Google API key. Add OpenRouter as an alternative so users without Google access can caption via any vision-capable model through one unified key. Provider is selected by which key is present: OPENROUTER_API_KEY → OpenRouter (OpenAI-style /chat/completions with an image_url data URI), else GEMINI_API_KEY/GOOGLE_API_KEY → Gemini (unchanged), else DOM-only as before. OpenRouter wins if both are set. Default model is google/gemini-3.1-flash-lite (the OpenRouter analog of the Gemini path's existing 3.1-flash-lite tier), overridable via HYPERFRAMES_OPENROUTER_MODEL. Both vision call sites — the image loop and the rasterized-SVG loop — route through a single `captionOne` dispatcher, so the new provider works for SVGs too (the original PR #840 only patched the image loop, which would have left OpenRouter-only users with crashing SVG captioning). The OpenRouter path checks res.ok and surfaces the status/body on failure. Reimplements #840 (which was unmergeable: saved with a UTF-8 BOM + CRLF so GitHub rendered it as a binary diff, used `any`, reused the Gemini model env var, and had a hallucinated default model id). - Adds unit tests for the OpenRouter path (happy path, graceful degradation on non-OK status, no-key skip). - Documents OPENROUTER_API_KEY in the website-to-video guide and the CLI capture reference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(cli): fix typecheck in OpenRouter caption test — capture request without `as` The test cast `fetchMock.mock.calls[0]` to a tuple (TS2352: `[] | undefined` doesn't overlap `[string, RequestInit]`), which failed the Typecheck CI job. Capture the url/init inside the typed mock and assert via `new Headers()` + `typeof` narrowing instead — no `as` assertions (which the repo bans anyway). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jrusso1020 mentioned this pull request May 14, 2026

chore: normalize line endings to LF via .gitattributes #841

Merged

2 tasks

jrusso1020 mentioned this pull request Jun 16, 2026

feat(cli): support OpenRouter as an alternative vision provider for capture captioning #1478

Merged

4 tasks

jrusso1020 closed this Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(capture): support OpenRouter as alternative vision provider for image captioning#840

feat(capture): support OpenRouter as alternative vision provider for image captioning#840
CodAr-man wants to merge 1 commit into
heygen-com:mainfrom
CodAr-man:feat/openrouter-vision-provider

CodAr-man commented May 14, 2026

Uh oh!

jrusso1020 commented May 14, 2026

Uh oh!

jrusso1020 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CodAr-man commented May 14, 2026

What this does

How it works

Why

Backwards compatibility

Uh oh!

jrusso1020 commented May 14, 2026

Uh oh!

jrusso1020 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants