Skip to content

feat: external-source data replication plugin (closes #72)#273

Open
mikhaeelatefrizk wants to merge 4 commits into
outerbase:mainfrom
mikhaeelatefrizk:feat/replication-plugin
Open

feat: external-source data replication plugin (closes #72)#273
mikhaeelatefrizk wants to merge 4 commits into
outerbase:mainfrom
mikhaeelatefrizk:feat/replication-plugin

Conversation

@mikhaeelatefrizk

Copy link
Copy Markdown

Purpose

Closes #72.

Adds a replication plugin that pulls new/changed rows from an external source (Postgres or MySQL — e.g. a Supabase database) into StarbaseDB's internal SQLite on a schedule, giving you a locally‑queryable edge replica. It covers all four asks in the issue:

  • Interval‑based pull — each job runs on a user‑supplied cron schedule.
  • Selective replication — you choose the table per job (plus an optional column subset / target table).
  • Per‑table change tracking — a "last value" cursor is stored per job for append‑only polling.
  • User‑defined tracking columntracking_col + tracking_type (timestamp or id) drive change detection.

How it works

A Durable Object has a single alarm, which src/do.ts hardcodes to the cron plugin. Rather than competing for that alarm, the replication plugin reuses cron: it registers a cron task per job (addEvent) and runs its sync when that task fires (onEvent). No changes to do.ts, no alarm collision.

On each tick, per active job, it pulls
SELECT … WHERE tracking_col > ? ORDER BY tracking_col ASC LIMIT batch
from the source (the predicate is omitted on the first run), upserts the page with INSERT OR REPLACE into internal SQLite, advances the cursor to the max tracking value, and persists it — paginating until a short page or a per‑run page cap.

Notable details

  • Uses ? placeholders (the @outerbase/sdk rewrites these per dialect); table/column identifiers are validated against ^[A-Za-z0-9_]+$ and always quoted (injection‑safe, reserved‑word‑safe).
  • The external pull bypasses internal RLS / allowlist / cache (hardened config + isRaw) so the query reaches the source unmodified.
  • Type‑aware cursor comparison (numeric for id, temporal for timestamp); typed schema inference; chunked upserts within SQLite's bound‑variable limit.
  • Per‑job error isolation with last_run_at / last_error / rows_synced observability; the source password is redacted on list responses and in stored error messages; all routes are admin‑only.

Included alongside the plugin

  • plugins/cron — a small, backward‑compatible removeEvent(name, dataSource?) (and an optional dataSource arg on addEvent) so schedulers can clean up their tasks and not depend on middleware ordering.
  • src/operation.tsexecuteSDKQuery now closes the external driver connection in a finally (mirroring the Hyperdrive branch). Previously it leaked a connection per external query, which a forever‑polling replicator would exhaust.

Tasks

  • plugins/replication/index.ts — the plugin
  • plugins/replication/index.test.ts — 50 unit tests
  • plugins/replication/README.md + meta.json
  • Register in src/index.ts
  • CronPlugin.removeEvent + optional dataSource
  • executeSDKQuery connection‑leak fix
  • New suite green, prettier clean, no new type errors in touched files

Verify

pnpm install
pnpm vitest run plugins/replication      # 50 passing

End‑to‑end (optional) — point a job at a Supabase table:

pnpm dev
curl -X POST http://localhost:8787/replication/jobs \
  -H "Authorization: Bearer $ADMIN_TOKEN" -H "Content-Type: application/json" \
  -d '{"name":"users_sync","table_name":"users","tracking_col":"updated_at",
       "tracking_type":"timestamp","cron_tab":"*/1 * * * *","primary_key":["id"],
       "source":{"dialect":"postgresql","host":"…","port":5432,"user":"postgres",
                 "password":"…","database":"postgres"}}'
# insert rows in the source → within a minute they appear in StarbaseDB

See plugins/replication/README.md for the full API (create / list / run / reset / pause / delete).

Before

No mechanism to replicate an external table into the internal database; reads against external data always hit the remote source.

After

A configurable, per‑table pull replicator with cursor‑based change tracking, exposed under /replication/*.

@mikhaeelatefrizk

Copy link
Copy Markdown
Author

/claim #72

@mikhaeelatefrizk

Copy link
Copy Markdown
Author

C:/Program Files/Git/claim #72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replicate data from external source to internal source with a Plugin

1 participant