feat(ai): add multi-provider AI support with enhanced extraction by kelvinkipruto · Pull Request #547 · CodeForAfrica/PromiseTracker

kelvinkipruto · 2026-02-24T06:14:44Z

Description

Introduce provider catalogue and registry to support multiple AI providers (Google, OpenAI, Anthropic, etc.)
Add provider credentials management in settings with environment variable fallbacks
Implement two-pass extraction with chunking and tool-based normalisation
Add force re-extract capability for existing documents
Update dependencies to the latest AI SDK versions and Zod 4.3.6
Deprecate legacy Google-only model configuration in favour of the new provider system

Fixes #538

Type of change

New feature (non-breaking change which adds functionality)

Screenshots

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation

- Introduce provider catalog and registry to support multiple AI providers (Google, OpenAI, Anthropic, etc.) - Add provider credentials management in settings with environment variable fallbacks - Implement two-pass extraction with chunking and tool-based normalization - Add force re-extract capability for existing documents - Update dependencies to latest AI SDK versions and Zod 4.3.6 - Deprecate legacy Google-only model configuration in favor of new provider system

vercel · 2026-02-24T06:14:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
PromiseTracker	Ready	Preview, Comment	Feb 24, 2026 7:35am

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb9861ef8d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-24T06:18:51Z

src/tasks/extractPromises.ts

+            for (const extraction of existingExtractions) {
+              await payload.delete({
+                collection: "ai-extractions",
+                id: extraction.id,
+              });


Delay deleting existing extractions until re-extract succeeds

When forceReextract is enabled, this loop deletes all prior ai-extractions before the new AI run is validated; if any later step fails (API outage, parsing error, empty text, etc.), the catch block marks the document failed but the previous extraction data is already gone. This introduces avoidable data loss and can leave documents without any extraction even though they had valid historical results.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-24T06:18:51Z

src/tasks/extractPromises.ts

+      if (
+        APICallError.isInstance(error) ||
+        NoObjectGeneratedError.isInstance(error)
+      ) {
+        continue;


Fail document when all chunk extractions error out

These errors are swallowed per chunk and execution continues, but there is no later check that at least one chunk actually produced candidates. If every chunk hits APICallError/NoObjectGeneratedError (for example due transient provider issues or unsupported structured output), the task can still proceed and eventually mark the document as "Analysed by AI (No promises found)" with fullyProcessed: true, which records a false negative instead of a recoverable failure.

Useful? React with 👍 / 👎.

kelvinkipruto · 2026-02-24T07:34:04Z

@claude review

claude · 2026-02-24T07:34:18Z

Claude finished @kelvinkipruto's task in 2m 19s —— View job

Code Review

Gathering context and reading changed files
Reviewing provider catalog and registry
Reviewing extraction logic changes
Reviewing settings changes
Compiling review feedback

Overall this is a well-structured feature. The provider catalog/registry split is clean, backward compatibility with legacy config is handled gracefully, and the two-pass extraction with chunking is a meaningful improvement. Below are issues ranging from critical to minor.

P1 — Bugs / Data Loss

1. Data deleted before new extraction succeeds (src/tasks/extractPromises.ts:522-528)

When forceReextract is true, all existing extractions are deleted before the AI run begins. If any subsequent step fails (empty text, API outage, parsing error), the catch block marks the document failed — but the previous extraction data is permanently gone.

// Existing extractions deleted here...
for (const extraction of existingExtractions) {
  await payload.delete({ collection: "ai-extractions", id: extraction.id });
}

// ...but extraction can still fail after this point
const extractedCandidates = await extractPromisesFromChunks({...});

Fix: Defer deletion until after the new ai-extractions record is successfully created, or use a transaction / rollback pattern. Fix this →

2. All-chunk failures produce false negatives (src/tasks/extractPromises.ts:286-294)

APICallError and NoObjectGeneratedError are caught per-chunk and swallowed with continue. If every chunk fails (e.g. due to a provider outage or unsupported structured output), candidates will be empty, and the code proceeds to mark the document as "Analysed by AI (No promises found)" with fullyProcessed: true. This records a permanent false negative instead of a recoverable failure.

} catch (error) {
  // ...logged as warn...
  if (APICallError.isInstance(error) || NoObjectGeneratedError.isInstance(error)) {
    continue; // ← silently drops chunk failure
  }
  throw error;
}

Fix: After the chunk loop, check if candidates.length === 0 when chunks were non-empty — if so, throw an error rather than continuing with empty candidates. Fix this →

P2 — Security / Design Concerns

3. API keys stored as plaintext in database (src/globals/Settings.ts:183-190)

The apiKey field in providerCredentials is a standard type: "text" field stored in MongoDB without encryption. This means provider API keys are readable by anyone with database access, CMS admin access, or through Payload's REST/GraphQL APIs.

Consider using a password/secret field type, environment-variable-only configuration for production, or at minimum documenting this limitation prominently.

4. Speculative model IDs in catalog (src/lib/ai/providerCatalog.ts:64-68)

The catalog includes model IDs that don't exist at the time of this PR (as of early 2026):

gpt-5, gpt-5-mini, gpt-5-nano — OpenAI has not released these
grok-4 — not available in xAI's API
claude-opus-4-1 — not a current Anthropic release

Including non-existent model IDs will fail silently at configuration time (they pass isProviderModelId validation since they match the format) and only fail at runtime with a cryptic provider error. Consider only including models that are currently GA, or add a comment explaining which entries are forward-looking.

5. as never cast bypasses type safety (src/lib/ai/providerRegistry.ts:297)

return createProviderRegistry(providers as never);

This completely disables TypeScript's type checking for the registry creation and hides any type incompatibilities between the provider objects and what createProviderRegistry expects. If the AI SDK's types change across a version bump, this will compile but break at runtime. The comment explaining why this cast is needed would help future maintainers.

P3 — Minor / UX Issues

6. All 14 providers instantiated per extraction (src/lib/ai/providerRegistry.ts:184-295)

buildLanguageModelRegistry eagerly instantiates every provider SDK client on every call, regardless of which provider is actually selected. For a background task this is minor overhead, but it also means credential resolution runs for all providers. Consider lazy instantiation or only building the provider that's actually needed.

7. Silent candidate truncation (src/tasks/extractPromises.ts:582-585)

When more than 220 candidates are found, the list is silently truncated (first 220 kept, rest dropped). There's a warn log, but no indication to the user or in the extraction result that promises may be missing. For documents with many distinct promises, this silently produces an incomplete extraction.

8. baseURL field hidden for all non-Ollama providers (src/globals/Settings.ts:202-206)

admin: {
  condition: (_, siblingData) => siblingData?.provider === "ollama",
  ...
},

OpenAI-compatible providers and Azure OpenAI deployments commonly need custom base URLs. The baseURL is stored in the schema and processed in the registry, but there's no way to configure it through the UI for providers other than Ollama. This is a hidden limitation — worth at least a comment or expanding the condition.

9. Ollama URL normalization is fragile (src/lib/ai/providerRegistry.ts:165-173)

const sanitized = baseURL.replace(/\/+$/, "");
return sanitized.endsWith("/v1") ? sanitized : `${sanitized}/v1`;

This will incorrectly handle URLs like http://localhost:11434/api (produces http://localhost:11434/api/v1) or http://localhost:11434/v1/chat (already contains /v1 but at a non-terminal position). Using URL parsing would be more robust.

10. stepCountIs(MAX_TOOL_STEPS) stops normalization silently (src/tasks/extractPromises.ts:352)

If the model hits 40 steps without calling finalizeExtraction, normalization stops and whatever promises have been recorded are used — which could be a partial list. There's no detection that finalizeExtraction was never called vs. the model completing normally. Worth checking hasToolCall in the result to detect this.

Nits

src/lib/ai/providerCatalog.ts:35 — MODEL_ID_PATTERN uses /^[a-z0-9-]+:\S+$/i but isProviderModelId also validates the provider against the catalog. The regex alone could match unknown-provider:some-model, which is harmless since the catalog check catches it, but could be tightened.
src/globals/Settings.ts:52 — The Settings export is a good opportunity to add JSDoc noting the legacy fields are intentionally hidden (not just deprecated in label).
No tests are included or referenced in the PR. The __extractPromisesTestUtils export suggests test infrastructure exists — shipping tests for the chunking, merging, and normalization logic would be valuable given the complexity of the two-pass approach.

Summary: The architecture is solid and the backward compatibility path is well thought out. The two P1 issues (data loss during force re-extract and false-negative on all-chunk-failure) should be addressed before merge. The security concern around plaintext API key storage and the speculative model IDs in the catalog are also worth resolving or at least explicitly acknowledging.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b1afc646b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-24T07:39:46Z

src/tasks/extractPromises.ts

+      finalizeExtraction,
+    },
+    toolChoice: "required",
+    stopWhen: [hasToolCall("finalizeExtraction"), stepCountIs(MAX_TOOL_STEPS)],


Guard against silent truncation when tool step cap is hit

The normalization pass can silently return incomplete results because stepCountIs(MAX_TOOL_STEPS) stops generation after 40 steps even if finalizeExtraction was never called. On long documents (or when the model emits one recordPromise call per step), this returns a partial normalizedPromises list and the task still proceeds to mark the document as fully processed, which can undercount promises without surfacing a failure.

Useful? React with 👍 / 👎.

vercel bot deployed to Preview February 24, 2026 06:17 View deployment

chatgpt-codex-connector bot reviewed Feb 24, 2026

View reviewed changes

Merge branch 'main' into ft/mutliprovider-ai-llms

4b1afc6

kelvinkipruto marked this pull request as draft February 24, 2026 07:32

kelvinkipruto self-assigned this Feb 24, 2026

kelvinkipruto marked this pull request as ready for review February 24, 2026 07:32

vercel bot deployed to Preview February 24, 2026 07:35 View deployment

chatgpt-codex-connector bot reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): add multi-provider AI support with enhanced extraction#547

feat(ai): add multi-provider AI support with enhanced extraction#547
kelvinkipruto wants to merge 2 commits intomainfrom
ft/mutliprovider-ai-llms

kelvinkipruto commented Feb 24, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 24, 2026

Uh oh!

chatgpt-codex-connector bot Feb 24, 2026

Uh oh!

kelvinkipruto commented Feb 24, 2026

Uh oh!

claude bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kelvinkipruto commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Screenshots

Checklist:

Uh oh!

vercel bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

kelvinkipruto commented Feb 24, 2026

Uh oh!

claude bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

P1 — Bugs / Data Loss

P2 — Security / Design Concerns

P3 — Minor / UX Issues

Nits

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kelvinkipruto commented Feb 24, 2026 •

edited

Loading

vercel bot commented Feb 24, 2026 •

edited

Loading

claude bot commented Feb 24, 2026 •

edited

Loading