Conversation
Add MongoDB as the 11th supported database driver, enabling document
database operations via the existing `Connector` interface.
- New driver at `packages/drivers/src/mongodb.ts` supporting:
- Full MQL command set: find, aggregate, CRUD, indexes, collection mgmt
- Cross-database queries via per-query `database` field
- BSON type serialization (ObjectId, Decimal128, Long, UUID, Binary, Date)
- Schema introspection via document sampling with field type inference
- Connection string URI and host/port/user/password auth
- `authorizedDatabases` fallback for restricted-privilege users
- Registration in driver index, config normalizer (with `mongo`/`mongodb` aliases), and connection registry
- CI: MongoDB 7.0 service in driver-e2e workflow with health checks
- 90 E2E tests including:
- CRUD operations, aggregation pipelines, schema introspection
- Cross-database operations, index management, collection lifecycle
- Adversarial tests: deeply nested docs, special characters, heterogeneous
collections, concurrent operations, large documents, numeric edge cases
Closes #480
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review.
Tip: disable this comment in your organization's Code Review settings.
|
Caution Review failedAn error occurred during the review process. Please try again later. 📝 WalkthroughWalkthroughAdds a MongoDB driver and CI/test integration: new driver implementation (connect, schema introspection, JSON MQL execution), registry and normalization updates, optional dependency entry, and a comprehensive Docker-capable end-to-end test plus CI changes to run it. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Driver as MongoDB Driver
participant MClient as MongoClient
participant Server as MongoDB Server
App->>Driver: connect(config)
activate Driver
Driver->>MClient: new MongoClient(uri, options)
activate MClient
MClient->>Server: TCP connect & auth
Server-->>MClient: Connected
MClient-->>Driver: client instance
Driver-->>App: Connector
deactivate Driver
rect rgba(200, 150, 255, 0.5)
Note over App,Driver: Execute JSON MQL
App->>Driver: execute(JSON query)
activate Driver
Driver->>Driver: parse & validate
Driver->>MClient: call op (find/aggregate/insert/...)
activate MClient
MClient->>Server: run operation
Server-->>MClient: cursor/data
MClient-->>Driver: result
Driver->>Driver: serialize BSON -> rows
Driver-->>App: ConnectorResult
deactivate Driver
end
rect rgba(150, 200, 255, 0.5)
Note over App,Driver: Introspection flow
App->>Driver: listSchemas() / listTables() / describeTable()
Driver->>MClient: admin/listDatabases / listCollections / find(limit=100)
MClient->>Server: fetch metadata / sample docs
Server-->>MClient: metadata / documents
MClient-->>Driver: data
Driver-->>App: schemas/tables/columns
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Fixes from 6-model consensus review (Claude, GPT 5.2 Codex, Gemini 3.1 Pro, Kimi K2.5, MiniMax M2.5, GLM-5): - Cap user-specified `limit` against `effectiveLimit` to prevent OOM from unbounded queries (flagged by Gemini) - Add `ping` command for connection testing instead of querying `system.version` which may not be accessible (flagged by Gemini, GLM-5) - Add `e.code === 26` fallback for `dropCollection` error handling across MongoDB versions (flagged by GLM-5) - Fix misleading docstring on `extractFields` that claimed dot-notation expansion (flagged by Gemini) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/drivers/src/normalize.ts (1)
76-85: Consider usingCOMMON_ALIASESspread for consistency.Other drivers (Postgres, Redshift, MySQL, SQL Server, Oracle) spread
COMMON_ALIASESto inherit the commonuseranddatabasemappings. The MongoDB aliases define these inline, which duplicates the same mappings. This works correctly but is inconsistent with the established pattern.♻️ Suggested refactor for consistency
const MONGODB_ALIASES: AliasMap = { - user: ["username"], - database: ["dbname", "db"], + ...COMMON_ALIASES, connection_string: ["connectionString", "uri", "url"], auth_source: ["authSource"], replica_set: ["replicaSet"], direct_connection: ["directConnection"], connect_timeout: ["connectTimeoutMS"], server_selection_timeout: ["serverSelectionTimeoutMS"], }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/normalize.ts` around lines 76 - 85, MONGODB_ALIASES currently redeclares the common mappings for user and database; update it to spread COMMON_ALIASES into MONGODB_ALIASES (e.g., const MONGODB_ALIASES: AliasMap = { ...COMMON_ALIASES, /* mongo-specific keys */ }) and then include only MongoDB-specific aliases (connection_string, auth_source, replica_set, direct_connection, connect_timeout, server_selection_timeout), removing the duplicate user and database entries to match the other drivers' pattern.packages/drivers/src/mongodb.ts (1)
328-362: Aggregate$limitdetection could miss nested stages.The check
lastStage && "$limit" in lastStageonly examines the last pipeline stage. If users have a$limitburied in the middle of the pipeline (intentionally wanting full results), the auto-appended$limitwill still truncate. This is documented behavior (you append if "pipeline doesn't already end with one"), so it's acceptable, but consider documenting that users should place their own$limitlast if they want to control truncation precisely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/mongodb.ts` around lines 328 - 362, The current aggregate branch only checks the last pipeline stage (`lastStage` / `hasLimit`) for a $limit and can wrongly append a new $limit when one exists earlier; update the detection to scan the entire parsed.pipeline (e.g., replace the `hasLimit` logic with a check like `pipeline.some(stage => "$limit" in stage)`) so you only push `{ $limit: effectiveLimit + 1 }` when no stage in `parsed.pipeline` contains `$limit`; ensure you still preserve existing behavior of cloning `parsed.pipeline` into `pipeline` and use `coll.aggregate(pipeline).toArray()` and keep the rest of the logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/drivers/package.json`:
- Around line 19-20: Update the mongodb dependency version in
packages/drivers/package.json from "mongodb": "^6.0.0" to "mongodb": "^7.0.0" so
the project can pull the latest stable 7.x releases; after changing the
dependency string, run your package manager to update the lockfile
(npm/yarn/pnpm install) and run the test suite/build to ensure no breaking API
changes affect code using the MongoClient or other mongodb-related symbols.
---
Nitpick comments:
In `@packages/drivers/src/mongodb.ts`:
- Around line 328-362: The current aggregate branch only checks the last
pipeline stage (`lastStage` / `hasLimit`) for a $limit and can wrongly append a
new $limit when one exists earlier; update the detection to scan the entire
parsed.pipeline (e.g., replace the `hasLimit` logic with a check like
`pipeline.some(stage => "$limit" in stage)`) so you only push `{ $limit:
effectiveLimit + 1 }` when no stage in `parsed.pipeline` contains `$limit`;
ensure you still preserve existing behavior of cloning `parsed.pipeline` into
`pipeline` and use `coll.aggregate(pipeline).toArray()` and keep the rest of the
logic unchanged.
In `@packages/drivers/src/normalize.ts`:
- Around line 76-85: MONGODB_ALIASES currently redeclares the common mappings
for user and database; update it to spread COMMON_ALIASES into MONGODB_ALIASES
(e.g., const MONGODB_ALIASES: AliasMap = { ...COMMON_ALIASES, /* mongo-specific
keys */ }) and then include only MongoDB-specific aliases (connection_string,
auth_source, replica_set, direct_connection, connect_timeout,
server_selection_timeout), removing the duplicate user and database entries to
match the other drivers' pattern.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e11fc510-0761-4a43-919e-0c5ae0a1c567
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
.github/workflows/ci.ymlpackages/drivers/package.jsonpackages/drivers/src/index.tspackages/drivers/src/mongodb.tspackages/drivers/src/normalize.tspackages/opencode/src/altimate/native/connections/registry.tspackages/opencode/test/altimate/drivers-mongodb-e2e.test.ts
| "duckdb": "^1.0.0", | ||
| "mongodb": "^6.0.0" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the latest stable version of the mongodb npm package?
💡 Result:
The latest stable version of the mongodb npm package is 7.1.1.
Citations:
- 1: https://www.npmjs.com/package/mongodb
- 2: https://www.npmjs.com/package/mongodb?activeTab=dependencies
- 3: https://www.npmjs.com/package/mongodb?activeTab=versions
Update mongodb package to latest stable version.
The mongodb ^6.0.0 constraint excludes the latest stable release (7.1.1). Update to ^7.0.0 to include security patches and bug fixes from the latest major version.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/drivers/package.json` around lines 19 - 20, Update the mongodb
dependency version in packages/drivers/package.json from "mongodb": "^6.0.0" to
"mongodb": "^7.0.0" so the project can pull the latest stable 7.x releases;
after changing the dependency string, run your package manager to update the
lockfile (npm/yarn/pnpm install) and run the test suite/build to ensure no
breaking API changes affect code using the MongoClient or other mongodb-related
symbols.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/drivers/src/mongodb.ts (1)
191-194: Redundant logic in serializeValue — both branches produce identical output.Lines 191-194 always call
JSON.stringify(val)regardless of which branch is taken. The conditional serves no purpose.♻️ Simplify to a single JSON.stringify call
// Arrays and plain objects — JSON-serialize for tabular display - if (Array.isArray(val) || typeof (val as any).toJSON !== "function") { - return JSON.stringify(val) - } return JSON.stringify(val)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/mongodb.ts` around lines 191 - 194, The conditional in serializeValue is redundant because both branches call JSON.stringify(val); simplify by removing the if/else and returning JSON.stringify(val) directly from the serializeValue function (locate the serializeValue function and replace the conditional block starting with "if (Array.isArray(val) || typeof (val as any).toJSON !== \"function\")" with a single return JSON.stringify(val)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/drivers/src/mongodb.ts`:
- Around line 382-395: The distinct branch returns raw BSON values from
coll.distinct without serialization; update the code in the "distinct" case to
pass each value through the existing serializeValue function before constructing
rows (e.g., when building limited.map((v) => [serializeValue(v)]) ), keeping the
truncated/limited logic and columns: [parsed.field] and row_count based on
limited.length unchanged so the response matches how find/aggregate return
serialized tabular values.
- Around line 336-370: The aggregate branch doesn't cap a user-specified $limit,
so a pipeline ending in { $limit: N } can return more than effectiveLimit and
risk OOM; update the pipeline before calling coll.aggregate by inspecting
parsed.pipeline's lastStage: if lastStage has "$limit", replace its value with
Math.min(userLimit, effectiveLimit + 1) (so truncated detection remains
correct), otherwise append { $limit: effectiveLimit + 1 } as currently done;
operate on the local pipeline array (the one created from parsed.pipeline) so
coll.aggregate(pipeline).toArray() uses the capped limit.
---
Nitpick comments:
In `@packages/drivers/src/mongodb.ts`:
- Around line 191-194: The conditional in serializeValue is redundant because
both branches call JSON.stringify(val); simplify by removing the if/else and
returning JSON.stringify(val) directly from the serializeValue function (locate
the serializeValue function and replace the conditional block starting with "if
(Array.isArray(val) || typeof (val as any).toJSON !== \"function\")" with a
single return JSON.stringify(val)).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b676f8db-554c-452d-935d-599b0d84a98e
📒 Files selected for processing (2)
packages/drivers/src/mongodb.tspackages/opencode/src/altimate/native/connections/registry.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opencode/src/altimate/native/connections/registry.ts
| case "distinct": { | ||
| if (!parsed.field) { | ||
| throw new Error("distinct requires a 'field' string") | ||
| } | ||
| const values = await coll.distinct(parsed.field, parsed.filter ?? {}) | ||
| const truncated = values.length > effectiveLimit | ||
| const limited = truncated ? values.slice(0, effectiveLimit) : values | ||
| return { | ||
| columns: [parsed.field], | ||
| rows: limited.map((v: unknown) => [v]), | ||
| row_count: limited.length, | ||
| truncated, | ||
| } | ||
| } |
There was a problem hiding this comment.
Distinct values are not serialized, unlike find and aggregate results.
Values returned by distinct may include BSON types (ObjectId, Decimal128, etc.), but they are returned as-is without passing through serializeValue. This is inconsistent with find and aggregate which serialize all values for tabular display.
🔧 Proposed fix to serialize distinct values
const values = await coll.distinct(parsed.field, parsed.filter ?? {})
const truncated = values.length > effectiveLimit
const limited = truncated ? values.slice(0, effectiveLimit) : values
return {
columns: [parsed.field],
- rows: limited.map((v: unknown) => [v]),
+ rows: limited.map((v: unknown) => [serializeValue(v)]),
row_count: limited.length,
truncated,
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case "distinct": { | |
| if (!parsed.field) { | |
| throw new Error("distinct requires a 'field' string") | |
| } | |
| const values = await coll.distinct(parsed.field, parsed.filter ?? {}) | |
| const truncated = values.length > effectiveLimit | |
| const limited = truncated ? values.slice(0, effectiveLimit) : values | |
| return { | |
| columns: [parsed.field], | |
| rows: limited.map((v: unknown) => [v]), | |
| row_count: limited.length, | |
| truncated, | |
| } | |
| } | |
| case "distinct": { | |
| if (!parsed.field) { | |
| throw new Error("distinct requires a 'field' string") | |
| } | |
| const values = await coll.distinct(parsed.field, parsed.filter ?? {}) | |
| const truncated = values.length > effectiveLimit | |
| const limited = truncated ? values.slice(0, effectiveLimit) : values | |
| return { | |
| columns: [parsed.field], | |
| rows: limited.map((v: unknown) => [serializeValue(v)]), | |
| row_count: limited.length, | |
| truncated, | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/drivers/src/mongodb.ts` around lines 382 - 395, The distinct branch
returns raw BSON values from coll.distinct without serialization; update the
code in the "distinct" case to pass each value through the existing
serializeValue function before constructing rows (e.g., when building
limited.map((v) => [serializeValue(v)]) ), keeping the truncated/limited logic
and columns: [parsed.field] and row_count based on limited.length unchanged so
the response matches how find/aggregate return serialized tabular values.
CodeRabbit review fixes: - Use `...COMMON_ALIASES` spread in `MONGODB_ALIASES` for consistency with other drivers (normalize.ts) - Scan entire aggregate pipeline for `$limit` instead of only last stage; also skip auto-limit for `$out`/`$merge` write pipelines - Keep `mongodb` at `^6.0.0` (NOT upgrading to v7): v7 drops Node 16/18 support and has many breaking changes; v6.21.0 is actively maintained and supports MongoDB server 3.6-8.0 Additional review fixes: - Include database name in URI when building from individual config fields for correct auth-source resolution (GLM-5) - Add security comment about credential exposure in URI (Kimi K2.5) Documentation updates: - `docs/docs/drivers.md`: add MongoDB to support matrix (11 databases), install section, auth methods, auto-discovery - `docs/docs/configure/warehouses.md`: add MongoDB configuration section with connection string and field-based examples, server compatibility - `docs/docs/data-engineering/tools/warehouse-tools.md`: add MongoDB to Docker discovery and env var detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressed all CodeRabbit findings in 56acab4: Version upgrade ( Nitpicks addressed:
Additional fixes in this commit:
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Cap user-specified `$limit` in aggregate pipelines against `effectiveLimit` to prevent OOM, matching the `find` command behavior - Serialize `distinct` return values through `serializeValue()` for consistent BSON type handling across find/aggregate/distinct Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MongoDB driver was added in PR #482 but its config alias normalization had zero unit test coverage. These tests ensure camelCase config keys (connectionString, uri, authSource, replicaSet, etc.) resolve correctly, preventing silent connection failures for users providing non-canonical config field names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_019D8HyMGHH3Pf43PddTqUw5
Cover two untested code paths in the connection registry: 1. ALTIMATE_CODE_CONN_* env var parsing (CI/Docker users) 2. MongoDB-specific detectAuthMethod branches added in #482 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_015egWH6W5HdaVuMysv9tFHq
MongoDB driver (PR #482) had zero unit tests for config alias normalization (uri, url, authSource, etc.) and auth method detection in telemetry. These tests prevent silent config failures for MongoDB users using common alias field names and ensure telemetry correctly categorizes auth methods for all driver types including the new MongoDB and file-based drivers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 9 driver import checks to verify-install.sh: pg, snowflake-sdk, mysql2, mssql, duckdb, mongodb, bigquery, databricks, oracledb (#295) - Install all driver packages in Dockerfile so tests don't skip - Add MongoDB service to docker-compose.yml (PR #482 added the driver) - Add mongodb smoke test alongside existing postgres/snowflake tests - Add mongodb E2E step to ci-local.sh full mode - Bump Docker healthy count from 4 to 5 (includes mongodb) - Add driver-change PR trigger to generate.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
What does this PR do?
Adds MongoDB as the 11th supported database driver, enabling document database operations via the existing
Connectorinterface. This is needed for the DataAgentBench (DAB) benchmark where 2 of 12 datasets require MongoDB.Changes:
packages/drivers/src/mongodb.tswith full MQL command set (find, aggregate, CRUD, indexes, collection management)databasefield overrideauthorizedDatabasesfallback for restricted-privilege usersmongo/mongodbaliases), and connection registryType of change
Issue for this PR
Closes #480
Closes #481
How did you verify your code works?
Checklist
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
CI