Skip to content

harden: validate view database names in mapreduce cleanup#9215

Open
l3tchupkt wants to merge 4 commits intoapache:masterfrom
l3tchupkt:harden-view-cleanup
Open

harden: validate view database names in mapreduce cleanup#9215
l3tchupkt wants to merge 4 commits intoapache:masterfrom
l3tchupkt:harden-view-cleanup

Conversation

@l3tchupkt
Copy link
Copy Markdown

Overview

This PR adds validation for database names derived from info.db_name and used in mapreduce view cleanup operations.

In environments where database names may originate from external input (e.g., multi-tenant applications constructing database names from user identifiers), the current implementation allows potentially unsafe database names to flow through PouchDB's internal metadata system and eventually be used in destructive operations.

Changes:

  1. createView.js: Added validation before constructing depDbName from info.db_name. Invalid database names are logged with a warning and skipped.

  2. index.js (localViewCleanup): Added validation for view database names before processing them during cleanup. Invalid entries are logged with a warning and skipped.

Both validations use a regex pattern ^[^<>:"|?*]+$ that blocks dangerous characters commonly used in path traversal attacks while still allowing valid paths like ./data/mydb.

This hardening improves safety without introducing breaking changes - existing code continues to work, but potentially unsafe database names are now handled defensively.

Testing recommendations

  1. Existing functionality: Create views and run view cleanup on databases with standard names (alphanumeric, underscores, paths like ./data/mydb). Verify normal operation continues.

  2. Edge cases: Test with database names containing the blocked characters <>:"|?*. These should trigger warnings and be skipped rather than processed.

  3. Multi-tenant scenario: In a server environment, verify that database names originating from user input are handled safely during view operations.

Related Issues or Pull Requests

N/A - This is a proactive hardening improvement based on security review feedback.

Checklist

  • I am not a bot
  • This is my own work, I did not use AI, LLM's or similar technology for code or docs generation
  • Code is written and works correctly
  • Changes are covered by tests
  • Documentation changes were made in the docs folder

@janl
Copy link
Copy Markdown
Member

janl commented Apr 3, 2026

Heya, thanks for this.

I think the correct fix for this is not allowing the creation of such databases in the first place, i.e. validate during new PouchDB() at a point where something can be done about this.

Also could you clarify what the regex is designed to match?

@l3tchupkt
Copy link
Copy Markdown
Author

@janl Thanks for the feedback.

I agree that validating at new PouchDB() would be the ideal place long-term.

In this PR, I added validation at viewCleanup() as a defense-in-depth measure, since it’s the point where the value is used in a destructive operation.

Regarding the regex:

  • ^[^<>:"|?*]+$ filters out clearly unsafe characters (common across file/path handling)
  • ^[^<>:"|?*]+-mrview-[a-f0-9]{32}$ ensures only expected view DB names are processed during cleanup

Happy to adjust the scope (e.g. limit validation to cleanup only or refine the regex) based on your preference.

Best regards,
Lakshmikanthan K

@alxndrsn
Copy link
Copy Markdown
Contributor

alxndrsn commented Apr 4, 2026

Both validations use a regex pattern ^[^<>:"|?*]+$ that blocks dangerous characters commonly used in path traversal attacks while still allowing valid paths like ./data/mydb.

Is this applicable to all adapters?

@l3tchupkt
Copy link
Copy Markdown
Author

l3tchupkt commented Apr 4, 2026 via email

@alxndrsn
Copy link
Copy Markdown
Contributor

alxndrsn commented Apr 4, 2026

@l3tchupkt can you include a test that fails on master that this PR will fix?

@janl
Copy link
Copy Markdown
Member

janl commented Apr 4, 2026

Excuse the detour, but I want to put all the pieces on the board to make sure we solve this correctly:

  1. I think this is an issue worth fixing.
  2. This only really applies to non-browser environments, that currently* means the LevelDB pouchdb-adapter-leveldb adapter is affected (the -memory and -localstorage variants use in-browser and in-memory backends that are not affected. So a fix should probably live in pouchdb-adapter-leveldb.
  3. Next let’s consider what we are trying to say here: there are certain database names that should not be allowed. As usual for PouchDB, we can look at CouchDB for inspiration and CouchDB has a definition of a valid database name, it even gives a regex: ^[a-z][a-z0-9_$()+/-]*$. This regex is designed to only allow characters that are allowed as filenames on all file systems. This definition has not changed since it was first introduced in CouchDB at around (IIRC) 2007 or 2008, so I am happy to say it survived real world contact.
  4. But PouchDB-in-Node.js scenarios use the name in one additional way that differs from CouchDB: specification of database directory. By default your LevelDB databases go into ./ and you can override this with a ./data/ prefix for any name to put all databases into that subdir. This is a feature we should keep supporting, but without overloading the database name, and thus allowing file-system-traversal-relevant characters.
  5. CouchDB also allows the configuration of a database directory, but that happens in CouchDB’s configuration, not by inferring a path from the database name (there is an exception for / in db names that do create subdirs, but those never allow upwards traversal), but by having a database_dir configuration value that is set when CouchDB starts.
  6. Given all this, I suggest we do the following:
    1. inside pouchdb-adapter-leveldb during database initialisation, we only allow the regex as defined by CouchDB for valid database names.
    2. we also introduce a new option to be passed to new PouchDB(name, options) called databaseDir. The name and databaseDir are then path.join()ed before handing them to LevelDB or using them for separate cleanup operations.
    3. this is a backwards-incompatible or breaking change. We try to avoid these, but I think it could be warranted here. I could be convinced to allow a fallback with yet another option, so folks can just enable the old behaviours without a semantic port to the new system that might cause issues. I’m a bit on the fence about about it. Maybe the answer is to release the fix as opt-in in a bugfix release and then flip it to opt-out in the next major release. But please discuss.

*we’ll keep this in mind for the ongoing SQLite adapter work, cc @Realtin @AlbaHerrerias.

- Add strict whitelist validation for database names (alphanumeric, underscore, hyphen, max 100 chars)
- Prevent path traversal and injection attacks in view operations
- Apply validation at both createView and localViewCleanup
- Non-breaking: log warnings and skip invalid entries
- Add comprehensive test coverage
@l3tchupkt l3tchupkt force-pushed the harden-view-cleanup branch from 6b9d31b to 0608b33 Compare April 4, 2026 09:47
@l3tchupkt
Copy link
Copy Markdown
Author

@janl Thanks for the detailed explanation, that clarifies the scope a lot.

I see that the issue is specific to filesystem-backed adapters, so it makes sense to move the validation into pouchdb-adapter-leveldb rather than handling it in mapreduce.

I’ll refactor the fix accordingly by:

  • aligning validation with CouchDB’s database name rules
  • applying it during LevelDB adapter initialization
  • separating filesystem paths via a databaseDir option instead of overloading the name

I’ll update the PR shortly with these changes.

- Add strict validation using /^[a-z][a-z0-9_+/-]*$/
- Prevent path traversal and unsafe database identifiers
- Fix callback handling to avoid runtime crashes
- Separate database name from filesystem path via databaseDir
- Add adapter-level tests for validation coverage
@l3tchupkt
Copy link
Copy Markdown
Author

@janl @alxndrsn

Summary of Changes

This PR adds secure database name validation to the LevelDB adapter, aligned with CouchDB rules.

Security

  • Enforces regex: ^[a-z][a-z0-9_$()+/-]*$
  • Rejects path traversal patterns and unsafe characters
  • Prevents misuse of database names in filesystem operations

Implementation

  • Validation occurs during adapter initialization
  • Ensures callback is always a function
  • Adds databaseDir option for separating storage path from database name
  • Uses dbPath for LevelDB while keeping name for cache identity

Tests

  • Added adapter-level validation tests
  • Covers invalid names, traversal patterns, and valid cases
  • All tests passing

Breaking Change
Database names must now follow CouchDB conventions and invalid names are rejected at initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants