Skip to content

Conversation

@BYK
Copy link
Member

@BYK BYK commented Feb 3, 2026

Summary

This PR implements project root detection and a language-agnostic DSN scanner for automatic Sentry DSN discovery. The CLI now walks up from CWD to find the project root, then scans from there with proper depth limiting and priority ordering.

Problem

  1. Running sentry issue list from a subdirectory (e.g., ~/project/src/lib/) would miss .env files at ~/project/.env
  2. Language-specific detectors only matched DSNs in specific contexts like Sentry.init({ dsn: "..." }) but missed DSNs assigned to variables
  3. No depth limiting meant slow scans in large projects
  4. Complex codebase with 7 language-specific detector files

Solution

Project Root Detection:

  1. Walk UP from CWD checking each directory for markers
  2. At each level, check .env files for SENTRY_DSN (stops walk-up but doesn't short-circuit detection)
  3. Stop at VCS/CI markers (.git, .github, etc.) - definitive repo root
  4. Language markers (package.json, pyproject.toml, etc.) - closest to CWD wins
  5. Scan from project root with 2-level depth limit

Language-Agnostic Code Scanner:

  • Greps for DSN URL pattern directly using regex: https://KEY@HOST/PROJECT_ID
  • Filters out commented lines using common prefixes (//, #, --, <!--, etc.)
  • Respects .gitignore via the ignore package
  • Validates DSN hosts: SENTRY_URL set → only that host; not set → only *.sentry.io
  • Scans concurrently with p-limit for performance
  • Skips large files (>256KB) and known non-source directories

DSN Priority Order:

code > env_file > env_var

Changes

New Files:

  • src/lib/dsn/project-root.ts - Project root detection with Sentry instrumentation
  • src/lib/dsn/code-scanner.ts - Language-agnostic DSN scanner
  • test/lib/dsn/project-root.test.ts - 34 tests for project root detection
  • test/lib/dsn/code-scanner.test.ts - 29 tests for code scanner

Modified:

  • src/lib/dsn/detector.ts - Uses project root and new scanner with correct priority
  • package.json - Added ignore and p-limit dependencies

Deleted (replaced by code-scanner.ts):

  • src/lib/dsn/languages/*.ts (8 files: go, java, javascript, php, python, ruby, index, types)
  • test/lib/dsn/languages/*.test.ts (6 test files)

Key Features

Feature Implementation
Project root detection Walk up from CWD, stop at VCS/CI markers
DSN extraction Regex-based, filters commented lines
Host validation SaaS-only or self-hosted-only based on SENTRY_URL
Gitignore support Uses ignore package with built-in skip dirs
Depth limiting Max 2 levels from project root
Concurrency 50 parallel file reads via p-limit
Observability Sentry spans + metrics for scan stats

Testing

bun test test/lib/dsn/           # 89 tests pass
bun run typecheck                # Pass
bun run lint                     # Pass

Walk up from CWD to find project root before scanning for DSNs:
- Check .env files for SENTRY_DSN during walk-up (immediate return if found)
- Stop at VCS/CI markers (.git, .github, etc.) as definitive repo root
- Use language markers (package.json, pyproject.toml, etc.) as fallback
- Limit scan depth to 2 levels from project root for performance

This ensures DSN detection works regardless of which subdirectory
the user runs the CLI from within a project.
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Semver Impact of This PR

🟡 Minor (new features)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (dsn) Add project root detection for automatic DSN discovery by BYK in #159

Bug Fixes 🐛

  • Added nullable in substatus's zod validation by MathurAditya724 in #157

🤖 This preview updates automatically when you update the PR.

BYK added 3 commits February 3, 2026 14:22
…anner

- Add language-agnostic code-scanner.ts that greps for DSN URL patterns
- Filter commented lines using common prefixes (// # -- <!-- etc.)
- Respect .gitignore via 'ignore' package
- Scan concurrently with p-limit for performance
- Remove 8 language-specific detector files and 6 test files
- Add 28 new tests for the code scanner
Previously, finding a DSN in .env during walk-up would return immediately,
bypassing code scanning. Now we always do a full scan with correct priority:
code > env_file > env_var.

The .env DSN still stops the walk-up (determines project root) but doesn't
short-circuit detection.
- Optimize extractDsnsFromContent: extract matches first, then check line context
- Add limit parameter for early exit in first-DSN extraction
- Fix host validation: SENTRY_URL set -> only that host; not set -> only *.sentry.io
- Use path.extname() instead of manual lastIndexOf
- Use recursive readdir + incorporate ALWAYS_SKIP_DIRS into gitignore instance
- Document file size check placement (avoids extra stat() calls)
- Simplify createDetectedDsn usage with .filter(Boolean)
- Simplify deduplication with Map<string, DetectedDsn>
- Add Sentry.metrics for filesScanned/filesCollected/dsnsFound
@BYK BYK marked this pull request as ready for review February 3, 2026 15:31
- Use regex pattern for path separators to support Windows
- Fix walkUpDirectories to check starting dir at stop boundary
- Remove unnecessary export from checkEnvForDsn
- Clean up unused exports in dsn/index.ts
@BYK
Copy link
Member Author

BYK commented Feb 3, 2026

Addressed all review comments

I have addressed all the review comments in the latest commits. Here is a summary:

Code changes made:

  1. Priority order in detectAllDsns - Fixed! Code DSNs now have priority over env file DSNs found during walk-up. Updated the test to match the documented priority order (code > env_file > env_var).

  2. Stop boundary bug in walkUpDirectories - Fixed! Added explicit break when currentDir === stopBoundary BEFORE moving to parent, so we no longer traverse past the home directory.

  3. Duplicate ENV_FILES constant - Fixed! Removed from project-root.ts and now imports from env-file.ts.

  4. Windows path normalization - Fixed! Added relativePath.replaceAll("\\", "/") before calling ig.ignores() since the ignore package requires forward slashes.

Previously addressed (already in the code):

  • Inefficient iteration - Already uses single-pass matchAll() with surrounding line lookup per match
  • Limit parameter - Already has limit parameter for early exit
  • Hardcoded sentry.io - Already uses DEFAULT_SENTRY_HOST from constants and handles SENTRY_URL properly
  • path.extname - Already uses extname() from node:path
  • recursive readdir - Already uses recursive: true
  • Scanning logic order - Already correctly ordered (non-file → skip, depth check, ignore check, extension check)
  • filesScanned metric - Already emits Sentry.metrics.distribution("dsn.files_scanned", ...)

Design decisions (no change needed):

  1. Host check at scanning phase vs extraction: The current approach validates hosts in isValidDsnHost() after regex matching. Moving this earlier would require parsing every potential DSN URL upfront just to check the host, which adds overhead. The current flow is: regex match → check surrounding line for comments → parse and validate host. This is efficient because we only parse valid-looking DSNs.

  2. Map vs Set for deduplication: We use Map<string, DetectedDsn> because we need to store the full DetectedDsn object (with source, path, etc.) keyed by the raw DSN string. A Set<string> would only store the DSN strings, losing the metadata. We could use Set<DetectedDsn> but that would require custom equality logic since objects are compared by reference.

- Fix priority order in detectAllDsns: code DSNs now have priority
- Fix walkUpDirectories stop boundary: check before moving to parent
- Deduplicate ENV_FILES: import from env-file.ts instead of redefining
- Normalize Windows paths: convert backslashes for ignore package
- Update test to match correct priority order (code > env_file > env_var)
BYK added 2 commits February 3, 2026 19:45
When SENTRY_URL is set to an invalid URL, the code scanner was silently
failing to detect any DSNs because getExpectedHost() returned null.

Now it:
1. Logs a warning (once per process) about the invalid URL
2. Falls back to SaaS behavior (accepting *.sentry.io DSNs)

This is consistent with how other parts of the codebase handle invalid
SENTRY_URL values (they fall back to the default SaaS URL).
The relativePath was being normalized for ig.ignores() but the original
backslash path was pushed to files array. This caused inferPackagePath()
to fail on Windows since it splits by "/" only.

Now we normalize the path immediately after creation, ensuring:
1. The ignore package gets forward-slash paths
2. inferPackagePath() receives forward-slash paths
3. DetectedDsn.sourcePath is consistent across platforms
BYK added 2 commits February 3, 2026 20:01
The function was exported but never used in production code - only in tests.
It wasn't part of the public API (not re-exported from index.ts).
The main detection algorithm uses findProjectRoot which handles project
root detection differently, making this function redundant.
The verifyCachedDsn() function was returning null for source='env',
causing a full directory scan on every invocation when the DSN came
from the SENTRY_DSN environment variable.

Now it properly verifies by checking if the env var still contains
the same DSN value, providing the expected O(1) cache hit performance
instead of O(n) directory scanning.

This is especially important for CI/CD, Docker, and Kubernetes
environments where SENTRY_DSN env var is the primary configuration method.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

1. Cache verification now checks for higher-priority sources:
   - When cached DSN is from env var, checks for code DSNs first
   - When cached DSN is from env file, checks for code DSNs first
   - This ensures adding a code DSN after caching correctly updates priority

2. Fixed module-level state test pollution:
   - Exported _resetInvalidSentryUrlWarning() for testing
   - Tests now reset warning state in afterEach to prevent order dependencies

Added tests for both fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants