Skip to content

Conversation

@richhankins
Copy link
Contributor

Summary

Simplifies the ctxc index command to accept URLs directly, eliminating the need for verbose source-specific flags.

Before

ctxc index github --owner augmentcode --repo context-connectors -i context-connectors
ctxc index gitlab --project group/project -i my-project
ctxc index website --url https://docs.example.com -i my-docs

After

ctxc index https://github.com/augmentcode/context-connectors
ctxc index https://gitlab.com/group/project
ctxc index https://docs.example.com

Changes

1. URL Parser Module (src/core/url-parser.ts)

  • New parseSourceUrl() function that auto-detects source type from URLs
  • Supports GitHub, GitLab, Bitbucket, and website URLs
  • Extracts owner/repo/project from URL paths
  • Detects branch/tag from URLs like /tree/main
  • 19 unit tests

2. CLI URL Mode (src/bin/cmd-index.ts)

  • New url subcommand: ctxc index url <url>
  • Auto-detects source type and creates appropriate indexer
  • Defaults index name to repo name (can override with -i)
  • Supports --ref to override branch/tag

3. Direct URL Syntax (src/bin/index.ts)

  • Pre-parse argument rewriting allows ctxc index <url> without the url subcommand
  • Transparently inserts url when a URL is detected

Backward Compatibility

✅ All existing subcommands work unchanged:

ctxc index github --owner augmentcode --repo context-connectors  # Still works

Testing

  • 182 tests pass (19 new for URL parser)
  • Manually verified URL detection and indexing

Examples

# GitHub
ctxc index https://github.com/augmentcode/context-connectors
ctxc index https://github.com/augmentcode/context-connectors/tree/develop

# GitLab (including self-hosted)
ctxc index https://gitlab.com/group/project
ctxc index https://gitlab.mycompany.com/team/project

# Bitbucket
ctxc index https://bitbucket.org/workspace/repo

# Website
ctxc index https://docs.python.org/3/

# With options
ctxc index https://github.com/owner/repo -i custom-name --ref v2.0

Pull Request opened by Augment Code with guidance from the PR author

- Parse GitHub URLs (owner/repo, tree/branch, commit/sha)
- Parse GitLab URLs (project path, subgroups, /-/tree/branch)
- Parse Bitbucket URLs (workspace/repo, src/branch, branch/name)
- Fallback to website source for unknown URLs
- Extract default index names from URLs
- Support self-hosted GitLab and Bitbucket instances
- Export parseSourceUrl and ParsedUrl from @augmentcode/context-connectors/core
- Add comprehensive unit tests (19 test cases)

Agent-Id: agent-8394bd07-7a81-41d0-ac95-1ca62623e6fb
Adds  command that auto-detects source type
(GitHub, GitLab, Bitbucket, or website) from the URL and creates
the appropriate source.

Features:
- Parses URL using parseSourceUrl() to determine source type
- Supports --ref option to override URL-detected branch/tag
- Supports -i/--index option to override default index name
- Supports --store and --store-path options
- Default index name derived from repo/project name
- Graceful error handling for invalid URLs

Agent-Id: agent-c9423996-94bb-4ab3-8311-ca0cc822da14
Adds pre-parse argument rewriting that auto-detects when a URL is passed
directly to 'ctxc index' and transparently inserts the 'url' subcommand.

Before: ctxc index url https://github.com/owner/repo
After:  ctxc index https://github.com/owner/repo

Both syntaxes now work. Existing subcommands (github, gitlab, etc.) are
unchanged and continue to work.

Agent-Id: agent-ce81a04d-72f2-4289-8eb7-c3074d7d8030
@richhankins
Copy link
Contributor Author

augment review

@augment-app-staging
Copy link

🤖 Augment PR Summary

Summary: This PR simplifies the indexing CLI by allowing users to pass a repository/website URL directly and having the tool auto-detect the source type.

Changes:

  • Added parseSourceUrl() in src/core/url-parser.ts to classify GitHub/GitLab/Bitbucket vs website URLs and extract source configs (including URL-derived refs).
  • Introduced ctxc index url <url> which instantiates the appropriate Source based on the parsed URL, with --ref override support.
  • Added argv rewriting so ctxc index <url> transparently maps to ctxc index url <url>.
  • Re-exported the parser + types from src/core/index.ts.
  • Added a Vitest suite covering common URL patterns across the supported providers.

Technical Notes: Self-hosted GitLab/Bitbucket URLs are supported via baseUrl, and URL-derived refs (e.g., /tree/<ref>) can be overridden by CLI flags.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augment-app-staging augment-app-staging bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

}

const owner = pathParts[0];
const repo = pathParts[1];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URLs pasted from clone endpoints often end with .git; currently repo/projectName/defaultIndexName will include the suffix, which likely breaks GitHub/GitLab/Bitbucket API paths and yields odd default index keys. Consider stripping a trailing .git from repo/project segments when parsing.

Other Locations
  • src/core/url-parser.ts:125
  • src/core/url-parser.ts:152

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

}

// GitLab (gitlab.com or hostname contains "gitlab")
if (hostname === "gitlab.com" || hostname.includes("gitlab")) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hostname.includes("gitlab") / hostname.includes("bitbucket") can misclassify unrelated websites whose domain happens to contain those substrings and then fail parsing instead of falling back to website. Consider a more conservative self-hosted detection strategy to reduce false positives.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

program.parse();
// Auto-detect URL mode: ctxc index <url> -> ctxc index url <url>
// This allows users to skip the 'url' subcommand when providing a URL directly
const indexIdx = process.argv.indexOf("index");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL auto-rewrite only triggers when the URL is the first argument after index, so ctxc index -i myidx https://… won’t be rewritten and will likely error. Is that limitation intentional, or should the rewrite scan forward for the first non-option arg?

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants