-
Notifications
You must be signed in to change notification settings - Fork 14
feat: Add vcspull import command for remote repository discovery #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
why: Enable importing repositories from GitHub, GitLab, Codeberg/Gitea/Forgejo, and AWS CodeCommit into vcspull configuration. what: - Add base.py with RemoteRepo dataclass, ImportOptions, ImportMode enum - Add HTTPClient for stdlib-only HTTP requests (urllib) - Add error hierarchy: AuthenticationError, RateLimitError, NotFoundError, etc. - Add GitHubImporter with user/org/search modes - Add GitLabImporter with group/search support (auth required for search) - Add GiteaImporter supporting Codeberg, Gitea, Forgejo instances - Add CodeCommitImporter using AWS CLI subprocess calls - Add filter_repo() for client-side filtering by language, topics, stars
why: Allow users to import repositories from remote services directly into their vcspull configuration without manual entry. what: - Add create_import_subparser() for CLI argument handling - Add import_repos() main function with full import workflow - Support services: github, gitlab, codeberg, gitea, forgejo, codecommit - Add service aliases (gh, gl, cb, cc, aws) - Add filtering: --language, --topics, --min-stars, --archived, --forks - Add output modes: human-readable, --json, --ndjson - Add --dry-run and --yes options for confirmation control - Require --workspace flag (no default guessing)
why: Make the import command accessible via vcspull CLI. what: - Import create_import_subparser, import_repos from import_repos module - Add IMPORT_DESCRIPTION with usage examples - Add import subparser to CLI - Add handler for import subparser in cli() function
why: Ensure remote repository import functionality works correctly. what: - Add conftest.py with mock_urlopen fixture and sample API responses - Add test_base.py with 12 filter_repo tests and RemoteRepo tests - Add test_github.py with 9 tests for user/org/search modes - Add test_gitlab.py with 8 tests including auth requirement for search - Add test_gitea.py with 8 tests covering wrapped and array responses - Total: 48 tests covering all importers and edge cases
why: Ensure CLI import functionality handles all scenarios correctly. what: - Add 14 tests for _get_importer with all services and aliases - Add 4 tests for _resolve_config_file path resolution - Add 10 parametrized tests for import_repos main function - Add 12 edge case tests: abort, skip existing, JSON/NDJSON output, etc. - Total: 40 tests covering argument handling, errors, and output modes
why: The new import_repos module creates a logger that must be included in the expected logger names test. what: - Add "vcspull.cli.import_repos" to EXPECTED_LOGGER_NAMES
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #510 +/- ##
==========================================
- Coverage 80.52% 78.84% -1.68%
==========================================
Files 16 22 +6
Lines 2192 2841 +649
Branches 454 570 +116
==========================================
+ Hits 1765 2240 +475
- Misses 277 412 +135
- Partials 150 189 +39 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
why: Document the new import feature for the changelog. what: - Add New features section for v1.51.x unreleased - Document vcspull import command with usage examples - List supported services, aliases, and filtering options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new vcspull import CLI command and supporting “remote service importer” implementations to discover repositories from hosted services and write them into a vcspull config.
Changes:
- Introduces
vcspull importcommand with service selection, filtering, output modes (human/json/ndjson), confirmation, and dry-run. - Adds a new internal
remotespackage implementing GitHub/GitLab/Gitea(Codeberg/Forgejo)/CodeCommit importers plus shared HTTP/filtering primitives. - Adds comprehensive unit tests for the CLI command and each importer, plus changelog and logger name coverage updates.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_log.py | Adds the new CLI module logger name to the logger discovery test. |
| tests/cli/test_import_repos.py | Adds CLI-level tests for importer selection, config resolution, output modes, dry-run/confirmation, and error handling. |
| tests/_internal/remotes/test_gitlab.py | Adds GitLab importer tests including auth-required search behavior. |
| tests/_internal/remotes/test_github.py | Adds GitHub importer tests including filtering and limit handling. |
| tests/_internal/remotes/test_gitea.py | Adds Gitea/Codeberg importer tests including search response variants. |
| tests/_internal/remotes/test_base.py | Adds tests for shared base models/utilities (RemoteRepo, ImportOptions, filter_repo). |
| tests/_internal/remotes/conftest.py | Adds shared HTTP mocking helpers and sample API payload fixtures for remotes tests. |
| tests/_internal/remotes/init.py | Marks the remotes tests package. |
| src/vcspull/cli/import_repos.py | Implements the vcspull import command handler and argument parsing. |
| src/vcspull/cli/init.py | Registers the new import subcommand and help text/examples. |
| src/vcspull/_internal/remotes/gitlab.py | Implements GitLab repository discovery (user/org/search) via GitLab REST API. |
| src/vcspull/_internal/remotes/github.py | Implements GitHub repository discovery (user/org/search) via GitHub REST API. |
| src/vcspull/_internal/remotes/gitea.py | Implements Gitea/Forgejo/Codeberg discovery via Gitea-compatible REST API. |
| src/vcspull/_internal/remotes/codecommit.py | Implements CodeCommit discovery via AWS CLI subprocess calls. |
| src/vcspull/_internal/remotes/base.py | Adds shared dataclasses, filtering logic, error hierarchy, and a small urllib-based HTTP client. |
| src/vcspull/_internal/remotes/init.py | Exposes the remotes package public API (__all__). |
| CHANGES | Documents the new vcspull import feature and usage examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
why: GitHub Advanced Security flagged substring matching as vulnerable.
A malicious URL like "https://evil.com/codeberg.org/path" would pass
the previous "codeberg.org" in url check.
what:
- Add urllib.parse import
- Use urlparse().netloc to extract hostname for exact matching
- Replace substring check with exact hostname comparison for codeberg.org
why: Test used same vulnerable substring matching pattern as the code. what: - Replace "codeberg.org" in importer._base_url with exact URL comparison
why: Dead code - defined but never used what: - Remove SERVICES_REQUIRING_URL constant at line 54
why: Comma-separated topics like "python,,rust" would produce empty strings what: - Add if t.strip() filter to exclude empty strings after stripping
why: Log messages corrupt JSON/NDJSON output when piping to other tools what: - Wrap "Fetching repositories..." message in human output check - Wrap "No repositories found" message in human output check - Wrap "Found X repositories" message in human output check
…gination
why: Comparing len(items) < DEFAULT_PER_PAGE causes premature exit when
actual per_page is smaller near the limit
what:
- Store per_page value before use in _fetch_search
- Compare against actual per_page instead of DEFAULT_PER_PAGE
- Apply same fix to _paginate_repos
…gination
why: Comparing len(data) < DEFAULT_PER_PAGE causes premature exit when
actual per_page is smaller near the limit
what:
- Store per_page value before use in _fetch_search
- Compare against actual per_page instead of DEFAULT_PER_PAGE
- Apply same fix to _paginate_repos
…ination
why: Comparing len(items) < DEFAULT_PER_PAGE causes premature exit when
actual page_limit is smaller near the limit
what:
- Store page_limit value before use in _fetch_search
- Compare against actual page_limit instead of DEFAULT_PER_PAGE
- Apply same fix to _paginate_repos
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. The new code follows existing patterns in the codebase for docstrings, imports, and test structure. 🤖 Generated with Claude Code |
…or pagination duplicates why: The pagination duplicate bug occurs when client-side filtering (excluding forks/archived repos) causes the per_page/limit parameter to vary between API pages, leading to offset misalignment and duplicate repositories. what: - Add test_pagination_duplicates.py with xfail-marked tests - Tests verify that per_page/limit values are consistent across all pagination requests - Cover both GitHub and Gitea importers
… pagination why: Changing per_page between API pages causes offset misalignment, resulting in duplicate repositories when client-side filtering removes some items. what: - Always use DEFAULT_PER_PAGE instead of recalculating based on remaining count - Fix both _paginate_repos and _fetch_search methods - Update early termination checks to use DEFAULT_PER_PAGE
…ination why: Changing limit between API pages causes offset misalignment, resulting in duplicate repositories when client-side filtering removes some items. what: - Always use DEFAULT_PER_PAGE instead of recalculating based on remaining count - Fix both _paginate_repos and _fetch_search methods - Update early termination checks to use DEFAULT_PER_PAGE
…agination tests why: The pagination duplicate bug has been fixed in both GitHub and Gitea importers. what: - Remove @pytest.mark.xfail decorators from both tests - Tests now pass and verify consistent per_page/limit across all API requests
… pagination why: Changing per_page between API pages causes offset misalignment, resulting in duplicate repositories when client-side filtering removes some items. what: - Always use DEFAULT_PER_PAGE instead of recalculating based on remaining count - Fix both _paginate_repos and _fetch_search methods - Update early termination checks to use DEFAULT_PER_PAGE - Add GitLab pagination test to test_pagination_duplicates.py
why: Users should know they can import from GitLab subgroups using slash notation (e.g., gitlab-org/ci-cd). what: - Add note about subgroup support to import command description - Add example showing subgroup import: gitlab-org/ci-cd - Update TARGET argument help to mention subgroup slash notation
…up support why: Verify that GitLab subgroups with slash notation are correctly URL-encoded as %2F in API requests. what: - Add test_gitlab_subgroup_url_encoding for parent/child paths - Add test_gitlab_deeply_nested_subgroup for a/b/c/d paths - Both tests verify the URL contains properly encoded %2F sequences
why: Tests for is_authenticated=False were failing because tokens from environment variables (GITHUB_TOKEN, GITLAB_TOKEN, etc.) were being detected. what: - Add monkeypatch to clear token env vars in test_*_is_authenticated_without_token - Fix test_gitlab_search_requires_auth to clear GITLAB_TOKEN/GL_TOKEN - Add pytest import to test_gitea.py
Code reviewFound 1 issue:
vcspull/src/vcspull/_internal/remotes/codecommit.py Lines 237 to 239 in 25d16e2
Compare to GitHub importer which correctly filters: vcspull/src/vcspull/_internal/remotes/github.py Lines 189 to 191 in 25d16e2
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
…args why: Running `vcspull import` without arguments showed an error instead of help, unlike `vcspull import --help`. what: - Make service positional arg optional with nargs="?" and default=None - Make workspace arg optional with default=None instead of required=True - Check for missing args in handler and print help instead of proceeding - Add tests for help display behavior
…epos why: CodeCommit importer was yielding repos without applying filter_repo(), causing --language, --topics, --min-stars, --archived, and --forks options to be silently ignored for CodeCommit imports. what: - Import filter_repo from base module - Apply filter_repo(repo, options) check before yielding, matching other importers
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
The import itself seems to be working, but it imports the repositories with |
Summary
Adds a new
vcspull importcommand to search and import repositories from remote services into vcspull configuration.Closes #416
Features
gh,gl,cb,cc,awsfor convenienceuser,org,search--language,--topics,--min-stars,--archived,--forks--json,--ndjson--dry-runpreview,--yesto skip confirmationUsage Examples
Import a user's repositories:
$ vcspull import github torvalds -w ~/repos/linux --mode userImport an organization's repositories:
$ vcspull import github django -w ~/study/python --mode orgSearch and import repositories:
$ vcspull import github "machine learning" -w ~/ml-repos --mode search --min-stars 1000Use with self-hosted GitLab:
$ vcspull import gitlab myuser -w ~/work --url https://gitlab.company.comPreview without writing (dry run):
$ vcspull import codeberg user -w ~/oss --dry-runImport from AWS CodeCommit:
$ vcspull import codecommit -w ~/work/aws --region us-east-1Architecture
src/vcspull/_internal/remotes/: New package with service importersbase.py:RemoteRepodataclass,ImportOptions,HTTPClient, error hierarchygithub.py,gitlab.py,gitea.py,codecommit.py: Service-specific implementationssrc/vcspull/cli/import_repos.py: CLI command handlerurllibfor HTTP,subprocessfor AWS CLITest Plan
Automated Tests
Authentication Requirements
GITLAB_TOKEN)CODEBERG_TOKEN/GITEA_TOKEN)Setup for Testing
Option A: Test via uvx (no clone required)
Option B: Test from cloned branch
Manual Test Commands
Show help:
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import --helpuv run vcspull import --helpShow help (no args is equivalent to --help):
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull importuv run vcspull importGitHub - user repos:
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github torvalds -w ~/test --mode user --dry-run --limit 10uv run vcspull import github torvalds -w ~/test --mode user --dry-run --limit 10GitHub - org repos:
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github django -w ~/test --mode org --dry-run --limit 10uv run vcspull import github django -w ~/test --mode org --dry-run --limit 10GitHub - search with min-stars filter:
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github "machine learning" -w ~/test --mode search --dry-run --limit 5 --min-stars 1000uv run vcspull import github "machine learning" -w ~/test --mode search --dry-run --limit 5 --min-stars 1000Codeberg - org repos:
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import codeberg forgejo -w ~/test --mode org --dry-run --limit 10uv run vcspull import codeberg forgejo -w ~/test --mode org --dry-run --limit 10GitLab - org/group (requires token):
GitLab - subgroup with slash notation (requires token):
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import gitlab gitlab-org/ci-cd -w ~/test --mode org --dry-run --limit 10uv run vcspull import gitlab gitlab-org/ci-cd -w ~/test --mode org --dry-run --limit 10JSON output:
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github torvalds -w ~/test --dry-run --limit 3 --jsonuv run vcspull import github torvalds -w ~/test --dry-run --limit 3 --jsonNDJSON output:
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github torvalds -w ~/test --dry-run --limit 3 --ndjsonuv run vcspull import github torvalds -w ~/test --dry-run --limit 3 --ndjsonLanguage filter:
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github tony -w ~/test --dry-run --limit 5 --language Pythonuv run vcspull import github tony -w ~/test --dry-run --limit 5 --language Python