Skip to content

Conversation

@tony
Copy link
Member

@tony tony commented Feb 1, 2026

Summary

Adds a new vcspull import command to search and import repositories from remote services into vcspull configuration.

Closes #416

Features

  • Supported services: GitHub, GitLab, Codeberg, Gitea, Forgejo, AWS CodeCommit
  • Service aliases: gh, gl, cb, cc, aws for convenience
  • Import modes: user, org, search
  • Filtering: --language, --topics, --min-stars, --archived, --forks
  • Output: Human-readable (default), --json, --ndjson
  • Safety: --dry-run preview, --yes to skip confirmation

Usage Examples

Import a user's repositories:

$ vcspull import github torvalds -w ~/repos/linux --mode user

Import an organization's repositories:

$ vcspull import github django -w ~/study/python --mode org

Search and import repositories:

$ vcspull import github "machine learning" -w ~/ml-repos --mode search --min-stars 1000

Use with self-hosted GitLab:

$ vcspull import gitlab myuser -w ~/work --url https://gitlab.company.com

Preview without writing (dry run):

$ vcspull import codeberg user -w ~/oss --dry-run

Import from AWS CodeCommit:

$ vcspull import codecommit -w ~/work/aws --region us-east-1

Architecture

  • src/vcspull/_internal/remotes/: New package with service importers
    • base.py: RemoteRepo dataclass, ImportOptions, HTTPClient, error hierarchy
    • github.py, gitlab.py, gitea.py, codecommit.py: Service-specific implementations
  • src/vcspull/cli/import_repos.py: CLI command handler
  • No new dependencies: Uses stdlib urllib for HTTP, subprocess for AWS CLI

Test Plan

Automated Tests

  • 53 tests for remotes package (all importers, filtering, error handling)
  • 42 tests for CLI (argument parsing, output modes, edge cases)
  • All 839 project tests passing
  • Linting (ruff) passing
  • Type checking (mypy) passing

Authentication Requirements

Service Mode Auth Required
GitHub user, org, search No (for public repos)
GitLab user, org, search Yes (GITLAB_TOKEN)
Codeberg/Gitea org No
Codeberg/Gitea user, search Yes (CODEBERG_TOKEN/GITEA_TOKEN)
CodeCommit - Yes (AWS CLI configured)

Setup for Testing

Option A: Test via uvx (no clone required)

Note: --with typing_extensions is needed because package dependencies aren't fully resolved in isolated uvx environments.

Option B: Test from cloned branch

git clone --branch scraper https://github.com/vcs-python/vcspull.git vcspull-test
cd vcspull-test
uv sync
uv run pytest  # Run automated tests (839 should pass)

Manual Test Commands

Show help:

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import --help
uv run vcspull import --help

Show help (no args is equivalent to --help):

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import
uv run vcspull import

GitHub - user repos:

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github torvalds -w ~/test --mode user --dry-run --limit 10
uv run vcspull import github torvalds -w ~/test --mode user --dry-run --limit 10

GitHub - org repos:

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github django -w ~/test --mode org --dry-run --limit 10
uv run vcspull import github django -w ~/test --mode org --dry-run --limit 10

GitHub - search with min-stars filter:

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github "machine learning" -w ~/test --mode search --dry-run --limit 5 --min-stars 1000
uv run vcspull import github "machine learning" -w ~/test --mode search --dry-run --limit 5 --min-stars 1000

Codeberg - org repos:

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import codeberg forgejo -w ~/test --mode org --dry-run --limit 10
uv run vcspull import codeberg forgejo -w ~/test --mode org --dry-run --limit 10

GitLab - org/group (requires token):

export GITLAB_TOKEN="glpat-xxxxxxxxxxxxxxxxxxxx"
uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import gitlab gitlab-org -w ~/test --mode org --dry-run --limit 10
export GITLAB_TOKEN="glpat-xxxxxxxxxxxxxxxxxxxx"
uv run vcspull import gitlab gitlab-org -w ~/test --mode org --dry-run --limit 10

GitLab - subgroup with slash notation (requires token):

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import gitlab gitlab-org/ci-cd -w ~/test --mode org --dry-run --limit 10
uv run vcspull import gitlab gitlab-org/ci-cd -w ~/test --mode org --dry-run --limit 10

JSON output:

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github torvalds -w ~/test --dry-run --limit 3 --json
uv run vcspull import github torvalds -w ~/test --dry-run --limit 3 --json

NDJSON output:

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github torvalds -w ~/test --dry-run --limit 3 --ndjson
uv run vcspull import github torvalds -w ~/test --dry-run --limit 3 --ndjson

Language filter:

uvx --with typing_extensions --from "git+https://github.com/vcs-python/vcspull@scraper" vcspull import github tony -w ~/test --dry-run --limit 5 --language Python
uv run vcspull import github tony -w ~/test --dry-run --limit 5 --language Python

tony added 6 commits February 1, 2026 10:14
why: Enable importing repositories from GitHub, GitLab, Codeberg/Gitea/Forgejo,
and AWS CodeCommit into vcspull configuration.

what:
- Add base.py with RemoteRepo dataclass, ImportOptions, ImportMode enum
- Add HTTPClient for stdlib-only HTTP requests (urllib)
- Add error hierarchy: AuthenticationError, RateLimitError, NotFoundError, etc.
- Add GitHubImporter with user/org/search modes
- Add GitLabImporter with group/search support (auth required for search)
- Add GiteaImporter supporting Codeberg, Gitea, Forgejo instances
- Add CodeCommitImporter using AWS CLI subprocess calls
- Add filter_repo() for client-side filtering by language, topics, stars
why: Allow users to import repositories from remote services directly
into their vcspull configuration without manual entry.

what:
- Add create_import_subparser() for CLI argument handling
- Add import_repos() main function with full import workflow
- Support services: github, gitlab, codeberg, gitea, forgejo, codecommit
- Add service aliases (gh, gl, cb, cc, aws)
- Add filtering: --language, --topics, --min-stars, --archived, --forks
- Add output modes: human-readable, --json, --ndjson
- Add --dry-run and --yes options for confirmation control
- Require --workspace flag (no default guessing)
why: Make the import command accessible via vcspull CLI.

what:
- Import create_import_subparser, import_repos from import_repos module
- Add IMPORT_DESCRIPTION with usage examples
- Add import subparser to CLI
- Add handler for import subparser in cli() function
why: Ensure remote repository import functionality works correctly.

what:
- Add conftest.py with mock_urlopen fixture and sample API responses
- Add test_base.py with 12 filter_repo tests and RemoteRepo tests
- Add test_github.py with 9 tests for user/org/search modes
- Add test_gitlab.py with 8 tests including auth requirement for search
- Add test_gitea.py with 8 tests covering wrapped and array responses
- Total: 48 tests covering all importers and edge cases
why: Ensure CLI import functionality handles all scenarios correctly.

what:
- Add 14 tests for _get_importer with all services and aliases
- Add 4 tests for _resolve_config_file path resolution
- Add 10 parametrized tests for import_repos main function
- Add 12 edge case tests: abort, skip existing, JSON/NDJSON output, etc.
- Total: 40 tests covering argument handling, errors, and output modes
why: The new import_repos module creates a logger that must be included
in the expected logger names test.

what:
- Add "vcspull.cli.import_repos" to EXPECTED_LOGGER_NAMES
@codecov
Copy link

codecov bot commented Feb 1, 2026

Codecov Report

❌ Patch coverage is 73.18952% with 174 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.84%. Comparing base (bcfb4bb) to head (cb7ae08).

Files with missing lines Patch % Lines
src/vcspull/_internal/remotes/codecommit.py 23.07% 69 Missing and 1 partial ⚠️
src/vcspull/_internal/remotes/base.py 70.50% 38 Missing and 3 partials ⚠️
src/vcspull/_internal/remotes/github.py 79.76% 6 Missing and 11 partials ⚠️
src/vcspull/_internal/remotes/gitea.py 80.72% 5 Missing and 11 partials ⚠️
src/vcspull/_internal/remotes/gitlab.py 80.00% 6 Missing and 10 partials ⚠️
src/vcspull/cli/import_repos.py 91.86% 11 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #510      +/-   ##
==========================================
- Coverage   80.52%   78.84%   -1.68%     
==========================================
  Files          16       22       +6     
  Lines        2192     2841     +649     
  Branches      454      570     +116     
==========================================
+ Hits         1765     2240     +475     
- Misses        277      412     +135     
- Partials      150      189      +39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

why: Document the new import feature for the changelog.

what:
- Add New features section for v1.51.x unreleased
- Document vcspull import command with usage examples
- List supported services, aliases, and filtering options
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new vcspull import CLI command and supporting “remote service importer” implementations to discover repositories from hosted services and write them into a vcspull config.

Changes:

  • Introduces vcspull import command with service selection, filtering, output modes (human/json/ndjson), confirmation, and dry-run.
  • Adds a new internal remotes package implementing GitHub/GitLab/Gitea(Codeberg/Forgejo)/CodeCommit importers plus shared HTTP/filtering primitives.
  • Adds comprehensive unit tests for the CLI command and each importer, plus changelog and logger name coverage updates.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/test_log.py Adds the new CLI module logger name to the logger discovery test.
tests/cli/test_import_repos.py Adds CLI-level tests for importer selection, config resolution, output modes, dry-run/confirmation, and error handling.
tests/_internal/remotes/test_gitlab.py Adds GitLab importer tests including auth-required search behavior.
tests/_internal/remotes/test_github.py Adds GitHub importer tests including filtering and limit handling.
tests/_internal/remotes/test_gitea.py Adds Gitea/Codeberg importer tests including search response variants.
tests/_internal/remotes/test_base.py Adds tests for shared base models/utilities (RemoteRepo, ImportOptions, filter_repo).
tests/_internal/remotes/conftest.py Adds shared HTTP mocking helpers and sample API payload fixtures for remotes tests.
tests/_internal/remotes/init.py Marks the remotes tests package.
src/vcspull/cli/import_repos.py Implements the vcspull import command handler and argument parsing.
src/vcspull/cli/init.py Registers the new import subcommand and help text/examples.
src/vcspull/_internal/remotes/gitlab.py Implements GitLab repository discovery (user/org/search) via GitLab REST API.
src/vcspull/_internal/remotes/github.py Implements GitHub repository discovery (user/org/search) via GitHub REST API.
src/vcspull/_internal/remotes/gitea.py Implements Gitea/Forgejo/Codeberg discovery via Gitea-compatible REST API.
src/vcspull/_internal/remotes/codecommit.py Implements CodeCommit discovery via AWS CLI subprocess calls.
src/vcspull/_internal/remotes/base.py Adds shared dataclasses, filtering logic, error hierarchy, and a small urllib-based HTTP client.
src/vcspull/_internal/remotes/init.py Exposes the remotes package public API (__all__).
CHANGES Documents the new vcspull import feature and usage examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tony added 8 commits February 1, 2026 10:33
why: GitHub Advanced Security flagged substring matching as vulnerable.
     A malicious URL like "https://evil.com/codeberg.org/path" would pass
     the previous "codeberg.org" in url check.
what:
- Add urllib.parse import
- Use urlparse().netloc to extract hostname for exact matching
- Replace substring check with exact hostname comparison for codeberg.org
why: Test used same vulnerable substring matching pattern as the code.
what:
- Replace "codeberg.org" in importer._base_url with exact URL comparison
why: Dead code - defined but never used
what:
- Remove SERVICES_REQUIRING_URL constant at line 54
why: Comma-separated topics like "python,,rust" would produce empty strings
what:
- Add if t.strip() filter to exclude empty strings after stripping
why: Log messages corrupt JSON/NDJSON output when piping to other tools
what:
- Wrap "Fetching repositories..." message in human output check
- Wrap "No repositories found" message in human output check
- Wrap "Found X repositories" message in human output check
…gination

why: Comparing len(items) < DEFAULT_PER_PAGE causes premature exit when
     actual per_page is smaller near the limit
what:
- Store per_page value before use in _fetch_search
- Compare against actual per_page instead of DEFAULT_PER_PAGE
- Apply same fix to _paginate_repos
…gination

why: Comparing len(data) < DEFAULT_PER_PAGE causes premature exit when
     actual per_page is smaller near the limit
what:
- Store per_page value before use in _fetch_search
- Compare against actual per_page instead of DEFAULT_PER_PAGE
- Apply same fix to _paginate_repos
…ination

why: Comparing len(items) < DEFAULT_PER_PAGE causes premature exit when
     actual page_limit is smaller near the limit
what:
- Store page_limit value before use in _fetch_search
- Compare against actual page_limit instead of DEFAULT_PER_PAGE
- Apply same fix to _paginate_repos
@tony
Copy link
Member Author

tony commented Feb 1, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

The new code follows existing patterns in the codebase for docstrings, imports, and test structure.

🤖 Generated with Claude Code

tony added 8 commits February 1, 2026 12:24
…or pagination duplicates

why: The pagination duplicate bug occurs when client-side filtering (excluding
forks/archived repos) causes the per_page/limit parameter to vary between API
pages, leading to offset misalignment and duplicate repositories.

what:
- Add test_pagination_duplicates.py with xfail-marked tests
- Tests verify that per_page/limit values are consistent across all pagination requests
- Cover both GitHub and Gitea importers
… pagination

why: Changing per_page between API pages causes offset misalignment, resulting
in duplicate repositories when client-side filtering removes some items.

what:
- Always use DEFAULT_PER_PAGE instead of recalculating based on remaining count
- Fix both _paginate_repos and _fetch_search methods
- Update early termination checks to use DEFAULT_PER_PAGE
…ination

why: Changing limit between API pages causes offset misalignment, resulting
in duplicate repositories when client-side filtering removes some items.

what:
- Always use DEFAULT_PER_PAGE instead of recalculating based on remaining count
- Fix both _paginate_repos and _fetch_search methods
- Update early termination checks to use DEFAULT_PER_PAGE
…agination tests

why: The pagination duplicate bug has been fixed in both GitHub and Gitea
importers.

what:
- Remove @pytest.mark.xfail decorators from both tests
- Tests now pass and verify consistent per_page/limit across all API requests
… pagination

why: Changing per_page between API pages causes offset misalignment, resulting
in duplicate repositories when client-side filtering removes some items.

what:
- Always use DEFAULT_PER_PAGE instead of recalculating based on remaining count
- Fix both _paginate_repos and _fetch_search methods
- Update early termination checks to use DEFAULT_PER_PAGE
- Add GitLab pagination test to test_pagination_duplicates.py
why: Users should know they can import from GitLab subgroups using slash
notation (e.g., gitlab-org/ci-cd).

what:
- Add note about subgroup support to import command description
- Add example showing subgroup import: gitlab-org/ci-cd
- Update TARGET argument help to mention subgroup slash notation
…up support

why: Verify that GitLab subgroups with slash notation are correctly URL-encoded
as %2F in API requests.

what:
- Add test_gitlab_subgroup_url_encoding for parent/child paths
- Add test_gitlab_deeply_nested_subgroup for a/b/c/d paths
- Both tests verify the URL contains properly encoded %2F sequences
why: Tests for is_authenticated=False were failing because tokens from
environment variables (GITHUB_TOKEN, GITLAB_TOKEN, etc.) were being detected.

what:
- Add monkeypatch to clear token env vars in test_*_is_authenticated_without_token
- Fix test_gitlab_search_requires_auth to clear GITLAB_TOKEN/GL_TOKEN
- Add pytest import to test_gitea.py
@tony
Copy link
Member Author

tony commented Feb 1, 2026

Code review

Found 1 issue:

  1. Missing filter_repo() call in CodeCommit importer - All other importers (GitHub, GitLab, Gitea) call filter_repo(repo, options) before yielding repositories, but CodeCommit yields directly without filtering. This means --language, --topics, --min-stars, --archived, and --forks options are silently ignored for CodeCommit imports.

repo = self._parse_repo(repo_metadata)
yield repo
count += 1

Compare to GitHub importer which correctly filters:

repo = self._parse_repo(item)
if filter_repo(repo, options):
yield repo

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

tony added 2 commits February 1, 2026 13:34
…args

why: Running `vcspull import` without arguments showed an error instead
of help, unlike `vcspull import --help`.
what:
- Make service positional arg optional with nargs="?" and default=None
- Make workspace arg optional with default=None instead of required=True
- Check for missing args in handler and print help instead of proceeding
- Add tests for help display behavior
…epos

why: CodeCommit importer was yielding repos without applying filter_repo(),
causing --language, --topics, --min-stars, --archived, and --forks options
to be silently ignored for CodeCommit imports.
what:
- Import filter_repo from base module
- Apply filter_repo(repo, options) check before yielding, matching other importers
@tony

This comment has been minimized.

@tony

This comment has been minimized.

@tony

This comment has been minimized.

@aschleifer
Copy link

The import itself seems to be working, but it imports the repositories with git+https://gitlab.com/...git urls. Is it possible to get an option to use ssh urls? otherwise I would have to manually edit the config after running the import command to get this fully usable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Import remote repos, ghorg-like fetching behavior

3 participants