diff --git a/AGENTS.md b/AGENTS.md index 89b1c38d0..d5776277b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -332,6 +332,24 @@ $ vcspull search django $ vcspull search "name:flask" ``` +**Prefer longform flags** — use `--workspace` not `-w`, `--file` not `-f`. + +**Split multi-flag commands** — when a command has 2+ flags/options, place each on its own `\`-continuation line, indented by 4 spaces. + +Good: + +```console +$ vcspull import gh my-org \ + --mode org \ + --workspace ~/code/ +``` + +Bad: + +```console +$ vcspull import gh my-org --mode org -w ~/code/ +``` + ## Debugging Tips When stuck in debugging loops: diff --git a/CHANGES b/CHANGES index b203566d4..1aa6f551e 100644 --- a/CHANGES +++ b/CHANGES @@ -33,6 +33,125 @@ $ uvx --from 'vcspull' --prerelease allow vcspull _Notes on upcoming releases will be added here_ +### New features + +#### New command: `vcspull import` (#510) + +Import repositories from GitHub, GitLab, Codeberg/Gitea/Forgejo, and AWS +CodeCommit directly into your vcspull configuration. + +Import a user's repositories: + +```console +$ vcspull import github torvalds \ + --workspace ~/repos/linux \ + --mode user +``` + +Import an organization's repositories: + +```console +$ vcspull import github django \ + --workspace ~/study/python \ + --mode org +``` + +Search and import repositories: + +```console +$ vcspull import github "machine learning" \ + --workspace ~/ml-repos \ + --mode search \ + --min-stars 1000 +``` + +Use with self-hosted GitLab: + +```console +$ vcspull import gitlab myuser \ + --workspace ~/work \ + --url https://gitlab.company.com +``` + +Import from AWS CodeCommit: + +```console +$ vcspull import codecommit \ + --workspace ~/work/aws \ + --region us-east-1 +``` + +Preview without writing (dry run): + +```console +$ vcspull import codeberg user \ + --workspace ~/oss \ + --dry-run +``` + +**Key features:** + +- Service aliases: `gh`, `gl`, `cb`, `cc`, `aws` +- Filtering: `--language`, `--topics`, `--min-stars`, `--archived`, `--forks` +- Output modes: human-readable (default), `--json`, `--ndjson` +- Interactive confirmation before writing; use `--yes`/`-y` to skip +- Repositories already in the config are detected and skipped +- Non-zero exit code on errors (for CI/automation) +- No new dependencies (uses stdlib `urllib` for HTTP) + +#### `vcspull import`: SSH clone URLs by default (#510) + +Clone URLs default to SSH. Use `--https` to get HTTPS URLs instead: + +SSH (default): + +```console +$ vcspull import github torvalds \ + --workspace ~/repos/linux \ + --mode user +``` + +Use `--https` for HTTPS clone URLs: + +```console +$ vcspull import github torvalds \ + --workspace ~/repos/linux \ + --mode user \ + --https +``` + +#### `vcspull import`: GitLab subgroups map to workspace roots (#510) + +For GitLab organization/group imports, subgroup namespaces are preserved +under the workspace root by default: + +```console +$ vcspull import gitlab vcs-python-group-test \ + --workspace ~/projects/python \ + --mode org +``` + +This writes repositories into workspace sections like: + +- `~/projects/python/` +- `~/projects/python//` +- `~/projects/python///` + +Use `--flatten-groups` to collapse subgroup repositories into a single +workspace root: + +```console +$ vcspull import gitlab vcs-python-group-test \ + --workspace ~/projects/python \ + --mode org \ + --flatten-groups +``` + +### Bug fixes + +- Config writes now use atomic temp-file-then-rename to prevent data loss + during interrupted writes (#510) + ### Tests - Fix `pytest-asyncio` deprecation warning in isolated `pytester` runs by diff --git a/README.md b/README.md index cb194e5df..fea8acaf2 100644 --- a/README.md +++ b/README.md @@ -68,7 +68,7 @@ You can test the unpublished version of vcspull before its released. ## Configuration Add your repos to `~/.vcspull.yaml`. You can edit the file by hand or let -`vcspull add` or `vcspull discover` create entries for you. +`vcspull add`, `vcspull discover`, or `vcspull import` create entries for you. ```yaml ~/code/: @@ -119,10 +119,32 @@ $ vcspull discover ~/code --recursive ``` The scan shows each repository before import unless you opt into `--yes`. Add -`-w ~/code/` to pin the resulting workspace root or `-f` to write somewhere other +`--workspace ~/code/` to pin the resulting workspace root or `-f/--file` to write somewhere other than the default `~/.vcspull.yaml`. Duplicate workspace roots are merged by default; include `--no-merge` to keep them separate while you review the log. +### Import from remote services + +Pull repository lists from GitHub, GitLab, Codeberg, Gitea, Forgejo, or AWS +CodeCommit directly into your configuration: + +```console +$ vcspull import github myuser \ + --workspace ~/code/ \ + --mode user +``` + +```console +$ vcspull import gitlab my-group \ + --workspace ~/work/ \ + --mode org +``` + +Use `--dry-run` to preview changes, `--https` for HTTPS clone URLs, and +`--language`/`--topics`/`--min-stars` to filter results. See the +[import documentation](https://vcspull.git-pull.com/cli/import/) for all +supported services and options. + ### Inspect configured repositories List what vcspull already knows about without mutating anything: @@ -164,7 +186,9 @@ After importing or editing by hand, run the formatter to tidy up keys, merge duplicate workspace sections, and keep entries sorted: ```console -$ vcspull fmt -f ~/.vcspull.yaml --write +$ vcspull fmt \ + --file ~/.vcspull.yaml \ + --write ``` Use `vcspull fmt --all --write` to format every YAML file that vcspull can @@ -205,7 +229,7 @@ or svn project with a git dependency: Clone / update repos via config file: ```console -$ vcspull sync -f external_deps.yaml '*' +$ vcspull sync --file external_deps.yaml '*' ``` See the [Quickstart](https://vcspull.git-pull.com/quickstart.html) for diff --git a/docs/api/cli/import.md b/docs/api/cli/import.md new file mode 100644 index 000000000..ab9bbeef2 --- /dev/null +++ b/docs/api/cli/import.md @@ -0,0 +1,15 @@ +# vcspull import - `vcspull.cli.import_cmd` + +```{eval-rst} +.. automodule:: vcspull.cli.import_cmd + :members: + :show-inheritance: + :undoc-members: +``` + +```{eval-rst} +.. automodule:: vcspull.cli.import_cmd._common + :members: + :show-inheritance: + :undoc-members: +``` diff --git a/docs/api/cli/index.md b/docs/api/cli/index.md index 8ca2beb71..5093c2fe5 100644 --- a/docs/api/cli/index.md +++ b/docs/api/cli/index.md @@ -10,6 +10,7 @@ sync add +import discover list search diff --git a/docs/cli/add.md b/docs/cli/add.md index df96dc0f8..aebe9c36d 100644 --- a/docs/cli/add.md +++ b/docs/cli/add.md @@ -8,8 +8,9 @@ merges duplicate workspace roots by default, and prompts before writing unless you pass `--yes`. ```{note} -This command replaces the manual import functionality from `vcspull import`. -For bulk scanning of existing repositories, see {ref}`cli-discover`. +This command replaces the old `vcspull import ` from v1.36--v1.39. +For bulk scanning of local repositories, see {ref}`cli-discover`. +For bulk import from remote services (GitHub, GitLab, etc.), see {ref}`cli-import`. ``` ## Command @@ -97,7 +98,8 @@ vcspull searches for configuration files in this order: Specify a file explicitly with `-f/--file`: ```console -$ vcspull add ~/study/python/pytest-docker -f ~/configs/python.yaml +$ vcspull add ~/study/python/pytest-docker \ + --file ~/configs/python.yaml ``` ## Handling duplicates @@ -114,22 +116,27 @@ a summary of the merge. Prefer to inspect duplicates yourself? Add 2. Run `vcspull list` to verify the new entry (see {ref}`cli-list`). 3. Run `vcspull sync` to clone or update the working tree (see {ref}`cli-sync`). -## Migration from vcspull import +## Migration from the old vcspull import -If you previously used `vcspull import `, switch to the path-first -workflow: +The `vcspull import ` command from v1.36--v1.39 has been replaced +by `vcspull add`: ```diff - $ vcspull import flask https://github.com/pallets/flask.git -c ~/.vcspull.yaml -+ $ vcspull add ~/code/flask --url https://github.com/pallets/flask.git -f ~/.vcspull.yaml ++ $ vcspull add ~/code/flask --url https://github.com/pallets/flask.git --file ~/.vcspull.yaml ``` Key differences: -- `vcspull add` now derives the name from the filesystem unless you pass - `--name`. +- `vcspull add` derives the name from the filesystem unless you pass `--name`. - The parent directory becomes the workspace automatically; use `--workspace` to override. - Use `--url` to record a remote when the checkout does not have one. +```{note} +Starting with v1.55, `vcspull import` is a *different* command that bulk-imports +repositories from remote services (GitHub, GitLab, etc.). See {ref}`cli-import` +for details. +``` + [pip vcs url]: https://pip.pypa.io/en/stable/topics/vcs-support/ diff --git a/docs/cli/discover.md b/docs/cli/discover.md index 865c02b25..6df754d78 100644 --- a/docs/cli/discover.md +++ b/docs/cli/discover.md @@ -129,7 +129,9 @@ $ vcspull discover ~ --recursive --workspace-root ~/code/ --yes Specify a custom config file with `-f/--file`: ```console -$ vcspull discover ~/company --recursive -f ~/company/.vcspull.yaml +$ vcspull discover ~/company \ + --recursive \ + --file ~/company/.vcspull.yaml ``` If the config file doesn't exist, it will be created. @@ -195,7 +197,7 @@ Scan to specific config: $ vcspull discover ~/company/repos \ --recursive \ --yes \ - -f ~/company/.vcspull.yaml + --file ~/company/.vcspull.yaml ``` ## After discovering repositories @@ -229,7 +231,7 @@ If you previously used `vcspull import --scan`: ```diff - $ vcspull import --scan ~/code --recursive -c ~/.vcspull.yaml --yes -+ $ vcspull discover ~/code --recursive -f ~/.vcspull.yaml --yes ++ $ vcspull discover ~/code --recursive --file ~/.vcspull.yaml --yes ``` Changes: @@ -273,7 +275,7 @@ $ vcspull discover ~/projects --recursive --yes ```console $ vcspull discover ~/company \ --recursive \ - -f ~/company/.vcspull.yaml \ + --file ~/company/.vcspull.yaml \ --workspace-root ~/work/ \ --yes ``` diff --git a/docs/cli/fmt.md b/docs/cli/fmt.md index b28d06161..fade66757 100644 --- a/docs/cli/fmt.md +++ b/docs/cli/fmt.md @@ -59,22 +59,12 @@ Run the formatter in dry-run mode first to preview the adjustments: $ vcspull fmt --file ~/.vcspull.yaml ``` -Then add `--write` (or `-w`) to persist them back to disk: +Then add `--write` to persist them back to disk: ```console -$ vcspull fmt --file ~/.vcspull.yaml --write -``` - -Short form for preview: - -```console -$ vcspull fmt -f ~/.vcspull.yaml -``` - -Short form to apply: - -```console -$ vcspull fmt -f ~/.vcspull.yaml -w +$ vcspull fmt \ + --file ~/.vcspull.yaml \ + --write ``` Use `--all` to iterate over the default search locations: the current working diff --git a/docs/cli/import/codeberg.md b/docs/cli/import/codeberg.md new file mode 100644 index 000000000..f0a0c4c25 --- /dev/null +++ b/docs/cli/import/codeberg.md @@ -0,0 +1,34 @@ +(cli-import-codeberg)= + +# vcspull import codeberg + +Import repositories from Codeberg. + +## Command + +```{eval-rst} +.. argparse:: + :module: vcspull.cli + :func: create_parser + :prog: vcspull + :path: import codeberg +``` + +## Authentication + +- **Env vars**: `CODEBERG_TOKEN` (primary), `GITEA_TOKEN` (fallback) +- **Token type**: API token +- **Scope**: no scopes needed for public repos; token required for private repos +- **Create at**: + +Set the token: + +```console +$ export CODEBERG_TOKEN=... +``` + +Then import: + +```console +$ vcspull import codeberg myuser --workspace ~/code/ +``` diff --git a/docs/cli/import/codecommit.md b/docs/cli/import/codecommit.md new file mode 100644 index 000000000..ef7285cb5 --- /dev/null +++ b/docs/cli/import/codecommit.md @@ -0,0 +1,50 @@ +(cli-import-codecommit)= + +# vcspull import codecommit + +Import repositories from AWS CodeCommit. + +## Command + +```{eval-rst} +.. argparse:: + :module: vcspull.cli + :func: create_parser + :prog: vcspull + :path: import codecommit +``` + +## Usage + +CodeCommit does not require a target argument. Use `--region` and `--profile` +to select the AWS environment: + +```console +$ vcspull import codecommit \ + --workspace ~/code/ \ + --region us-east-1 \ + --profile work +``` + +## Authentication + +- **Auth**: AWS CLI credentials (`aws configure`) — no token env var +- **CLI args**: `--region`, `--profile` +- **IAM permissions required**: + - `codecommit:ListRepositories` (resource: `*`) + - `codecommit:BatchGetRepositories` (resource: repo ARNs or `*`) +- **Dependency**: AWS CLI must be installed (`pip install awscli`) + +Configure your AWS credentials: + +```console +$ aws configure +``` + +Then import: + +```console +$ vcspull import codecommit \ + --workspace ~/code/ \ + --region us-east-1 +``` diff --git a/docs/cli/import/forgejo.md b/docs/cli/import/forgejo.md new file mode 100644 index 000000000..bdedc8a2b --- /dev/null +++ b/docs/cli/import/forgejo.md @@ -0,0 +1,37 @@ +(cli-import-forgejo)= + +# vcspull import forgejo + +Import repositories from a self-hosted Forgejo instance. + +## Command + +```{eval-rst} +.. argparse:: + :module: vcspull.cli + :func: create_parser + :prog: vcspull + :path: import forgejo +``` + +## Authentication + +- **Env vars**: `FORGEJO_TOKEN` (primary; matched when hostname contains + "forgejo"), `GITEA_TOKEN` (fallback) +- **Token type**: API token +- **Scope**: `read:repository` +- **Create at**: `https:///user/settings/applications` + +Set the token: + +```console +$ export FORGEJO_TOKEN=... +``` + +Then import: + +```console +$ vcspull import forgejo myuser \ + --workspace ~/code/ \ + --url https://forgejo.example.com +``` diff --git a/docs/cli/import/gitea.md b/docs/cli/import/gitea.md new file mode 100644 index 000000000..be33aa21e --- /dev/null +++ b/docs/cli/import/gitea.md @@ -0,0 +1,36 @@ +(cli-import-gitea)= + +# vcspull import gitea + +Import repositories from a self-hosted Gitea instance. + +## Command + +```{eval-rst} +.. argparse:: + :module: vcspull.cli + :func: create_parser + :prog: vcspull + :path: import gitea +``` + +## Authentication + +- **Env var**: `GITEA_TOKEN` +- **Token type**: API token with scoped permissions +- **Scope**: `read:repository` (minimum for listing repos) +- **Create at**: `https:///user/settings/applications` + +Set the token: + +```console +$ export GITEA_TOKEN=... +``` + +Then import: + +```console +$ vcspull import gitea myuser \ + --workspace ~/code/ \ + --url https://git.example.com +``` diff --git a/docs/cli/import/github.md b/docs/cli/import/github.md new file mode 100644 index 000000000..49be1ebab --- /dev/null +++ b/docs/cli/import/github.md @@ -0,0 +1,38 @@ +(cli-import-github)= + +# vcspull import github + +Import repositories from GitHub or GitHub Enterprise. + +## Command + +```{eval-rst} +.. argparse:: + :module: vcspull.cli + :func: create_parser + :prog: vcspull + :path: import github +``` + +## Authentication + +- **Env vars**: `GITHUB_TOKEN` (primary), `GH_TOKEN` (fallback) +- **Token type**: Personal access token (classic) or fine-grained PAT +- **Permissions**: + - Classic PAT: no scopes needed for public repos; `repo` scope for private + repos; `read:org` for org repos + - Fine-grained PAT: "Metadata: Read-only" for public; add "Contents: + Read-only" for private +- **Create at**: + +Set the token: + +```console +$ export GITHUB_TOKEN=ghp_... +``` + +Then import: + +```console +$ vcspull import gh myuser --workspace ~/code/ +``` diff --git a/docs/cli/import/gitlab.md b/docs/cli/import/gitlab.md new file mode 100644 index 000000000..cb23522bd --- /dev/null +++ b/docs/cli/import/gitlab.md @@ -0,0 +1,49 @@ +(cli-import-gitlab)= + +# vcspull import gitlab + +Import repositories from GitLab or a self-hosted GitLab instance. + +## Command + +```{eval-rst} +.. argparse:: + :module: vcspull.cli + :func: create_parser + :prog: vcspull + :path: import gitlab +``` + +## Group flattening + +When importing a GitLab group with `--mode org`, vcspull preserves subgroup +structure as nested workspace directories by default. Use `--flatten-groups` to +place all repositories directly in the base workspace: + +```console +$ vcspull import gl my-group \ + --mode org \ + --workspace ~/code/ \ + --flatten-groups +``` + +## Authentication + +- **Env vars**: `GITLAB_TOKEN` (primary), `GL_TOKEN` (fallback) +- **Token type**: Personal access token +- **Scope**: `read_api` (minimum for listing projects; **required** for search + mode) +- **Create at**: + (self-hosted: `https:///-/user_settings/personal_access_tokens`) + +Set the token: + +```console +$ export GITLAB_TOKEN=glpat-... +``` + +Then import: + +```console +$ vcspull import gl myuser --workspace ~/code/ +``` diff --git a/docs/cli/import/index.md b/docs/cli/import/index.md new file mode 100644 index 000000000..429261ff2 --- /dev/null +++ b/docs/cli/import/index.md @@ -0,0 +1,254 @@ +(cli-import)= + +# vcspull import + +The `vcspull import` command bulk-imports repositories from remote hosting +services into your vcspull configuration. It connects to the service API, +fetches a list of repositories, and writes them to your config file in a single +step. + +Supported services: **GitHub**, **GitLab**, **Codeberg**, **Gitea**, +**Forgejo**, and **AWS CodeCommit**. + +## Command + +```{eval-rst} +.. argparse:: + :module: vcspull.cli + :func: create_parser + :prog: vcspull + :path: import + :nosubcommands: + :nodescription: +``` + +Choose a service subcommand for details: + +- {ref}`cli-import-github` — GitHub or GitHub Enterprise +- {ref}`cli-import-gitlab` — GitLab (gitlab.com or self-hosted) +- {ref}`cli-import-codeberg` — Codeberg +- {ref}`cli-import-gitea` — Self-hosted Gitea instance +- {ref}`cli-import-forgejo` — Self-hosted Forgejo instance +- {ref}`cli-import-codecommit` — AWS CodeCommit + +## Basic usage + +Import all repositories for a GitHub user into a workspace: + +```vcspull-console +$ vcspull import github myuser --workspace ~/code/ +→ Fetching repositories from GitHub... +✓ Found 12 repositories + + project-a [Python] + + project-b [Rust] ★42 + + dotfiles + ... and 9 more +Import 12 repositories to ~/.vcspull.yaml? [y/N]: y +✓ Added 12 repositories to ~/.vcspull.yaml +``` + +## Supported services + +| Service | Aliases | Self-hosted | Auth env var(s) | +|------------|------------------|----------------------|-----------------------------------| +| GitHub | `github`, `gh` | `--url` | `GITHUB_TOKEN` / `GH_TOKEN` | +| GitLab | `gitlab`, `gl` | `--url` | `GITLAB_TOKEN` / `GL_TOKEN` | +| Codeberg | `codeberg`, `cb` | No | `CODEBERG_TOKEN` / `GITEA_TOKEN` | +| Gitea | `gitea` | `--url` (required) | `GITEA_TOKEN` | +| Forgejo | `forgejo` | `--url` (required) | `FORGEJO_TOKEN` / `GITEA_TOKEN` | +| CodeCommit | `codecommit`, `cc`, `aws` | N/A | AWS CLI credentials | + +For Gitea and Forgejo, `--url` is required because there is no default +instance. + +```{toctree} +:maxdepth: 1 +:hidden: + +github +gitlab +codeberg +gitea +forgejo +codecommit +``` + +## Import modes + +### User mode (default) + +Fetch all repositories owned by a user: + +```console +$ vcspull import gh myuser --workspace ~/code/ +``` + +### Organization mode + +Fetch repositories belonging to an organization or group: + +```console +$ vcspull import gh my-org \ + --mode org \ + --workspace ~/code/ +``` + +For GitLab, subgroups are supported with slash notation: + +```console +$ vcspull import gl my-group/sub-group \ + --mode org \ + --workspace ~/code/ +``` + +### Search mode + +Search for repositories matching a query: + +```console +$ vcspull import gh django \ + --mode search \ + --workspace ~/code/ \ + --min-stars 100 +``` + +## Filtering + +Narrow results with filtering flags: + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --language python +``` + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --topics cli,automation +``` + +```console +$ vcspull import gh django \ + --mode search \ + --workspace ~/code/ \ + --min-stars 50 +``` + +Include archived or forked repositories (excluded by default): + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --archived \ + --forks +``` + +Limit the number of repositories fetched: + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --limit 50 +``` + +```{note} +Not all filters work with every service. For example, `--language` may not +return results for GitLab or CodeCommit because those APIs don't expose +language metadata. vcspull warns when a filter is unlikely to work. +``` + +## Output formats + +Human-readable output (default): + +```console +$ vcspull import gh myuser --workspace ~/code/ +``` + +JSON for automation: + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --json +``` + +NDJSON for streaming: + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --ndjson +``` + +## Dry runs and confirmation + +Preview what would be imported without writing to the config file: + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --dry-run +``` + +Skip the confirmation prompt (useful for scripts): + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --yes +``` + +## Configuration file selection + +vcspull writes to `~/.vcspull.yaml` by default. Override with `-f/--file`: + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --file ~/configs/github.yaml +``` + +## Protocol selection + +SSH clone URLs are used by default. Switch to HTTPS with `--https`: + +```console +$ vcspull import gh myuser \ + --workspace ~/code/ \ + --https +``` + +## Self-hosted instances + +Point to a self-hosted GitHub Enterprise, GitLab, Gitea, or Forgejo instance +with `--url`: + +```console +$ vcspull import gitea myuser \ + --workspace ~/code/ \ + --url https://git.example.com +``` + +## Authentication + +vcspull reads API tokens from environment variables. Use `--token` to override. +Environment variables are preferred for security. See each service page for +details. + +| Service | Env var(s) | Token type | Min scope / permissions | +|------------|----------------------------------|-----------------------|------------------------------------------------------------------| +| GitHub | `GITHUB_TOKEN` / `GH_TOKEN` | PAT (classic or fine) | None (public), `repo` (private) | +| GitLab | `GITLAB_TOKEN` / `GL_TOKEN` | PAT | `read_api` | +| Codeberg | `CODEBERG_TOKEN` / `GITEA_TOKEN` | API token | None (public), any token (private) | +| Gitea | `GITEA_TOKEN` | API token | `read:repository` | +| Forgejo | `FORGEJO_TOKEN` / `GITEA_TOKEN` | API token | `read:repository` | +| CodeCommit | AWS CLI credentials | IAM access key | `codecommit:ListRepositories`, `codecommit:BatchGetRepositories` | + +## After importing + +1. Run `vcspull fmt --write` to normalize and sort the configuration (see + {ref}`cli-fmt`). +2. Run `vcspull list` to verify the imported entries (see {ref}`cli-list`). +3. Run `vcspull sync` to clone the repositories (see {ref}`cli-sync`). diff --git a/docs/cli/index.md b/docs/cli/index.md index 977483710..1cb579205 100644 --- a/docs/cli/index.md +++ b/docs/cli/index.md @@ -8,6 +8,7 @@ sync add +import/index discover list search @@ -36,5 +37,5 @@ completion :nosubcommands: subparser_name : @replace - See :ref:`cli-sync`, :ref:`cli-add`, :ref:`cli-discover`, :ref:`cli-list`, :ref:`cli-search`, :ref:`cli-status`, :ref:`cli-fmt` + See :ref:`cli-sync`, :ref:`cli-add`, :ref:`cli-import`, :ref:`cli-discover`, :ref:`cli-list`, :ref:`cli-search`, :ref:`cli-status`, :ref:`cli-fmt` ``` diff --git a/docs/cli/list.md b/docs/cli/list.md index ad69646eb..ad80161a0 100644 --- a/docs/cli/list.md +++ b/docs/cli/list.md @@ -124,7 +124,7 @@ By default, vcspull searches for config files in standard locations Specify a custom config file with `-f/--file`: ```console -$ vcspull list -f ~/projects/.vcspull.yaml +$ vcspull list --file ~/projects/.vcspull.yaml ``` ## Workspace filtering @@ -132,7 +132,7 @@ $ vcspull list -f ~/projects/.vcspull.yaml Filter repositories by workspace root with `-w/--workspace/--workspace-root`: ```vcspull-console -$ vcspull list -w ~/code/ +$ vcspull list --workspace ~/code/ • flask → ~/code/flask • requests → ~/code/requests ``` diff --git a/docs/cli/search.md b/docs/cli/search.md index b7c186ff3..d7b0b72c9 100644 --- a/docs/cli/search.md +++ b/docs/cli/search.md @@ -55,7 +55,7 @@ $ vcspull search --fixed-strings 'git+https://github.com/org/repo.git' case-insensitively unless your query includes uppercase characters. ```console -$ vcspull search -S Django +$ vcspull search --smart-case Django ``` ## Boolean matching @@ -69,7 +69,7 @@ $ vcspull search --any django flask Invert matches with `-v/--invert-match`: ```console -$ vcspull search -v --fixed-strings github +$ vcspull search --invert-match --fixed-strings github ``` ## JSON output diff --git a/docs/cli/status.md b/docs/cli/status.md index 6959d3c00..34d4d1614 100644 --- a/docs/cli/status.md +++ b/docs/cli/status.md @@ -180,7 +180,7 @@ $ vcspull status --json > workspace-status-$(date +%Y%m%d).json Specify a custom config file with `-f/--file`: ```console -$ vcspull status -f ~/projects/.vcspull.yaml +$ vcspull status --file ~/projects/.vcspull.yaml ``` ## Workspace filtering @@ -188,7 +188,7 @@ $ vcspull status -f ~/projects/.vcspull.yaml Filter repositories by workspace root (planned feature): ```console -$ vcspull status -w ~/code/ +$ vcspull status --workspace ~/code/ ``` ## Color output diff --git a/docs/cli/sync.md b/docs/cli/sync.md index d3cb26f5d..3b2327e06 100644 --- a/docs/cli/sync.md +++ b/docs/cli/sync.md @@ -88,7 +88,7 @@ Each line is a JSON object representing a sync event, ideal for: Specify a custom config file with `-f/--file`: ```console -$ vcspull sync -f ~/projects/.vcspull.yaml '*' +$ vcspull sync --file ~/projects/.vcspull.yaml '*' ``` By default, vcspull searches for config files in: @@ -101,7 +101,7 @@ By default, vcspull searches for config files in: Filter repositories by workspace root with `-w/--workspace` or `--workspace-root`: ```console -$ vcspull sync -w ~/code/ '*' +$ vcspull sync --workspace ~/code/ '*' ``` This syncs only repositories in the specified workspace root, useful for: diff --git a/docs/configuration/generation.md b/docs/configuration/generation.md index 781986754..7503c419e 100644 --- a/docs/configuration/generation.md +++ b/docs/configuration/generation.md @@ -2,112 +2,16 @@ # Config generation -As a temporary solution for `vcspull` not being able to generate {ref}`configuration` through scanning directories or fetching them via API (e.g. gitlab, github, etc), you can write scripts to generate configs in the mean time. +The `vcspull import` command can generate configuration by fetching +repositories from remote services. See {ref}`cli-import` for details. -(config-generation-gitlab)= +Supported services: GitHub, GitLab, Codeberg, Gitea, Forgejo, +AWS CodeCommit. -## Collect repos from Gitlab - -Contributed by Andreas Schleifer (a.schleifer@bigpoint.net) - -Limitation on both, no pagination support in either, so only returns the -first page of repos (as of Feb 26th this is 100). - -````{tab} Shell-script - -_Requires [jq] and [curl]._ - -```{literalinclude} ../../scripts/generate_gitlab.sh -:language: shell -``` +Example — import all repos from a GitLab group: ```console -$ env GITLAB_TOKEN=mySecretToken \ - /path/to/generate_gitlab.sh gitlab.mycompany.com desired_namespace -``` - -To be executed from the path where the repos should later be stored. It will use -the current working directory as a "prefix" for the path used in the new config file. - -Optional: Set config file output path as additional argument (_will overwrite_) - -```console -$ env GITLAB_TOKEN=mySecretToken \ - /path/to/generate_gitlab.sh gitlab.mycompany.com desired_namespace /path/to/config.yaml -``` - -**Demonstration** - -Assume current directory of _/home/user/workspace/_ and script at _/home/user/workspace/scripts/generate_gitlab.sh_: - -```console -$ ./scripts/generate_gitlab.sh gitlab.com vcs-python -``` - -New file _vcspull.yaml_: - -```yaml -/my/workspace/: - g: - url: "git+ssh://git@gitlab.com/vcs-python/g.git" - remotes: - origin: "ssh://git@gitlab.com/vcs-python/g.git" - libvcs: - url: "git+ssh://git@gitlab.com/vcs-python/libvcs.git" - remotes: - origin: "ssh://git@gitlab.com/vcs-python/libvcs.git" - vcspull: - url: "git+ssh://git@gitlab.com/vcs-python/vcspull.git" - remotes: - origin: "ssh://git@gitlab.com/vcs-python/vcspull.git" +$ vcspull import gitlab my-group \ + --workspace ~/code \ + --mode org ``` - -[jq]: https://stedolan.github.io/jq/ - -[curl]: https://curl.se/ - -```` - -````{tab} Python -_Requires [requests] and [pyyaml]._ - -This confirms file overwrite, if already exists. It also requires passing the protocol/schema -of the gitlab mirror, e.g. `https://gitlab.com` instead of `gitlab.com`. - -```{literalinclude} ../../scripts/generate_gitlab.py -:language: python -``` - -**Demonstration** - -Assume current directory of _/home/user/workspace/_ and script at _/home/user/workspace/scripts/generate_gitlab.sh_: - -```console -$ ./scripts/generate_gitlab.py https://gitlab.com vcs-python -``` - -```yaml -/my/workspace/vcs-python: - g: - remotes: - origin: ssh://git@gitlab.com/vcs-python/g.git - url: git+ssh://git@gitlab.com/vcs-python/g.git - libvcs: - remotes: - origin: ssh://git@gitlab.com/vcs-python/libvcs.git - url: git+ssh://git@gitlab.com/vcs-python/libvcs.git - vcspull: - remotes: - origin: ssh://git@gitlab.com/vcs-python/vcspull.git - url: git+ssh://git@gitlab.com/vcs-python/vcspull.git -``` - -[requests]: https://docs.python-requests.org/en/latest/ -[pyyaml]: https://pyyaml.org/ - -```` - -### Contribute your own - -Post yours on or create a PR to add -yours to scripts/ and be featured here diff --git a/docs/configuration/index.md b/docs/configuration/index.md index b966410b0..1fd453ff7 100644 --- a/docs/configuration/index.md +++ b/docs/configuration/index.md @@ -88,13 +88,6 @@ YAML: ```` -```{toctree} -:maxdepth: 2 -:hidden: - -generation -``` - ## Caveats (git-remote-ssh-git)= diff --git a/docs/quickstart.md b/docs/quickstart.md index 80681a3d5..f2a838093 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -102,7 +102,7 @@ via trunk (can break easily): ## Configuration ```{seealso} -{ref}`configuration` and {ref}`config-generation`. +{ref}`configuration` and {ref}`cli-import`. ``` We will check out the source code of [flask][flask] to `~/code/flask`. @@ -154,7 +154,7 @@ be any name): Use `-f/--file` to specify a config. ```console -$ vcspull sync -f .deps.yaml --all +$ vcspull sync --file .deps.yaml --all ``` You can also use [fnmatch] to pull repositories from your config in diff --git a/scripts/generate_gitlab.py b/scripts/generate_gitlab.py deleted file mode 100755 index 0274bd26f..000000000 --- a/scripts/generate_gitlab.py +++ /dev/null @@ -1,125 +0,0 @@ -#!/usr/bin/env python -"""Example script for export gitlab organization to vcspull config file.""" - -from __future__ import annotations - -import argparse -import logging -import os -import pathlib -import sys -import typing as t - -import requests -import yaml -from libvcs.sync.git import GitRemote - -from vcspull.cli.sync import CouldNotGuessVCSFromURL, guess_vcs - -if t.TYPE_CHECKING: - from vcspull.types import RawConfig - -log = logging.getLogger(__name__) -logging.basicConfig(level=logging.INFO, format="%(message)s") - -try: - gitlab_token = os.environ["GITLAB_TOKEN"] -except KeyError: - log.info("Please provide the environment variable GITLAB_TOKEN") - sys.exit(1) - -parser = argparse.ArgumentParser( - description="Script to generate vcsconfig for all repositories \ - under the given namespace (needs Gitlab >= 10.3)", -) -parser.add_argument("gitlab_host", type=str, help="url to the gitlab instance") -parser.add_argument( - "gitlab_namespace", - type=str, - help="namespace/group in gitlab to generate vcsconfig for", -) -parser.add_argument( - "-c", - type=str, - help="path to the target config file (default: ./vcspull.yaml)", - dest="config_file_name", - required=False, - default="./vcspull.yaml", -) - -args = vars(parser.parse_args()) -gitlab_host = args["gitlab_host"] -gitlab_namespace = args["gitlab_namespace"] -config_filename = pathlib.Path(args["config_file_name"]) - -try: - if config_filename.is_file(): - result = input( - f"The target config file ({config_filename}) already exists, \ - do you want to overwrite it? [y/N] ", - ) - - if result != "y": - log.info( - "Aborting per user request as existing config file (%s) should not be " - "overwritten!", - config_filename, - ) - sys.exit(0) - - config_file = config_filename.open(encoding="utf-8", mode="w") -except OSError: - log.info("File %s not accessible", config_filename) - sys.exit(1) - -response = requests.get( - f"{gitlab_host}/api/v4/groups/{gitlab_namespace}/projects", - params={"include_subgroups": "true", "per_page": "100"}, - headers={"Authorization": f"Bearer {gitlab_token}"}, -) - -if response.status_code != 200: - log.info("Error: %s", response) - sys.exit(1) - -path_prefix = pathlib.Path().cwd() -config: RawConfig = {} - - -for group in response.json(): - url_to_repo = group["ssh_url_to_repo"].replace(":", "/") - namespace_path = group["namespace"]["full_path"] - reponame = group["path"] - - path = f"{path_prefix}/{namespace_path}" - - if path not in config: - config[path] = {} - - # simplified config not working - https://github.com/vcs-python/vcspull/issues/332 - # config[path][reponame] = 'git+ssh://%s' % (url_to_repo) - - vcs = guess_vcs(url_to_repo) - if vcs is None: - raise CouldNotGuessVCSFromURL(url_to_repo) - - config[path][reponame] = { - "name": reponame, - "path": path / reponame, - "url": f"git+ssh://{url_to_repo}", - "remotes": { - "origin": GitRemote( - name="origin", - fetch_url=f"ssh://{url_to_repo}", - push_url=f"ssh://{url_to_repo}", - ), - }, - "vcs": vcs, - } - -config_yaml = yaml.dump(config) - -log.info(config_yaml) - -config_file.write(config_yaml) -config_file.close() diff --git a/scripts/generate_gitlab.sh b/scripts/generate_gitlab.sh deleted file mode 100755 index 86068bd9e..000000000 --- a/scripts/generate_gitlab.sh +++ /dev/null @@ -1,38 +0,0 @@ -#!/usr/bin/env bash - -if [ -z "${GITLAB_TOKEN}" ]; then - echo 'Please provide the environment variable $GITLAB_TOKEN' - exit 1 -fi - -if [ $# -lt 2 ]; then - echo "Usage: $0 []" - exit 1 -fi - -prefix="$(pwd)" -gitlab_host="${1}" -namespace="${2}" -config_file="${3:-./vcspull.yaml}" - -current_namespace_path="" - -curl --silent --show-error --header "Authorization: Bearer ${GITLAB_TOKEN}" "https://${gitlab_host}/api/v4/groups/${namespace}/projects?include_subgroups=true&per_page=100" \ - | jq -r '.[]|.namespace.full_path + " " + .path' \ - | LC_ALL=C sort \ - | while read namespace_path reponame; do - if [ "${current_namespace_path}" != "${namespace_path}" ]; then - current_namespace_path="${namespace_path}" - - echo "${prefix}/${current_namespace_path}:" - fi - - # simplified config not working - https://github.com/vcs-python/vcspull/issues/332 - #echo " ${reponame}: 'git+ssh://git@${gitlab_host}/${current_namespace_path}/${reponame}.git'" - - echo " ${reponame}:" - echo " url: 'git+ssh://git@${gitlab_host}/${current_namespace_path}/${reponame}.git'" - echo " remotes:" - echo " origin: 'ssh://git@${gitlab_host}/${current_namespace_path}/${reponame}.git'" - done \ - | tee "${config_file}" diff --git a/src/vcspull/_internal/remotes/__init__.py b/src/vcspull/_internal/remotes/__init__.py new file mode 100644 index 000000000..293c4ec2e --- /dev/null +++ b/src/vcspull/_internal/remotes/__init__.py @@ -0,0 +1,39 @@ +"""Remote repository importing for vcspull.""" + +from __future__ import annotations + +from .base import ( + AuthenticationError, + ConfigurationError, + DependencyError, + ImportMode, + ImportOptions, + NotFoundError, + RateLimitError, + RemoteImportError, + RemoteRepo, + ServiceUnavailableError, + filter_repo, +) +from .codecommit import CodeCommitImporter +from .gitea import GiteaImporter +from .github import GitHubImporter +from .gitlab import GitLabImporter + +__all__ = [ + "AuthenticationError", + "CodeCommitImporter", + "ConfigurationError", + "DependencyError", + "GitHubImporter", + "GitLabImporter", + "GiteaImporter", + "ImportMode", + "ImportOptions", + "NotFoundError", + "RateLimitError", + "RemoteImportError", + "RemoteRepo", + "ServiceUnavailableError", + "filter_repo", +] diff --git a/src/vcspull/_internal/remotes/base.py b/src/vcspull/_internal/remotes/base.py new file mode 100644 index 000000000..b750444e6 --- /dev/null +++ b/src/vcspull/_internal/remotes/base.py @@ -0,0 +1,564 @@ +"""Base classes and utilities for remote repository importers.""" + +from __future__ import annotations + +import dataclasses +import enum +import json +import logging +import os +import typing as t +import urllib.error +import urllib.parse +import urllib.request + +log = logging.getLogger(__name__) + + +class ImportMode(enum.Enum): + """Import mode for remote services.""" + + USER = "user" + ORG = "org" + SEARCH = "search" + + +class RemoteImportError(Exception): + """Base exception for remote import errors.""" + + def __init__(self, message: str, service: str | None = None) -> None: + """Initialize the error. + + Parameters + ---------- + message : str + Error message + service : str | None + Name of the service that raised the error + + Examples + -------- + >>> err = RemoteImportError("connection failed", service="GitHub") + >>> str(err) + 'connection failed' + >>> err.service + 'GitHub' + """ + super().__init__(message) + self.service = service + + +class AuthenticationError(RemoteImportError): + """Raised when authentication fails or is required.""" + + +class RateLimitError(RemoteImportError): + """Raised when API rate limit is exceeded.""" + + +class NotFoundError(RemoteImportError): + """Raised when a requested resource is not found.""" + + +class ServiceUnavailableError(RemoteImportError): + """Raised when the service is unavailable.""" + + +class ConfigurationError(RemoteImportError): + """Raised when there's a configuration error.""" + + +class DependencyError(RemoteImportError): + """Raised when a required dependency is missing.""" + + +@dataclasses.dataclass(frozen=True) +class RemoteRepo: + """Represents a repository from a remote service. + + Parameters + ---------- + name : str + Repository name (filesystem-safe) + clone_url : str + HTTPS URL for cloning the repository + ssh_url : str + SSH URL for cloning the repository + html_url : str + URL for viewing the repository in a browser + description : str | None + Repository description + language : str | None + Primary programming language + topics : tuple[str, ...] + Repository topics/tags + stars : int + Star/favorite count + is_fork : bool + Whether this is a fork of another repository + is_archived : bool + Whether the repository is archived + default_branch : str + Default branch name + owner : str + Owner username or organization name + """ + + name: str + clone_url: str + ssh_url: str + html_url: str + description: str | None + language: str | None + topics: tuple[str, ...] + stars: int + is_fork: bool + is_archived: bool + default_branch: str + owner: str + + def to_vcspull_url(self, *, use_ssh: bool = True) -> str: + """Return the URL formatted for vcspull config. + + Parameters + ---------- + use_ssh : bool + When True and ``ssh_url`` is non-empty, use the SSH URL. + Falls back to ``clone_url`` when ``ssh_url`` is empty. + + Returns + ------- + str + Git URL with git+ prefix for vcspull config + + Examples + -------- + >>> repo = RemoteRepo( + ... name="test", + ... clone_url="https://github.com/user/test.git", + ... ssh_url="git@github.com:user/test.git", + ... html_url="https://github.com/user/test", + ... description=None, + ... language=None, + ... topics=(), + ... stars=0, + ... is_fork=False, + ... is_archived=False, + ... default_branch="main", + ... owner="user", + ... ) + >>> repo.to_vcspull_url() + 'git+git@github.com:user/test.git' + >>> repo.to_vcspull_url(use_ssh=False) + 'git+https://github.com/user/test.git' + """ + url = self.ssh_url if use_ssh and self.ssh_url else self.clone_url + if url.startswith("git+"): + return url + return f"git+{url}" + + def to_dict(self) -> dict[str, t.Any]: + """Convert to dictionary for JSON serialization. + + Returns + ------- + dict[str, t.Any] + Dictionary representation + + Examples + -------- + >>> repo = RemoteRepo( + ... name="test", + ... clone_url="https://github.com/user/test.git", + ... ssh_url="git@github.com:user/test.git", + ... html_url="https://github.com/user/test", + ... description="A test repo", + ... language="Python", + ... topics=("cli", "tool"), + ... stars=100, + ... is_fork=False, + ... is_archived=False, + ... default_branch="main", + ... owner="user", + ... ) + >>> d = repo.to_dict() + >>> d["name"] + 'test' + >>> d["topics"] + ['cli', 'tool'] + """ + return { + "name": self.name, + "clone_url": self.clone_url, + "ssh_url": self.ssh_url, + "html_url": self.html_url, + "description": self.description, + "language": self.language, + "topics": list(self.topics), + "stars": self.stars, + "is_fork": self.is_fork, + "is_archived": self.is_archived, + "default_branch": self.default_branch, + "owner": self.owner, + } + + +@dataclasses.dataclass +class ImportOptions: + """Options for importing repositories from a remote service. + + Parameters + ---------- + mode : ImportMode + The importing mode (user, org, or search) + target : str + Target user, org, or search query + base_url : str | None + Base URL for self-hosted instances + token : str | None + API token for authentication + include_forks : bool + Whether to include forked repositories + include_archived : bool + Whether to include archived repositories + language : str | None + Filter by programming language + topics : list[str] + Filter by topics + min_stars : int + Minimum star count (for search mode) + limit : int + Maximum number of repositories to return + """ + + mode: ImportMode = ImportMode.USER + target: str = "" + base_url: str | None = None + token: str | None = None + include_forks: bool = False + include_archived: bool = False + language: str | None = None + topics: list[str] = dataclasses.field(default_factory=list) + min_stars: int = 0 + limit: int = 100 + + def __post_init__(self) -> None: + """Validate options after initialization. + + Examples + -------- + >>> opts = ImportOptions(limit=10) + >>> opts.limit + 10 + + >>> ImportOptions(limit=0) + Traceback (most recent call last): + ... + ValueError: limit must be >= 1, got 0 + """ + if self.limit < 1: + msg = f"limit must be >= 1, got {self.limit}" + raise ValueError(msg) + + +class HTTPClient: + """Simple HTTP client using urllib for making API requests.""" + + def __init__( + self, + base_url: str, + *, + token: str | None = None, + auth_header: str = "Authorization", + auth_prefix: str = "Bearer", + user_agent: str = "vcspull", + timeout: int = 30, + ) -> None: + """Initialize the HTTP client. + + Parameters + ---------- + base_url : str + Base URL for API requests + token : str | None + Authentication token + auth_header : str + Header name for authentication + auth_prefix : str + Prefix for the token in the auth header + user_agent : str + User-Agent header value + timeout : int + Request timeout in seconds + + Examples + -------- + >>> client = HTTPClient("https://api.example.com/") + >>> client.base_url + 'https://api.example.com' + """ + self.base_url = base_url.rstrip("/") + self.token = token + if token and not self.base_url.startswith("https://"): + log.warning( + "Authentication token will be sent over non-HTTPS connection " + "to %s — consider using HTTPS to protect credentials", + self.base_url, + ) + self.auth_header = auth_header + self.auth_prefix = auth_prefix + self.user_agent = user_agent + self.timeout = timeout + + def _build_headers(self) -> dict[str, str]: + """Build request headers. + + Returns + ------- + dict[str, str] + Request headers + + Examples + -------- + >>> client = HTTPClient("https://api.example.com", token="tok123") + >>> headers = client._build_headers() + >>> headers["Authorization"] + 'Bearer tok123' + + >>> client = HTTPClient("https://api.example.com") + >>> "Authorization" not in client._build_headers() + True + """ + headers = { + "User-Agent": self.user_agent, + "Accept": "application/json", + } + if self.token: + if self.auth_prefix: + headers[self.auth_header] = f"{self.auth_prefix} {self.token}" + else: + headers[self.auth_header] = self.token + return headers + + def get( + self, + endpoint: str, + *, + params: dict[str, str | int] | None = None, + service_name: str = "remote", + ) -> tuple[t.Any, dict[str, str]]: + """Make a GET request to the API. + + Parameters + ---------- + endpoint : str + API endpoint (will be appended to base_url) + params : dict | None + Query parameters + service_name : str + Service name for error messages + + Returns + ------- + tuple[Any, dict[str, str]] + Parsed JSON response and response headers + + Raises + ------ + AuthenticationError + When authentication fails (401) + RateLimitError + When rate limit is exceeded (403/429) + NotFoundError + When resource is not found (404) + ServiceUnavailableError + When service is unavailable (5xx) + """ + url = f"{self.base_url}{endpoint}" + + if params: + parts = urllib.parse.urlsplit(url) + existing_qs = urllib.parse.parse_qs(parts.query) + existing_qs.update({k: [str(v)] for k, v in params.items()}) + new_query = urllib.parse.urlencode( + {k: v[0] for k, v in existing_qs.items()}, + ) + url = urllib.parse.urlunsplit( + (parts.scheme, parts.netloc, parts.path, new_query, parts.fragment), + ) + + headers = self._build_headers() + request = urllib.request.Request(url, headers=headers) + + log.debug("GET %s", url) + + try: + with urllib.request.urlopen(request, timeout=self.timeout) as response: + body = response.read().decode("utf-8") + response_headers = {k.lower(): v for k, v in response.getheaders()} + return json.loads(body), response_headers + except urllib.error.HTTPError as exc: + self._handle_http_error(exc, service_name) + except urllib.error.URLError as exc: + msg = f"Connection error: {exc.reason}" + raise ServiceUnavailableError(msg, service=service_name) from exc + except json.JSONDecodeError as exc: + msg = f"Invalid JSON response from {service_name}" + raise ServiceUnavailableError(msg, service=service_name) from exc + + # Should never reach here, but for type checker + msg = "Unexpected error" + raise ServiceUnavailableError(msg, service=service_name) + + def _handle_http_error( + self, + exc: urllib.error.HTTPError, + service_name: str, + ) -> t.NoReturn: + """Handle HTTP error responses. + + Parameters + ---------- + exc : urllib.error.HTTPError + The HTTP error + service_name : str + Service name for error messages + + Raises + ------ + AuthenticationError + When authentication fails (401) + RateLimitError + When rate limit is exceeded (403/429) + NotFoundError + When resource is not found (404) + ServiceUnavailableError + When service is unavailable (5xx) + """ + try: + body = exc.read().decode("utf-8") + error_data = json.loads(body) + message = str(error_data.get("message", exc)) + except (json.JSONDecodeError, UnicodeDecodeError): + message = str(exc) + + if exc.code == 401: + msg = f"Authentication failed for {service_name}: {message}" + raise AuthenticationError(msg, service=service_name) from exc + + if exc.code == 403: + if "rate limit" in message.lower(): + msg = f"Rate limit exceeded for {service_name}: {message}" + raise RateLimitError(msg, service=service_name) from exc + msg = f"Access denied for {service_name}: {message}" + raise AuthenticationError(msg, service=service_name) from exc + + if exc.code == 404: + msg = f"Resource not found on {service_name}: {message}" + raise NotFoundError(msg, service=service_name) from exc + + if exc.code == 429: + msg = f"Rate limit exceeded for {service_name}: {message}" + raise RateLimitError(msg, service=service_name) from exc + + if exc.code >= 500: + msg = f"{service_name} service unavailable: {message}" + raise ServiceUnavailableError(msg, service=service_name) from exc + + msg = f"HTTP {exc.code} from {service_name}: {message}" + raise ServiceUnavailableError(msg, service=service_name) from exc + + +def get_token_from_env(*env_vars: str) -> str | None: + """Get an API token from environment variables. + + Parameters + ---------- + *env_vars : str + Environment variable names to check in order + + Returns + ------- + str | None + The token if found, None otherwise + + Examples + -------- + >>> import os + >>> os.environ["TEST_TOKEN"] = "secret" + >>> get_token_from_env("TEST_TOKEN", "OTHER_TOKEN") + 'secret' + >>> get_token_from_env("NONEXISTENT_TOKEN") + >>> del os.environ["TEST_TOKEN"] + """ + for var in env_vars: + token = os.environ.get(var) + if token: + return token + return None + + +def filter_repo( + repo: RemoteRepo, + options: ImportOptions, +) -> bool: + """Check if a repository passes the filter criteria. + + Parameters + ---------- + repo : RemoteRepo + The repository to check + options : ImportOptions + Filter options + + Returns + ------- + bool + True if the repository passes all filters + + Examples + -------- + >>> repo = RemoteRepo( + ... name="test", + ... clone_url="https://github.com/user/test.git", + ... ssh_url="git@github.com:user/test.git", + ... html_url="https://github.com/user/test", + ... description=None, + ... language="Python", + ... topics=("cli",), + ... stars=50, + ... is_fork=False, + ... is_archived=False, + ... default_branch="main", + ... owner="user", + ... ) + >>> options = ImportOptions(include_forks=False, include_archived=False) + >>> filter_repo(repo, options) + True + >>> options = ImportOptions(language="JavaScript") + >>> filter_repo(repo, options) + False + """ + # Check fork filter + if repo.is_fork and not options.include_forks: + return False + + # Check archived filter + if repo.is_archived and not options.include_archived: + return False + + # Check language filter + if options.language and ( + not repo.language or repo.language.lower() != options.language.lower() + ): + return False + + # Check topics filter + if options.topics: + repo_topics_lower = {topic.lower() for topic in repo.topics} + required_topics_lower = {topic.lower() for topic in options.topics} + if not required_topics_lower.issubset(repo_topics_lower): + return False + + # Check minimum stars + return not (options.min_stars > 0 and repo.stars < options.min_stars) diff --git a/src/vcspull/_internal/remotes/codecommit.py b/src/vcspull/_internal/remotes/codecommit.py new file mode 100644 index 000000000..455e4e5b3 --- /dev/null +++ b/src/vcspull/_internal/remotes/codecommit.py @@ -0,0 +1,314 @@ +"""AWS CodeCommit repository importer for vcspull.""" + +from __future__ import annotations + +import json +import logging +import subprocess +import typing as t + +from .base import ( + AuthenticationError, + ConfigurationError, + DependencyError, + ImportOptions, + RemoteRepo, + ServiceUnavailableError, + filter_repo, +) + +log = logging.getLogger(__name__) + + +class CodeCommitImporter: + """Importer for AWS CodeCommit repositories. + + Uses AWS CLI to list and fetch repository information. + Requires AWS CLI to be installed and configured. + + Examples + -------- + >>> importer = CodeCommitImporter(region="us-east-1") + >>> importer.service_name + 'CodeCommit' + """ + + service_name: str = "CodeCommit" + + def __init__( + self, + region: str | None = None, + profile: str | None = None, + ) -> None: + """Initialize the CodeCommit importer. + + Parameters + ---------- + region : str | None + AWS region. If not provided, uses AWS CLI default. + profile : str | None + AWS profile name. If not provided, uses default profile. + + Notes + ----- + Uses AWS CLI credentials (``aws configure``). No token environment + variable is used. IAM policy must include + ``codecommit:ListRepositories`` (resource ``*``) and + ``codecommit:BatchGetRepositories``. + + Requires AWS CLI: ``pip install awscli``. + """ + self._region = region + self._profile = profile + self._check_aws_cli() + + def _check_aws_cli(self) -> None: + """Check if AWS CLI is installed and accessible. + + Raises + ------ + DependencyError + When AWS CLI is not installed + """ + try: + result = subprocess.run( + ["aws", "--version"], + capture_output=True, + text=True, + check=False, + ) + if result.returncode != 0: + msg = ( + "AWS CLI not installed or not accessible. " + "Please install it with: pip install awscli" + ) + raise DependencyError(msg, service=self.service_name) + except FileNotFoundError as exc: + msg = "AWS CLI not installed. Please install it with: pip install awscli" + raise DependencyError(msg, service=self.service_name) from exc + + def _build_aws_command(self, *args: str) -> list[str]: + """Build AWS CLI command with region and profile options. + + Parameters + ---------- + *args : str + AWS CLI arguments + + Returns + ------- + list[str] + Complete command list + """ + cmd = ["aws", "--output", "json"] + if self._region: + cmd.extend(["--region", self._region]) + if self._profile: + cmd.extend(["--profile", self._profile]) + cmd.extend(args) + return cmd + + def _run_aws_command(self, *args: str) -> dict[str, t.Any]: + """Run an AWS CLI command and return parsed JSON output. + + Parameters + ---------- + *args : str + AWS CLI arguments + + Returns + ------- + dict + Parsed JSON output + + Raises + ------ + AuthenticationError + When AWS credentials are missing or invalid + ConfigurationError + When region is invalid or endpoint unreachable + """ + cmd = self._build_aws_command(*args) + log.debug("Running: %s", " ".join(cmd)) + + try: + result = subprocess.run( + cmd, + capture_output=True, + text=True, + check=False, + timeout=60, + ) + except FileNotFoundError as exc: + msg = "AWS CLI not found" + raise DependencyError(msg, service=self.service_name) from exc + except subprocess.TimeoutExpired as exc: + msg = "AWS CLI command timed out" + raise ServiceUnavailableError(msg, service=self.service_name) from exc + + if result.returncode != 0: + stderr = result.stderr.lower() + if "unable to locate credentials" in stderr: + msg = ( + "AWS credentials not configured. Run 'aws configure' or " + "set AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY." + ) + raise AuthenticationError(msg, service=self.service_name) + if "could not connect to the endpoint" in stderr: + msg = ( + f"Could not connect to CodeCommit. Check your region setting. " + f"Error: {result.stderr}" + ) + raise ConfigurationError(msg, service=self.service_name) + if "invalid" in stderr and "region" in stderr: + msg = f"Invalid AWS region. Error: {result.stderr}" + raise ConfigurationError(msg, service=self.service_name) + msg = f"AWS CLI error: {result.stderr}" + raise ConfigurationError(msg, service=self.service_name) + + try: + return json.loads(result.stdout) if result.stdout.strip() else {} + except json.JSONDecodeError as exc: + msg = f"Invalid JSON from AWS CLI: {result.stdout}" + raise ConfigurationError(msg, service=self.service_name) from exc + + @property + def is_authenticated(self) -> bool: + """Check if AWS credentials are configured. + + Returns + ------- + bool + True if credentials appear to be configured + """ + try: + self._run_aws_command("sts", "get-caller-identity") + except (AuthenticationError, ConfigurationError): + return False + else: + return True + + def fetch_repos(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories from AWS CodeCommit. + + Parameters + ---------- + options : ImportOptions + Import options (target is used as optional name filter) + + Yields + ------ + RemoteRepo + Repository information + + Raises + ------ + AuthenticationError + When AWS credentials are missing + ConfigurationError + When region is invalid + DependencyError + When AWS CLI is not installed + """ + # List all repositories (paginate over nextToken) + repositories: list[dict[str, t.Any]] = [] + next_token: str | None = None + while True: + cmd_args = ["codecommit", "list-repositories"] + if next_token: + cmd_args.extend(["--next-token", next_token]) + data = self._run_aws_command(*cmd_args) + repositories.extend(data.get("repositories", [])) + next_token = data.get("nextToken") + if not next_token: + break + + if not repositories: + return + + # Filter by name if target is provided + if options.target: + target_lower = options.target.lower() + repositories = [ + r + for r in repositories + if target_lower in r.get("repositoryName", "").lower() + ] + + # Batch get repository details (up to 25 at a time) + count = 0 + batch_size = 25 + + for i in range(0, len(repositories), batch_size): + if count >= options.limit: + break + + batch = repositories[i : i + batch_size] + repo_names = [r["repositoryName"] for r in batch] + + # Get detailed info for batch + details = self._run_aws_command( + "codecommit", + "batch-get-repositories", + "--repository-names", + *repo_names, + ) + + for repo_metadata in details.get("repositories", []): + if count >= options.limit: + break + + repo = self._parse_repo(repo_metadata) + if filter_repo(repo, options): + yield repo + count += 1 + + def _parse_repo(self, data: dict[str, t.Any]) -> RemoteRepo: + """Parse CodeCommit repository metadata into RemoteRepo. + + Parameters + ---------- + data : dict + CodeCommit repository metadata + + Returns + ------- + RemoteRepo + Parsed repository information + """ + repo_name = data.get("repositoryName", "") + account_id = data.get("accountId", "") + + # Build console URL + region = self._region + if not region: + # Extract region from clone URL + # (format: https://git-codecommit.{region}.amazonaws.com/...) + clone_http = data.get("cloneUrlHttp", "") + if "git-codecommit." in clone_http: + region = clone_http.split("git-codecommit.")[1].split(".")[0] + else: + region = "us-east-1" + log.debug( + "Could not determine region, defaulting to %s for console URL", + region, + ) + html_url = ( + f"https://{region}.console.aws.amazon.com/codesuite/codecommit/" + f"repositories/{repo_name}/browse" + ) + + return RemoteRepo( + name=repo_name, + clone_url=data.get("cloneUrlHttp", ""), + ssh_url=data.get("cloneUrlSsh", ""), + html_url=html_url, + description=data.get("repositoryDescription"), + language=None, # CodeCommit doesn't track language + topics=(), # CodeCommit doesn't have topics + stars=0, # CodeCommit doesn't have stars + is_fork=False, # CodeCommit doesn't have forks + is_archived=False, # CodeCommit doesn't have archived state + default_branch=data.get("defaultBranch", "main"), + owner=account_id, + ) diff --git a/src/vcspull/_internal/remotes/gitea.py b/src/vcspull/_internal/remotes/gitea.py new file mode 100644 index 000000000..67b1487cb --- /dev/null +++ b/src/vcspull/_internal/remotes/gitea.py @@ -0,0 +1,322 @@ +"""Gitea/Forgejo/Codeberg repository importer for vcspull.""" + +from __future__ import annotations + +import logging +import typing as t +import urllib.parse + +from .base import ( + HTTPClient, + ImportMode, + ImportOptions, + RemoteRepo, + filter_repo, + get_token_from_env, +) + +log = logging.getLogger(__name__) + +CODEBERG_API_URL = "https://codeberg.org" +DEFAULT_PER_PAGE = 50 # Gitea's default is 50 + + +class GiteaImporter: + """Importer for Gitea, Forgejo, and Codeberg repositories. + + Supports three modes: + - USER: Fetch repositories for a user + - ORG: Fetch repositories for an organization + - SEARCH: Search for repositories by query + + Examples + -------- + >>> importer = GiteaImporter(base_url="https://codeberg.org") + >>> importer.service_name + 'Gitea' + """ + + service_name: str = "Gitea" + + def __init__( + self, + token: str | None = None, + base_url: str | None = None, + ) -> None: + """Initialize the Gitea/Forgejo/Codeberg importer. + + Parameters + ---------- + token : str | None + API token. If not provided, will try service-specific env vars. + base_url : str | None + Base URL for the Gitea instance. Required for generic Gitea. + Defaults to Codeberg if not specified. + + Notes + ----- + Token lookup is hostname-aware: + + - Codeberg (codeberg.org): ``CODEBERG_TOKEN``, falls back to + ``GITEA_TOKEN`` + - Forgejo (hostname contains "forgejo"): ``FORGEJO_TOKEN``, falls back + to ``GITEA_TOKEN`` + - Other Gitea instances: ``GITEA_TOKEN`` + + Create a scoped token with at least ``read:repository`` permission at + ``https:///user/settings/applications``. + + Examples + -------- + >>> importer = GiteaImporter(token="fake", base_url="https://codeberg.org") + >>> importer.service_name + 'Gitea' + """ + self._base_url = (base_url or CODEBERG_API_URL).rstrip("/") + + # Determine token from environment based on service. + # Use proper URL parsing to extract hostname to avoid substring attacks. + parsed_url = urllib.parse.urlparse(self._base_url.lower()) + hostname = parsed_url.netloc + + self._token: str | None + if token: + self._token = token + elif hostname == "codeberg.org": + self._token = get_token_from_env("CODEBERG_TOKEN", "GITEA_TOKEN") + elif "forgejo" in hostname: + self._token = get_token_from_env("FORGEJO_TOKEN", "GITEA_TOKEN") + else: + self._token = get_token_from_env("GITEA_TOKEN") + + self._client = HTTPClient( + f"{self._base_url}/api/v1", + token=self._token, + auth_header="Authorization", + auth_prefix="token", # Gitea uses "token " + user_agent="vcspull", + ) + + @property + def is_authenticated(self) -> bool: + """Check if the importer has authentication configured. + + Returns + ------- + bool + True if a token is configured + + Examples + -------- + >>> GiteaImporter(token="fake", base_url="https://codeberg.org").is_authenticated + True + """ + return self._token is not None + + def fetch_repos(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories from Gitea/Forgejo/Codeberg. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + + Raises + ------ + AuthenticationError + When authentication fails + RateLimitError + When rate limit is exceeded + NotFoundError + When user/org is not found + """ + if options.mode == ImportMode.USER: + yield from self._fetch_user(options) + elif options.mode == ImportMode.ORG: + yield from self._fetch_org(options) + elif options.mode == ImportMode.SEARCH: + yield from self._fetch_search(options) + + def _fetch_user(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories for a user. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + target = urllib.parse.quote(options.target, safe="") + endpoint = f"/users/{target}/repos" + yield from self._paginate_repos(endpoint, options) + + def _fetch_org(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories for an organization. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + target = urllib.parse.quote(options.target, safe="") + endpoint = f"/orgs/{target}/repos" + yield from self._paginate_repos(endpoint, options) + + def _fetch_search(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Search for repositories. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + endpoint = "/repos/search" + page = 1 + count = 0 + + while count < options.limit: + # Always use DEFAULT_PER_PAGE to maintain consistent pagination offset. + # Changing limit between pages causes offset misalignment and duplicates. + params: dict[str, str | int] = { + "q": options.target, + "limit": DEFAULT_PER_PAGE, + "page": page, + "sort": "stars", + "order": "desc", + } + + if not options.include_archived: + params["archived"] = "false" + + if not options.include_forks: + params["fork"] = "false" + + data, _headers = self._client.get( + endpoint, + params=params, + service_name=self.service_name, + ) + + # Gitea search returns {"ok": true, "data": [...]} or just [...] + items = data.get("data", []) if isinstance(data, dict) else data + + if not items: + break + + for item in items: + if count >= options.limit: + break + + repo = self._parse_repo(item) + if filter_repo(repo, options): + yield repo + count += 1 + + # Check if there are more pages + if len(items) < DEFAULT_PER_PAGE: + break + + page += 1 + + def _paginate_repos( + self, + endpoint: str, + options: ImportOptions, + ) -> t.Iterator[RemoteRepo]: + """Paginate through repository listing endpoints. + + Parameters + ---------- + endpoint : str + API endpoint + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + page = 1 + count = 0 + + while count < options.limit: + # Always use DEFAULT_PER_PAGE to maintain consistent pagination offset. + # Changing limit between pages causes offset misalignment and duplicates. + params: dict[str, str | int] = { + "limit": DEFAULT_PER_PAGE, + "page": page, + } + + data, _headers = self._client.get( + endpoint, + params=params, + service_name=self.service_name, + ) + + if not data: + break + + for item in data: + if count >= options.limit: + break + + repo = self._parse_repo(item) + if filter_repo(repo, options): + yield repo + count += 1 + + # Check if there are more pages + if len(data) < DEFAULT_PER_PAGE: + break + + page += 1 + + def _parse_repo(self, data: dict[str, t.Any]) -> RemoteRepo: + """Parse Gitea API response into RemoteRepo. + + Parameters + ---------- + data : dict + Gitea API repository data + + Returns + ------- + RemoteRepo + Parsed repository information + """ + owner_data = data.get("owner") or {} + + return RemoteRepo( + name=data.get("name", ""), + clone_url=data.get("clone_url", ""), + ssh_url=data.get("ssh_url", ""), + html_url=data.get("html_url", ""), + description=data.get("description"), + language=data.get("language"), + topics=tuple(data.get("topics") or []), + stars=data.get("stars_count", 0), # Note: Gitea uses stars_count + is_fork=data.get("fork", False), + is_archived=data.get("archived", False), + default_branch=data.get("default_branch", "main"), + owner=owner_data.get("login", owner_data.get("username", "")), + ) diff --git a/src/vcspull/_internal/remotes/github.py b/src/vcspull/_internal/remotes/github.py new file mode 100644 index 000000000..74f529d06 --- /dev/null +++ b/src/vcspull/_internal/remotes/github.py @@ -0,0 +1,358 @@ +"""GitHub repository importer for vcspull.""" + +from __future__ import annotations + +import logging +import typing as t +import urllib.parse + +from .base import ( + HTTPClient, + ImportMode, + ImportOptions, + RemoteRepo, + filter_repo, + get_token_from_env, +) + +log = logging.getLogger(__name__) + +GITHUB_API_URL = "https://api.github.com" +DEFAULT_PER_PAGE = 100 +# GitHub search API limits results to 1000; exceeding this causes HTTP 422. +SEARCH_MAX_RESULTS = 1000 + + +class GitHubImporter: + """Importer for GitHub repositories. + + Supports three modes: + - USER: Fetch repositories for a user + - ORG: Fetch repositories for an organization + - SEARCH: Search for repositories by query + + Examples + -------- + >>> importer = GitHubImporter() + >>> importer.service_name + 'GitHub' + """ + + service_name: str = "GitHub" + + def __init__( + self, + token: str | None = None, + base_url: str | None = None, + ) -> None: + """Initialize the GitHub importer. + + Parameters + ---------- + token : str | None + GitHub API token. If not provided, will try GITHUB_TOKEN env var. + base_url : str | None + Base URL for GitHub Enterprise. Defaults to api.github.com. + + Notes + ----- + Authentication is optional for public repositories. For private + repositories or higher rate limits, set ``GITHUB_TOKEN`` or ``GH_TOKEN``. + + Classic PAT: no scopes needed for public repos; ``repo`` scope for + private. Fine-grained PAT: "Metadata: Read-only" for public; add + "Contents: Read-only" for private repos. + + Create a token at https://github.com/settings/tokens. + + Examples + -------- + >>> importer = GitHubImporter(token="fake") + >>> importer.service_name + 'GitHub' + """ + self._token = token or get_token_from_env("GITHUB_TOKEN", "GH_TOKEN") + self._base_url = (base_url or GITHUB_API_URL).rstrip("/") + + # GitHub Enterprise needs /api/v3; public api.github.com does not + api_url = self._base_url + if base_url and "/api/" not in self._base_url: + api_url = f"{self._base_url}/api/v3" + + self._client = HTTPClient( + api_url, + token=self._token, + auth_header="Authorization", + auth_prefix="Bearer", + user_agent="vcspull", + ) + + @property + def is_authenticated(self) -> bool: + """Check if the importer has authentication configured. + + Returns + ------- + bool + True if a token is configured + + Examples + -------- + >>> GitHubImporter(token="fake").is_authenticated + True + """ + return self._token is not None + + def fetch_repos(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories from GitHub. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + + Raises + ------ + AuthenticationError + When authentication fails + RateLimitError + When rate limit is exceeded + NotFoundError + When user/org is not found + """ + if options.mode == ImportMode.USER: + yield from self._fetch_user(options) + elif options.mode == ImportMode.ORG: + yield from self._fetch_org(options) + elif options.mode == ImportMode.SEARCH: + yield from self._fetch_search(options) + + def _fetch_user(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories for a user. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + target = urllib.parse.quote(options.target, safe="") + endpoint = f"/users/{target}/repos" + yield from self._paginate_repos(endpoint, options) + + def _fetch_org(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories for an organization. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + target = urllib.parse.quote(options.target, safe="") + endpoint = f"/orgs/{target}/repos" + yield from self._paginate_repos(endpoint, options) + + def _fetch_search(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Search for repositories. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + query_parts = [options.target] + + if options.language: + query_parts.append(f"language:{options.language}") + + if options.min_stars > 0: + query_parts.append(f"stars:>={options.min_stars}") + + query = " ".join(query_parts) + endpoint = "/search/repositories" + page = 1 + count = 0 + + while count < options.limit: + # Always use DEFAULT_PER_PAGE to maintain consistent pagination offset. + # Changing per_page between pages causes offset misalignment and duplicates. + params: dict[str, str | int] = { + "q": query, + "per_page": DEFAULT_PER_PAGE, + "page": page, + "sort": "stars", + "order": "desc", + } + + data, headers = self._client.get( + endpoint, + params=params, + service_name=self.service_name, + ) + + self._log_rate_limit(headers) + + total_count = data.get("total_count", 0) + if page == 1 and total_count > 1000: + log.warning( + "GitHub search returned %d total results but API limits " + "to 1000; consider narrowing your query", + total_count, + ) + + items = data.get("items", []) + if not items: + break + + for item in items: + if count >= options.limit: + break + + repo = self._parse_repo(item) + if filter_repo(repo, options): + yield repo + count += 1 + + # Check if there are more pages + if len(items) < DEFAULT_PER_PAGE: + break + + # GitHub search API caps at 1000 results + if page * DEFAULT_PER_PAGE >= SEARCH_MAX_RESULTS: + break + + page += 1 + + def _paginate_repos( + self, + endpoint: str, + options: ImportOptions, + ) -> t.Iterator[RemoteRepo]: + """Paginate through repository listing endpoints. + + Parameters + ---------- + endpoint : str + API endpoint + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + page = 1 + count = 0 + + while count < options.limit: + # Always use DEFAULT_PER_PAGE to maintain consistent pagination offset. + # Changing per_page between pages causes offset misalignment and duplicates. + params: dict[str, str | int] = { + "per_page": DEFAULT_PER_PAGE, + "page": page, + "sort": "updated", + "direction": "desc", + } + + data, headers = self._client.get( + endpoint, + params=params, + service_name=self.service_name, + ) + + self._log_rate_limit(headers) + + if not data: + break + + for item in data: + if count >= options.limit: + break + + repo = self._parse_repo(item) + if filter_repo(repo, options): + yield repo + count += 1 + + # Check if there are more pages + if len(data) < DEFAULT_PER_PAGE: + break + + page += 1 + + def _parse_repo(self, data: dict[str, t.Any]) -> RemoteRepo: + """Parse GitHub API response into RemoteRepo. + + Parameters + ---------- + data : dict + GitHub API repository data + + Returns + ------- + RemoteRepo + Parsed repository information + """ + return RemoteRepo( + name=data.get("name", ""), + clone_url=data.get("clone_url", ""), + ssh_url=data.get("ssh_url", ""), + html_url=data.get("html_url", ""), + description=data.get("description"), + language=data.get("language"), + topics=tuple(data.get("topics") or []), + stars=data.get("stargazers_count", 0), + is_fork=data.get("fork", False), + is_archived=data.get("archived", False), + default_branch=data.get("default_branch", "main"), + owner=(data.get("owner") or {}).get("login", ""), + ) + + def _log_rate_limit(self, headers: dict[str, str]) -> None: + """Log rate limit information from response headers. + + Parameters + ---------- + headers : dict[str, str] + Response headers + """ + remaining = headers.get("x-ratelimit-remaining") + limit = headers.get("x-ratelimit-limit") + + if remaining is not None and limit is not None: + try: + remaining_int = int(remaining) + except (ValueError, TypeError): + return + if remaining_int < 10: + log.warning( + "GitHub API rate limit low: %s/%s remaining", + remaining, + limit, + ) + else: + log.debug( + "GitHub API rate limit: %s/%s remaining", + remaining, + limit, + ) diff --git a/src/vcspull/_internal/remotes/gitlab.py b/src/vcspull/_internal/remotes/gitlab.py new file mode 100644 index 000000000..799d23561 --- /dev/null +++ b/src/vcspull/_internal/remotes/gitlab.py @@ -0,0 +1,338 @@ +"""GitLab repository importer for vcspull.""" + +from __future__ import annotations + +import logging +import typing as t +import urllib.parse + +from .base import ( + AuthenticationError, + HTTPClient, + ImportMode, + ImportOptions, + RemoteRepo, + filter_repo, + get_token_from_env, +) + +log = logging.getLogger(__name__) + +GITLAB_API_URL = "https://gitlab.com" +DEFAULT_PER_PAGE = 100 + + +class GitLabImporter: + """Importer for GitLab repositories. + + Supports three modes: + - USER: Fetch repositories for a user + - ORG: Fetch repositories for a group (organization) + - SEARCH: Search for repositories (requires authentication) + + Examples + -------- + >>> importer = GitLabImporter() + >>> importer.service_name + 'GitLab' + """ + + service_name: str = "GitLab" + + def __init__( + self, + token: str | None = None, + base_url: str | None = None, + ) -> None: + """Initialize the GitLab importer. + + Parameters + ---------- + token : str | None + GitLab API token. If not provided, will try GITLAB_TOKEN env var. + base_url : str | None + Base URL for self-hosted GitLab instances. Defaults to gitlab.com. + + Notes + ----- + Set ``GITLAB_TOKEN`` or ``GL_TOKEN`` for authentication. A token with + the ``read_api`` scope is the minimum for listing projects. Search mode + **requires** authentication. + + Create a token at + https://gitlab.com/-/user_settings/personal_access_tokens. + + Examples + -------- + >>> importer = GitLabImporter(token="fake") + >>> importer.service_name + 'GitLab' + """ + self._token = token or get_token_from_env("GITLAB_TOKEN", "GL_TOKEN") + self._base_url = (base_url or GITLAB_API_URL).rstrip("/") + self._client = HTTPClient( + f"{self._base_url}/api/v4", + token=self._token, + auth_header="PRIVATE-TOKEN", + auth_prefix="", # GitLab uses token directly without prefix + user_agent="vcspull", + ) + + @property + def is_authenticated(self) -> bool: + """Check if the importer has authentication configured. + + Returns + ------- + bool + True if a token is configured + + Examples + -------- + >>> GitLabImporter(token="fake").is_authenticated + True + """ + return self._token is not None + + def fetch_repos(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories from GitLab. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + + Raises + ------ + AuthenticationError + When authentication fails or is required for search + RateLimitError + When rate limit is exceeded + NotFoundError + When user/group is not found + """ + if options.mode == ImportMode.USER: + yield from self._fetch_user(options) + elif options.mode == ImportMode.ORG: + yield from self._fetch_group(options) + elif options.mode == ImportMode.SEARCH: + yield from self._fetch_search(options) + + def _fetch_user(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories for a user. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + target = urllib.parse.quote(options.target, safe="") + endpoint = f"/users/{target}/projects" + yield from self._paginate_repos(endpoint, options) + + def _fetch_group(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Fetch repositories for a group (organization). + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + """ + # URL-encode the group name (handles slashes in subgroups, etc.) + target = urllib.parse.quote(options.target, safe="") + endpoint = f"/groups/{target}/projects" + yield from self._paginate_repos(endpoint, options, include_subgroups=True) + + def _fetch_search(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Search for repositories. + + Note: GitLab search API requires authentication. + + Parameters + ---------- + options : ImportOptions + Import options + + Yields + ------ + RemoteRepo + Repository information + + Raises + ------ + AuthenticationError + When not authenticated (GitLab search requires auth) + """ + if not self.is_authenticated: + msg = ( + "GitLab search API requires authentication. Please provide " + "a token via --token or GITLAB_TOKEN environment variable." + ) + raise AuthenticationError(msg, service=self.service_name) + + endpoint = "/search" + page = 1 + count = 0 + + while count < options.limit: + # Always use DEFAULT_PER_PAGE to maintain consistent pagination offset. + # Changing per_page between pages causes offset misalignment and duplicates. + params: dict[str, str | int] = { + "scope": "projects", + "search": options.target, + "per_page": DEFAULT_PER_PAGE, + "page": page, + } + + if not options.include_archived: + params["archived"] = "false" + + data, _headers = self._client.get( + endpoint, + params=params, + service_name=self.service_name, + ) + + if not data: + break + + for item in data: + if count >= options.limit: + break + + repo = self._parse_repo(item) + if filter_repo(repo, options): + yield repo + count += 1 + + # Check if there are more pages + if len(data) < DEFAULT_PER_PAGE: + break + + page += 1 + + def _paginate_repos( + self, + endpoint: str, + options: ImportOptions, + *, + include_subgroups: bool = False, + ) -> t.Iterator[RemoteRepo]: + """Paginate through project listing endpoints. + + Parameters + ---------- + endpoint : str + API endpoint + options : ImportOptions + Import options + include_subgroups : bool + Whether to include projects from subgroups + + Yields + ------ + RemoteRepo + Repository information + """ + page = 1 + count = 0 + + while count < options.limit: + # Always use DEFAULT_PER_PAGE to maintain consistent pagination offset. + # Changing per_page between pages causes offset misalignment and duplicates. + params: dict[str, str | int] = { + "per_page": DEFAULT_PER_PAGE, + "page": page, + "order_by": "last_activity_at", + "sort": "desc", + } + + if include_subgroups: + params["include_subgroups"] = "true" + + if not options.include_archived: + params["archived"] = "false" + # When include_archived=True, omit the param to get all projects + + data, _headers = self._client.get( + endpoint, + params=params, + service_name=self.service_name, + ) + + if not data: + break + + for item in data: + if count >= options.limit: + break + + repo = self._parse_repo(item) + if filter_repo(repo, options): + yield repo + count += 1 + + # Check if there are more pages + if len(data) < DEFAULT_PER_PAGE: + break + + page += 1 + + def _parse_repo(self, data: dict[str, t.Any]) -> RemoteRepo: + """Parse GitLab API response into RemoteRepo. + + Parameters + ---------- + data : dict + GitLab API project data + + Returns + ------- + RemoteRepo + Parsed repository information + """ + # Use 'path' instead of 'name' for filesystem-safe name + name = data.get("path", data.get("name", "")) + + # Prefer the full namespace path for subgroup-aware import behavior. + namespace = data.get("namespace") or {} + owner = namespace.get("full_path") + if not owner: + path_with_namespace = data.get("path_with_namespace") + if isinstance(path_with_namespace, str) and "/" in path_with_namespace: + owner = path_with_namespace.rsplit("/", 1)[0] + else: + owner = namespace.get("path", namespace.get("name", "")) + + # Check if it's a fork + is_fork = data.get("forked_from_project") is not None + + return RemoteRepo( + name=name, + clone_url=data.get("http_url_to_repo", ""), + ssh_url=data.get("ssh_url_to_repo", ""), + html_url=data.get("web_url", ""), + description=data.get("description"), + language=None, # GitLab doesn't return language in list endpoints + topics=tuple(data.get("topics") or data.get("tag_list") or []), + stars=data.get("star_count", 0), + is_fork=is_fork, + is_archived=data.get("archived", False), + default_branch=data.get("default_branch", "main"), + owner=owner, + ) diff --git a/src/vcspull/cli/__init__.py b/src/vcspull/cli/__init__.py index 4a73e8f36..7fb589173 100644 --- a/src/vcspull/cli/__init__.py +++ b/src/vcspull/cli/__init__.py @@ -7,7 +7,6 @@ import pathlib import textwrap import typing as t -from typing import overload from libvcs.__about__ import __version__ as libvcs_version @@ -18,6 +17,7 @@ from .add import add_repo, create_add_subparser, handle_add_command from .discover import create_discover_subparser, discover_repos from .fmt import create_fmt_subparser, format_config_file +from .import_cmd import create_import_subparser from .list import create_list_subparser, list_repos from .search import create_search_subparser, search_repos from .status import create_status_subparser, status_repos @@ -105,6 +105,15 @@ def build_description( "vcspull fmt --all", ], ), + ( + "import", + [ + "vcspull import github torvalds -w ~/repos/linux --mode user", + "vcspull import github django -w ~/study/python --mode org", + "vcspull import gitlab gitlab-org/ci-cd -w ~/work --mode org", + "vcspull import codeberg user -w ~/oss --json", + ], + ), ), ) @@ -234,14 +243,41 @@ def build_description( ), ) +IMPORT_DESCRIPTION = build_description( + """ + Import repositories from remote services. + + Fetches repository lists from a remote hosting service and adds them to + the vcspull configuration. Choose a service subcommand for details: -@overload + github (gh) GitHub or GitHub Enterprise + gitlab (gl) GitLab (gitlab.com or self-hosted) + codeberg (cb) Codeberg + gitea Self-hosted Gitea instance + forgejo Self-hosted Forgejo instance + codecommit (cc) AWS CodeCommit + """, + ( + ( + None, + [ + "vcspull import github torvalds -w ~/repos/linux", + "vcspull import gh django -w ~/study/python --mode org", + "vcspull import gitlab mygroup -w ~/work --mode org", + "vcspull import codecommit -w ~/work/aws --region us-east-1", + ], + ), + ), +) + + +@t.overload def create_parser( return_subparsers: t.Literal[True], ) -> tuple[argparse.ArgumentParser, t.Any]: ... -@overload +@t.overload def create_parser(return_subparsers: t.Literal[False]) -> argparse.ArgumentParser: ... @@ -333,6 +369,15 @@ def create_parser( ) create_fmt_subparser(fmt_parser) + # Import command + import_parser = subparsers.add_parser( + "import", + help="import repositories from remote services", + formatter_class=VcspullHelpFormatter, + description=IMPORT_DESCRIPTION, + ) + create_import_subparser(import_parser) + if return_subparsers: # Return all parsers needed by cli() function return parser, ( @@ -343,6 +388,7 @@ def create_parser( add_parser, discover_parser, fmt_parser, + import_parser, ) return parser @@ -358,6 +404,7 @@ def cli(_args: list[str] | None = None) -> None: add_parser, discover_parser, _fmt_parser, + _import_parser, ) = subparsers args = parser.parse_args(_args) @@ -453,3 +500,11 @@ def cli(_args: list[str] | None = None) -> None: args.all, merge_roots=args.merge_roots, ) + elif args.subparser_name == "import": + handler = getattr(args, "import_handler", None) + if handler is None: + _import_parser.print_help() + return + result = handler(args) + if result: + raise SystemExit(result) diff --git a/src/vcspull/cli/_formatter.py b/src/vcspull/cli/_formatter.py index 8cbf01828..4f96fc31c 100644 --- a/src/vcspull/cli/_formatter.py +++ b/src/vcspull/cli/_formatter.py @@ -19,6 +19,16 @@ "--max-concurrent", "--name", "--url", + "--region", + "--profile", + "--token", + "-m", + "--mode", + "-l", + "--language", + "--topics", + "--min-stars", + "--limit", } OPTIONS_FLAG_ONLY = { @@ -62,6 +72,10 @@ "--sequential", "--no-merge", "--verbose", + "--flatten-groups", + "--archived", + "--forks", + "--https", } diff --git a/src/vcspull/cli/_output.py b/src/vcspull/cli/_output.py index 11f257e3f..c2add1d2d 100644 --- a/src/vcspull/cli/_output.py +++ b/src/vcspull/cli/_output.py @@ -48,7 +48,25 @@ class PlanEntry: diagnostics: list[str] = field(default_factory=list) def to_payload(self) -> dict[str, t.Any]: - """Convert the plan entry into a serialisable payload.""" + """Convert the plan entry into a serialisable payload. + + Examples + -------- + >>> entry = PlanEntry( + ... name="myrepo", + ... path="/home/user/repos/myrepo", + ... workspace_root="/home/user/repos", + ... action=PlanAction.CLONE, + ... url="git+https://github.com/user/myrepo.git", + ... ) + >>> payload = entry.to_payload() + >>> payload["name"] + 'myrepo' + >>> payload["action"] + 'clone' + >>> payload["format_version"] + '1' + """ payload: dict[str, t.Any] = { "format_version": "1", "type": "operation", @@ -94,11 +112,28 @@ class PlanSummary: duration_ms: int | None = None def total(self) -> int: - """Return the total number of repositories accounted for.""" + """Return the total number of repositories accounted for. + + Examples + -------- + >>> summary = PlanSummary(clone=2, update=3, unchanged=1) + >>> summary.total() + 6 + """ return self.clone + self.update + self.unchanged + self.blocked + self.errors def to_payload(self) -> dict[str, t.Any]: - """Convert the summary to a serialisable payload.""" + """Convert the summary to a serialisable payload. + + Examples + -------- + >>> summary = PlanSummary(clone=1, update=2) + >>> payload = summary.to_payload() + >>> payload["type"] + 'summary' + >>> payload["total"] + 3 + """ payload: dict[str, t.Any] = { "format_version": "1", "type": "summary", @@ -143,6 +178,12 @@ def __init__(self, mode: OutputMode = OutputMode.HUMAN) -> None: ---------- mode : OutputMode The output mode to use (human, json, ndjson) + + Examples + -------- + >>> formatter = OutputFormatter(OutputMode.JSON) + >>> formatter.mode + """ self.mode = mode self._json_buffer: list[dict[str, t.Any]] = [] @@ -184,7 +225,7 @@ def emit_text(self, text: str) -> None: def finalize(self) -> None: """Finalize output (flush JSON buffer if needed).""" - if self.mode == OutputMode.JSON and self._json_buffer: + if self.mode == OutputMode.JSON: sys.stdout.write(json.dumps(self._json_buffer, indent=2) + "\n") sys.stdout.flush() self._json_buffer.clear() @@ -204,6 +245,15 @@ def get_output_mode(json_flag: bool, ndjson_flag: bool) -> OutputMode: ------- OutputMode The determined output mode (NDJSON takes precedence over JSON) + + Examples + -------- + >>> get_output_mode(json_flag=False, ndjson_flag=False) + + >>> get_output_mode(json_flag=True, ndjson_flag=False) + + >>> get_output_mode(json_flag=True, ndjson_flag=True) + """ if ndjson_flag: return OutputMode.NDJSON diff --git a/src/vcspull/cli/import_cmd/__init__.py b/src/vcspull/cli/import_cmd/__init__.py new file mode 100644 index 000000000..d3c1dd9bf --- /dev/null +++ b/src/vcspull/cli/import_cmd/__init__.py @@ -0,0 +1,38 @@ +"""``vcspull import`` subcommand package. + +Each supported service (GitHub, GitLab, Codeberg, Gitea, Forgejo, +CodeCommit) is registered as a proper argparse subcommand so that +``vcspull import --help`` shows only the flags relevant to +that service. +""" + +from __future__ import annotations + +import argparse + +from .codeberg import create_codeberg_subparser +from .codecommit import create_codecommit_subparser +from .forgejo import create_forgejo_subparser +from .gitea import create_gitea_subparser +from .github import create_github_subparser +from .gitlab import create_gitlab_subparser + +__all__ = ["create_import_subparser"] + + +def create_import_subparser(parser: argparse.ArgumentParser) -> None: + """Wire per-service subparsers into the ``vcspull import`` parser. + + Parameters + ---------- + parser : argparse.ArgumentParser + The ``import`` parser to attach service subcommands to. + """ + service_subparsers = parser.add_subparsers(dest="import_service") + + create_github_subparser(service_subparsers) + create_gitlab_subparser(service_subparsers) + create_codeberg_subparser(service_subparsers) + create_gitea_subparser(service_subparsers) + create_forgejo_subparser(service_subparsers) + create_codecommit_subparser(service_subparsers) diff --git a/src/vcspull/cli/import_cmd/_common.py b/src/vcspull/cli/import_cmd/_common.py new file mode 100644 index 000000000..1ae4820e7 --- /dev/null +++ b/src/vcspull/cli/import_cmd/_common.py @@ -0,0 +1,643 @@ +"""Shared infrastructure for the ``vcspull import`` subcommand tree. + +Provides parent argparse parsers (for flag composition via ``parents=[]``) +and the ``_run_import()`` function that all per-service handlers delegate to. +""" + +from __future__ import annotations + +import argparse +import logging +import pathlib +import sys +import typing as t + +from vcspull._internal.config_reader import ConfigReader +from vcspull._internal.private_path import PrivatePath +from vcspull._internal.remotes import ( + AuthenticationError, + ConfigurationError, + ImportMode, + ImportOptions, + NotFoundError, + RateLimitError, + RemoteImportError, + RemoteRepo, + ServiceUnavailableError, +) +from vcspull.config import ( + find_home_config_files, + save_config_json, + save_config_yaml, + workspace_root_label, +) +from vcspull.exc import MultipleConfigWarning + +from .._colors import Colors, get_color_mode +from .._output import OutputFormatter, get_output_mode + +log = logging.getLogger(__name__) + + +class Importer(t.Protocol): + """Structural type for any remote service importer.""" + + service_name: str + + def fetch_repos(self, options: ImportOptions) -> t.Iterator[RemoteRepo]: + """Yield repositories matching *options*.""" + ... + + +# --------------------------------------------------------------------------- +# Parent parser factories +# --------------------------------------------------------------------------- + + +def _create_shared_parent() -> argparse.ArgumentParser: + """Create parent parser with workspace, filtering, and output flags. + + Returns + ------- + argparse.ArgumentParser + Parent parser (``add_help=False``) carrying flags shared by all + import service subcommands. + """ + parent = argparse.ArgumentParser(add_help=False) + parent.add_argument( + "-w", + "--workspace", + dest="workspace", + metavar="DIR", + default=None, + help="Workspace root directory (REQUIRED)", + ) + + # Filtering options + filter_group = parent.add_argument_group("filtering") + filter_group.add_argument( + "-l", + "--language", + dest="language", + metavar="LANG", + help="Filter by programming language", + ) + filter_group.add_argument( + "--topics", + dest="topics", + metavar="TOPICS", + help="Filter by topics (comma-separated)", + ) + filter_group.add_argument( + "--min-stars", + dest="min_stars", + type=int, + default=0, + metavar="N", + help="Minimum stars (for search mode)", + ) + filter_group.add_argument( + "--archived", + dest="include_archived", + action="store_true", + help="Include archived repositories", + ) + filter_group.add_argument( + "--forks", + dest="include_forks", + action="store_true", + help="Include forked repositories", + ) + filter_group.add_argument( + "--limit", + dest="limit", + type=int, + default=100, + metavar="N", + help="Maximum repositories to fetch (default: 100)", + ) + + # Output options + output_group = parent.add_argument_group("output") + output_group.add_argument( + "-f", + "--file", + dest="config", + metavar="FILE", + help="Config file to write to (default: ~/.vcspull.yaml)", + ) + output_group.add_argument( + "--dry-run", + "-n", + action="store_true", + help="Preview without writing to config file", + ) + output_group.add_argument( + "--yes", + "-y", + action="store_true", + help="Skip confirmation prompt", + ) + output_group.add_argument( + "--json", + action="store_true", + dest="output_json", + help="Output as JSON", + ) + output_group.add_argument( + "--ndjson", + action="store_true", + dest="output_ndjson", + help="Output as NDJSON (one JSON per line)", + ) + output_group.add_argument( + "--https", + action="store_true", + dest="use_https", + help="Use HTTPS clone URLs instead of SSH (default: SSH)", + ) + output_group.add_argument( + "--color", + choices=["auto", "always", "never"], + default="auto", + help="When to use colors (default: auto)", + ) + return parent + + +def _create_token_parent() -> argparse.ArgumentParser: + """Create parent parser with the ``--token`` flag. + + Returns + ------- + argparse.ArgumentParser + Parent parser carrying ``--token``. + """ + parent = argparse.ArgumentParser(add_help=False) + parent.add_argument( + "--token", + dest="token", + metavar="TOKEN", + help="API token (overrides env var; prefer env var for security)", + ) + return parent + + +def _create_mode_parent() -> argparse.ArgumentParser: + """Create parent parser with the ``-m/--mode`` flag. + + Returns + ------- + argparse.ArgumentParser + Parent parser carrying ``-m/--mode``. + """ + parent = argparse.ArgumentParser(add_help=False) + parent.add_argument( + "-m", + "--mode", + dest="mode", + choices=["user", "org", "search"], + default="user", + help="Import mode: user (default), org, or search", + ) + return parent + + +def _create_target_parent() -> argparse.ArgumentParser: + """Create parent parser with the required ``target`` positional. + + Returns + ------- + argparse.ArgumentParser + Parent parser carrying the ``target`` positional argument. + """ + parent = argparse.ArgumentParser(add_help=False) + parent.add_argument( + "target", + metavar="TARGET", + help=( + "User, org name, or search query. " + "For GitLab, supports subgroups with slash notation (e.g., parent/child)." + ), + ) + return parent + + +# --------------------------------------------------------------------------- +# Config resolution +# --------------------------------------------------------------------------- + + +def _resolve_config_file(config_path_str: str | None) -> pathlib.Path: + """Resolve the config file path. + + Parameters + ---------- + config_path_str : str | None + Config file path from user, or None for default + + Returns + ------- + pathlib.Path + Resolved config file path + """ + if config_path_str: + path = pathlib.Path(config_path_str).expanduser().resolve() + if path.suffix.lower() not in {".yaml", ".yml", ".json"}: + msg = f"Unsupported config file type: {path.suffix}" + raise ValueError(msg) + return path + + home_configs = find_home_config_files(filetype=["yaml", "json"]) + if home_configs: + return home_configs[0] + + return pathlib.Path.home() / ".vcspull.yaml" + + +# --------------------------------------------------------------------------- +# Core import logic +# --------------------------------------------------------------------------- + + +def _run_import( + importer: Importer, + *, + service_name: str, + target: str, + workspace: str, + mode: str, + language: str | None, + topics: str | None, + min_stars: int, + include_archived: bool, + include_forks: bool, + limit: int, + config_path_str: str | None, + dry_run: bool, + yes: bool, + output_json: bool, + output_ndjson: bool, + color: str, + use_https: bool = False, + flatten_groups: bool = False, +) -> int: + """Run the import workflow for a single service. + + This is the core fetch / preview / confirm / write logic shared by every + per-service handler. The caller is responsible for constructing the + *importer* instance; this function only orchestrates the import flow. + + Parameters + ---------- + importer : Importer + Already-constructed importer instance (any object satisfying + the :class:`Importer` protocol) + service_name : str + Canonical service name (e.g. ``"github"``, ``"gitlab"``, ``"codecommit"``) + target : str + User, org, or search query + workspace : str + Workspace root directory + mode : str + Import mode (user, org, search) + language : str | None + Language filter + topics : str | None + Topics filter (comma-separated) + min_stars : int + Minimum stars filter + include_archived : bool + Include archived repositories + include_forks : bool + Include forked repositories + limit : int + Maximum repositories to fetch + config_path_str : str | None + Config file path + dry_run : bool + Preview without writing + yes : bool + Skip confirmation + output_json : bool + Output as JSON + output_ndjson : bool + Output as NDJSON + color : str + Color mode + use_https : bool + Use HTTPS clone URLs instead of SSH (default: False, i.e., SSH) + flatten_groups : bool + For GitLab org imports, flatten subgroup paths into base workspace + + Returns + ------- + int + 0 on success, 1 on error + """ + output_mode = get_output_mode(output_json, output_ndjson) + formatter = OutputFormatter(output_mode) + colors = Colors(get_color_mode(color)) + + # Build import options + import_mode = ImportMode(mode) + topic_list = ( + [topic.strip() for topic in topics.split(",") if topic.strip()] + if topics + else [] + ) + + try: + options = ImportOptions( + mode=import_mode, + target=target, + include_forks=include_forks, + include_archived=include_archived, + language=language, + topics=topic_list, + min_stars=min_stars, + limit=limit, + ) + except ValueError as exc_: + log.error("%s %s", colors.error("✗"), exc_) # noqa: TRY400 + return 1 + + # Warn if --language is used with services that don't return language info + if options.language and service_name in ("gitlab", "codecommit"): + log.warning( + "%s %s does not return language metadata; " + "--language filter may exclude all results", + colors.warning("!"), + importer.service_name, + ) + if options.topics and service_name == "codecommit": + log.warning( + "%s %s does not support topic filtering; " + "--topics filter may exclude all results", + colors.warning("!"), + importer.service_name, + ) + if options.min_stars > 0 and service_name == "codecommit": + log.warning( + "%s %s does not track star counts; " + "--min-stars filter may exclude all results", + colors.warning("!"), + importer.service_name, + ) + + # Resolve workspace path + workspace_path = pathlib.Path(workspace).expanduser().resolve() + cwd = pathlib.Path.cwd() + home = pathlib.Path.home() + + # Resolve config file + try: + config_file_path = _resolve_config_file(config_path_str) + except (ValueError, MultipleConfigWarning) as exc_: + log.error("%s %s", colors.error("✗"), exc_) # noqa: TRY400 + return 1 + display_config_path = str(PrivatePath(config_file_path)) + + # Fetch repositories + if output_mode.value == "human": + log.info( + "%s Fetching repositories from %s...", + colors.info("→"), + colors.highlight(importer.service_name), + ) + + repos: list[RemoteRepo] = [] + try: + for repo in importer.fetch_repos(options): + repos.append(repo) + + # Emit for JSON/NDJSON output + formatter.emit(repo.to_dict()) + + # Log progress for human output + if output_mode.value == "human" and len(repos) % 10 == 0: + log.info( + "%s Fetched %s repositories...", + colors.muted("•"), + colors.info(str(len(repos))), + ) + + except AuthenticationError as exc: + log.error( # noqa: TRY400 + "%s Authentication error: %s", colors.error("✗"), exc + ) + formatter.finalize() + return 1 + except RateLimitError as exc: + log.error( # noqa: TRY400 + "%s Rate limit exceeded: %s", colors.error("✗"), exc + ) + formatter.finalize() + return 1 + except NotFoundError as exc: + log.error("%s Not found: %s", colors.error("✗"), exc) # noqa: TRY400 + formatter.finalize() + return 1 + except ServiceUnavailableError as exc: + log.error( # noqa: TRY400 + "%s Service unavailable: %s", colors.error("✗"), exc + ) + formatter.finalize() + return 1 + except ConfigurationError as exc: + log.error( # noqa: TRY400 + "%s Configuration error: %s", colors.error("✗"), exc + ) + formatter.finalize() + return 1 + except RemoteImportError as exc: + log.error("%s Error: %s", colors.error("✗"), exc) # noqa: TRY400 + formatter.finalize() + return 1 + + if not repos: + if output_mode.value == "human": + log.info( + "%s No repositories found matching criteria.", + colors.warning("!"), + ) + formatter.finalize() + return 0 + + if output_mode.value == "human": + log.info( + "\n%s Found %s repositories", + colors.success("✓"), + colors.info(str(len(repos))), + ) + + # Show preview in human mode + if output_mode.value == "human": + for repo in repos[:10]: # Show first 10 + stars_str = f" ★{repo.stars}" if repo.stars > 0 else "" + lang_str = f" [{repo.language}]" if repo.language else "" + log.info( + " %s %s%s%s", + colors.success("+"), + colors.info(repo.name), + colors.muted(lang_str), + colors.muted(stars_str), + ) + if len(repos) > 10: + log.info( + " %s and %s more", + colors.muted("..."), + colors.info(str(len(repos) - 10)), + ) + + formatter.finalize() + + # Handle dry-run + if dry_run: + log.info( + "\n%s Dry run complete. Would write to %s", + colors.warning("→"), + colors.muted(display_config_path), + ) + return 0 + + # Confirm with user + if not yes and output_mode.value == "human": + if not sys.stdin.isatty(): + log.info( + "%s Non-interactive mode: use --yes to skip confirmation.", + colors.error("✗"), + ) + return 1 + try: + confirm = input( + f"\n{colors.info('Import')} {len(repos)} repositories to " + f"{display_config_path}? [y/N]: ", + ).lower() + except EOFError: + confirm = "" + if confirm not in {"y", "yes"}: + log.info("%s Aborted by user.", colors.error("✗")) + return 0 + + # Load existing config or create new + raw_config: dict[str, t.Any] + if config_file_path.exists(): + try: + raw_config = ConfigReader._from_file(config_file_path) or {} + except Exception: + log.exception("Error loading config file") + return 1 + + if not isinstance(raw_config, dict): + log.error( + "%s Config file is not a valid mapping: %s", + colors.error("✗"), + display_config_path, + ) + return 1 + else: + raw_config = {} + + # Add repositories to config + checked_labels: set[str] = set() + error_labels: set[str] = set() + added_count = 0 + skipped_count = 0 + + for repo in repos: + # Determine workspace for this repo + repo_workspace_path = workspace_path + + preserve_group_structure = ( + service_name == "gitlab" + and options.mode == ImportMode.ORG + and not flatten_groups + ) + if preserve_group_structure and repo.owner.startswith(options.target): + # Check if it is a subdirectory + if repo.owner == options.target: + subpath = "" + elif repo.owner.startswith(options.target + "/"): + subpath = repo.owner[len(options.target) + 1 :] + else: + subpath = "" + + if subpath: + candidate = (workspace_path / subpath).resolve() + if not candidate.is_relative_to(workspace_path.resolve()): + log.warning( + "%s Ignoring subgroup path that escapes workspace: %s", + colors.warning("⚠"), + repo.owner, + ) + subpath = "" + else: + repo_workspace_path = workspace_path / subpath + + repo_workspace_label = workspace_root_label( + repo_workspace_path, cwd=cwd, home=home + ) + + if repo_workspace_label not in checked_labels: + if repo_workspace_label in raw_config and not isinstance( + raw_config[repo_workspace_label], dict + ): + log.error( + "%s Workspace section '%s' is not a mapping in config", + colors.error("✗"), + repo_workspace_label, + ) + error_labels.add(repo_workspace_label) + checked_labels.add(repo_workspace_label) + + if repo_workspace_label in raw_config and not isinstance( + raw_config[repo_workspace_label], dict + ): + continue + + if repo_workspace_label not in raw_config: + raw_config[repo_workspace_label] = {} + + if repo.name in raw_config[repo_workspace_label]: + skipped_count += 1 + continue + + raw_config[repo_workspace_label][repo.name] = { + "repo": repo.to_vcspull_url(use_ssh=not use_https), + } + added_count += 1 + + if error_labels: + return 1 + + if added_count == 0: + log.info( + "%s All repositories already exist in config. Nothing to add.", + colors.success("✓"), + ) + return 0 + + # Save config + try: + if config_file_path.suffix.lower() == ".json": + save_config_json(config_file_path, raw_config) + else: + save_config_yaml(config_file_path, raw_config) + log.info( + "%s Added %s repositories to %s", + colors.success("✓"), + colors.info(str(added_count)), + colors.muted(display_config_path), + ) + if skipped_count > 0: + log.info( + "%s Skipped %s existing repositories", + colors.warning("!"), + colors.info(str(skipped_count)), + ) + except OSError: + log.exception("Error saving config to %s", display_config_path) + return 1 + + return 0 diff --git a/src/vcspull/cli/import_cmd/codeberg.py b/src/vcspull/cli/import_cmd/codeberg.py new file mode 100644 index 000000000..82f12e99b --- /dev/null +++ b/src/vcspull/cli/import_cmd/codeberg.py @@ -0,0 +1,84 @@ +"""``vcspull import codeberg`` subcommand.""" + +from __future__ import annotations + +import argparse + +from vcspull._internal.remotes import GiteaImporter + +from .._formatter import VcspullHelpFormatter +from ._common import ( + _create_mode_parent, + _create_shared_parent, + _create_target_parent, + _create_token_parent, + _run_import, +) + + +def create_codeberg_subparser( + subparsers: argparse._SubParsersAction[argparse.ArgumentParser], +) -> None: + """Register the ``codeberg`` (alias ``cb``) service subcommand. + + Parameters + ---------- + subparsers : argparse._SubParsersAction + The subparsers action from the ``import`` parser. + """ + subparsers.add_parser( + "codeberg", + aliases=["cb"], + help="import from Codeberg", + parents=[ + _create_shared_parent(), + _create_token_parent(), + _create_mode_parent(), + _create_target_parent(), + ], + formatter_class=VcspullHelpFormatter, + description="Import repositories from Codeberg.", + ).set_defaults(import_handler=handle_codeberg) + + +def handle_codeberg(args: argparse.Namespace) -> int: + """Handle ``vcspull import codeberg``. + + Parameters + ---------- + args : argparse.Namespace + Parsed CLI arguments. + + Returns + ------- + int + Exit code (0 = success). + """ + if args.workspace is None: + msg = "-w/--workspace is required" + raise SystemExit(msg) + + importer = GiteaImporter( + token=getattr(args, "token", None), + base_url="https://codeberg.org", + ) + return _run_import( + importer, + service_name="codeberg", + target=args.target, + workspace=args.workspace, + mode=args.mode, + language=getattr(args, "language", None), + topics=getattr(args, "topics", None), + min_stars=getattr(args, "min_stars", 0), + include_archived=getattr(args, "include_archived", False), + include_forks=getattr(args, "include_forks", False), + limit=getattr(args, "limit", 100), + config_path_str=getattr(args, "config", None), + dry_run=getattr(args, "dry_run", False), + yes=getattr(args, "yes", False), + output_json=getattr(args, "output_json", False), + output_ndjson=getattr(args, "output_ndjson", False), + color=getattr(args, "color", "auto"), + use_https=getattr(args, "use_https", False), + ) diff --git a/src/vcspull/cli/import_cmd/codecommit.py b/src/vcspull/cli/import_cmd/codecommit.py new file mode 100644 index 000000000..2a5efdf9b --- /dev/null +++ b/src/vcspull/cli/import_cmd/codecommit.py @@ -0,0 +1,104 @@ +"""``vcspull import codecommit`` subcommand.""" + +from __future__ import annotations + +import argparse +import logging + +from vcspull._internal.remotes import CodeCommitImporter, DependencyError + +from .._colors import Colors, get_color_mode +from .._formatter import VcspullHelpFormatter +from ._common import _create_shared_parent, _run_import + +log = logging.getLogger(__name__) + + +def create_codecommit_subparser( + subparsers: argparse._SubParsersAction[argparse.ArgumentParser], +) -> None: + """Register the ``codecommit`` (aliases ``cc``, ``aws``) service subcommand. + + Parameters + ---------- + subparsers : argparse._SubParsersAction + The subparsers action from the ``import`` parser. + """ + parser = subparsers.add_parser( + "codecommit", + aliases=["cc", "aws"], + help="import from AWS CodeCommit", + parents=[_create_shared_parent()], + formatter_class=VcspullHelpFormatter, + description="Import repositories from AWS CodeCommit.", + ) + parser.add_argument( + "target", + metavar="TARGET", + nargs="?", + default="", + help="Optional substring filter for repository names", + ) + parser.add_argument( + "--region", + dest="region", + metavar="REGION", + help="AWS region for CodeCommit", + ) + parser.add_argument( + "--profile", + dest="profile", + metavar="PROFILE", + help="AWS profile for CodeCommit", + ) + parser.set_defaults(import_handler=handle_codecommit) + + +def handle_codecommit(args: argparse.Namespace) -> int: + """Handle ``vcspull import codecommit``. + + Parameters + ---------- + args : argparse.Namespace + Parsed CLI arguments. + + Returns + ------- + int + Exit code (0 = success). + """ + if args.workspace is None: + msg = "-w/--workspace is required" + raise SystemExit(msg) + + colors = Colors(get_color_mode(getattr(args, "color", "auto"))) + + try: + importer = CodeCommitImporter( + region=getattr(args, "region", None), + profile=getattr(args, "profile", None), + ) + except DependencyError as exc: + log.error("%s %s", colors.error("\u2717"), exc) # noqa: TRY400 + return 1 + + return _run_import( + importer, + service_name="codecommit", + target=getattr(args, "target", "") or "", + workspace=args.workspace, + mode="user", + language=getattr(args, "language", None), + topics=getattr(args, "topics", None), + min_stars=getattr(args, "min_stars", 0), + include_archived=getattr(args, "include_archived", False), + include_forks=getattr(args, "include_forks", False), + limit=getattr(args, "limit", 100), + config_path_str=getattr(args, "config", None), + dry_run=getattr(args, "dry_run", False), + yes=getattr(args, "yes", False), + output_json=getattr(args, "output_json", False), + output_ndjson=getattr(args, "output_ndjson", False), + color=getattr(args, "color", "auto"), + use_https=getattr(args, "use_https", False), + ) diff --git a/src/vcspull/cli/import_cmd/forgejo.py b/src/vcspull/cli/import_cmd/forgejo.py new file mode 100644 index 000000000..b0d3f23b1 --- /dev/null +++ b/src/vcspull/cli/import_cmd/forgejo.py @@ -0,0 +1,91 @@ +"""``vcspull import forgejo`` subcommand.""" + +from __future__ import annotations + +import argparse + +from vcspull._internal.remotes import GiteaImporter + +from .._formatter import VcspullHelpFormatter +from ._common import ( + _create_mode_parent, + _create_shared_parent, + _create_target_parent, + _create_token_parent, + _run_import, +) + + +def create_forgejo_subparser( + subparsers: argparse._SubParsersAction[argparse.ArgumentParser], +) -> None: + """Register the ``forgejo`` service subcommand. + + Parameters + ---------- + subparsers : argparse._SubParsersAction + The subparsers action from the ``import`` parser. + """ + parser = subparsers.add_parser( + "forgejo", + help="import from a Forgejo instance", + parents=[ + _create_shared_parent(), + _create_token_parent(), + _create_mode_parent(), + _create_target_parent(), + ], + formatter_class=VcspullHelpFormatter, + description="Import repositories from a Forgejo instance.", + ) + parser.add_argument( + "--url", + dest="base_url", + metavar="URL", + required=True, + help="Base URL of the Forgejo instance (required)", + ) + parser.set_defaults(import_handler=handle_forgejo) + + +def handle_forgejo(args: argparse.Namespace) -> int: + """Handle ``vcspull import forgejo``. + + Parameters + ---------- + args : argparse.Namespace + Parsed CLI arguments. + + Returns + ------- + int + Exit code (0 = success). + """ + if args.workspace is None: + msg = "-w/--workspace is required" + raise SystemExit(msg) + + importer = GiteaImporter( + token=getattr(args, "token", None), + base_url=args.base_url, + ) + return _run_import( + importer, + service_name="forgejo", + target=args.target, + workspace=args.workspace, + mode=args.mode, + language=getattr(args, "language", None), + topics=getattr(args, "topics", None), + min_stars=getattr(args, "min_stars", 0), + include_archived=getattr(args, "include_archived", False), + include_forks=getattr(args, "include_forks", False), + limit=getattr(args, "limit", 100), + config_path_str=getattr(args, "config", None), + dry_run=getattr(args, "dry_run", False), + yes=getattr(args, "yes", False), + output_json=getattr(args, "output_json", False), + output_ndjson=getattr(args, "output_ndjson", False), + color=getattr(args, "color", "auto"), + use_https=getattr(args, "use_https", False), + ) diff --git a/src/vcspull/cli/import_cmd/gitea.py b/src/vcspull/cli/import_cmd/gitea.py new file mode 100644 index 000000000..492a471db --- /dev/null +++ b/src/vcspull/cli/import_cmd/gitea.py @@ -0,0 +1,91 @@ +"""``vcspull import gitea`` subcommand.""" + +from __future__ import annotations + +import argparse + +from vcspull._internal.remotes import GiteaImporter + +from .._formatter import VcspullHelpFormatter +from ._common import ( + _create_mode_parent, + _create_shared_parent, + _create_target_parent, + _create_token_parent, + _run_import, +) + + +def create_gitea_subparser( + subparsers: argparse._SubParsersAction[argparse.ArgumentParser], +) -> None: + """Register the ``gitea`` service subcommand. + + Parameters + ---------- + subparsers : argparse._SubParsersAction + The subparsers action from the ``import`` parser. + """ + parser = subparsers.add_parser( + "gitea", + help="import from a Gitea instance", + parents=[ + _create_shared_parent(), + _create_token_parent(), + _create_mode_parent(), + _create_target_parent(), + ], + formatter_class=VcspullHelpFormatter, + description="Import repositories from a Gitea instance.", + ) + parser.add_argument( + "--url", + dest="base_url", + metavar="URL", + required=True, + help="Base URL of the Gitea instance (required)", + ) + parser.set_defaults(import_handler=handle_gitea) + + +def handle_gitea(args: argparse.Namespace) -> int: + """Handle ``vcspull import gitea``. + + Parameters + ---------- + args : argparse.Namespace + Parsed CLI arguments. + + Returns + ------- + int + Exit code (0 = success). + """ + if args.workspace is None: + msg = "-w/--workspace is required" + raise SystemExit(msg) + + importer = GiteaImporter( + token=getattr(args, "token", None), + base_url=args.base_url, + ) + return _run_import( + importer, + service_name="gitea", + target=args.target, + workspace=args.workspace, + mode=args.mode, + language=getattr(args, "language", None), + topics=getattr(args, "topics", None), + min_stars=getattr(args, "min_stars", 0), + include_archived=getattr(args, "include_archived", False), + include_forks=getattr(args, "include_forks", False), + limit=getattr(args, "limit", 100), + config_path_str=getattr(args, "config", None), + dry_run=getattr(args, "dry_run", False), + yes=getattr(args, "yes", False), + output_json=getattr(args, "output_json", False), + output_ndjson=getattr(args, "output_ndjson", False), + color=getattr(args, "color", "auto"), + use_https=getattr(args, "use_https", False), + ) diff --git a/src/vcspull/cli/import_cmd/github.py b/src/vcspull/cli/import_cmd/github.py new file mode 100644 index 000000000..bbf87e61b --- /dev/null +++ b/src/vcspull/cli/import_cmd/github.py @@ -0,0 +1,93 @@ +"""``vcspull import github`` subcommand.""" + +from __future__ import annotations + +import argparse + +from vcspull._internal.remotes import GitHubImporter + +from .._formatter import VcspullHelpFormatter +from ._common import ( + _create_mode_parent, + _create_shared_parent, + _create_target_parent, + _create_token_parent, + _run_import, +) + + +def create_github_subparser( + subparsers: argparse._SubParsersAction[argparse.ArgumentParser], +) -> None: + """Register the ``github`` (alias ``gh``) service subcommand. + + Parameters + ---------- + subparsers : argparse._SubParsersAction + The subparsers action from the ``import`` parser. + """ + parser = subparsers.add_parser( + "github", + aliases=["gh"], + help="import from GitHub", + parents=[ + _create_shared_parent(), + _create_token_parent(), + _create_mode_parent(), + _create_target_parent(), + ], + formatter_class=VcspullHelpFormatter, + description=( + "Import repositories from GitHub (github.com or GitHub Enterprise)." + ), + ) + parser.add_argument( + "--url", + dest="base_url", + metavar="URL", + help="Base URL for GitHub Enterprise (optional)", + ) + parser.set_defaults(import_handler=handle_github) + + +def handle_github(args: argparse.Namespace) -> int: + """Handle ``vcspull import github``. + + Parameters + ---------- + args : argparse.Namespace + Parsed CLI arguments. + + Returns + ------- + int + Exit code (0 = success). + """ + if args.workspace is None: + msg = "-w/--workspace is required" + raise SystemExit(msg) + + importer = GitHubImporter( + token=getattr(args, "token", None), + base_url=getattr(args, "base_url", None), + ) + return _run_import( + importer, + service_name="github", + target=args.target, + workspace=args.workspace, + mode=args.mode, + language=getattr(args, "language", None), + topics=getattr(args, "topics", None), + min_stars=getattr(args, "min_stars", 0), + include_archived=getattr(args, "include_archived", False), + include_forks=getattr(args, "include_forks", False), + limit=getattr(args, "limit", 100), + config_path_str=getattr(args, "config", None), + dry_run=getattr(args, "dry_run", False), + yes=getattr(args, "yes", False), + output_json=getattr(args, "output_json", False), + output_ndjson=getattr(args, "output_ndjson", False), + color=getattr(args, "color", "auto"), + use_https=getattr(args, "use_https", False), + ) diff --git a/src/vcspull/cli/import_cmd/gitlab.py b/src/vcspull/cli/import_cmd/gitlab.py new file mode 100644 index 000000000..086d040d7 --- /dev/null +++ b/src/vcspull/cli/import_cmd/gitlab.py @@ -0,0 +1,101 @@ +"""``vcspull import gitlab`` subcommand.""" + +from __future__ import annotations + +import argparse + +from vcspull._internal.remotes import GitLabImporter + +from .._formatter import VcspullHelpFormatter +from ._common import ( + _create_mode_parent, + _create_shared_parent, + _create_target_parent, + _create_token_parent, + _run_import, +) + + +def create_gitlab_subparser( + subparsers: argparse._SubParsersAction[argparse.ArgumentParser], +) -> None: + """Register the ``gitlab`` (alias ``gl``) service subcommand. + + Parameters + ---------- + subparsers : argparse._SubParsersAction + The subparsers action from the ``import`` parser. + """ + parser = subparsers.add_parser( + "gitlab", + aliases=["gl"], + help="import from GitLab", + parents=[ + _create_shared_parent(), + _create_token_parent(), + _create_mode_parent(), + _create_target_parent(), + ], + formatter_class=VcspullHelpFormatter, + description="Import repositories from GitLab (gitlab.com or self-hosted).", + ) + parser.add_argument( + "--url", + dest="base_url", + metavar="URL", + help="Base URL for self-hosted GitLab (optional)", + ) + parser.add_argument( + "--flatten-groups", + action="store_true", + dest="flatten_groups", + help=( + "For ``--mode org``, flatten subgroup repositories into the base " + "workspace instead of preserving subgroup paths" + ), + ) + parser.set_defaults(import_handler=handle_gitlab) + + +def handle_gitlab(args: argparse.Namespace) -> int: + """Handle ``vcspull import gitlab``. + + Parameters + ---------- + args : argparse.Namespace + Parsed CLI arguments. + + Returns + ------- + int + Exit code (0 = success). + """ + if args.workspace is None: + msg = "-w/--workspace is required" + raise SystemExit(msg) + + importer = GitLabImporter( + token=getattr(args, "token", None), + base_url=getattr(args, "base_url", None), + ) + return _run_import( + importer, + service_name="gitlab", + target=args.target, + workspace=args.workspace, + mode=args.mode, + language=getattr(args, "language", None), + topics=getattr(args, "topics", None), + min_stars=getattr(args, "min_stars", 0), + include_archived=getattr(args, "include_archived", False), + include_forks=getattr(args, "include_forks", False), + limit=getattr(args, "limit", 100), + config_path_str=getattr(args, "config", None), + dry_run=getattr(args, "dry_run", False), + yes=getattr(args, "yes", False), + output_json=getattr(args, "output_json", False), + output_ndjson=getattr(args, "output_ndjson", False), + color=getattr(args, "color", "auto"), + use_https=getattr(args, "use_https", False), + flatten_groups=getattr(args, "flatten_groups", False), + ) diff --git a/src/vcspull/config.py b/src/vcspull/config.py index 803f4eb3b..8a94f32ec 100644 --- a/src/vcspull/config.py +++ b/src/vcspull/config.py @@ -2,11 +2,13 @@ from __future__ import annotations +import contextlib import copy import fnmatch import logging import os import pathlib +import tempfile import typing as t from collections.abc import Callable @@ -152,10 +154,13 @@ def find_home_config_files( filetype = ["json", "yaml"] configs: list[pathlib.Path] = [] + check_yaml = "yaml" in filetype + check_json = "json" in filetype + yaml_config = pathlib.Path("~/.vcspull.yaml").expanduser() - has_yaml_config = yaml_config.exists() + has_yaml_config = check_yaml and yaml_config.exists() json_config = pathlib.Path("~/.vcspull.json").expanduser() - has_json_config = json_config.exists() + has_json_config = check_json and json_config.exists() if not has_yaml_config and not has_json_config: log.debug( @@ -460,6 +465,38 @@ def is_config_file( return any(filename.endswith(e) for e in extensions) +def _atomic_write(target: pathlib.Path, content: str) -> None: + """Write content to a file atomically via temp-file-then-rename. + + Parameters + ---------- + target : pathlib.Path + Destination file path + content : str + Content to write + """ + original_mode: int | None = None + if target.exists(): + original_mode = target.stat().st_mode + + fd, tmp_path = tempfile.mkstemp( + dir=target.parent, + prefix=f".{target.name}.", + suffix=".tmp", + ) + try: + with os.fdopen(fd, "w", encoding="utf-8") as f: + f.write(content) + if original_mode is not None: + pathlib.Path(tmp_path).chmod(original_mode) + pathlib.Path(tmp_path).replace(target) + except BaseException: + # Clean up the temp file on any failure + with contextlib.suppress(OSError): + pathlib.Path(tmp_path).unlink() + raise + + def save_config_yaml(config_file_path: pathlib.Path, data: dict[t.Any, t.Any]) -> None: """Save configuration data to a YAML file. @@ -475,7 +512,25 @@ def save_config_yaml(config_file_path: pathlib.Path, data: dict[t.Any, t.Any]) - content=data, indent=2, ) - config_file_path.write_text(yaml_content, encoding="utf-8") + _atomic_write(config_file_path, yaml_content) + + +def save_config_json(config_file_path: pathlib.Path, data: dict[t.Any, t.Any]) -> None: + """Save configuration data to a JSON file. + + Parameters + ---------- + config_file_path : pathlib.Path + Path to the configuration file to write + data : dict + Configuration data to save + """ + json_content = ConfigReader._dump( + fmt="json", + content=data, + indent=2, + ) + _atomic_write(config_file_path, json_content) def save_config_yaml_with_items( @@ -498,7 +553,7 @@ def save_config_yaml_with_items( if yaml_content: yaml_content += "\n" - config_file_path.write_text(yaml_content, encoding="utf-8") + _atomic_write(config_file_path, yaml_content) def merge_duplicate_workspace_root_entries( diff --git a/tests/_internal/remotes/__init__.py b/tests/_internal/remotes/__init__.py new file mode 100644 index 000000000..933ac6f57 --- /dev/null +++ b/tests/_internal/remotes/__init__.py @@ -0,0 +1 @@ +"""Tests for vcspull._internal.remotes package.""" diff --git a/tests/_internal/remotes/conftest.py b/tests/_internal/remotes/conftest.py new file mode 100644 index 000000000..8a902740e --- /dev/null +++ b/tests/_internal/remotes/conftest.py @@ -0,0 +1,245 @@ +"""Shared fixtures for remotes tests.""" + +from __future__ import annotations + +import json +import typing as t +import urllib.error + +import pytest + + +class MockHTTPResponse: + """Mock HTTP response for testing.""" + + def __init__( + self, + body: bytes, + headers: dict[str, str] | None = None, + status: int = 200, + ) -> None: + """Initialize mock response.""" + self._body = body + self._headers = headers or {} + self.status = status + self.code = status + + def read(self) -> bytes: + """Return response body.""" + return self._body + + def getheaders(self) -> list[tuple[str, str]]: + """Return response headers as list of tuples.""" + return list(self._headers.items()) + + def __enter__(self) -> MockHTTPResponse: + """Context manager entry.""" + return self + + def __exit__(self, *args: t.Any) -> None: + """Context manager exit.""" + pass + + +@pytest.fixture +def mock_urlopen(monkeypatch: pytest.MonkeyPatch) -> t.Callable[..., None]: + """Create factory fixture to mock urllib.request.urlopen responses. + + Parameters + ---------- + monkeypatch : pytest.MonkeyPatch + Pytest monkeypatch fixture + + Returns + ------- + Callable + Function to set up mock responses + """ + + def _mock( + responses: list[tuple[bytes, dict[str, str], int]] | None = None, + error: urllib.error.HTTPError | None = None, + ) -> None: + """Set up mock responses. + + Parameters + ---------- + responses : list[tuple[bytes, dict[str, str], int]] | None + List of (body, headers, status) tuples for sequential responses + error : urllib.error.HTTPError | None + Error to raise instead of returning response + """ + call_count = 0 + responses = responses or [] + + def urlopen_side_effect( + request: t.Any, + timeout: int | None = None, + ) -> MockHTTPResponse: + nonlocal call_count + if error: + raise error + if not responses: + return MockHTTPResponse(b"[]", {}, 200) + body, headers, status = responses[call_count % len(responses)] + call_count += 1 + return MockHTTPResponse(body, headers, status) + + # Mock urlopen: return pre-configured responses to avoid real HTTP requests + monkeypatch.setattr("urllib.request.urlopen", urlopen_side_effect) + + return _mock + + +@pytest.fixture +def github_user_repos_response() -> bytes: + """Return standard GitHub user repos API response.""" + return json.dumps( + [ + { + "name": "repo1", + "clone_url": "https://github.com/testuser/repo1.git", + "ssh_url": "git@github.com:testuser/repo1.git", + "html_url": "https://github.com/testuser/repo1", + "description": "Test repo 1", + "language": "Python", + "topics": ["cli", "tool"], + "stargazers_count": 100, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + { + "name": "repo2", + "clone_url": "https://github.com/testuser/repo2.git", + "ssh_url": "git@github.com:testuser/repo2.git", + "html_url": "https://github.com/testuser/repo2", + "description": "Test repo 2", + "language": "JavaScript", + "topics": [], + "stargazers_count": 50, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + ] + ).encode() + + +@pytest.fixture +def github_forked_repo_response() -> bytes: + """GitHub repo that is a fork.""" + return json.dumps( + [ + { + "name": "forked-repo", + "clone_url": "https://github.com/testuser/forked-repo.git", + "ssh_url": "git@github.com:testuser/forked-repo.git", + "html_url": "https://github.com/testuser/forked-repo", + "description": "A forked repo", + "language": "Python", + "topics": [], + "stargazers_count": 10, + "fork": True, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + } + ] + ).encode() + + +@pytest.fixture +def github_archived_repo_response() -> bytes: + """GitHub repo that is archived.""" + return json.dumps( + [ + { + "name": "archived-repo", + "clone_url": "https://github.com/testuser/archived-repo.git", + "ssh_url": "git@github.com:testuser/archived-repo.git", + "html_url": "https://github.com/testuser/archived-repo", + "description": "An archived repo", + "language": "Python", + "topics": [], + "stargazers_count": 5, + "fork": False, + "archived": True, + "default_branch": "main", + "owner": {"login": "testuser"}, + } + ] + ).encode() + + +@pytest.fixture +def gitlab_user_projects_response() -> bytes: + """Return standard GitLab user projects API response.""" + return json.dumps( + [ + { + "path": "project1", + "name": "Project 1", + "http_url_to_repo": "https://gitlab.com/testuser/project1.git", + "ssh_url_to_repo": "git@gitlab.com:testuser/project1.git", + "web_url": "https://gitlab.com/testuser/project1", + "description": "Test project 1", + "topics": ["python"], + "star_count": 20, + "archived": False, + "default_branch": "main", + "namespace": {"path": "testuser", "full_path": "testuser"}, + }, + ] + ).encode() + + +@pytest.fixture +def gitea_user_repos_response() -> bytes: + """Return standard Gitea user repos API response.""" + return json.dumps( + [ + { + "name": "repo1", + "clone_url": "https://codeberg.org/testuser/repo1.git", + "ssh_url": "git@codeberg.org:testuser/repo1.git", + "html_url": "https://codeberg.org/testuser/repo1", + "description": "Test repo 1", + "language": "Python", + "topics": [], + "stars_count": 15, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + ] + ).encode() + + +@pytest.fixture +def gitea_search_response() -> bytes: + """Gitea search API response with wrapped data.""" + return json.dumps( + { + "ok": True, + "data": [ + { + "name": "search-result", + "clone_url": "https://codeberg.org/user/search-result.git", + "ssh_url": "git@codeberg.org:user/search-result.git", + "html_url": "https://codeberg.org/user/search-result", + "description": "Found by search", + "language": "Go", + "topics": ["search"], + "stars_count": 30, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + }, + ], + } + ).encode() diff --git a/tests/_internal/remotes/test_base.py b/tests/_internal/remotes/test_base.py new file mode 100644 index 000000000..bb6dc25b0 --- /dev/null +++ b/tests/_internal/remotes/test_base.py @@ -0,0 +1,606 @@ +"""Tests for vcspull._internal.remotes.base module.""" + +from __future__ import annotations + +import typing as t + +import pytest + +from vcspull._internal.remotes.base import ( + ImportMode, + ImportOptions, + RemoteRepo, + filter_repo, +) + + +class FilterRepoFixture(t.NamedTuple): + """Fixture for filter_repo test cases.""" + + test_id: str + repo_kwargs: dict[str, t.Any] + options_kwargs: dict[str, t.Any] + expected: bool + + +FILTER_REPO_FIXTURES: list[FilterRepoFixture] = [ + FilterRepoFixture( + test_id="passes-all-defaults", + repo_kwargs={ + "name": "test", + "clone_url": "https://github.com/user/test.git", + "ssh_url": "git@github.com:user/test.git", + "html_url": "https://github.com/user/test", + "description": None, + "language": "Python", + "topics": (), + "stars": 50, + "is_fork": False, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={}, + expected=True, + ), + FilterRepoFixture( + test_id="excludes-fork-by-default", + repo_kwargs={ + "name": "fork", + "clone_url": "https://github.com/user/fork.git", + "ssh_url": "git@github.com:user/fork.git", + "html_url": "https://github.com/user/fork", + "description": None, + "language": "Python", + "topics": (), + "stars": 10, + "is_fork": True, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"include_forks": False}, + expected=False, + ), + FilterRepoFixture( + test_id="includes-fork-when-enabled", + repo_kwargs={ + "name": "fork", + "clone_url": "https://github.com/user/fork.git", + "ssh_url": "git@github.com:user/fork.git", + "html_url": "https://github.com/user/fork", + "description": None, + "language": "Python", + "topics": (), + "stars": 10, + "is_fork": True, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"include_forks": True}, + expected=True, + ), + FilterRepoFixture( + test_id="excludes-archived-by-default", + repo_kwargs={ + "name": "archived", + "clone_url": "https://github.com/user/archived.git", + "ssh_url": "git@github.com:user/archived.git", + "html_url": "https://github.com/user/archived", + "description": None, + "language": "Python", + "topics": (), + "stars": 5, + "is_fork": False, + "is_archived": True, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"include_archived": False}, + expected=False, + ), + FilterRepoFixture( + test_id="includes-archived-when-enabled", + repo_kwargs={ + "name": "archived", + "clone_url": "https://github.com/user/archived.git", + "ssh_url": "git@github.com:user/archived.git", + "html_url": "https://github.com/user/archived", + "description": None, + "language": "Python", + "topics": (), + "stars": 5, + "is_fork": False, + "is_archived": True, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"include_archived": True}, + expected=True, + ), + FilterRepoFixture( + test_id="filters-by-language-match", + repo_kwargs={ + "name": "python-repo", + "clone_url": "https://github.com/user/python-repo.git", + "ssh_url": "git@github.com:user/python-repo.git", + "html_url": "https://github.com/user/python-repo", + "description": None, + "language": "Python", + "topics": (), + "stars": 50, + "is_fork": False, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"language": "Python"}, + expected=True, + ), + FilterRepoFixture( + test_id="filters-by-language-mismatch", + repo_kwargs={ + "name": "python-repo", + "clone_url": "https://github.com/user/python-repo.git", + "ssh_url": "git@github.com:user/python-repo.git", + "html_url": "https://github.com/user/python-repo", + "description": None, + "language": "Python", + "topics": (), + "stars": 50, + "is_fork": False, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"language": "JavaScript"}, + expected=False, + ), + FilterRepoFixture( + test_id="filters-by-language-case-insensitive", + repo_kwargs={ + "name": "python-repo", + "clone_url": "https://github.com/user/python-repo.git", + "ssh_url": "git@github.com:user/python-repo.git", + "html_url": "https://github.com/user/python-repo", + "description": None, + "language": "Python", + "topics": (), + "stars": 50, + "is_fork": False, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"language": "python"}, + expected=True, + ), + FilterRepoFixture( + test_id="filters-by-min-stars-pass", + repo_kwargs={ + "name": "popular", + "clone_url": "https://github.com/user/popular.git", + "ssh_url": "git@github.com:user/popular.git", + "html_url": "https://github.com/user/popular", + "description": None, + "language": "Python", + "topics": (), + "stars": 100, + "is_fork": False, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"min_stars": 50}, + expected=True, + ), + FilterRepoFixture( + test_id="filters-by-min-stars-fail", + repo_kwargs={ + "name": "unpopular", + "clone_url": "https://github.com/user/unpopular.git", + "ssh_url": "git@github.com:user/unpopular.git", + "html_url": "https://github.com/user/unpopular", + "description": None, + "language": "Python", + "topics": (), + "stars": 10, + "is_fork": False, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"min_stars": 50}, + expected=False, + ), + FilterRepoFixture( + test_id="filters-by-topics-match", + repo_kwargs={ + "name": "cli-tool", + "clone_url": "https://github.com/user/cli-tool.git", + "ssh_url": "git@github.com:user/cli-tool.git", + "html_url": "https://github.com/user/cli-tool", + "description": None, + "language": "Python", + "topics": ("cli", "tool", "python"), + "stars": 50, + "is_fork": False, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"topics": ["cli", "python"]}, + expected=True, + ), + FilterRepoFixture( + test_id="filters-by-topics-mismatch", + repo_kwargs={ + "name": "web-app", + "clone_url": "https://github.com/user/web-app.git", + "ssh_url": "git@github.com:user/web-app.git", + "html_url": "https://github.com/user/web-app", + "description": None, + "language": "Python", + "topics": ("web", "django"), + "stars": 50, + "is_fork": False, + "is_archived": False, + "default_branch": "main", + "owner": "user", + }, + options_kwargs={"topics": ["cli"]}, + expected=False, + ), +] + + +@pytest.mark.parametrize( + list(FilterRepoFixture._fields), + FILTER_REPO_FIXTURES, + ids=[f.test_id for f in FILTER_REPO_FIXTURES], +) +def test_filter_repo( + test_id: str, + repo_kwargs: dict[str, t.Any], + options_kwargs: dict[str, t.Any], + expected: bool, +) -> None: + """Test filter_repo with various filter combinations.""" + repo = RemoteRepo(**repo_kwargs) + options = ImportOptions(**options_kwargs) + assert filter_repo(repo, options) == expected + + +def test_remote_repo_to_vcspull_url_defaults_to_ssh() -> None: + """Test RemoteRepo.to_vcspull_url defaults to SSH URL.""" + repo = RemoteRepo( + name="test", + clone_url="https://github.com/user/test.git", + ssh_url="git@github.com:user/test.git", + html_url="https://github.com/user/test", + description=None, + language=None, + topics=(), + stars=0, + is_fork=False, + is_archived=False, + default_branch="main", + owner="user", + ) + assert repo.to_vcspull_url() == "git+git@github.com:user/test.git" + + +def test_remote_repo_to_vcspull_url_https() -> None: + """Test RemoteRepo.to_vcspull_url with use_ssh=False returns HTTPS.""" + repo = RemoteRepo( + name="test", + clone_url="https://github.com/user/test.git", + ssh_url="git@github.com:user/test.git", + html_url="https://github.com/user/test", + description=None, + language=None, + topics=(), + stars=0, + is_fork=False, + is_archived=False, + default_branch="main", + owner="user", + ) + assert repo.to_vcspull_url(use_ssh=False) == ( + "git+https://github.com/user/test.git" + ) + + +def test_remote_repo_to_vcspull_url_fallback_no_ssh() -> None: + """Test RemoteRepo.to_vcspull_url falls back to clone_url when ssh_url empty.""" + repo = RemoteRepo( + name="test", + clone_url="https://github.com/user/test.git", + ssh_url="", + html_url="https://github.com/user/test", + description=None, + language=None, + topics=(), + stars=0, + is_fork=False, + is_archived=False, + default_branch="main", + owner="user", + ) + assert repo.to_vcspull_url() == "git+https://github.com/user/test.git" + + +def test_remote_repo_to_vcspull_url_already_prefixed() -> None: + """Test RemoteRepo.to_vcspull_url doesn't double-prefix.""" + repo = RemoteRepo( + name="test", + clone_url="git+https://github.com/user/test.git", + ssh_url="", + html_url="https://github.com/user/test", + description=None, + language=None, + topics=(), + stars=0, + is_fork=False, + is_archived=False, + default_branch="main", + owner="user", + ) + assert repo.to_vcspull_url(use_ssh=False) == ( + "git+https://github.com/user/test.git" + ) + + +def test_remote_repo_to_dict() -> None: + """Test RemoteRepo.to_dict serialization.""" + repo = RemoteRepo( + name="test", + clone_url="https://github.com/user/test.git", + ssh_url="git@github.com:user/test.git", + html_url="https://github.com/user/test", + description="A test repo", + language="Python", + topics=("cli", "tool"), + stars=100, + is_fork=False, + is_archived=False, + default_branch="main", + owner="user", + ) + d = repo.to_dict() + assert d["name"] == "test" + assert d["clone_url"] == "https://github.com/user/test.git" + assert d["ssh_url"] == "git@github.com:user/test.git" + assert d["language"] == "Python" + assert d["topics"] == ["cli", "tool"] + assert d["stars"] == 100 + assert d["is_fork"] is False + + +def test_import_options_defaults() -> None: + """Test ImportOptions default values.""" + options = ImportOptions() + assert options.mode == ImportMode.USER + assert options.target == "" + assert options.base_url is None + assert options.token is None + assert options.include_forks is False + assert options.include_archived is False + assert options.language is None + assert options.topics == [] + assert options.min_stars == 0 + assert options.limit == 100 + + +def test_import_mode_values() -> None: + """Test ImportMode enum values.""" + assert ImportMode.USER.value == "user" + assert ImportMode.ORG.value == "org" + assert ImportMode.SEARCH.value == "search" + + +class InvalidLimitFixture(t.NamedTuple): + """Fixture for invalid ImportOptions.limit test cases.""" + + test_id: str + limit: int + + +INVALID_LIMIT_FIXTURES: list[InvalidLimitFixture] = [ + InvalidLimitFixture(test_id="zero-limit", limit=0), + InvalidLimitFixture(test_id="negative-limit", limit=-1), + InvalidLimitFixture(test_id="large-negative-limit", limit=-100), +] + + +@pytest.mark.parametrize( + list(InvalidLimitFixture._fields), + INVALID_LIMIT_FIXTURES, + ids=[f.test_id for f in INVALID_LIMIT_FIXTURES], +) +def test_import_options_rejects_invalid_limit( + test_id: str, + limit: int, +) -> None: + """Test ImportOptions raises ValueError for limit < 1.""" + with pytest.raises(ValueError, match="limit must be >= 1"): + ImportOptions(limit=limit) + + +def test_import_options_accepts_valid_limit() -> None: + """Test ImportOptions accepts limit >= 1.""" + options = ImportOptions(limit=1) + assert options.limit == 1 + options = ImportOptions(limit=500) + assert options.limit == 500 + + +class HandleHttpErrorFixture(t.NamedTuple): + """Fixture for HTTPClient._handle_http_error test cases.""" + + test_id: str + status_code: int + response_body: str + expected_error_type: str + expected_message_contains: str + + +HANDLE_HTTP_ERROR_FIXTURES: list[HandleHttpErrorFixture] = [ + HandleHttpErrorFixture( + test_id="string-message-401", + status_code=401, + response_body='{"message": "Bad credentials"}', + expected_error_type="AuthenticationError", + expected_message_contains="Bad credentials", + ), + HandleHttpErrorFixture( + test_id="dict-message-403", + status_code=403, + response_body='{"message": {"error": "forbidden"}}', + expected_error_type="AuthenticationError", + expected_message_contains="forbidden", + ), + HandleHttpErrorFixture( + test_id="int-message-404", + status_code=404, + response_body='{"message": 42}', + expected_error_type="NotFoundError", + expected_message_contains="42", + ), + HandleHttpErrorFixture( + test_id="rate-limit-string-403", + status_code=403, + response_body='{"message": "API rate limit exceeded"}', + expected_error_type="RateLimitError", + expected_message_contains="rate limit", + ), + HandleHttpErrorFixture( + test_id="invalid-json-body-500", + status_code=500, + response_body="Server Error", + expected_error_type="ServiceUnavailableError", + expected_message_contains="service unavailable", + ), +] + + +@pytest.mark.parametrize( + list(HandleHttpErrorFixture._fields), + HANDLE_HTTP_ERROR_FIXTURES, + ids=[f.test_id for f in HANDLE_HTTP_ERROR_FIXTURES], +) +def test_handle_http_error( + test_id: str, + status_code: int, + response_body: str, + expected_error_type: str, + expected_message_contains: str, +) -> None: + """Test HTTPClient._handle_http_error with various response bodies.""" + import io + import urllib.error + + from vcspull._internal.remotes.base import ( + AuthenticationError, + HTTPClient, + NotFoundError, + RateLimitError, + ServiceUnavailableError, + ) + + error_classes = { + "AuthenticationError": AuthenticationError, + "RateLimitError": RateLimitError, + "NotFoundError": NotFoundError, + "ServiceUnavailableError": ServiceUnavailableError, + } + + client = HTTPClient("https://api.example.com") + exc = urllib.error.HTTPError( + url="https://api.example.com/test", + code=status_code, + msg="Error", + hdrs=None, # type: ignore[arg-type] + fp=io.BytesIO(response_body.encode()), + ) + + with pytest.raises(error_classes[expected_error_type]) as exc_info: + client._handle_http_error(exc, "TestService") + + assert expected_message_contains.lower() in str(exc_info.value).lower() + + +def test_http_client_get_merges_query_params( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test HTTPClient.get properly merges params into URLs with existing query strings. + + Naive f"{url}?{params}" would produce a double-? URL when the endpoint + already contains query parameters. The implementation should use + urllib.parse to merge them correctly. + """ + import json + import urllib.request + + from tests._internal.remotes.conftest import MockHTTPResponse + from vcspull._internal.remotes.base import HTTPClient + + captured_urls: list[str] = [] + + def mock_urlopen( + request: urllib.request.Request, + **kwargs: t.Any, + ) -> MockHTTPResponse: + captured_urls.append(request.full_url) + return MockHTTPResponse(json.dumps({"ok": True}).encode(), {}, 200) + + # Mock urlopen: capture the request URL to verify query param merging + monkeypatch.setattr("urllib.request.urlopen", mock_urlopen) + + client = HTTPClient("https://api.example.com") + + # Endpoint already has a query string; additional params should merge + client.get( + "/search?q=test", + params={"page": 1, "per_page": 10}, + service_name="TestService", + ) + + assert len(captured_urls) == 1 + url = captured_urls[0] + assert "??" not in url, f"Double question mark in URL: {url}" + assert "q=test" in url + assert "page=1" in url + assert "per_page=10" in url + + +def test_http_client_warns_on_non_https_with_token( + caplog: pytest.LogCaptureFixture, +) -> None: + """Test HTTPClient logs a warning when token is sent over non-HTTPS.""" + import logging + + from vcspull._internal.remotes.base import HTTPClient + + caplog.set_level(logging.WARNING) + + HTTPClient("http://insecure.example.com", token="secret-token") + + assert "non-HTTPS" in caplog.text + assert "insecure.example.com" in caplog.text + + +def test_http_client_no_warning_on_https_with_token( + caplog: pytest.LogCaptureFixture, +) -> None: + """Test HTTPClient does not warn when token is sent over HTTPS.""" + import logging + + from vcspull._internal.remotes.base import HTTPClient + + caplog.set_level(logging.WARNING) + + HTTPClient("https://secure.example.com", token="secret-token") + + assert "non-HTTPS" not in caplog.text diff --git a/tests/_internal/remotes/test_codecommit.py b/tests/_internal/remotes/test_codecommit.py new file mode 100644 index 000000000..28624fcda --- /dev/null +++ b/tests/_internal/remotes/test_codecommit.py @@ -0,0 +1,679 @@ +"""Tests for vcspull._internal.remotes.codecommit module.""" + +from __future__ import annotations + +import json +import subprocess +import typing as t + +import pytest + +from vcspull._internal.remotes.base import ImportOptions +from vcspull._internal.remotes.codecommit import CodeCommitImporter + + +def _aws_ok( + stdout: str = "", + stderr: str = "", +) -> subprocess.CompletedProcess[str]: + """Create a successful subprocess result.""" + return subprocess.CompletedProcess( + args=["aws"], + returncode=0, + stdout=stdout, + stderr=stderr, + ) + + +def _aws_err( + stderr: str = "", + returncode: int = 1, +) -> subprocess.CompletedProcess[str]: + """Create a failed subprocess result.""" + return subprocess.CompletedProcess( + args=["aws"], + returncode=returncode, + stdout="", + stderr=stderr, + ) + + +def _make_cc_repo( + name: str, + *, + region: str = "us-east-1", + account_id: str = "123456789012", + default_branch: str = "main", + description: str | None = None, +) -> dict[str, t.Any]: + """Create a CodeCommit repository metadata dict.""" + return { + "repositoryName": name, + "cloneUrlHttp": ( + f"https://git-codecommit.{region}.amazonaws.com/v1/repos/{name}" + ), + "cloneUrlSsh": (f"ssh://git-codecommit.{region}.amazonaws.com/v1/repos/{name}"), + "accountId": account_id, + "defaultBranch": default_branch, + "repositoryDescription": description, + } + + +# --------------------------------------------------------------------------- +# _check_aws_cli +# --------------------------------------------------------------------------- + + +def test_check_aws_cli_file_not_found(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _check_aws_cli raises DependencyError when aws binary missing.""" + from vcspull._internal.remotes.base import DependencyError + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + msg = "aws" + raise FileNotFoundError(msg) + + # Mock subprocess.run: simulate aws binary not found (FileNotFoundError) + monkeypatch.setattr("subprocess.run", mock_run) + + with pytest.raises(DependencyError, match="not installed"): + CodeCommitImporter() + + +def test_check_aws_cli_nonzero_returncode(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _check_aws_cli raises DependencyError for non-zero returncode.""" + from vcspull._internal.remotes.base import DependencyError + + # Mock subprocess.run: simulate aws CLI returning non-zero exit code + monkeypatch.setattr("subprocess.run", lambda cmd, **kw: _aws_err()) + + with pytest.raises(DependencyError, match="not installed"): + CodeCommitImporter() + + +# --------------------------------------------------------------------------- +# _build_aws_command +# --------------------------------------------------------------------------- + + +def test_build_aws_command_no_flags(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _build_aws_command with no region/profile.""" + # Mock subprocess.run: allow CodeCommitImporter construction (aws --version check) + monkeypatch.setattr("subprocess.run", lambda cmd, **kw: _aws_ok("aws-cli/2.x")) + + importer = CodeCommitImporter() + result = importer._build_aws_command("codecommit", "list-repositories") + assert result == ["aws", "--output", "json", "codecommit", "list-repositories"] + + +def test_build_aws_command_with_region(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _build_aws_command appends --region.""" + # Mock subprocess.run: allow CodeCommitImporter construction (aws --version check) + monkeypatch.setattr("subprocess.run", lambda cmd, **kw: _aws_ok("aws-cli/2.x")) + + importer = CodeCommitImporter(region="eu-west-1") + result = importer._build_aws_command("codecommit", "list-repositories") + assert result == [ + "aws", + "--output", + "json", + "--region", + "eu-west-1", + "codecommit", + "list-repositories", + ] + + +def test_build_aws_command_with_profile(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _build_aws_command appends --profile.""" + # Mock subprocess.run: allow CodeCommitImporter construction (aws --version check) + monkeypatch.setattr("subprocess.run", lambda cmd, **kw: _aws_ok("aws-cli/2.x")) + + importer = CodeCommitImporter(profile="myprofile") + result = importer._build_aws_command("codecommit", "list-repositories") + assert result == [ + "aws", + "--output", + "json", + "--profile", + "myprofile", + "codecommit", + "list-repositories", + ] + + +def test_build_aws_command_with_region_and_profile( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test _build_aws_command with both region and profile.""" + # Mock subprocess.run: allow CodeCommitImporter construction (aws --version check) + monkeypatch.setattr("subprocess.run", lambda cmd, **kw: _aws_ok("aws-cli/2.x")) + + importer = CodeCommitImporter(region="ap-south-1", profile="prod") + result = importer._build_aws_command("sts", "get-caller-identity") + assert result == [ + "aws", + "--output", + "json", + "--region", + "ap-south-1", + "--profile", + "prod", + "sts", + "get-caller-identity", + ] + + +# --------------------------------------------------------------------------- +# _run_aws_command — error handling +# --------------------------------------------------------------------------- + + +class RunAwsErrorFixture(t.NamedTuple): + """Fixture for _run_aws_command error test cases.""" + + test_id: str + stderr: str + expected_error_type: str + expected_match: str + + +RUN_AWS_ERROR_FIXTURES: list[RunAwsErrorFixture] = [ + RunAwsErrorFixture( + test_id="credential-error", + stderr="Unable to locate credentials", + expected_error_type="AuthenticationError", + expected_match="credentials not configured", + ), + RunAwsErrorFixture( + test_id="endpoint-connection-error", + stderr="Could not connect to the endpoint URL", + expected_error_type="ConfigurationError", + expected_match="Could not connect", + ), + RunAwsErrorFixture( + test_id="invalid-region-error", + stderr="Invalid region: foobar-1", + expected_error_type="ConfigurationError", + expected_match="Invalid AWS region", + ), + RunAwsErrorFixture( + test_id="generic-aws-error", + stderr="Something unexpected happened", + expected_error_type="ConfigurationError", + expected_match="AWS CLI error", + ), +] + + +@pytest.mark.parametrize( + list(RunAwsErrorFixture._fields), + RUN_AWS_ERROR_FIXTURES, + ids=[f.test_id for f in RUN_AWS_ERROR_FIXTURES], +) +def test_run_aws_command_errors( + test_id: str, + stderr: str, + expected_error_type: str, + expected_match: str, + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test _run_aws_command handles various AWS CLI errors.""" + from vcspull._internal.remotes import base + + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + # First call is _check_aws_cli — succeed + if call_count == 1: + return _aws_ok("aws-cli/2.x") + # Subsequent calls fail with the test error + return _aws_err(stderr=stderr) + + # Mock subprocess.run: first call passes aws --version, subsequent calls fail + # with the specific AWS CLI error under test + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + + error_class = getattr(base, expected_error_type) + with pytest.raises(error_class, match=expected_match): + importer._run_aws_command("codecommit", "list-repositories") + + +def test_run_aws_command_json_parse_error(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _run_aws_command raises ConfigurationError for invalid JSON.""" + from vcspull._internal.remotes.base import ConfigurationError + + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + return _aws_ok("aws-cli/2.x") + return _aws_ok(stdout="not valid json {{{") + + # Mock subprocess.run: first call passes aws --version, second returns invalid JSON + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + + with pytest.raises(ConfigurationError, match="Invalid JSON"): + importer._run_aws_command("codecommit", "list-repositories") + + +def test_run_aws_command_file_not_found(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _run_aws_command raises DependencyError when aws disappears mid-session.""" + from vcspull._internal.remotes.base import DependencyError + + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + return _aws_ok("aws-cli/2.x") + msg = "aws" + raise FileNotFoundError(msg) + + # Mock subprocess.run: first call passes aws --version, second raises + # FileNotFoundError to simulate aws binary disappearing mid-session + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + + with pytest.raises(DependencyError, match="not found"): + importer._run_aws_command("codecommit", "list-repositories") + + +# --------------------------------------------------------------------------- +# fetch_repos +# --------------------------------------------------------------------------- + + +def test_fetch_repos_basic(monkeypatch: pytest.MonkeyPatch) -> None: + """Test fetch_repos returns repos from list + batch-get pipeline.""" + repos_data = [_make_cc_repo("my-repo"), _make_cc_repo("other-repo")] + + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + # _check_aws_cli + return _aws_ok("aws-cli/2.x") + if "list-repositories" in cmd: + return _aws_ok( + json.dumps( + { + "repositories": [ + {"repositoryName": "my-repo"}, + {"repositoryName": "other-repo"}, + ] + } + ) + ) + if "batch-get-repositories" in cmd: + return _aws_ok(json.dumps({"repositories": repos_data})) + return _aws_err(stderr="unknown command") + + # Mock subprocess.run: simulate aws --version, list-repositories, and + # batch-get-repositories responses to test the full fetch pipeline + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + options = ImportOptions() + repos = list(importer.fetch_repos(options)) + + assert len(repos) == 2 + assert repos[0].name == "my-repo" + assert repos[1].name == "other-repo" + + +def test_fetch_repos_empty(monkeypatch: pytest.MonkeyPatch) -> None: + """Test fetch_repos returns nothing when no repositories exist.""" + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + return _aws_ok("aws-cli/2.x") + if "list-repositories" in cmd: + return _aws_ok(json.dumps({"repositories": []})) + return _aws_err(stderr="unknown command") + + # Mock subprocess.run: simulate aws --version and empty list-repositories response + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + options = ImportOptions() + repos = list(importer.fetch_repos(options)) + + assert len(repos) == 0 + + +def test_fetch_repos_name_filter(monkeypatch: pytest.MonkeyPatch) -> None: + """Test fetch_repos filters by target name.""" + repos_data = [_make_cc_repo("django-app")] + + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + return _aws_ok("aws-cli/2.x") + if "list-repositories" in cmd: + return _aws_ok( + json.dumps( + { + "repositories": [ + {"repositoryName": "django-app"}, + {"repositoryName": "flask-app"}, + {"repositoryName": "react-app"}, + ] + } + ) + ) + if "batch-get-repositories" in cmd: + # Only django-app should be requested + assert "django-app" in cmd + return _aws_ok(json.dumps({"repositories": repos_data})) + return _aws_err(stderr="unknown command") + + # Mock subprocess.run: simulate aws --version, list-repositories with + # multiple repos, and batch-get for only the name-filtered subset + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + options = ImportOptions(target="django") + repos = list(importer.fetch_repos(options)) + + assert len(repos) == 1 + assert repos[0].name == "django-app" + + +def test_fetch_repos_limit(monkeypatch: pytest.MonkeyPatch) -> None: + """Test fetch_repos respects limit option.""" + repos_data = [_make_cc_repo(f"repo{i}") for i in range(5)] + + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + return _aws_ok("aws-cli/2.x") + if "list-repositories" in cmd: + return _aws_ok( + json.dumps( + {"repositories": [{"repositoryName": f"repo{i}"} for i in range(5)]} + ) + ) + if "batch-get-repositories" in cmd: + return _aws_ok(json.dumps({"repositories": repos_data})) + return _aws_err(stderr="unknown command") + + # Mock subprocess.run: simulate full pipeline to verify limit is respected + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + options = ImportOptions(limit=2) + repos = list(importer.fetch_repos(options)) + + assert len(repos) == 2 + + +def test_fetch_repos_batch_processing(monkeypatch: pytest.MonkeyPatch) -> None: + """Test fetch_repos batches in groups of 25.""" + # Create 30 repos — should result in 2 batch-get calls (25 + 5) + batch_get_calls: list[list[str]] = [] + + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + return _aws_ok("aws-cli/2.x") + if "list-repositories" in cmd: + return _aws_ok( + json.dumps( + { + "repositories": [ + {"repositoryName": f"repo{i}"} for i in range(30) + ] + } + ) + ) + if "batch-get-repositories" in cmd: + # Extract repo names from command (after --repository-names) + names_idx = cmd.index("--repository-names") + 1 + repo_names = cmd[names_idx:] + batch_get_calls.append(repo_names) + repos = [_make_cc_repo(name) for name in repo_names] + return _aws_ok(json.dumps({"repositories": repos})) + return _aws_err(stderr="unknown command") + + # Mock subprocess.run: simulate 30 repos to verify batch-get splits at 25 + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + options = ImportOptions(limit=100) + repos = list(importer.fetch_repos(options)) + + assert len(repos) == 30 + assert len(batch_get_calls) == 2 + assert len(batch_get_calls[0]) == 25 + assert len(batch_get_calls[1]) == 5 + + +def test_fetch_repos_pagination(monkeypatch: pytest.MonkeyPatch) -> None: + """Test fetch_repos handles nextToken pagination across list-repositories calls.""" + call_count = 0 + list_calls: list[list[str]] = [] + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + return _aws_ok("aws-cli/2.x") + if "list-repositories" in cmd: + list_calls.append(cmd) + if "--next-token" not in cmd: + # First page: return 2 repos + nextToken + return _aws_ok( + json.dumps( + { + "repositories": [ + {"repositoryName": "page1-repo1"}, + {"repositoryName": "page1-repo2"}, + ], + "nextToken": "token-page2", + } + ) + ) + # Second page: return 1 repo, no nextToken + return _aws_ok( + json.dumps( + { + "repositories": [ + {"repositoryName": "page2-repo1"}, + ], + } + ) + ) + if "batch-get-repositories" in cmd: + names_idx = cmd.index("--repository-names") + 1 + repo_names = cmd[names_idx:] + repos = [_make_cc_repo(name) for name in repo_names] + return _aws_ok(json.dumps({"repositories": repos})) + return _aws_err(stderr="unknown command") + + # Mock subprocess.run: simulate paginated list-repositories with nextToken + # to verify the importer follows pagination tokens across pages + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + options = ImportOptions() + repos = list(importer.fetch_repos(options)) + + # Should have consumed both pages + assert len(repos) == 3 + assert {r.name for r in repos} == {"page1-repo1", "page1-repo2", "page2-repo1"} + # Should have made 2 list-repositories calls + assert len(list_calls) == 2 + assert "--next-token" not in list_calls[0] + assert "--next-token" in list_calls[1] + + +# --------------------------------------------------------------------------- +# _parse_repo — region extraction +# --------------------------------------------------------------------------- + + +def test_parse_repo_region_from_clone_url(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _parse_repo extracts region from clone URL when not set.""" + # Mock subprocess.run: allow CodeCommitImporter construction (aws --version check) + monkeypatch.setattr("subprocess.run", lambda cmd, **kw: _aws_ok("aws-cli/2.x")) + + # No region set — should extract from clone URL + importer = CodeCommitImporter(region=None) + data = _make_cc_repo("my-repo", region="us-west-2") + repo = importer._parse_repo(data) + + assert "us-west-2" in repo.html_url + assert "us-east-1" not in repo.html_url + + +def test_parse_repo_region_explicit(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _parse_repo uses explicit region when set.""" + # Mock subprocess.run: allow CodeCommitImporter construction (aws --version check) + monkeypatch.setattr("subprocess.run", lambda cmd, **kw: _aws_ok("aws-cli/2.x")) + + importer = CodeCommitImporter(region="eu-central-1") + data = _make_cc_repo("my-repo", region="us-west-2") + repo = importer._parse_repo(data) + + # Explicit region takes precedence over clone URL + assert "eu-central-1" in repo.html_url + + +def test_parse_repo_fallback_region(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _parse_repo falls back to us-east-1 when no region info available.""" + # Mock subprocess.run: allow CodeCommitImporter construction (aws --version check) + monkeypatch.setattr("subprocess.run", lambda cmd, **kw: _aws_ok("aws-cli/2.x")) + + importer = CodeCommitImporter(region=None) + # Data without a recognizable clone URL + data = { + "repositoryName": "my-repo", + "cloneUrlHttp": "", + "cloneUrlSsh": "", + "accountId": "123456789012", + } + repo = importer._parse_repo(data) + + assert "us-east-1" in repo.html_url + + +def test_parse_repo_fields(monkeypatch: pytest.MonkeyPatch) -> None: + """Test _parse_repo maps all fields correctly.""" + # Mock subprocess.run: allow CodeCommitImporter construction (aws --version check) + monkeypatch.setattr("subprocess.run", lambda cmd, **kw: _aws_ok("aws-cli/2.x")) + + importer = CodeCommitImporter(region="us-east-1") + data = _make_cc_repo( + "test-repo", + region="us-east-1", + account_id="999888777666", + default_branch="develop", + description="A test repository", + ) + repo = importer._parse_repo(data) + + assert repo.name == "test-repo" + assert "git-codecommit.us-east-1" in repo.clone_url + assert "git-codecommit.us-east-1" in repo.ssh_url + assert repo.description == "A test repository" + assert repo.language is None + assert repo.topics == () + assert repo.stars == 0 + assert repo.is_fork is False + assert repo.is_archived is False + assert repo.default_branch == "develop" + assert repo.owner == "999888777666" + + +# --------------------------------------------------------------------------- +# is_authenticated +# --------------------------------------------------------------------------- + + +def test_is_authenticated_success(monkeypatch: pytest.MonkeyPatch) -> None: + """Test is_authenticated returns True when sts get-caller-identity succeeds.""" + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + return _aws_ok("aws-cli/2.x") + # sts get-caller-identity succeeds + return _aws_ok( + json.dumps( + {"UserId": "AIDA...", "Account": "123456789012", "Arn": "arn:..."} + ) + ) + + # Mock subprocess.run: first call passes aws --version, second returns + # successful sts get-caller-identity to confirm credentials are valid + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + + assert importer.is_authenticated is True + + +def test_is_authenticated_failure(monkeypatch: pytest.MonkeyPatch) -> None: + """Test is_authenticated returns False when credentials are missing.""" + call_count = 0 + + def mock_run(cmd: list[str], **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + return _aws_ok("aws-cli/2.x") + # sts get-caller-identity fails with credential error + return _aws_err(stderr="Unable to locate credentials") + + # Mock subprocess.run: first call passes aws --version, second fails + # sts get-caller-identity with credential error to simulate missing credentials + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + + assert importer.is_authenticated is False + + +def test_codecommit_timeout_raises_service_unavailable( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test _run_aws_command raises ServiceUnavailableError on timeout. + + If the AWS CLI hangs (broken credential provider, network issue), + subprocess.run should time out and the error should propagate as + ServiceUnavailableError rather than blocking indefinitely. + """ + from vcspull._internal.remotes.base import ServiceUnavailableError + + call_count = 0 + + def mock_run(*args: t.Any, **kwargs: t.Any) -> subprocess.CompletedProcess[str]: + nonlocal call_count + call_count += 1 + if call_count == 1: + # _check_aws_cli: aws --version succeeds + return _aws_ok("aws-cli/2.x") + # Mock subprocess.run: second call (actual command) raises + # TimeoutExpired to simulate a hung AWS CLI process + raise subprocess.TimeoutExpired(cmd="aws", timeout=60) + + monkeypatch.setattr("subprocess.run", mock_run) + importer = CodeCommitImporter() + + with pytest.raises(ServiceUnavailableError, match="timed out"): + importer._run_aws_command("codecommit", "list-repositories") diff --git a/tests/_internal/remotes/test_gitea.py b/tests/_internal/remotes/test_gitea.py new file mode 100644 index 000000000..8ad9b6f76 --- /dev/null +++ b/tests/_internal/remotes/test_gitea.py @@ -0,0 +1,261 @@ +"""Tests for vcspull._internal.remotes.gitea module.""" + +from __future__ import annotations + +import json +import typing as t + +import pytest + +from vcspull._internal.remotes.base import ImportMode, ImportOptions +from vcspull._internal.remotes.gitea import GiteaImporter + + +def test_gitea_fetch_user( + mock_urlopen: t.Callable[..., None], + gitea_user_repos_response: bytes, +) -> None: + """Test Gitea user repository fetching.""" + mock_urlopen([(gitea_user_repos_response, {}, 200)]) + importer = GiteaImporter(base_url="https://codeberg.org") + options = ImportOptions(mode=ImportMode.USER, target="testuser") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "repo1" + assert repos[0].owner == "testuser" + assert repos[0].stars == 15 + + +def test_gitea_fetch_org( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test Gitea org repository fetching.""" + response_json = [ + { + "name": "org-repo", + "clone_url": "https://codeberg.org/testorg/org-repo.git", + "ssh_url": "git@codeberg.org:testorg/org-repo.git", + "html_url": "https://codeberg.org/testorg/org-repo", + "description": "Org repo", + "language": "Go", + "topics": [], + "stars_count": 100, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testorg"}, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GiteaImporter(base_url="https://codeberg.org") + options = ImportOptions(mode=ImportMode.ORG, target="testorg") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "org-repo" + + +def test_gitea_search_with_wrapped_response( + mock_urlopen: t.Callable[..., None], + gitea_search_response: bytes, +) -> None: + """Test Gitea search handles wrapped response format.""" + mock_urlopen([(gitea_search_response, {}, 200)]) + importer = GiteaImporter(base_url="https://codeberg.org") + options = ImportOptions(mode=ImportMode.SEARCH, target="test") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "search-result" + + +def test_gitea_search_with_array_response( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test Gitea search handles plain array response format.""" + # Some Gitea instances return plain array instead of {"ok": true, "data": [...]} + response_json = [ + { + "name": "plain-result", + "clone_url": "https://gitea.example.com/user/plain-result.git", + "ssh_url": "git@gitea.example.com:user/plain-result.git", + "html_url": "https://gitea.example.com/user/plain-result", + "description": "Plain array result", + "language": "Python", + "topics": [], + "stars_count": 20, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GiteaImporter(base_url="https://gitea.example.com") + options = ImportOptions(mode=ImportMode.SEARCH, target="test") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "plain-result" + + +def test_gitea_importer_defaults_to_codeberg() -> None: + """Test GiteaImporter defaults to Codeberg URL.""" + importer = GiteaImporter() + assert importer._base_url == "https://codeberg.org" + + +def test_gitea_importer_service_name() -> None: + """Test service_name property.""" + importer = GiteaImporter() + assert importer.service_name == "Gitea" + + +def test_gitea_importer_is_authenticated_without_token( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test is_authenticated returns False without token.""" + # Clear environment variables that could provide a token + monkeypatch.delenv("CODEBERG_TOKEN", raising=False) + monkeypatch.delenv("GITEA_TOKEN", raising=False) + monkeypatch.delenv("FORGEJO_TOKEN", raising=False) + importer = GiteaImporter(token=None) + assert importer.is_authenticated is False + + +def test_gitea_importer_is_authenticated_with_token() -> None: + """Test is_authenticated returns True with token.""" + importer = GiteaImporter(token="test-token") + assert importer.is_authenticated is True + + +def test_gitea_uses_stars_count_field( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test Gitea correctly reads stars_count (not stargazers_count).""" + response_json = [ + { + "name": "starred-repo", + "clone_url": "https://codeberg.org/user/starred-repo.git", + "ssh_url": "git@codeberg.org:user/starred-repo.git", + "html_url": "https://codeberg.org/user/starred-repo", + "description": "Popular repo", + "language": "Rust", + "topics": [], + "stars_count": 500, # Gitea uses stars_count + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GiteaImporter(base_url="https://codeberg.org") + options = ImportOptions(mode=ImportMode.USER, target="user") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].stars == 500 + + +def test_gitea_handles_null_topics( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test Gitea handles null topics in API response. + + Gitea API can return "topics": null instead of an empty array. + dict.get("topics", []) returns None when the key exists with null value, + causing tuple(None) to crash with TypeError. + """ + response_json = [ + { + "name": "null-topics-repo", + "clone_url": "https://codeberg.org/user/null-topics-repo.git", + "ssh_url": "git@codeberg.org:user/null-topics-repo.git", + "html_url": "https://codeberg.org/user/null-topics-repo", + "description": "Repo with null topics", + "language": "Python", + "topics": None, + "stars_count": 10, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GiteaImporter(base_url="https://codeberg.org") + options = ImportOptions(mode=ImportMode.USER, target="user") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].topics == () + + +def test_gitea_filters_by_language( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test Gitea language filter works.""" + response_json = [ + { + "name": "go-repo", + "clone_url": "https://codeberg.org/user/go-repo.git", + "ssh_url": "git@codeberg.org:user/go-repo.git", + "html_url": "https://codeberg.org/user/go-repo", + "description": "Go repo", + "language": "Go", + "topics": [], + "stars_count": 50, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + }, + { + "name": "rust-repo", + "clone_url": "https://codeberg.org/user/rust-repo.git", + "ssh_url": "git@codeberg.org:user/rust-repo.git", + "html_url": "https://codeberg.org/user/rust-repo", + "description": "Rust repo", + "language": "Rust", + "topics": [], + "stars_count": 30, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + }, + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GiteaImporter(base_url="https://codeberg.org") + options = ImportOptions(mode=ImportMode.USER, target="user", language="Rust") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "rust-repo" + + +def test_gitea_parse_repo_null_owner( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test Gitea _parse_repo handles null owner without crashing. + + Gitea/Forgejo APIs may return ``"owner": null`` for system repositories. + The importer must not raise AttributeError when this happens. + """ + response_json = [ + { + "name": "sys-repo", + "clone_url": "https://codeberg.org/sys-repo.git", + "ssh_url": "git@codeberg.org:sys-repo.git", + "html_url": "https://codeberg.org/sys-repo", + "description": "System repo", + "language": "Go", + "topics": [], + "stars_count": 0, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": None, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GiteaImporter(base_url="https://codeberg.org") + options = ImportOptions(mode=ImportMode.USER, target="testuser") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].owner == "" diff --git a/tests/_internal/remotes/test_github.py b/tests/_internal/remotes/test_github.py new file mode 100644 index 000000000..3bb022b8a --- /dev/null +++ b/tests/_internal/remotes/test_github.py @@ -0,0 +1,655 @@ +"""Tests for vcspull._internal.remotes.github module.""" + +from __future__ import annotations + +import json +import typing as t + +import pytest + +from vcspull._internal.remotes.base import ImportMode, ImportOptions +from vcspull._internal.remotes.github import GitHubImporter + + +class GitHubUserFixture(t.NamedTuple): + """Fixture for GitHub user import test cases.""" + + test_id: str + response_json: list[dict[str, t.Any]] + options_kwargs: dict[str, t.Any] + expected_count: int + expected_names: list[str] + + +GITHUB_USER_FIXTURES: list[GitHubUserFixture] = [ + GitHubUserFixture( + test_id="single-repo-user", + response_json=[ + { + "name": "repo1", + "clone_url": "https://github.com/testuser/repo1.git", + "ssh_url": "git@github.com:testuser/repo1.git", + "html_url": "https://github.com/testuser/repo1", + "description": "Test repo", + "language": "Python", + "topics": [], + "stargazers_count": 10, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + } + ], + options_kwargs={"mode": ImportMode.USER, "target": "testuser"}, + expected_count=1, + expected_names=["repo1"], + ), + GitHubUserFixture( + test_id="multiple-repos-forks-excluded", + response_json=[ + { + "name": "original", + "clone_url": "https://github.com/testuser/original.git", + "ssh_url": "git@github.com:testuser/original.git", + "html_url": "https://github.com/testuser/original", + "description": "Original repo", + "language": "Python", + "topics": [], + "stargazers_count": 100, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + { + "name": "forked", + "clone_url": "https://github.com/testuser/forked.git", + "ssh_url": "git@github.com:testuser/forked.git", + "html_url": "https://github.com/testuser/forked", + "description": "Forked repo", + "language": "Python", + "topics": [], + "stargazers_count": 5, + "fork": True, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + ], + options_kwargs={ + "mode": ImportMode.USER, + "target": "testuser", + "include_forks": False, + }, + expected_count=1, + expected_names=["original"], + ), + GitHubUserFixture( + test_id="multiple-repos-forks-included", + response_json=[ + { + "name": "original", + "clone_url": "https://github.com/testuser/original.git", + "ssh_url": "git@github.com:testuser/original.git", + "html_url": "https://github.com/testuser/original", + "description": "Original repo", + "language": "Python", + "topics": [], + "stargazers_count": 100, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + { + "name": "forked", + "clone_url": "https://github.com/testuser/forked.git", + "ssh_url": "git@github.com:testuser/forked.git", + "html_url": "https://github.com/testuser/forked", + "description": "Forked repo", + "language": "Python", + "topics": [], + "stargazers_count": 5, + "fork": True, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + ], + options_kwargs={ + "mode": ImportMode.USER, + "target": "testuser", + "include_forks": True, + }, + expected_count=2, + expected_names=["original", "forked"], + ), + GitHubUserFixture( + test_id="archived-excluded-by-default", + response_json=[ + { + "name": "active", + "clone_url": "https://github.com/testuser/active.git", + "ssh_url": "git@github.com:testuser/active.git", + "html_url": "https://github.com/testuser/active", + "description": "Active repo", + "language": "Python", + "topics": [], + "stargazers_count": 50, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + { + "name": "archived", + "clone_url": "https://github.com/testuser/archived.git", + "ssh_url": "git@github.com:testuser/archived.git", + "html_url": "https://github.com/testuser/archived", + "description": "Archived repo", + "language": "Python", + "topics": [], + "stargazers_count": 10, + "fork": False, + "archived": True, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + ], + options_kwargs={ + "mode": ImportMode.USER, + "target": "testuser", + "include_archived": False, + }, + expected_count=1, + expected_names=["active"], + ), + GitHubUserFixture( + test_id="language-filter-applied", + response_json=[ + { + "name": "python-repo", + "clone_url": "https://github.com/testuser/python-repo.git", + "ssh_url": "git@github.com:testuser/python-repo.git", + "html_url": "https://github.com/testuser/python-repo", + "description": "Python repo", + "language": "Python", + "topics": [], + "stargazers_count": 50, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + { + "name": "js-repo", + "clone_url": "https://github.com/testuser/js-repo.git", + "ssh_url": "git@github.com:testuser/js-repo.git", + "html_url": "https://github.com/testuser/js-repo", + "description": "JavaScript repo", + "language": "JavaScript", + "topics": [], + "stargazers_count": 30, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testuser"}, + }, + ], + options_kwargs={ + "mode": ImportMode.USER, + "target": "testuser", + "language": "Python", + }, + expected_count=1, + expected_names=["python-repo"], + ), + GitHubUserFixture( + test_id="empty-response-returns-empty-list", + response_json=[], + options_kwargs={"mode": ImportMode.USER, "target": "emptyuser"}, + expected_count=0, + expected_names=[], + ), +] + + +@pytest.mark.parametrize( + list(GitHubUserFixture._fields), + GITHUB_USER_FIXTURES, + ids=[f.test_id for f in GITHUB_USER_FIXTURES], +) +def test_github_fetch_user( + test_id: str, + response_json: list[dict[str, t.Any]], + options_kwargs: dict[str, t.Any], + expected_count: int, + expected_names: list[str], + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitHub user repository fetching with various scenarios.""" + mock_urlopen( + [ + ( + json.dumps(response_json).encode(), + {"x-ratelimit-remaining": "100", "x-ratelimit-limit": "60"}, + 200, + ) + ] + ) + importer = GitHubImporter() + options = ImportOptions(**options_kwargs) + repos = list(importer.fetch_repos(options)) + assert len(repos) == expected_count + assert [r.name for r in repos] == expected_names + + +def test_github_fetch_org( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitHub org repository fetching.""" + response_json = [ + { + "name": "org-repo", + "clone_url": "https://github.com/testorg/org-repo.git", + "ssh_url": "git@github.com:testorg/org-repo.git", + "html_url": "https://github.com/testorg/org-repo", + "description": "Org repo", + "language": "Python", + "topics": [], + "stargazers_count": 200, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "testorg"}, + } + ] + mock_urlopen( + [ + ( + json.dumps(response_json).encode(), + {"x-ratelimit-remaining": "100", "x-ratelimit-limit": "60"}, + 200, + ) + ] + ) + importer = GitHubImporter() + options = ImportOptions(mode=ImportMode.ORG, target="testorg") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "org-repo" + assert repos[0].owner == "testorg" + + +def test_github_fetch_search( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitHub search repository fetching.""" + search_response = { + "total_count": 1, + "items": [ + { + "name": "search-result", + "clone_url": "https://github.com/user/search-result.git", + "ssh_url": "git@github.com:user/search-result.git", + "html_url": "https://github.com/user/search-result", + "description": "Found by search", + "language": "Python", + "topics": ["machine-learning"], + "stargazers_count": 1000, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + } + ], + } + mock_urlopen( + [ + ( + json.dumps(search_response).encode(), + {"x-ratelimit-remaining": "100", "x-ratelimit-limit": "60"}, + 200, + ) + ] + ) + importer = GitHubImporter() + options = ImportOptions(mode=ImportMode.SEARCH, target="machine learning") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "search-result" + assert repos[0].stars == 1000 + + +def test_github_importer_is_authenticated_without_token( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test is_authenticated returns False without token.""" + # Clear environment variables that could provide a token + monkeypatch.delenv("GITHUB_TOKEN", raising=False) + monkeypatch.delenv("GH_TOKEN", raising=False) + importer = GitHubImporter(token=None) + assert importer.is_authenticated is False + + +def test_github_importer_is_authenticated_with_token() -> None: + """Test is_authenticated returns True with token.""" + importer = GitHubImporter(token="test-token") + assert importer.is_authenticated is True + + +def test_github_importer_service_name() -> None: + """Test service_name property.""" + importer = GitHubImporter() + assert importer.service_name == "GitHub" + + +def test_github_enterprise_url_normalized() -> None: + """Test that GitHub Enterprise URLs get /api/v3 appended.""" + importer = GitHubImporter(token="fake", base_url="https://ghe.example.com") + assert importer._client.base_url == "https://ghe.example.com/api/v3" + + +def test_github_enterprise_url_already_has_api() -> None: + """Test that GHE URLs with /api/v3 are not double-suffixed.""" + importer = GitHubImporter(token="fake", base_url="https://ghe.example.com/api/v3") + assert importer._client.base_url == "https://ghe.example.com/api/v3" + + +def test_github_public_url_not_modified() -> None: + """Test that default api.github.com URL is not modified.""" + importer = GitHubImporter(token="fake") + assert importer._client.base_url == "https://api.github.com" + + +def test_github_handles_null_topics( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitHub handles null topics in API response. + + GitHub API can return "topics": null instead of an empty array. + dict.get("topics", []) returns None when the key exists with null value, + causing tuple(None) to crash with TypeError. + """ + response_json = [ + { + "name": "null-topics-repo", + "clone_url": "https://github.com/user/null-topics-repo.git", + "ssh_url": "git@github.com:user/null-topics-repo.git", + "html_url": "https://github.com/user/null-topics-repo", + "description": "Repo with null topics", + "language": "Python", + "topics": None, + "stargazers_count": 10, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + } + ] + mock_urlopen( + [ + ( + json.dumps(response_json).encode(), + {"x-ratelimit-remaining": "100", "x-ratelimit-limit": "60"}, + 200, + ) + ] + ) + importer = GitHubImporter() + options = ImportOptions(mode=ImportMode.USER, target="user") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].topics == () + + +def test_github_limit_respected( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test that limit option is respected.""" + # Create response with 5 repos + response_json = [ + { + "name": f"repo{i}", + "clone_url": f"https://github.com/user/repo{i}.git", + "ssh_url": f"git@github.com:user/repo{i}.git", + "html_url": f"https://github.com/user/repo{i}", + "description": f"Repo {i}", + "language": "Python", + "topics": [], + "stargazers_count": 10, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + } + for i in range(5) + ] + mock_urlopen( + [ + ( + json.dumps(response_json).encode(), + {"x-ratelimit-remaining": "100", "x-ratelimit-limit": "60"}, + 200, + ) + ] + ) + importer = GitHubImporter() + options = ImportOptions(mode=ImportMode.USER, target="user", limit=3) + repos = list(importer.fetch_repos(options)) + assert len(repos) == 3 + + +class LogRateLimitFixture(t.NamedTuple): + """Fixture for _log_rate_limit test cases.""" + + test_id: str + headers: dict[str, str] + expected_log_level: str | None + expected_message_fragment: str | None + + +LOG_RATE_LIMIT_FIXTURES: list[LogRateLimitFixture] = [ + LogRateLimitFixture( + test_id="valid-headers-low-remaining", + headers={"x-ratelimit-remaining": "5", "x-ratelimit-limit": "60"}, + expected_log_level="warning", + expected_message_fragment="rate limit low", + ), + LogRateLimitFixture( + test_id="valid-headers-sufficient-remaining", + headers={"x-ratelimit-remaining": "50", "x-ratelimit-limit": "60"}, + expected_log_level="debug", + expected_message_fragment="rate limit", + ), + LogRateLimitFixture( + test_id="non-numeric-remaining-header", + headers={"x-ratelimit-remaining": "unlimited", "x-ratelimit-limit": "60"}, + expected_log_level=None, + expected_message_fragment=None, + ), + LogRateLimitFixture( + test_id="missing-remaining-header", + headers={"x-ratelimit-limit": "60"}, + expected_log_level=None, + expected_message_fragment=None, + ), + LogRateLimitFixture( + test_id="missing-both-headers", + headers={}, + expected_log_level=None, + expected_message_fragment=None, + ), +] + + +@pytest.mark.parametrize( + list(LogRateLimitFixture._fields), + LOG_RATE_LIMIT_FIXTURES, + ids=[f.test_id for f in LOG_RATE_LIMIT_FIXTURES], +) +def test_log_rate_limit( + test_id: str, + headers: dict[str, str], + expected_log_level: str | None, + expected_message_fragment: str | None, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _log_rate_limit handles various header scenarios.""" + import logging + + caplog.set_level(logging.DEBUG) + importer = GitHubImporter() + # Should not raise on any input + importer._log_rate_limit(headers) + + if expected_message_fragment is not None: + assert expected_message_fragment in caplog.text.lower() + else: + # No rate limit message should appear + assert "rate limit" not in caplog.text.lower() + + +def test_github_parse_repo_missing_keys( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitHub _parse_repo handles incomplete API responses gracefully. + + GitHub API responses may lack keys like 'name', 'clone_url', or 'html_url' + in edge cases (partial responses, API changes). Using .get() with defaults + prevents KeyError crashes. + """ + response_json = [ + { + # Missing: name, clone_url, html_url, ssh_url + "description": "Incomplete repo data", + "language": "Python", + "topics": ["test"], + "stargazers_count": 5, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + } + ] + mock_urlopen( + [ + ( + json.dumps(response_json).encode(), + {"x-ratelimit-remaining": "100", "x-ratelimit-limit": "60"}, + 200, + ) + ] + ) + importer = GitHubImporter() + options = ImportOptions(mode=ImportMode.USER, target="user") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "" + assert repos[0].clone_url == "" + assert repos[0].html_url == "" + assert repos[0].ssh_url == "" + + +def test_github_parse_repo_null_owner( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitHub _parse_repo handles null owner without crashing. + + JSON APIs may return ``"owner": null`` for deleted/suspended accounts. + The importer must not raise AttributeError when this happens. + """ + response_json = [ + { + "name": "repo", + "clone_url": "https://github.com/ghost/repo.git", + "ssh_url": "git@github.com:ghost/repo.git", + "html_url": "https://github.com/ghost/repo", + "description": "Orphaned repo", + "language": "Python", + "topics": [], + "stargazers_count": 1, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": None, + } + ] + mock_urlopen( + [ + ( + json.dumps(response_json).encode(), + {"x-ratelimit-remaining": "100", "x-ratelimit-limit": "60"}, + 200, + ) + ] + ) + importer = GitHubImporter() + options = ImportOptions(mode=ImportMode.USER, target="ghost") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].owner == "" + + +def test_github_search_caps_at_1000_results( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test GitHub search stops paginating at 1000 results. + + GitHub's search API returns HTTP 422 beyond offset 1000. + The importer must stop before requesting page 11 (with per_page=100). + """ + from tests._internal.remotes.conftest import MockHTTPResponse + + call_count = 0 + + def make_search_page() -> dict[str, t.Any]: + """Create a full page of 100 search results.""" + return { + "total_count": 5000, + "items": [ + { + "name": f"repo-{i}", + "clone_url": f"https://github.com/user/repo-{i}.git", + "ssh_url": f"git@github.com:user/repo-{i}.git", + "html_url": f"https://github.com/user/repo-{i}", + "description": f"Repo {i}", + "language": "Python", + "topics": [], + "stargazers_count": 100, + "fork": False, + "archived": False, + "default_branch": "main", + "owner": {"login": "user"}, + } + for i in range(100) + ], + } + + def urlopen_side_effect( + request: t.Any, + timeout: int | None = None, + ) -> MockHTTPResponse: + nonlocal call_count + call_count += 1 + page_data = make_search_page() + return MockHTTPResponse( + json.dumps(page_data).encode(), + {"x-ratelimit-remaining": "100", "x-ratelimit-limit": "60"}, + 200, + ) + + # Mock urlopen: track how many API requests are made + monkeypatch.setattr("urllib.request.urlopen", urlopen_side_effect) + + importer = GitHubImporter() + options = ImportOptions( + mode=ImportMode.SEARCH, + target="test", + limit=5000, + ) + repos = list(importer.fetch_repos(options)) + + # Should have fetched at most 10 pages (1000 results) + assert call_count <= 10, f"Expected at most 10 API calls, got {call_count}" + assert len(repos) <= 1000 diff --git a/tests/_internal/remotes/test_gitlab.py b/tests/_internal/remotes/test_gitlab.py new file mode 100644 index 000000000..3a65e0bc0 --- /dev/null +++ b/tests/_internal/remotes/test_gitlab.py @@ -0,0 +1,597 @@ +"""Tests for vcspull._internal.remotes.gitlab module.""" + +from __future__ import annotations + +import json +import typing as t +import urllib.request + +import pytest + +from tests._internal.remotes.conftest import MockHTTPResponse +from vcspull._internal.remotes.base import ImportMode, ImportOptions +from vcspull._internal.remotes.gitlab import GitLabImporter + + +def test_gitlab_fetch_user( + mock_urlopen: t.Callable[..., None], + gitlab_user_projects_response: bytes, +) -> None: + """Test GitLab user project fetching.""" + mock_urlopen([(gitlab_user_projects_response, {}, 200)]) + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.USER, target="testuser") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "project1" + assert repos[0].owner == "testuser" + + +def test_gitlab_fetch_group( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitLab group (org) project fetching.""" + response_json = [ + { + "path": "group-project", + "name": "Group Project", + "http_url_to_repo": "https://gitlab.com/testgroup/group-project.git", + "ssh_url_to_repo": "git@gitlab.com:testgroup/group-project.git", + "web_url": "https://gitlab.com/testgroup/group-project", + "description": "Group project", + "topics": [], + "star_count": 50, + "archived": False, + "default_branch": "main", + "namespace": {"path": "testgroup"}, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.ORG, target="testgroup") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "group-project" + + +def test_gitlab_owner_uses_namespace_full_path( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitLab owner preserves full namespace path when available.""" + response_json = [ + { + "path": "group-project", + "name": "Group Project", + "path_with_namespace": ( + "vcs-python-group-test/vcs-python-subgroup-test/group-project" + ), + "http_url_to_repo": ( + "https://gitlab.com/vcs-python-group-test/" + "vcs-python-subgroup-test/group-project.git" + ), + "ssh_url_to_repo": ( + "git@gitlab.com:vcs-python-group-test/" + "vcs-python-subgroup-test/group-project.git" + ), + "web_url": ( + "https://gitlab.com/vcs-python-group-test/" + "vcs-python-subgroup-test/group-project" + ), + "description": "Group project", + "topics": [], + "star_count": 50, + "archived": False, + "default_branch": "main", + "namespace": { + "path": "vcs-python-subgroup-test", + "full_path": "vcs-python-group-test/vcs-python-subgroup-test", + }, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.ORG, target="vcs-python-group-test") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].owner == "vcs-python-group-test/vcs-python-subgroup-test" + + +def test_gitlab_owner_falls_back_to_path_with_namespace( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test owner derivation uses path_with_namespace when full_path is missing.""" + response_json = [ + { + "path": "group-project", + "name": "Group Project", + "path_with_namespace": ( + "vcs-python-group-test/vcs-python-subgroup-test/group-project" + ), + "http_url_to_repo": ( + "https://gitlab.com/vcs-python-group-test/" + "vcs-python-subgroup-test/group-project.git" + ), + "ssh_url_to_repo": ( + "git@gitlab.com:vcs-python-group-test/" + "vcs-python-subgroup-test/group-project.git" + ), + "web_url": ( + "https://gitlab.com/vcs-python-group-test/" + "vcs-python-subgroup-test/group-project" + ), + "description": "Group project", + "topics": [], + "star_count": 50, + "archived": False, + "default_branch": "main", + "namespace": { + "path": "vcs-python-subgroup-test", + }, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.ORG, target="vcs-python-group-test") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].owner == "vcs-python-group-test/vcs-python-subgroup-test" + + +def test_gitlab_search_requires_auth( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test GitLab search raises error without authentication.""" + from vcspull._internal.remotes.base import AuthenticationError + + # Clear environment variables that could provide a token + monkeypatch.delenv("GITLAB_TOKEN", raising=False) + monkeypatch.delenv("GL_TOKEN", raising=False) + importer = GitLabImporter(token=None) + options = ImportOptions(mode=ImportMode.SEARCH, target="test") + with pytest.raises(AuthenticationError, match="requires authentication"): + list(importer.fetch_repos(options)) + + +def test_gitlab_search_with_auth( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitLab search works with authentication.""" + search_response = [ + { + "path": "search-result", + "name": "Search Result", + "http_url_to_repo": "https://gitlab.com/user/search-result.git", + "ssh_url_to_repo": "git@gitlab.com:user/search-result.git", + "web_url": "https://gitlab.com/user/search-result", + "description": "Found", + "topics": [], + "star_count": 100, + "archived": False, + "default_branch": "main", + "namespace": {"path": "user"}, + } + ] + mock_urlopen([(json.dumps(search_response).encode(), {}, 200)]) + importer = GitLabImporter(token="test-token") + options = ImportOptions(mode=ImportMode.SEARCH, target="test") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "search-result" + + +def test_gitlab_importer_is_authenticated_without_token( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test is_authenticated returns False without token.""" + # Clear environment variables that could provide a token + monkeypatch.delenv("GITLAB_TOKEN", raising=False) + monkeypatch.delenv("GL_TOKEN", raising=False) + importer = GitLabImporter(token=None) + assert importer.is_authenticated is False + + +def test_gitlab_importer_is_authenticated_with_token() -> None: + """Test is_authenticated returns True with token.""" + importer = GitLabImporter(token="test-token") + assert importer.is_authenticated is True + + +def test_gitlab_importer_service_name() -> None: + """Test service_name property.""" + importer = GitLabImporter() + assert importer.service_name == "GitLab" + + +def test_gitlab_handles_forked_project( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitLab correctly identifies forked projects.""" + response_json = [ + { + "path": "forked-project", + "name": "Forked Project", + "http_url_to_repo": "https://gitlab.com/user/forked-project.git", + "ssh_url_to_repo": "git@gitlab.com:user/forked-project.git", + "web_url": "https://gitlab.com/user/forked-project", + "description": "A fork", + "topics": [], + "star_count": 5, + "archived": False, + "default_branch": "main", + "namespace": {"path": "user"}, + "forked_from_project": {"id": 123}, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.USER, target="user", include_forks=False) + repos = list(importer.fetch_repos(options)) + # Fork should be filtered out + assert len(repos) == 0 + + +def test_gitlab_uses_path_not_name( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitLab uses 'path' for filesystem-safe names, not 'name'.""" + response_json = [ + { + "path": "my-project", + "name": "My Project With Spaces", # This should NOT be used + "http_url_to_repo": "https://gitlab.com/user/my-project.git", + "ssh_url_to_repo": "git@gitlab.com:user/my-project.git", + "web_url": "https://gitlab.com/user/my-project", + "description": "Project with spaces in name", + "topics": [], + "star_count": 10, + "archived": False, + "default_branch": "main", + "namespace": {"path": "user"}, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.USER, target="user") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "my-project" # Uses 'path', not 'name' + + +def test_gitlab_subgroup_url_encoding( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that GitLab subgroups are URL-encoded correctly. + + Subgroups use slash notation (e.g., parent/child) which must be + URL-encoded as %2F in API requests. + """ + captured_urls: list[str] = [] + + response_json = [ + { + "path": "subgroup-project", + "name": "Subgroup Project", + "http_url_to_repo": "https://gitlab.com/parent/child/subgroup-project.git", + "ssh_url_to_repo": "git@gitlab.com:parent/child/subgroup-project.git", + "web_url": "https://gitlab.com/parent/child/subgroup-project", + "description": "Project in subgroup", + "topics": [], + "star_count": 10, + "archived": False, + "default_branch": "main", + "namespace": {"path": "child", "full_path": "parent/child"}, + } + ] + + def urlopen_capture( + request: urllib.request.Request, + timeout: int | None = None, + ) -> MockHTTPResponse: + captured_urls.append(request.full_url) + return MockHTTPResponse(json.dumps(response_json).encode()) + + # Mock urlopen: capture request URLs to verify subgroup path encoding + monkeypatch.setattr("urllib.request.urlopen", urlopen_capture) + + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.ORG, target="parent/child") + repos = list(importer.fetch_repos(options)) + + # Verify the URL was encoded correctly + assert len(captured_urls) == 1 + assert "parent%2Fchild" in captured_urls[0], ( + f"Expected URL-encoded subgroup path 'parent%2Fchild', got: {captured_urls[0]}" + ) + assert "/groups/parent%2Fchild/projects" in captured_urls[0] + + # Verify repos were returned + assert len(repos) == 1 + assert repos[0].name == "subgroup-project" + assert repos[0].owner == "parent/child" + + +def test_gitlab_deeply_nested_subgroup( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that deeply nested subgroups (multiple slashes) work correctly.""" + captured_urls: list[str] = [] + + response_json = [ + { + "path": "deep-project", + "name": "Deep Project", + "http_url_to_repo": "https://gitlab.com/a/b/c/d/deep-project.git", + "ssh_url_to_repo": "git@gitlab.com:a/b/c/d/deep-project.git", + "web_url": "https://gitlab.com/a/b/c/d/deep-project", + "description": "Deeply nested project", + "topics": [], + "star_count": 5, + "archived": False, + "default_branch": "main", + "namespace": {"path": "d", "full_path": "a/b/c/d"}, + } + ] + + def urlopen_capture( + request: urllib.request.Request, + timeout: int | None = None, + ) -> MockHTTPResponse: + captured_urls.append(request.full_url) + return MockHTTPResponse(json.dumps(response_json).encode()) + + # Mock urlopen: capture request URLs to verify deep nesting path encoding + monkeypatch.setattr("urllib.request.urlopen", urlopen_capture) + + importer = GitLabImporter() + # Test with 4 levels of nesting: a/b/c/d + options = ImportOptions(mode=ImportMode.ORG, target="a/b/c/d") + repos = list(importer.fetch_repos(options)) + + # Verify URL encoding - each slash should become %2F + assert len(captured_urls) == 1 + assert "a%2Fb%2Fc%2Fd" in captured_urls[0], ( + f"Expected URL-encoded path 'a%2Fb%2Fc%2Fd', got: {captured_urls[0]}" + ) + + assert len(repos) == 1 + assert repos[0].name == "deep-project" + assert repos[0].owner == "a/b/c/d" + + +def test_gitlab_handles_null_topics( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitLab handles null topics in API response. + + GitLab API can return "topics": null instead of an empty array. + dict.get("topics", []) returns None when the key exists with null value, + causing tuple(None) to crash with TypeError. + """ + response_json = [ + { + "path": "null-topics-project", + "name": "Null Topics Project", + "http_url_to_repo": "https://gitlab.com/user/null-topics-project.git", + "ssh_url_to_repo": "git@gitlab.com:user/null-topics-project.git", + "web_url": "https://gitlab.com/user/null-topics-project", + "description": "Project with null topics", + "topics": None, + "star_count": 10, + "archived": False, + "default_branch": "main", + "namespace": {"path": "user"}, + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.USER, target="user") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].topics == () + + +def test_gitlab_archived_param_omitted_when_including( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that archived param is omitted when include_archived=True. + + GitLab API: archived=true returns ONLY archived projects. + Omitting the param returns all projects (archived + non-archived). + """ + captured_urls: list[str] = [] + + response_json = [ + { + "path": "project1", + "name": "Project 1", + "http_url_to_repo": "https://gitlab.com/user/project1.git", + "ssh_url_to_repo": "git@gitlab.com:user/project1.git", + "web_url": "https://gitlab.com/user/project1", + "description": "Active project", + "topics": [], + "star_count": 10, + "archived": False, + "default_branch": "main", + "namespace": {"path": "user"}, + } + ] + + def urlopen_capture( + request: urllib.request.Request, + timeout: int | None = None, + ) -> MockHTTPResponse: + captured_urls.append(request.full_url) + return MockHTTPResponse(json.dumps(response_json).encode()) + + # Mock urlopen: capture request URLs to verify archived param is omitted + monkeypatch.setattr("urllib.request.urlopen", urlopen_capture) + + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.USER, target="user", include_archived=True) + list(importer.fetch_repos(options)) + + assert len(captured_urls) == 1 + # archived param should NOT be in the URL when include_archived=True + assert "archived=" not in captured_urls[0], ( + f"Expected no 'archived' param in URL, got: {captured_urls[0]}" + ) + + +def test_gitlab_archived_param_false_when_excluding( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that archived=false is set when include_archived=False.""" + captured_urls: list[str] = [] + + response_json = [ + { + "path": "project1", + "name": "Project 1", + "http_url_to_repo": "https://gitlab.com/user/project1.git", + "ssh_url_to_repo": "git@gitlab.com:user/project1.git", + "web_url": "https://gitlab.com/user/project1", + "description": "Active project", + "topics": [], + "star_count": 10, + "archived": False, + "default_branch": "main", + "namespace": {"path": "user"}, + } + ] + + def urlopen_capture( + request: urllib.request.Request, + timeout: int | None = None, + ) -> MockHTTPResponse: + captured_urls.append(request.full_url) + return MockHTTPResponse(json.dumps(response_json).encode()) + + # Mock urlopen: capture request URLs to verify archived=false is included + monkeypatch.setattr("urllib.request.urlopen", urlopen_capture) + + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.USER, target="user", include_archived=False) + list(importer.fetch_repos(options)) + + assert len(captured_urls) == 1 + assert "archived=false" in captured_urls[0], ( + f"Expected 'archived=false' in URL, got: {captured_urls[0]}" + ) + + +def test_gitlab_search_archived_param_false_when_excluding( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that _fetch_search includes archived=false when excluding archived.""" + captured_urls: list[str] = [] + + search_response = [ + { + "path": "search-result", + "name": "Search Result", + "http_url_to_repo": "https://gitlab.com/user/search-result.git", + "ssh_url_to_repo": "git@gitlab.com:user/search-result.git", + "web_url": "https://gitlab.com/user/search-result", + "description": "Found", + "topics": [], + "star_count": 100, + "archived": False, + "default_branch": "main", + "namespace": {"path": "user"}, + } + ] + + def urlopen_capture( + request: urllib.request.Request, + timeout: int | None = None, + ) -> MockHTTPResponse: + captured_urls.append(request.full_url) + return MockHTTPResponse(json.dumps(search_response).encode()) + + # Mock urlopen: capture request URLs to verify search archived=false param + monkeypatch.setattr("urllib.request.urlopen", urlopen_capture) + + importer = GitLabImporter(token="test-token") + options = ImportOptions( + mode=ImportMode.SEARCH, target="test", include_archived=False + ) + list(importer.fetch_repos(options)) + + assert len(captured_urls) == 1 + assert "archived=false" in captured_urls[0], ( + f"Expected 'archived=false' in search URL, got: {captured_urls[0]}" + ) + + +def test_gitlab_search_archived_param_omitted_when_including( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that _fetch_search omits archived param when including archived.""" + captured_urls: list[str] = [] + + search_response = [ + { + "path": "search-result", + "name": "Search Result", + "http_url_to_repo": "https://gitlab.com/user/search-result.git", + "ssh_url_to_repo": "git@gitlab.com:user/search-result.git", + "web_url": "https://gitlab.com/user/search-result", + "description": "Found", + "topics": [], + "star_count": 100, + "archived": False, + "default_branch": "main", + "namespace": {"path": "user"}, + } + ] + + def urlopen_capture( + request: urllib.request.Request, + timeout: int | None = None, + ) -> MockHTTPResponse: + captured_urls.append(request.full_url) + return MockHTTPResponse(json.dumps(search_response).encode()) + + # Mock urlopen: capture request URLs to verify archived param is omitted in search + monkeypatch.setattr("urllib.request.urlopen", urlopen_capture) + + importer = GitLabImporter(token="test-token") + options = ImportOptions( + mode=ImportMode.SEARCH, target="test", include_archived=True + ) + list(importer.fetch_repos(options)) + + assert len(captured_urls) == 1 + assert "archived=" not in captured_urls[0], ( + f"Expected no 'archived' param in search URL, got: {captured_urls[0]}" + ) + + +def test_gitlab_parse_repo_null_namespace( + mock_urlopen: t.Callable[..., None], +) -> None: + """Test GitLab _parse_repo handles null namespace without crashing. + + Self-hosted GitLab instances may return ``"namespace": null`` for + system-level projects. The importer must not raise AttributeError. + """ + response_json = [ + { + "path": "my-project", + "name": "my-project", + "http_url_to_repo": "https://gitlab.example.com/my-project.git", + "ssh_url_to_repo": "git@gitlab.example.com:my-project.git", + "web_url": "https://gitlab.example.com/my-project", + "description": "Orphaned project", + "star_count": 0, + "namespace": None, + "path_with_namespace": "my-project", + } + ] + mock_urlopen([(json.dumps(response_json).encode(), {}, 200)]) + importer = GitLabImporter() + options = ImportOptions(mode=ImportMode.USER, target="testuser") + repos = list(importer.fetch_repos(options)) + assert len(repos) == 1 + assert repos[0].name == "my-project" + assert repos[0].owner == "" diff --git a/tests/_internal/remotes/test_pagination_duplicates.py b/tests/_internal/remotes/test_pagination_duplicates.py new file mode 100644 index 000000000..01164cbea --- /dev/null +++ b/tests/_internal/remotes/test_pagination_duplicates.py @@ -0,0 +1,329 @@ +"""Regression tests for pagination duplicate bug. + +The pagination duplicate bug occurs when client-side filtering (excluding forks/archived +repos) causes the per_page/limit parameter to vary between API pages. This causes offset +misalignment because: + +1. Page 1: per_page=10, returns items 0-9 +2. Client-side filtering removes some items, count becomes less than per_page +3. Page 2: per_page=5 (recalculated), API interprets as items 5-9 instead of 10-14 +4. Result: Items 5-9 appear twice (duplicates) + +The fix is to always use a consistent per_page value across all pagination requests. +""" + +from __future__ import annotations + +import json +import typing as t +import urllib.parse +import urllib.request + +import pytest + +from tests._internal.remotes.conftest import MockHTTPResponse +from vcspull._internal.remotes.base import ImportMode, ImportOptions +from vcspull._internal.remotes.gitea import ( + DEFAULT_PER_PAGE as GITEA_DEFAULT_PER_PAGE, + GiteaImporter, +) +from vcspull._internal.remotes.github import ( + DEFAULT_PER_PAGE as GITHUB_DEFAULT_PER_PAGE, + GitHubImporter, +) +from vcspull._internal.remotes.gitlab import ( + DEFAULT_PER_PAGE as GITLAB_DEFAULT_PER_PAGE, + GitLabImporter, +) + + +def _make_github_repo( + name: str, + *, + fork: bool = False, + archived: bool = False, +) -> dict[str, t.Any]: + """Create a GitHub API repo response object.""" + return { + "name": name, + "clone_url": f"https://github.com/testuser/{name}.git", + "ssh_url": f"git@github.com:testuser/{name}.git", + "html_url": f"https://github.com/testuser/{name}", + "description": f"Repo {name}", + "language": "Python", + "topics": [], + "stargazers_count": 10, + "fork": fork, + "archived": archived, + "default_branch": "main", + "owner": {"login": "testuser"}, + } + + +def _make_gitea_repo( + name: str, + *, + fork: bool = False, + archived: bool = False, +) -> dict[str, t.Any]: + """Create a Gitea API repo response object.""" + return { + "name": name, + "clone_url": f"https://codeberg.org/testuser/{name}.git", + "ssh_url": f"git@codeberg.org:testuser/{name}.git", + "html_url": f"https://codeberg.org/testuser/{name}", + "description": f"Repo {name}", + "language": "Python", + "topics": [], + "stars_count": 10, + "fork": fork, + "archived": archived, + "default_branch": "main", + "owner": {"login": "testuser"}, + } + + +def _make_gitlab_repo( + name: str, + *, + fork: bool = False, + archived: bool = False, +) -> dict[str, t.Any]: + """Create a GitLab API project response object.""" + return { + "path": name, + "name": name, + "http_url_to_repo": f"https://gitlab.com/testuser/{name}.git", + "ssh_url_to_repo": f"git@gitlab.com:testuser/{name}.git", + "web_url": f"https://gitlab.com/testuser/{name}", + "description": f"Project {name}", + "topics": [], + "star_count": 10, + "forked_from_project": {"id": 123} if fork else None, + "archived": archived, + "default_branch": "main", + "namespace": {"path": "testuser"}, + } + + +def test_github_pagination_consistent_per_page( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that GitHub pagination uses consistent per_page across all requests. + + When client-side filtering removes items, the per_page parameter should NOT + be recalculated based on remaining count - it should stay constant to maintain + proper pagination offsets. + """ + captured_requests: list[urllib.request.Request] = [] + + # Create page 1 with exactly DEFAULT_PER_PAGE items to force pagination. + # Half regular repos, half forks - forks will be filtered out client-side. + page1_repos = [ + _make_github_repo(f"repo{i}") for i in range(GITHUB_DEFAULT_PER_PAGE // 2) + ] + page1_repos.extend( + _make_github_repo(f"fork{i}", fork=True) + for i in range(GITHUB_DEFAULT_PER_PAGE // 2) + ) + + # Page 2 has more repos + page2_repos = [ + _make_github_repo(f"repo{GITHUB_DEFAULT_PER_PAGE // 2 + i}") for i in range(10) + ] + + responses = [ + ( + json.dumps(page1_repos).encode(), + {"x-ratelimit-remaining": "100", "x-ratelimit-limit": "60"}, + 200, + ), + ( + json.dumps(page2_repos).encode(), + {"x-ratelimit-remaining": "99", "x-ratelimit-limit": "60"}, + 200, + ), + ] + call_count = 0 + + def urlopen_capture( + request: urllib.request.Request, + timeout: int | None = None, + ) -> MockHTTPResponse: + nonlocal call_count + captured_requests.append(request) + body, headers, status = responses[call_count % len(responses)] + call_count += 1 + return MockHTTPResponse(body, headers, status) + + # Mock urlopen: capture paginated requests to verify consistent per_page + monkeypatch.setattr("urllib.request.urlopen", urlopen_capture) + + importer = GitHubImporter() + # Request more repos than page 1 provides after filtering (50 regular repos) + # This forces pagination to continue to page 2 + options = ImportOptions( + mode=ImportMode.USER, + target="testuser", + limit=60, # More than 50 regular repos in page 1 + include_forks=False, # Filter out forks client-side + ) + list(importer.fetch_repos(options)) + + # Extract per_page values from all requests + per_page_values = [] + for req in captured_requests: + parsed = urllib.parse.urlparse(req.full_url) + params = urllib.parse.parse_qs(parsed.query) + if "per_page" in params: + per_page_values.append(int(params["per_page"][0])) + + # All per_page values should be identical (consistent pagination) + assert len(per_page_values) >= 2, "Expected at least 2 API requests" + assert all(v == GITHUB_DEFAULT_PER_PAGE for v in per_page_values), ( + f"Expected all per_page values to be {GITHUB_DEFAULT_PER_PAGE}, " + f"got: {per_page_values}" + ) + + +def test_gitea_pagination_consistent_limit( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that Gitea pagination uses consistent limit across all requests. + + When client-side filtering removes items, the limit parameter should NOT + be recalculated based on remaining count - it should stay constant to maintain + proper pagination offsets. + """ + captured_requests: list[urllib.request.Request] = [] + + # Create page 1 with exactly DEFAULT_PER_PAGE items to force pagination. + # Half regular repos, half forks - forks will be filtered out client-side. + page1_repos = [ + _make_gitea_repo(f"repo{i}") for i in range(GITEA_DEFAULT_PER_PAGE // 2) + ] + page1_repos.extend( + _make_gitea_repo(f"fork{i}", fork=True) + for i in range(GITEA_DEFAULT_PER_PAGE // 2) + ) + + # Page 2 has more repos + page2_repos = [ + _make_gitea_repo(f"repo{GITEA_DEFAULT_PER_PAGE // 2 + i}") for i in range(10) + ] + + responses: list[tuple[bytes, dict[str, str], int]] = [ + (json.dumps(page1_repos).encode(), {}, 200), + (json.dumps(page2_repos).encode(), {}, 200), + ] + call_count = 0 + + def urlopen_capture( + request: urllib.request.Request, + timeout: int | None = None, + ) -> MockHTTPResponse: + nonlocal call_count + captured_requests.append(request) + body, headers, status = responses[call_count % len(responses)] + call_count += 1 + return MockHTTPResponse(body, headers, status) + + # Mock urlopen: capture paginated requests to verify consistent limit param + monkeypatch.setattr("urllib.request.urlopen", urlopen_capture) + + importer = GiteaImporter(base_url="https://codeberg.org") + # Request more repos than page 1 provides after filtering (25 regular repos) + # This forces pagination to continue to page 2 + options = ImportOptions( + mode=ImportMode.USER, + target="testuser", + limit=35, # More than 25 regular repos in page 1 + include_forks=False, # Filter out forks client-side + ) + list(importer.fetch_repos(options)) + + # Extract limit values from all requests + limit_values = [] + for req in captured_requests: + parsed = urllib.parse.urlparse(req.full_url) + params = urllib.parse.parse_qs(parsed.query) + if "limit" in params: + limit_values.append(int(params["limit"][0])) + + # All limit values should be identical (consistent pagination) + assert len(limit_values) >= 2, "Expected at least 2 API requests" + assert all(v == GITEA_DEFAULT_PER_PAGE for v in limit_values), ( + f"Expected all limit values to be {GITEA_DEFAULT_PER_PAGE}, got: {limit_values}" + ) + + +def test_gitlab_pagination_consistent_per_page( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that GitLab pagination uses consistent per_page across all requests. + + When client-side filtering removes items, the per_page parameter should NOT + be recalculated based on remaining count - it should stay constant to maintain + proper pagination offsets. + """ + captured_requests: list[urllib.request.Request] = [] + + # Create page 1 with exactly DEFAULT_PER_PAGE items to force pagination. + # Half regular repos, half forks - forks will be filtered out client-side. + page1_repos = [ + _make_gitlab_repo(f"repo{i}") for i in range(GITLAB_DEFAULT_PER_PAGE // 2) + ] + page1_repos.extend( + _make_gitlab_repo(f"fork{i}", fork=True) + for i in range(GITLAB_DEFAULT_PER_PAGE // 2) + ) + + # Page 2 has more repos + page2_repos = [ + _make_gitlab_repo(f"repo{GITLAB_DEFAULT_PER_PAGE // 2 + i}") for i in range(10) + ] + + responses: list[tuple[bytes, dict[str, str], int]] = [ + (json.dumps(page1_repos).encode(), {}, 200), + (json.dumps(page2_repos).encode(), {}, 200), + ] + call_count = 0 + + def urlopen_capture( + request: urllib.request.Request, + timeout: int | None = None, + ) -> MockHTTPResponse: + nonlocal call_count + captured_requests.append(request) + body, headers, status = responses[call_count % len(responses)] + call_count += 1 + return MockHTTPResponse(body, headers, status) + + # Mock urlopen: capture paginated requests to verify consistent per_page + monkeypatch.setattr("urllib.request.urlopen", urlopen_capture) + + importer = GitLabImporter() + # Request more repos than page 1 provides after filtering (50 regular repos) + # This forces pagination to continue to page 2 + options = ImportOptions( + mode=ImportMode.ORG, + target="testgroup", + limit=60, # More than 50 regular repos in page 1 + include_forks=False, # Filter out forks client-side + ) + list(importer.fetch_repos(options)) + + # Extract per_page values from all requests + per_page_values = [] + for req in captured_requests: + parsed = urllib.parse.urlparse(req.full_url) + params = urllib.parse.parse_qs(parsed.query) + if "per_page" in params: + per_page_values.append(int(params["per_page"][0])) + + # All per_page values should be identical (consistent pagination) + assert len(per_page_values) >= 2, "Expected at least 2 API requests" + assert all(v == GITLAB_DEFAULT_PER_PAGE for v in per_page_values), ( + f"Expected all per_page values to be {GITLAB_DEFAULT_PER_PAGE}, " + f"got: {per_page_values}" + ) diff --git a/tests/cli/test_import_repos.py b/tests/cli/test_import_repos.py new file mode 100644 index 000000000..553d5a667 --- /dev/null +++ b/tests/cli/test_import_repos.py @@ -0,0 +1,1881 @@ +"""Tests for vcspull import command.""" + +from __future__ import annotations + +import json +import logging +import pathlib +import sys +import typing as t + +import pytest + +from vcspull._internal.remotes import ( + AuthenticationError, + ConfigurationError, + ImportOptions, + NotFoundError, + RateLimitError, + RemoteRepo, + ServiceUnavailableError, +) +from vcspull.cli.import_cmd._common import ( + _resolve_config_file, + _run_import, +) +from vcspull.config import save_config_yaml, workspace_root_label + +# Get the actual _common module for monkeypatching +import_common_mod = sys.modules["vcspull.cli.import_cmd._common"] + +if t.TYPE_CHECKING: + from _pytest.monkeypatch import MonkeyPatch + + +def _make_repo( + name: str, + owner: str = "testuser", + stars: int = 10, + language: str = "Python", +) -> RemoteRepo: + """Create a RemoteRepo for testing.""" + return RemoteRepo( + name=name, + clone_url=f"https://github.com/{owner}/{name}.git", + ssh_url=f"git@github.com:{owner}/{name}.git", + html_url=f"https://github.com/{owner}/{name}", + description=f"Test repo {name}", + language=language, + topics=(), + stars=stars, + is_fork=False, + is_archived=False, + default_branch="main", + owner=owner, + ) + + +class MockImporter: + """Reusable mock importer for tests.""" + + def __init__( + self, + *, + service_name: str = "MockService", + repos: list[RemoteRepo] | None = None, + error: Exception | None = None, + ) -> None: + self.service_name = service_name + self._repos = repos or [] + self._error = error + + def fetch_repos( + self, + options: ImportOptions, + ) -> t.Iterator[RemoteRepo]: + """Yield mock repos or raise a mock error.""" + if self._error: + raise self._error + yield from self._repos + + +class ResolveConfigFixture(t.NamedTuple): + """Fixture for _resolve_config_file test cases.""" + + test_id: str + config_path_str: str | None + home_configs: list[str] + expected_suffix: str + + +RESOLVE_CONFIG_FIXTURES: list[ResolveConfigFixture] = [ + ResolveConfigFixture( + test_id="explicit-path-used", + config_path_str="/custom/config.yaml", + home_configs=[], + expected_suffix="config.yaml", + ), + ResolveConfigFixture( + test_id="tilde-expanded", + config_path_str="~/myconfig.yaml", + home_configs=[], + expected_suffix="myconfig.yaml", + ), + ResolveConfigFixture( + test_id="home-config-found", + config_path_str=None, + home_configs=["existing.yaml"], + expected_suffix="existing.yaml", + ), + ResolveConfigFixture( + test_id="default-when-no-home-config", + config_path_str=None, + home_configs=[], + expected_suffix=".vcspull.yaml", + ), + ResolveConfigFixture( + test_id="yml-extension-accepted", + config_path_str="/custom/config.yml", + home_configs=[], + expected_suffix="config.yml", + ), + ResolveConfigFixture( + test_id="json-extension-accepted", + config_path_str="/custom/config.json", + home_configs=[], + expected_suffix="config.json", + ), +] + + +@pytest.mark.parametrize( + list(ResolveConfigFixture._fields), + RESOLVE_CONFIG_FIXTURES, + ids=[f.test_id for f in RESOLVE_CONFIG_FIXTURES], +) +def test_resolve_config_file( + test_id: str, + config_path_str: str | None, + home_configs: list[str], + expected_suffix: str, + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, +) -> None: + """Test _resolve_config_file handles various config scenarios.""" + monkeypatch.setenv("HOME", str(tmp_path)) + + # Create home config files if needed + full_paths = [] + for cfg in home_configs: + cfg_path = tmp_path / cfg + cfg_path.touch() + full_paths.append(cfg_path) + + # Mock find_home_config_files: return pre-created config file paths + # instead of scanning the real home directory + monkeypatch.setattr( + import_common_mod, + "find_home_config_files", + lambda filetype=None: full_paths, + ) + + result = _resolve_config_file(config_path_str) + assert result.name == expected_suffix + + +class ImportReposFixture(t.NamedTuple): + """Fixture for _run_import test cases.""" + + test_id: str + service_name: str + target: str + mode: str + dry_run: bool + yes: bool + output_json: bool + mock_repos: list[RemoteRepo] + mock_error: Exception | None + expected_log_contains: list[str] + expected_config_repos: int + + +IMPORT_REPOS_FIXTURES: list[ImportReposFixture] = [ + ImportReposFixture( + test_id="basic-github-user-dry-run", + service_name="github", + target="testuser", + mode="user", + dry_run=True, + yes=True, + output_json=False, + mock_repos=[_make_repo("repo1"), _make_repo("repo2")], + mock_error=None, + expected_log_contains=["Found 2 repositories", "Dry run complete"], + expected_config_repos=0, + ), + ImportReposFixture( + test_id="github-user-writes-config", + service_name="github", + target="testuser", + mode="user", + dry_run=False, + yes=True, + output_json=False, + mock_repos=[_make_repo("repo1")], + mock_error=None, + expected_log_contains=["Added 1 repositories"], + expected_config_repos=1, + ), + ImportReposFixture( + test_id="no-repos-found", + service_name="github", + target="emptyuser", + mode="user", + dry_run=False, + yes=True, + output_json=False, + mock_repos=[], + mock_error=None, + expected_log_contains=["No repositories found"], + expected_config_repos=0, + ), + ImportReposFixture( + test_id="authentication-error", + service_name="github", + target="testuser", + mode="user", + dry_run=False, + yes=True, + output_json=False, + mock_repos=[], + mock_error=AuthenticationError("Bad credentials"), + expected_log_contains=["Authentication error"], + expected_config_repos=0, + ), + ImportReposFixture( + test_id="rate-limit-error", + service_name="github", + target="testuser", + mode="user", + dry_run=False, + yes=True, + output_json=False, + mock_repos=[], + mock_error=RateLimitError("Rate limit exceeded"), + expected_log_contains=["Rate limit exceeded"], + expected_config_repos=0, + ), + ImportReposFixture( + test_id="not-found-error", + service_name="github", + target="nosuchuser", + mode="user", + dry_run=False, + yes=True, + output_json=False, + mock_repos=[], + mock_error=NotFoundError("User not found"), + expected_log_contains=["Not found"], + expected_config_repos=0, + ), + ImportReposFixture( + test_id="service-unavailable-error", + service_name="github", + target="testuser", + mode="user", + dry_run=False, + yes=True, + output_json=False, + mock_repos=[], + mock_error=ServiceUnavailableError("Server error"), + expected_log_contains=["Service unavailable"], + expected_config_repos=0, + ), + ImportReposFixture( + test_id="configuration-error", + service_name="codecommit", + target="", + mode="user", + dry_run=False, + yes=True, + output_json=False, + mock_repos=[], + mock_error=ConfigurationError("Invalid region"), + expected_log_contains=["Configuration error"], + expected_config_repos=0, + ), + ImportReposFixture( + test_id="gitlab-org-mode", + service_name="gitlab", + target="testgroup", + mode="org", + dry_run=True, + yes=True, + output_json=False, + mock_repos=[_make_repo("group-project")], + mock_error=None, + expected_log_contains=["Found 1 repositories"], + expected_config_repos=0, + ), + ImportReposFixture( + test_id="codeberg-search-mode", + service_name="codeberg", + target="python cli", + mode="search", + dry_run=True, + yes=True, + output_json=False, + mock_repos=[_make_repo("cli-tool", stars=100)], + mock_error=None, + expected_log_contains=["Found 1 repositories"], + expected_config_repos=0, + ), +] + + +@pytest.mark.parametrize( + list(ImportReposFixture._fields), + IMPORT_REPOS_FIXTURES, + ids=[f.test_id for f in IMPORT_REPOS_FIXTURES], +) +def test_import_repos( + test_id: str, + service_name: str, + target: str, + mode: str, + dry_run: bool, + yes: bool, + output_json: bool, + mock_repos: list[RemoteRepo], + mock_error: Exception | None, + expected_log_contains: list[str], + expected_config_repos: int, + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import with various scenarios.""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + monkeypatch.chdir(tmp_path) + + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + + importer = MockImporter(repos=mock_repos, error=mock_error) + + _run_import( + importer, + service_name=service_name, + target=target, + workspace=str(workspace), + mode=mode, + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=dry_run, + yes=yes, + output_json=output_json, + output_ndjson=False, + color="never", + ) + + for expected_text in expected_log_contains: + assert expected_text in caplog.text, ( + f"Expected '{expected_text}' in log, got: {caplog.text}" + ) + + if expected_config_repos > 0: + assert config_file.exists() + import yaml + + with config_file.open() as f: + config = yaml.safe_load(f) + assert config is not None + total_repos = sum(len(repos) for repos in config.values()) + assert total_repos == expected_config_repos + + +def test_import_repos_user_abort( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import aborts when user declines confirmation.""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + + # Mock builtins.input: simulate user typing "n" to decline confirmation + monkeypatch.setattr("builtins.input", lambda _: "n") + # Mock sys.stdin: fake TTY so the confirmation prompt is shown + monkeypatch.setattr( + "sys.stdin", type("FakeTTY", (), {"isatty": lambda self: True})() + ) + + importer = MockImporter(repos=[_make_repo("repo1")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=False, # Require confirmation + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "Aborted by user" in caplog.text + assert not config_file.exists() + + +def test_import_repos_eoferror_aborts( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import aborts gracefully on EOFError from input().""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + + # Mock input() to raise EOFError (e.g., piped stdin) + def raise_eof(_: str) -> str: + raise EOFError + + # Mock builtins.input: simulate EOFError from piped/closed stdin + monkeypatch.setattr("builtins.input", raise_eof) + # Mock sys.stdin: fake TTY so the confirmation prompt path is exercised + monkeypatch.setattr( + "sys.stdin", type("FakeTTY", (), {"isatty": lambda self: True})() + ) + + importer = MockImporter(repos=[_make_repo("repo1")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=False, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "Aborted by user" in caplog.text + assert not config_file.exists() + + +def test_import_repos_non_tty_aborts( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import aborts when stdin is not a TTY.""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + + # Mock sys.stdin: fake non-TTY to test non-interactive abort path + monkeypatch.setattr( + "sys.stdin", type("FakeNonTTY", (), {"isatty": lambda self: False})() + ) + + importer = MockImporter(repos=[_make_repo("repo1")]) + + result = _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=False, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert result == 1, "Non-interactive abort must return non-zero exit code" + assert "Non-interactive mode" in caplog.text + assert not config_file.exists() + + +def test_import_repos_skips_existing( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import skips repositories already in config.""" + import yaml + + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + + # Create existing config with repo1 + existing_config = { + "~/repos/": { + "repo1": {"repo": "git+https://github.com/testuser/repo1.git"}, + } + } + save_config_yaml(config_file, existing_config) + + importer = MockImporter(repos=[_make_repo("repo1"), _make_repo("repo2")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "Added 1 repositories" in caplog.text + assert "Skipped 1 existing" in caplog.text + + with config_file.open() as f: + final_config = yaml.safe_load(f) + + assert "repo1" in final_config["~/repos/"] + assert "repo2" in final_config["~/repos/"] + + +def test_import_repos_all_existing( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import handles all repos already existing.""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + + # Create existing config with repo1 + existing_config = { + "~/repos/": { + "repo1": {"repo": "git+https://github.com/testuser/repo1.git"}, + } + } + save_config_yaml(config_file, existing_config) + + importer = MockImporter(repos=[_make_repo("repo1")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "All repositories already exist" in caplog.text + + +def test_import_repos_json_output( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + capsys: pytest.CaptureFixture[str], +) -> None: + """Test _run_import JSON output format.""" + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + importer = MockImporter(repos=[_make_repo("repo1", stars=50)]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=True, + yes=True, + output_json=True, + output_ndjson=False, + color="never", + ) + + captured = capsys.readouterr() + output = json.loads(captured.out) + + assert isinstance(output, list) + assert len(output) == 1 + assert output[0]["name"] == "repo1" + assert output[0]["stars"] == 50 + + +def test_import_repos_ndjson_output( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + capsys: pytest.CaptureFixture[str], +) -> None: + """Test _run_import NDJSON output format.""" + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + importer = MockImporter(repos=[_make_repo("repo1"), _make_repo("repo2")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=True, + yes=True, + output_json=False, + output_ndjson=True, + color="never", + ) + + captured = capsys.readouterr() + lines = captured.out.strip().split("\n") + + assert len(lines) == 2 + assert json.loads(lines[0])["name"] == "repo1" + assert json.loads(lines[1])["name"] == "repo2" + + +def test_import_repos_topics_filter( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import passes topics filter correctly.""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + received_options: list[ImportOptions] = [] + + class CapturingImporter: + service_name = "MockService" + + def fetch_repos( + self, + options: ImportOptions, + ) -> t.Iterator[RemoteRepo]: + received_options.append(options) + return iter([]) + + _run_import( + CapturingImporter(), + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language="Python", + topics="cli,tool,python", + min_stars=50, + include_archived=True, + include_forks=True, + limit=200, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=True, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert len(received_options) == 1 + opts = received_options[0] + assert opts.language == "Python" + assert opts.topics == ["cli", "tool", "python"] + assert opts.min_stars == 50 + assert opts.include_archived is True + assert opts.include_forks is True + assert opts.limit == 200 + + +def test_import_repos_codecommit_no_target_required( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import allows empty target for codecommit.""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + importer = MockImporter( + service_name="CodeCommit", + repos=[_make_repo("aws-repo")], + ) + + _run_import( + importer, + service_name="codecommit", + target="", # Empty target is OK for CodeCommit + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=True, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + # Should succeed and find repos + assert "Found 1 repositories" in caplog.text + # Should NOT have target required error + assert "TARGET is required" not in caplog.text + + +def test_import_repos_many_repos_truncates_preview( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import shows '...and X more' when many repos.""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + # Create 15 repos + many_repos = [_make_repo(f"repo{i}") for i in range(15)] + + importer = MockImporter(repos=many_repos) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=True, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "Found 15 repositories" in caplog.text + assert "and 5 more" in caplog.text + + +def test_import_repos_config_load_error( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import handles config load errors.""" + caplog.set_level(logging.ERROR) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + # Create an invalid YAML file + config_file = tmp_path / ".vcspull.yaml" + config_file.write_text("invalid: yaml: content: [", encoding="utf-8") + + importer = MockImporter(repos=[_make_repo("repo1")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "Error loading config" in caplog.text + + +def test_import_no_args_shows_help(capsys: pytest.CaptureFixture[str]) -> None: + """Test that 'vcspull import' without args shows help.""" + from vcspull.cli import cli + + cli(["import"]) + + captured = capsys.readouterr() + assert "usage: vcspull import" in captured.out + assert "Import repositories from remote services" in captured.out + + +def test_import_repos_defaults_to_ssh_urls( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import writes SSH URLs to config by default.""" + import yaml + + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + + importer = MockImporter(repos=[_make_repo("myrepo")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert config_file.exists() + with config_file.open() as f: + config = yaml.safe_load(f) + + repo_url = config["~/repos/"]["myrepo"]["repo"] + assert repo_url == "git+git@github.com:testuser/myrepo.git" + + +def test_import_repos_https_flag( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import writes HTTPS URLs when use_https=True.""" + import yaml + + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + + importer = MockImporter(repos=[_make_repo("myrepo")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + use_https=True, + ) + + assert config_file.exists() + with config_file.open() as f: + config = yaml.safe_load(f) + + repo_url = config["~/repos/"]["myrepo"]["repo"] + assert repo_url == "git+https://github.com/testuser/myrepo.git" + + +def test_import_https_flag_via_cli() -> None: + """Test that --https flag is recognized by the CLI parser.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + args = parser.parse_args( + ["import", "github", "testuser", "-w", "/tmp/repos", "--https"] + ) + assert args.use_https is True + + +def test_import_ssh_default_via_cli() -> None: + """Test that SSH is the default (no --https flag).""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + args = parser.parse_args(["import", "github", "testuser", "-w", "/tmp/repos"]) + assert args.use_https is False + + +def test_import_flatten_groups_flag_via_cli() -> None: + """Test that --flatten-groups flag is recognized by the GitLab subparser.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + args = parser.parse_args( + ["import", "gitlab", "group/subgroup", "-w", "/tmp/repos", "--flatten-groups"] + ) + assert args.flatten_groups is True + + +def test_import_repos_rejects_unsupported_config_type( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import rejects unsupported config file types.""" + caplog.set_level(logging.ERROR) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + importer = MockImporter(repos=[_make_repo("repo1")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(tmp_path / "config.toml"), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "Unsupported config file type" in caplog.text + + +def test_import_repos_catches_multiple_config_warning( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import logs error instead of crashing on MultipleConfigWarning.""" + from vcspull.exc import MultipleConfigWarning + + caplog.set_level(logging.ERROR) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + importer = MockImporter(repos=[_make_repo("repo1")]) + + # Mock _resolve_config_file: raise MultipleConfigWarning to test error handling + def raise_multiple_config(_: str | None) -> pathlib.Path: + raise MultipleConfigWarning(MultipleConfigWarning.message) + + monkeypatch.setattr( + import_common_mod, + "_resolve_config_file", + raise_multiple_config, + ) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=None, + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "Multiple configs" in caplog.text + + +def test_import_repos_invalid_limit( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import logs error for invalid limit (e.g. 0).""" + caplog.set_level(logging.ERROR) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + importer = MockImporter(repos=[_make_repo("repo1")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=0, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "limit must be >= 1" in caplog.text + + +def test_import_repos_returns_nonzero_on_error( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import returns non-zero exit code on error.""" + caplog.set_level(logging.ERROR) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + importer = MockImporter(error=AuthenticationError("Bad credentials")) + + result = _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert result != 0 + + +def test_import_repos_returns_zero_on_success( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import returns 0 on success.""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + importer = MockImporter(repos=[_make_repo("repo1")]) + + result = _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert result == 0 + + +def test_import_repos_json_config_write( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import writes valid JSON when config path has .json extension.""" + caplog.set_level(logging.INFO) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.json" + + importer = MockImporter(repos=[_make_repo("repo1")]) + + result = _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert result == 0 + assert config_file.exists() + loaded = json.loads(config_file.read_text(encoding="utf-8")) + assert isinstance(loaded, dict) + total_repos = sum(len(repos) for repos in loaded.values()) + assert total_repos == 1 + + +def test_import_repos_rejects_non_dict_config( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import rejects config that is a YAML list instead of dict.""" + caplog.set_level(logging.ERROR) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + # Write a YAML list instead of a mapping + config_file.write_text("- item1\n- item2\n", encoding="utf-8") + + importer = MockImporter(repos=[_make_repo("repo1")]) + + _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert "not a valid mapping" in caplog.text + + +def test_import_repos_non_mapping_workspace_returns_error( + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test _run_import returns non-zero when a workspace section is not a mapping.""" + caplog.set_level(logging.ERROR) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + config_file = tmp_path / ".vcspull.yaml" + # Workspace section is a string, not a mapping + label = workspace_root_label(workspace, cwd=pathlib.Path.cwd(), home=tmp_path) + config_file.write_text(f"{label}: invalid_string\n", encoding="utf-8") + + importer = MockImporter(repos=[_make_repo("repo1")]) + + result = _run_import( + importer, + service_name="github", + target="testuser", + workspace=str(workspace), + mode="user", + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + assert result == 1 + assert "not a mapping in config" in caplog.text + + +class NestedGroupImportFixture(t.NamedTuple): + """Fixture for nested-group workspace persistence cases.""" + + test_id: str + target: str + mode: str + flatten_groups: bool + workspace_relpath: str + mock_repos: list[RemoteRepo] + expected_sections: dict[str, tuple[str, ...]] + + +NESTED_GROUP_IMPORT_FIXTURES: list[NestedGroupImportFixture] = [ + NestedGroupImportFixture( + test_id="comment-example-relative-subpaths", + target="a/b", + mode="org", + flatten_groups=False, + workspace_relpath="repos", + mock_repos=[ + _make_repo("h", owner="a/b"), + _make_repo("d", owner="a/b/c"), + _make_repo("e", owner="a/b/c"), + _make_repo("g", owner="a/b/f"), + ], + expected_sections={ + "": ("h",), + "c": ("d", "e"), + "f": ("g",), + }, + ), + NestedGroupImportFixture( + test_id="deep-nesting-under-target", + target="a/b", + mode="org", + flatten_groups=False, + workspace_relpath="repos", + mock_repos=[ + _make_repo("r1", owner="a/b/c/d"), + _make_repo("r2", owner="a/b/c/d/e"), + ], + expected_sections={ + "c/d": ("r1",), + "c/d/e": ("r2",), + }, + ), + NestedGroupImportFixture( + test_id="non-org-mode-no-subpathing", + target="a/b", + mode="user", + flatten_groups=False, + workspace_relpath="repos", + mock_repos=[ + _make_repo("h", owner="a/b"), + _make_repo("d", owner="a/b/c"), + _make_repo("g", owner="a/b/f"), + ], + expected_sections={ + "": ("h", "d", "g"), + }, + ), + NestedGroupImportFixture( + test_id="owner-outside-target-fallback-base", + target="a/b", + mode="org", + flatten_groups=False, + workspace_relpath="repos", + mock_repos=[ + _make_repo("inside", owner="a/b/c"), + _make_repo("outside", owner="z/y"), + ], + expected_sections={ + "c": ("inside",), + "": ("outside",), + }, + ), + NestedGroupImportFixture( + test_id="traversal-in-owner-flattened-to-base", + target="a/b", + mode="org", + flatten_groups=False, + workspace_relpath="repos", + mock_repos=[ + _make_repo("evil", owner="a/b/../../escape"), + _make_repo("safe", owner="a/b/c"), + ], + expected_sections={ + "": ("evil",), + "c": ("safe",), + }, + ), + NestedGroupImportFixture( + test_id="flatten-groups-flag-uses-single-workspace", + target="a/b", + mode="org", + flatten_groups=True, + workspace_relpath="repos", + mock_repos=[ + _make_repo("h", owner="a/b"), + _make_repo("d", owner="a/b/c"), + _make_repo("g", owner="a/b/f"), + ], + expected_sections={ + "": ("h", "d", "g"), + }, + ), + NestedGroupImportFixture( + test_id="workspace-subdirectory-root-is-supported", + target="a/b", + mode="org", + flatten_groups=False, + workspace_relpath="projects/python", + mock_repos=[ + _make_repo("h", owner="a/b"), + _make_repo("d", owner="a/b/c"), + ], + expected_sections={ + "": ("h",), + "c": ("d",), + }, + ), +] + + +@pytest.mark.parametrize( + list(NestedGroupImportFixture._fields), + NESTED_GROUP_IMPORT_FIXTURES, + ids=[fixture.test_id for fixture in NESTED_GROUP_IMPORT_FIXTURES], +) +def test_import_nested_groups( + test_id: str, + target: str, + mode: str, + flatten_groups: bool, + workspace_relpath: str, + mock_repos: list[RemoteRepo], + expected_sections: dict[str, tuple[str, ...]], + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test that nested groups are preserved in config.""" + import yaml + + del test_id + caplog.set_level(logging.INFO) + monkeypatch.setenv("HOME", str(tmp_path)) + + workspace = tmp_path / workspace_relpath + workspace.mkdir(parents=True) + config_file = tmp_path / ".vcspull.yaml" + + importer = MockImporter(service_name="GitLab", repos=mock_repos) + + _run_import( + importer, + service_name="gitlab", + target=target, + workspace=str(workspace), + mode=mode, + language=None, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(config_file), + dry_run=False, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + flatten_groups=flatten_groups, + ) + + assert config_file.exists() + with config_file.open() as f: + config = yaml.safe_load(f) + + cwd = pathlib.Path.cwd() + home = pathlib.Path.home() + expected_labels: dict[str, tuple[str, ...]] = {} + for subpath, repo_names in expected_sections.items(): + expected_path = workspace if not subpath else workspace / subpath + label = workspace_root_label(expected_path, cwd=cwd, home=home) + expected_labels[label] = repo_names + + assert set(config.keys()) == set(expected_labels.keys()) + for label, expected_repo_names in expected_labels.items(): + assert isinstance(config[label], dict) + assert set(config[label].keys()) == set(expected_repo_names) + + +class LanguageWarningFixture(t.NamedTuple): + """Fixture for --language warning test cases.""" + + test_id: str + service_name: str + language: str | None + expect_warning: bool + + +LANGUAGE_WARNING_FIXTURES: list[LanguageWarningFixture] = [ + LanguageWarningFixture( + test_id="gitlab-with-language-warns", + service_name="gitlab", + language="Python", + expect_warning=True, + ), + LanguageWarningFixture( + test_id="codecommit-with-language-warns", + service_name="codecommit", + language="Python", + expect_warning=True, + ), + LanguageWarningFixture( + test_id="github-with-language-no-warning", + service_name="github", + language="Python", + expect_warning=False, + ), + LanguageWarningFixture( + test_id="gitlab-without-language-no-warning", + service_name="gitlab", + language=None, + expect_warning=False, + ), +] + + +@pytest.mark.parametrize( + list(LanguageWarningFixture._fields), + LANGUAGE_WARNING_FIXTURES, + ids=[f.test_id for f in LANGUAGE_WARNING_FIXTURES], +) +def test_import_repos_language_warning( + test_id: str, + service_name: str, + language: str | None, + expect_warning: bool, + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test that --language warns for services without language metadata.""" + caplog.set_level(logging.WARNING) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + display_name = {"gitlab": "GitLab", "codecommit": "CodeCommit"}.get( + service_name, "GitHub" + ) + importer = MockImporter(service_name=display_name) + + _run_import( + importer, + service_name=service_name, + target="testuser" if service_name != "codecommit" else "", + workspace=str(workspace), + mode="user", + language=language, + topics=None, + min_stars=0, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=True, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + if expect_warning: + assert "does not return language metadata" in caplog.text + else: + assert "does not return language metadata" not in caplog.text + + +class UnsupportedFilterFixture(t.NamedTuple): + """Fixture for unsupported CodeCommit filter warning test cases.""" + + test_id: str + service_name: str + topics: str | None + min_stars: int + expect_topics_warning: bool + expect_stars_warning: bool + + +UNSUPPORTED_FILTER_FIXTURES: list[UnsupportedFilterFixture] = [ + UnsupportedFilterFixture( + test_id="codecommit-with-topics-warns", + service_name="codecommit", + topics="python,cli", + min_stars=0, + expect_topics_warning=True, + expect_stars_warning=False, + ), + UnsupportedFilterFixture( + test_id="codecommit-with-min-stars-warns", + service_name="codecommit", + topics=None, + min_stars=10, + expect_topics_warning=False, + expect_stars_warning=True, + ), + UnsupportedFilterFixture( + test_id="codecommit-with-both-warns", + service_name="codecommit", + topics="python", + min_stars=5, + expect_topics_warning=True, + expect_stars_warning=True, + ), + UnsupportedFilterFixture( + test_id="github-with-topics-no-warning", + service_name="github", + topics="python,cli", + min_stars=10, + expect_topics_warning=False, + expect_stars_warning=False, + ), +] + + +@pytest.mark.parametrize( + list(UnsupportedFilterFixture._fields), + UNSUPPORTED_FILTER_FIXTURES, + ids=[f.test_id for f in UNSUPPORTED_FILTER_FIXTURES], +) +def test_import_repos_unsupported_filter_warning( + test_id: str, + service_name: str, + topics: str | None, + min_stars: int, + expect_topics_warning: bool, + expect_stars_warning: bool, + tmp_path: pathlib.Path, + monkeypatch: MonkeyPatch, + caplog: pytest.LogCaptureFixture, +) -> None: + """Test that --topics/--min-stars warn for CodeCommit.""" + caplog.set_level(logging.WARNING) + + monkeypatch.setenv("HOME", str(tmp_path)) + workspace = tmp_path / "repos" + workspace.mkdir() + + display_name = "CodeCommit" if service_name == "codecommit" else "GitHub" + importer = MockImporter(service_name=display_name) + + _run_import( + importer, + service_name=service_name, + target="testuser" if service_name != "codecommit" else "", + workspace=str(workspace), + mode="user", + language=None, + topics=topics, + min_stars=min_stars, + include_archived=False, + include_forks=False, + limit=100, + config_path_str=str(tmp_path / "config.yaml"), + dry_run=True, + yes=True, + output_json=False, + output_ndjson=False, + color="never", + ) + + if expect_topics_warning: + assert "does not support topic filtering" in caplog.text + else: + assert "does not support topic filtering" not in caplog.text + + if expect_stars_warning: + assert "does not track star counts" in caplog.text + else: + assert "does not track star counts" not in caplog.text + + +# ── New tests for per-service subparser architecture ── + + +def test_alias_parsing_gh() -> None: + """Test that 'import gh' resolves the same as 'import github'.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + args = parser.parse_args(["import", "gh", "myuser", "-w", "/tmp/repos"]) + assert args.import_service in ("github", "gh") + assert hasattr(args, "import_handler") + + +def test_alias_parsing_gl() -> None: + """Test that 'import gl' resolves the same as 'import gitlab'.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + args = parser.parse_args(["import", "gl", "myuser", "-w", "/tmp/repos"]) + assert args.import_service in ("gitlab", "gl") + assert hasattr(args, "import_handler") + + +def test_alias_parsing_cb() -> None: + """Test that 'import cb' resolves the same as 'import codeberg'.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + args = parser.parse_args(["import", "cb", "myuser", "-w", "/tmp/repos"]) + assert args.import_service in ("codeberg", "cb") + assert hasattr(args, "import_handler") + + +def test_alias_parsing_cc() -> None: + """Test that 'import cc' resolves the same as 'import codecommit'.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + args = parser.parse_args(["import", "cc", "-w", "/tmp/repos"]) + assert args.import_service in ("codecommit", "cc") + assert hasattr(args, "import_handler") + + +def test_alias_parsing_aws() -> None: + """Test that 'import aws' resolves the same as 'import codecommit'.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + args = parser.parse_args(["import", "aws", "-w", "/tmp/repos"]) + assert args.import_service in ("codecommit", "aws") + assert hasattr(args, "import_handler") + + +def test_flatten_groups_only_on_gitlab() -> None: + """Test that --flatten-groups is only available on the gitlab subparser.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + + # Should work for gitlab + args = parser.parse_args( + ["import", "gitlab", "mygroup", "-w", "/tmp/repos", "--flatten-groups"] + ) + assert args.flatten_groups is True + + # Should fail for github + with pytest.raises(SystemExit): + parser.parse_args( + ["import", "github", "myuser", "-w", "/tmp/repos", "--flatten-groups"] + ) + + +def test_region_only_on_codecommit() -> None: + """Test that --region is only available on the codecommit subparser.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + + # Should work for codecommit + args = parser.parse_args( + ["import", "codecommit", "-w", "/tmp/repos", "--region", "us-east-1"] + ) + assert args.region == "us-east-1" + + # Should fail for github + with pytest.raises(SystemExit): + parser.parse_args( + ["import", "github", "myuser", "-w", "/tmp/repos", "--region", "us-east-1"] + ) + + +def test_url_required_for_gitea() -> None: + """Test that --url is required for the gitea subparser.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + + # Should fail without --url + with pytest.raises(SystemExit): + parser.parse_args(["import", "gitea", "myuser", "-w", "/tmp/repos"]) + + # Should work with --url + args = parser.parse_args( + [ + "import", + "gitea", + "myuser", + "-w", + "/tmp/repos", + "--url", + "https://git.example.com", + ] + ) + assert args.base_url == "https://git.example.com" + + +def test_url_required_for_forgejo() -> None: + """Test that --url is required for the forgejo subparser.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + + # Should fail without --url + with pytest.raises(SystemExit): + parser.parse_args(["import", "forgejo", "myuser", "-w", "/tmp/repos"]) + + # Should work with --url + args = parser.parse_args( + [ + "import", + "forgejo", + "myuser", + "-w", + "/tmp/repos", + "--url", + "https://forgejo.example.com", + ] + ) + assert args.base_url == "https://forgejo.example.com" + + +def test_codecommit_target_is_optional() -> None: + """Test that target is optional for the codecommit subparser.""" + from vcspull.cli import create_parser + + parser = create_parser(return_subparsers=False) + + # Should work without target + args = parser.parse_args(["import", "codecommit", "-w", "/tmp/repos"]) + assert args.target == "" + + # Should work with target + args = parser.parse_args(["import", "codecommit", "myprefix", "-w", "/tmp/repos"]) + assert args.target == "myprefix" diff --git a/tests/cli/test_plan_output_helpers.py b/tests/cli/test_plan_output_helpers.py index facbc89fc..c4f691a1e 100644 --- a/tests/cli/test_plan_output_helpers.py +++ b/tests/cli/test_plan_output_helpers.py @@ -159,6 +159,17 @@ def test_plan_summary_to_payload( assert "duration_ms" not in payload +def test_output_formatter_json_mode_empty_buffer_emits_empty_array() -> None: + """OutputFormatter should emit an empty JSON array when buffer has no items.""" + formatter = OutputFormatter(mode=OutputMode.JSON) + captured = io.StringIO() + with redirect_stdout(captured): + formatter.finalize() + + output = json.loads(captured.getvalue()) + assert output == [] + + def test_output_formatter_json_mode_finalises_buffer() -> None: """OutputFormatter should flush buffered JSON payloads on finalize.""" entry = PlanEntry( diff --git a/tests/test_config_file.py b/tests/test_config_file.py index ed59ca3f9..6a30b1fa7 100644 --- a/tests/test_config_file.py +++ b/tests/test_config_file.py @@ -210,6 +210,40 @@ def test_multiple_config_files_raises_exception(tmp_path: pathlib.Path) -> None: config.find_home_config_files() +def test_find_home_config_files_filetype_yaml_only(tmp_path: pathlib.Path) -> None: + """When filetype=['yaml'], only .yaml is returned even if .json exists.""" + (tmp_path / ".vcspull.yaml").touch() + (tmp_path / ".vcspull.json").touch() + with EnvironmentVarGuard() as env: + env.set("HOME", str(tmp_path)) + # Should NOT raise MultipleConfigWarning because json is filtered out + results = config.find_home_config_files(filetype=["yaml"]) + assert len(results) == 1 + assert results[0].suffix == ".yaml" + + +def test_find_home_config_files_filetype_json_only(tmp_path: pathlib.Path) -> None: + """When filetype=['json'], only .json is returned even if .yaml exists.""" + (tmp_path / ".vcspull.yaml").touch() + (tmp_path / ".vcspull.json").touch() + with EnvironmentVarGuard() as env: + env.set("HOME", str(tmp_path)) + results = config.find_home_config_files(filetype=["json"]) + assert len(results) == 1 + assert results[0].suffix == ".json" + + +def test_find_home_config_files_both_types_still_raises( + tmp_path: pathlib.Path, +) -> None: + """Default filetype still raises MultipleConfigWarning when both exist.""" + (tmp_path / ".vcspull.yaml").touch() + (tmp_path / ".vcspull.json").touch() + with EnvironmentVarGuard() as env, pytest.raises(exc.MultipleConfigWarning): + env.set("HOME", str(tmp_path)) + config.find_home_config_files() + + def test_in_dir( config_path: pathlib.Path, yaml_config: pathlib.Path, diff --git a/tests/test_config_writer.py b/tests/test_config_writer.py index fe4f547b4..fbc7c7fb5 100644 --- a/tests/test_config_writer.py +++ b/tests/test_config_writer.py @@ -1,4 +1,4 @@ -"""Tests for duplicate-preserving config writer utilities.""" +"""Tests for config writer utilities.""" from __future__ import annotations @@ -7,7 +7,11 @@ import pytest -from vcspull.config import save_config_yaml_with_items +from vcspull.config import ( + save_config_json, + save_config_yaml, + save_config_yaml_with_items, +) if t.TYPE_CHECKING: import pathlib @@ -54,3 +58,116 @@ def test_save_config_yaml_with_items_preserves_duplicate_sections( yaml_text = config_path.read_text(encoding="utf-8") assert yaml_text == expected_yaml + + +def test_save_config_yaml_atomic_write( + tmp_path: pathlib.Path, +) -> None: + """Test that save_config_yaml uses atomic write (no temp files left).""" + config_path = tmp_path / ".vcspull.yaml" + data = {"~/code/": {"myrepo": {"repo": "git+https://example.com/repo.git"}}} + + save_config_yaml(config_path, data) + + # File should exist with correct content + assert config_path.exists() + content = config_path.read_text(encoding="utf-8") + assert "myrepo" in content + + # No temp files should be left in the directory + tmp_files = [f for f in tmp_path.iterdir() if f.name.startswith(".")] + assert tmp_files == [config_path] + + +def test_save_config_yaml_atomic_preserves_permissions( + tmp_path: pathlib.Path, +) -> None: + """Test that save_config_yaml preserves original file permissions.""" + config_path = tmp_path / ".vcspull.yaml" + config_path.write_text("~/code/: {}\n", encoding="utf-8") + config_path.chmod(0o644) + + data = {"~/code/": {"myrepo": {"repo": "git+https://example.com/repo.git"}}} + save_config_yaml(config_path, data) + + assert config_path.stat().st_mode & 0o777 == 0o644 + + +def test_save_config_yaml_atomic_preserves_existing_on_error( + tmp_path: pathlib.Path, + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Test that existing config is preserved if atomic write fails.""" + config_path = tmp_path / ".vcspull.yaml" + original_content = ( + "~/code/:\n existing: {repo: git+https://example.com/repo.git}\n" + ) + config_path.write_text(original_content, encoding="utf-8") + + # Mock Path.replace to simulate a failure after temp file is written + disk_error_msg = "Simulated disk error" + + import pathlib as _pathlib + + def failing_replace(self: _pathlib.Path, target: t.Any) -> _pathlib.Path: + raise OSError(disk_error_msg) + + monkeypatch.setattr(_pathlib.Path, "replace", failing_replace) + + data = {"~/new/": {"newrepo": {"repo": "git+https://example.com/new.git"}}} + with pytest.raises(OSError, match="Simulated disk error"): + save_config_yaml(config_path, data) + + # Original file should be untouched + assert config_path.read_text(encoding="utf-8") == original_content + + # No temp files should remain + tmp_files = [ + f for f in tmp_path.iterdir() if f.name.startswith(".") and f != config_path + ] + assert tmp_files == [] + + +def test_save_config_json_write_and_readback( + tmp_path: pathlib.Path, +) -> None: + """Test that save_config_json writes valid JSON that round-trips.""" + import json + + config_path = tmp_path / ".vcspull.json" + data = {"~/code/": {"myrepo": {"repo": "git+https://example.com/repo.git"}}} + + save_config_json(config_path, data) + + assert config_path.exists() + loaded = json.loads(config_path.read_text(encoding="utf-8")) + assert loaded == data + + +def test_save_config_json_atomic_write( + tmp_path: pathlib.Path, +) -> None: + """Test that save_config_json uses atomic write (no temp files left).""" + config_path = tmp_path / ".vcspull.json" + data = {"~/code/": {"myrepo": {"repo": "git+https://example.com/repo.git"}}} + + save_config_json(config_path, data) + + assert config_path.exists() + # No stray temp files should be left in the directory + tmp_files = [f for f in tmp_path.iterdir() if f.name.startswith(".")] + assert tmp_files == [config_path] + + +def test_save_config_json_atomic_preserves_permissions( + tmp_path: pathlib.Path, +) -> None: + """Test that save_config_json preserves original file permissions.""" + config_path = tmp_path / ".vcspull.json" + config_path.write_text("{}", encoding="utf-8") + config_path.chmod(0o644) + + data = {"~/code/": {"myrepo": {"repo": "git+https://example.com/repo.git"}}} + save_config_json(config_path, data) + + assert config_path.stat().st_mode & 0o777 == 0o644 diff --git a/tests/test_log.py b/tests/test_log.py index 1ba779bc0..2ccf26ce6 100644 --- a/tests/test_log.py +++ b/tests/test_log.py @@ -432,6 +432,14 @@ def test_get_cli_logger_names_includes_base() -> None: "vcspull.cli.add", "vcspull.cli.discover", "vcspull.cli.fmt", + "vcspull.cli.import_cmd", + "vcspull.cli.import_cmd._common", + "vcspull.cli.import_cmd.codeberg", + "vcspull.cli.import_cmd.codecommit", + "vcspull.cli.import_cmd.forgejo", + "vcspull.cli.import_cmd.gitea", + "vcspull.cli.import_cmd.github", + "vcspull.cli.import_cmd.gitlab", "vcspull.cli.list", "vcspull.cli.search", "vcspull.cli.status",