Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
<!--
SPDX-FileCopyrightText: Contributors to PyPSA-Eur <https://github.com/pypsa/pypsa-eur>
SPDX-License-Identifier: CC-BY-4.0
-->

# Changelog

## Unreleased — **Breaking change**

This release accepts all HTTP(S) URLs and therefore conflicts with `snakemake-storage-plugin-http`.
**You must uninstall `snakemake-storage-plugin-http`** before upgrading, otherwise Snakemake will
raise *"Multiple suitable storage providers found"* for any HTTP(S) URL.

### Added

- Generic HTTP(S) fallback: any `http://` or `https://` URL is now accepted, with size and
mtime read from `Content-Length` and `Last-Modified` response headers. Servers that do not
support `HEAD` requests are handled gracefully (size and mtime default to 0). No checksum
is available for generic URLs.

### Removed

- Dependency on `snakemake-storage-plugin-http` — this plugin now handles all HTTP(S) URLs
directly, with no monkey-patching required.

## v0.4.0 — Google Cloud Storage support

### Added

- Support for `storage.googleapis.com` URLs with checksum verification via the GCS JSON API
(`md5Hash` field) and mtime from GCS object metadata.

## v0.3.0 — data.pypsa.org support

### Added

- Support for `data.pypsa.org` URLs with checksum verification via `manifest.yaml` files
discovered by searching up the directory tree.
- Redirect support: manifest entries can specify a `redirect` field to point to another path.

## v0.2.0 — Dynamic versioning and zstd support

### Added

- Dynamic versioning via `setuptools-scm`.
- `zstandard` dependency for decompressing Cloudflare-compressed responses.

## v0.1.0 — Initial release

### Added

- Snakemake storage plugin for Zenodo URLs (`zenodo.org`, `sandbox.zenodo.org`) with:
- Local filesystem caching via `Cache` class
- Checksum verification from Zenodo API
- Adaptive rate limiting using `X-RateLimit-*` headers with exponential backoff retry
- Concurrent download limiting via semaphore
- Progress bars with `tqdm-loggable`
37 changes: 23 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ A Snakemake storage plugin for downloading files via HTTP with local caching, ch
- **zenodo.org** - Zenodo data repository (checksum from API)
- **data.pypsa.org** - PyPSA data repository (checksum from manifest.yaml)
- **storage.googleapis.com** - Google Cloud Storage (checksum from GCS JSON API)
- **any http(s) URL** - Generic fallback with size/mtime from HTTP headers

## Features

Expand All @@ -19,7 +20,7 @@ A Snakemake storage plugin for downloading files via HTTP with local caching, ch
- **Rate limit handling**: Automatically respects Zenodo's rate limits using `X-RateLimit-*` headers with exponential backoff retry
- **Concurrent download control**: Limits simultaneous downloads to prevent overwhelming servers
- **Progress bars**: Shows download progress with tqdm
- **Immutable URLs**: Returns mtime=0 for Zenodo and data.pypsa.org (persistent URLs); uses actual mtime for GCS
- **Immutable URLs**: Returns mtime=0 for Zenodo and data.pypsa.org (persistent URLs); uses actual mtime for GCS and generic HTTP
- **Environment variable support**: Configure via environment variables for CI/CD workflows

## Installation
Expand Down Expand Up @@ -67,7 +68,7 @@ If you don't explicitly configure it, the plugin will use default settings autom

## Usage

Use Zenodo, data.pypsa.org, or Google Cloud Storage URLs directly in your rules. Snakemake automatically detects supported URLs and routes them to this plugin:
Use any HTTP(S) URL directly in your rules. Snakemake automatically routes all HTTP(S) URLs to this plugin:

```python
rule download_zenodo:
Expand All @@ -93,6 +94,14 @@ rule download_gcs:
"resources/cba_projects.zip"
shell:
"cp {input} {output}"

rule download_generic:
input:
storage("https://example.com/data/dataset.csv"),
output:
"resources/dataset.csv"
shell:
"cp {input} {output}"
```

Or if you configured a tagged storage entity:
Expand All @@ -116,7 +125,7 @@ The plugin will:
- Progress bar showing download status
- Automatic rate limit handling with exponential backoff retry
- Concurrent download limiting
- Checksum verification (from Zenodo API, data.pypsa.org manifest, or GCS metadata)
- Checksum verification where available (Zenodo API, data.pypsa.org manifest, GCS metadata)
4. Store in cache for future use (if caching is enabled)

### Example: CI/CD Configuration
Expand Down Expand Up @@ -148,19 +157,19 @@ The plugin automatically:

## URL Handling

- Handles URLs from `zenodo.org`, `sandbox.zenodo.org`, `data.pypsa.org`, and `storage.googleapis.com`
- Other HTTP(S) URLs are handled by the standard `snakemake-storage-plugin-http`
- Both plugins can coexist in the same workflow

### Plugin Priority
This plugin accepts **all HTTP(S) URLs** and replaces `snakemake-storage-plugin-http`. It provides
enhanced support for specific sources:

When using `storage()` without specifying a plugin name, Snakemake checks all installed plugins:
- **Cached HTTP plugin**: Only accepts zenodo.org, data.pypsa.org, and storage.googleapis.com URLs
- **HTTP plugin**: Accepts all HTTP/HTTPS URLs (including zenodo.org)
| Source | Checksum | mtime | Immutable |
|---|---|---|---|
| `zenodo.org`, `sandbox.zenodo.org` | ✓ (from API) | — | ✓ |
| `data.pypsa.org` | ✓ (from manifest.yaml) | — | ✓ |
| `storage.googleapis.com` | ✓ (from GCS API) | ✓ | — |
| any other HTTP(S) | — | ✓ (Last-Modified) | — |

If both plugins are installed, supported URLs would be ambiguous - both plugins accept them.
Typically snakemake would raise an error: **"Multiple suitable storage providers found"** if you try to use `storage()` without specifying which plugin to use, ie. one needs to explicitly call the Cached HTTP provider using `storage.cached_http(url)` instead of `storage(url)`,
but we monkey-patch the http plugin to refuse zenodo.org, data.pypsa.org, and storage.googleapis.com URLs.
Generic HTTP URLs are treated as mutable: size and mtime are read from `Content-Length` and
`Last-Modified` response headers. Servers that do not support `HEAD` requests are handled
gracefully (size and mtime default to 0).

## License

Expand Down
5 changes: 2 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,9 @@ dependencies = [
"httpx ~= 0.27",
"platformdirs ~= 4.0",
"reretry ~= 0.11",
"snakemake-interface-common ~= 1.14",
"snakemake-interface-common >=1.14,<2.0",
"snakemake-interface-storage-plugins >=4.2,<5.0",
"snakemake-storage-plugin-http ~= 0.3",
"tqdm-loggable ~= 0.2",
"tqdm-loggable ~= 0.3",
"typing-extensions ~= 4.15",
"zstandard ~=0.25.0",
]
Expand Down
Loading