From 7186cc29deaa4be9f065416afeb573ae47570141 Mon Sep 17 00:00:00 2001 From: schneiva Date: Wed, 3 Dec 2025 14:47:29 +0100 Subject: [PATCH 1/3] docs(guides): add crypt4GH_proTES and SPE deployment tutorial Signed-off-by: schneiva --- docs/guides/guide-admin/crypt4gh_to_protes.md | 464 ++++++++++++++++++ .../guide-admin/sensitive_data_analysis.md | 114 +++++ mkdocs.yml | 2 + 3 files changed, 580 insertions(+) create mode 100644 docs/guides/guide-admin/crypt4gh_to_protes.md create mode 100644 docs/guides/guide-admin/sensitive_data_analysis.md diff --git a/docs/guides/guide-admin/crypt4gh_to_protes.md b/docs/guides/guide-admin/crypt4gh_to_protes.md new file mode 100644 index 0000000..e54f3a3 --- /dev/null +++ b/docs/guides/guide-admin/crypt4gh_to_protes.md @@ -0,0 +1,464 @@ +# Setting up Crypt4GH encryption/decryption in Funnel + +This guide explains how to configure and deploy an environment that enables encryption and decryption of sensitive data files using TES/[Funnel](https://github.com/ohsu-comp-bio/funnel) with [proTES](https://github.com/elixir-cloud-aai/proTES) as a stable and scalable [GA4GH TES](https://github.com/ga4gh/task-execution-schemas) gateway. + +## Overview + +[Crypt4GH](https://crypt4gh.readthedocs.io/) is a standard for encrypting sensitive genomic data. This setup demonstrates: + +- Generating cryptographic key pairs for data exchange between parties (sender and recipient) +- Encrypting files using the sender's private key and recipient's public key +- Automatically decrypting `.c4gh` encrypted files during task execution using [protes-middleware-crypt4gh](https://github.com/elixir-cloud-aai/protes-middleware-crypt4gh) +- Securely processing sensitive data in containerized environments + +**Security Note:** Private keys should be stored in secure locations and used only for decryption. Consider using signed URLs for transferring private keys to the TES instance. + +## Workflow + +The complete workflow consists of three main tasks: + +1. **Key Generation**: Generate Crypt4GH key pairs for the sender and recipient parties (optional). +2. **File Encryption**: Encrypt sensitive data using the generated keys. +3. **File Decryption**: Decrypt and process encrypted files in a secure environment. + +All keys are generated inside containers and exported to configured storage via TES outputs. The encrypted files (with `.c4gh` extension) are automatically decrypted during task execution using the proTES middleware. + +## Prerequisites + +Before starting, ensure you have: + +- **Three VMs**: + - Funnel server VM + - Funnel worker VM + - ProTES deployment VM +- [Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) installed on all VMs +- Network connectivity between all VMs +- Sufficient storage space for encrypted/decrypted files + +## Installation and Configuration + +### Step 1: Prepare Your VMs + +The setup requires three distinct components: + +- **Funnel Server**: Manages the database for storing task and scheduler data, and configures the compute backend +- **Funnel Worker**: Executes requested tasks and handles logging +- **ProTES Gateway**: Distributes tasks and provides middleware for automatic decryption + +#### Install Dependencies + +Run the following commands on both the Funnel server and worker VMs: + +```bash +sudo apt update +sudo apt install -y make golang-go protobuf-compiler + +# Install Go protocol buffer plugins +go install google.golang.org/protobuf/cmd/protoc-gen-go@latest +go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest +export PATH=$PATH:$(go env GOPATH)/bin + +# Clone and build Funnel +git clone https://github.com/ohsu-comp-bio/funnel.git +cd funnel +make +``` + +### Step 2: Configure Funnel Server + +Create a configuration file named `server-config.yaml` in the cloned Funnel directory on your **server VM**: + +```yaml +Server: + HostName: 0.0.0.0 + HTTPPort: "8000" + RPCPort: "9090" + +Database: boltdb + +BoltDB: + Path: ./funnel-work-dir/funnel.db + +Compute: "manual" + +Scheduler: + ScheduleRate: 1s + ScheduleChunk: 10 + +LocalStorage: + AllowedDirs: + - /tmp/funnel-storage +``` + +**Configuration Details:** + +- `HostName: 0.0.0.0`: Binds to all network interfaces +- `HTTPPort: "8000"`: HTTP API port +- `RPCPort: "9090"`: RPC communication port +- `Database: boltdb`: Uses embedded BoltDB for task storage +- `Compute: "manual"`: Manual node management mode +- `LocalStorage.AllowedDirs`: Directories accessible for file I/O operations + +### Step 3: Configure Funnel Worker + +Create a configuration file named `worker-config.yaml` in the cloned Funnel directory on your **worker VM**: + +```yaml +Server: + HostName: XXX # Replace with your Funnel server IP + RPCPort: "9090" + +RPCClient: + ServerAddress: XXX:9090 # Replace with your Funnel server IP + +Worker: + WorkDir: "/tmp/funnel-work" + +Node: + ID: "worker-node-1" + Resources: + Cpus: 4 + RamGb: 7.0 + DiskGb: 18.0 + UpdateRate: 5s + +LocalStorage: + AllowedDirs: + - /tmp/funnel-storage +``` + +**Important:** Replace `XXX` with the actual IP (internal if in the same network) address of your Funnel server VM. + +**Configuration Details:** + +- `ServerAddress`: Points to your Funnel server's RPC endpoint +- `Node.ID`: Unique identifier for this worker node +- `Node.Resources`: Define available CPU, RAM, and disk resources +- `UpdateRate`: How frequently the worker reports its status + + +### Step 4: Start Funnel Services + +Start the services on their respective VMs: + +**On the server VM:** + +```bash +cd funnel +funnel server run --config server-config.yaml & +``` + +**On the worker VM:** + +```bash +cd funnel +funnel node run --config worker-config.yaml & +``` + +Verify that both services are running by checking the logs or accessing the Funnel server API at `http://:8000`. + +### Step 5: Configure ProTES + +ProTES acts as a gateway and provides middleware for automatic Crypt4GH decryption. Follow the [proTES](https://github.com/elixir-cloud-aai/proTES) installation guide to deploy proTES on your third VM. + +Once installed, configure the Crypt4GH middleware by editing the `pro_tes/config.yaml` file: + +```yaml +middlewares: + - - "pro_tes.plugins.middlewares.crypt4gh_decrypt.CryptMiddleware" + - "pro_tes.plugins.middlewares.task_distribution.random.TaskDistributionRandom" +``` + +**Middleware Configuration:** + +- `CryptMiddleware`: Automatically detects and decrypts `.c4gh` files during task execution + +For detailed middleware installation, refer to the [protes-middleware-crypt4gh](https://github.com/elixir-cloud-aai/protes-middleware-crypt4gh). + +## Usage Examples + +The following examples demonstrate the complete encryption/decryption workflow using three sequential tasks. + +### Task 1: Generate Crypt4GH Key Pairs + +This task generates cryptographic key pairs for both the sender and recipient parties. This step is independent of the following steps and may have happened a while ago. Your private keys may already be in a secure place. If you have crypt4gh keys, feel free to skip this step. + +Create a file named `task1_keygen.json`: + +```json +{ + "name": "Generate crypt4gh key pairs", + "description": "Generate sender and recipient key pairs locally in container", + "inputs": [], + "outputs": [ + { + "name": "sender_sk", + "description": "Sender secret key", + "url": "file:///tmp/funnel-storage/keys/sender/sender.sec", + "path": "/outputs/keys/sender/sender.sec", + "type": "FILE" + }, + { + "name": "sender_pk", + "description": "Sender public key", + "url": "file:///tmp/funnel-storage/keys/sender/sender.pub", + "path": "/outputs/keys/sender/sender.pub", + "type": "FILE" + }, + { + "name": "recipient_sk", + "description": "Recipient secret key", + "url": "file:///tmp/funnel-storage/keys/recipient/recipient.sec", + "path": "/outputs/keys/recipient/recipient.sec", + "type": "FILE" + }, + { + "name": "recipient_pk", + "description": "Recipient public key", + "url": "file:///tmp/funnel-storage/keys/recipient/recipient.pub", + "path": "/outputs/keys/recipient/recipient.pub", + "type": "FILE" + }, + { + "name": "recipient_pk_copy", + "description": "Copy of recipient public key", + "url": "file:///tmp/funnel-storage/keys/sender/recipient.pub", + "path": "/outputs/keys/sender/recipient.pub", + "type": "FILE" + } + ], + "executors": [ + { + "image": "quay.io/grbot/crypt4gh-tutorial", + "command": [ + "/bin/bash", + "-c", + "crypt4gh-keygen --sk /outputs/keys/sender/sender.sec --pk /outputs/keys/sender/sender.pub -f --nocrypt && crypt4gh-keygen --sk /outputs/keys/recipient/recipient.sec --pk /outputs/keys/recipient/recipient.pub -f --nocrypt && cp /outputs/keys/recipient/recipient.pub /outputs/keys/sender/recipient.pub" + ], + "workdir": "/tmp" + } + ], + "resources": { + "cpu_cores": 1, + "ram_gb": 2, + "disk_gb": 5 + } +} +``` + +**Key Details:** + +- Generates two key pairs: one for the sender and one for the recipient +- Keys are generated without encryption (`--nocrypt`) for demonstration purposes +- The recipient's public key is copied to the sender's directory for use in encryption +- All keys are exported to local storage via TES outputs + +### Task 2: Encrypt a File + +This task downloads a file, encrypts it using Crypt4GH, and stores both the encrypted file and metadata. Create a file named `task2_encrypt_file.json`: + +```json +{ + "name": "Encrypt file with crypt4gh", + "description": "Download a file, record its size, and encrypt it locally using sender and recipient keys", + "inputs": [ + { + "name": "sender_sk", + "description": "Sender secret key", + "url": "file:///tmp/funnel-storage/keys/sender/sender.sec", + "path": "/inputs/keys/sender/sender.sec", + "type": "FILE" + }, + { + "name": "recipient_pk", + "description": "Recipient public key", + "url": "file:///tmp/funnel-storage/keys/recipient/recipient.pub", + "path": "/inputs/keys/recipient/recipient.pub", + "type": "FILE" + } + ], + "outputs": [ + { + "name": "encrypted_file", + "description": "Encrypted file", + "url": "file:///tmp/funnel-storage/encrypted/united_kingdom_logo_size.txt.c4gh", + "path": "/outputs/encrypted/united_kingdom_logo_size.txt.c4gh", + "type": "FILE" + }, + { + "name": "size_file", + "description": "Text file containing original file size", + "url": "file:///tmp/funnel-storage/raw/united_kingdom_logo_size.txt", + "path": "/outputs/raw/united_kingdom_logo_size.txt", + "type": "FILE" + } + ], + "executors": [ + { + "image": "quay.io/grbot/crypt4gh-tutorial", + "command": [ + "/bin/bash", + "-c", + "curl -L -o /tmp/file.png http://britishfamily.co.uk/wp-content/uploads/2015/02/MADE_IN_BRITAIN_web_300x300.png && stat -c %s /tmp/file.png > /outputs/raw/united_kingdom_logo_size.txt && crypt4gh encrypt --sk /inputs/keys/sender/sender.sec --recipient_pk /inputs/keys/recipient/recipient.pub < /outputs/raw/united_kingdom_logo_size.txt > /outputs/encrypted/united_kingdom_logo_size.txt.c4gh" + ], + "workdir": "/tmp" + } + ], + "resources": { + "cpu_cores": 1, + "ram_gb": 2, + "disk_gb": 10 + } +} +``` + +**Key Details:** + +- Takes the sender's private key and recipient's public key as inputs +- Downloads a sample file from a URL +- Records the original file size for verification +- Encrypts the file using Crypt4GH, producing a `.c4gh` encrypted file +- Stores both the encrypted file and size metadata + +### Task 3: Decrypt and Process File + +This task decrypts the encrypted file using the recipient's private key and processes it. + +**Note:** The different paths indicate isolated storage paths that do not necessarily see each other. For example, distinct S3 buckets. + +Create a file named `task3_decrypt_and_write_size.json`: + +```json +{ + "name": "Decrypt crypt4gh file", + "description": "Decrypt an encrypted file using recipient key locally", + "volumes": ["/outputs/test"], + "inputs": [ + { + "name": "encrypted_file", + "description": "Encrypted input file", + "url": "file:///tmp/funnel-storage/encrypted/united_kingdom_logo_size.txt.c4gh", + "path": "/inputs/encrypted/united_kingdom_logo_size.txt.c4gh", + "type": "FILE" + }, + { + "name": "recipient_sk", + "description": "Recipient secret key", + "url": "file:///tmp/funnel-storage/keys/recipient/recipient.sec", + "path": "/inputs/keys/recipient/recipient.sec", + "type": "FILE" + } + ], + "outputs": [ + { + "name": "decrypted_file", + "description": "Decrypted size text file", + "url": "file:///tmp/funnel-storage/decrypted/united_kingdom_logo_md5sum.txt", + "path": "/outputs/decrypted/united_kingdom_logo_md5sum.txt", + "type": "FILE" + } + ], + "executors": [ + { + "image": "quay.io/grbot/crypt4gh-tutorial", + "command": [ + "/bin/sh", + "-c", + "mkdir -p /outputs/decrypted && /bin/md5sum /outputs/decrypted/united_kingdom_logo_size.txt > /outputs/decrypted/united_kingdom_logo_md5sum.txt" + ], + "workdir": "/tmp" + } + ], + "resources": { + "cpu_cores": 1, + "ram_gb": 2, + "disk_gb": 5 + } +} +``` + +**Key Details:** + +- Takes the encrypted `.c4gh` file and recipient's private key as inputs +- The proTES middleware automatically decrypts the file during task execution +- Computes an MD5 checksum of the decrypted data for verification +- Stores the checksum in the output directory + +## Submitting Tasks + +Once your environment is configured, submit tasks to proTES using the following commands: + +```bash +# Submit Task 1: Generate keys +curl -X POST http://localhost:8080/ga4gh/tes/v1/tasks \ + -H "Content-Type: application/json" \ + -d @task1_keygen.json + +# Submit Task 2: Encrypt file (wait for Task 1 to complete) +curl -X POST http://localhost:8080/ga4gh/tes/v1/tasks \ + -H "Content-Type: application/json" \ + -d @task2_encrypt_file.json + +# Submit Task 3: Decrypt file (wait for Task 2 to complete) +curl -X POST http://localhost:8080/ga4gh/tes/v1/tasks \ + -H "Content-Type: application/json" \ + -d @task3_decrypt_and_write_size.json +``` + +**Important:** Replace `localhost:8080` with your proTES server address if it's running on a different machine. + +Each task submission returns a task ID that you can use to monitor progress: + +```bash +curl http://localhost:8080/ga4gh/tes/v1/tasks/ +``` + +## How It Works + +1. **Task Submission**: Tasks are submitted to proTES via the GA4GH TES API +2. **Task Distribution**: ProTES distributes tasks to available Funnel TES endpoints +3. **Automatic Decryption**: The Crypt4GH middleware automatically detects `.c4gh` files and injects a decryption step +4. **Container Execution**: Funnel worker nodes execute tasks in isolated containers +5. **Result Storage**: All results are stored in the configured local storage directory (`/tmp/funnel-storage`) +6. **Data Persistence**: Task and scheduler metadata is stored in the Funnel BoltDB database + +## Troubleshooting + +### Common Issues + +**Tasks not executing:** +- Verify Funnel server and worker are running +- Check network connectivity between VMs + +**Decryption failures:** +- Verify the Crypt4GH middleware is properly configured in proTES. Use `docker logs` during task submission. +- Ensure `.c4gh` file extension is present on encrypted files + +### Checking Logs + +View Funnel server logs: +```bash +ps aux | grep funnel +# Find the process and check its output +``` + +View task details: +```bash +curl http://localhost:8000/v1/tasks/ +``` + +## Additional Resources + +- [Funnel Documentation](https://ohsu-comp-bio.github.io/funnel/) +- [proTES Documentation](https://github.com/elixir-cloud-aai/proTES) +- [Crypt4GH Specification](https://crypt4gh.readthedocs.io/) +- [GA4GH TES API](https://github.com/ga4gh/task-execution-schemas) +- [Crypt4GH Middleware](https://github.com/elixir-cloud-aai/protes-middleware-crypt4gh) + +## Security Best Practices + +- **Never commit private keys** to version control +- Use **encrypted storage** for private keys in production +- Implement **access controls** on storage directories +- Use **signed URLs** or secure key management systems for key distribution +- Enable **TLS/SSL** for all API endpoints in production diff --git a/docs/guides/guide-admin/sensitive_data_analysis.md b/docs/guides/guide-admin/sensitive_data_analysis.md new file mode 100644 index 0000000..9f3a273 --- /dev/null +++ b/docs/guides/guide-admin/sensitive_data_analysis.md @@ -0,0 +1,114 @@ +# Analysis of Sensitive Data in Secure Processing Environments (SPE) + +This tutorial presents the implementation of a SPE in the de.NBI Cloud (ELIXIR-DE) using ELIXIR and open-source services. + +The aim of this tutorial is to describe how to deploy and configure a Secure Processing Environment (SPE) for analyzing large volumes of sensitive data generated by biomedical and clinical research. Easy and secure access to such environments accelerates research and enables participation by researchers with limited resources. + +Users of an SPE can run workflows on sensitive data, without ever gaining access to the actual data. The data is processed securely and the user can only access the results of the workflows. + +## Overview +The setup has three central components: +- Secure Execution Backend +- External Storage (S3) for result deposition. +- User Authentication (LS Login) + +The **execution backend** consists of two independent systems. The execution of workflows is managed by [WESkit](https://gitlab.com/one-touch-pipeline/weskit). It provides a REST interface to submit workflow runs and monitor progress. The actual execution of the workflow scripts takes place in a Slurm cluster. All sensitive data is stored and processed within the cluster. This tutorial assumes a Slurm cluster hosted in the [de.NBI Cloud](https://www.denbi.de/cloud). + +The **results** are stored in MinIO/S3-compatible storage that can be accessed by authorized users. [Life Science Login](https://lifescience-ri.eu/ls-login.html) is used to authenticate users to a registered service that allows them to request workflow executions and study results in an external storage. Therefore users can read only non-sensitive information resulting from workflow execution. Any sensitive data is not accessible. + +## Setup + +### Required infrastructure +- 1 VM for WESkit deployment +- 1-2 VMs for a SLURM cluster (depending on the workload more) +- 1 VM for S3 Storage + +### Authorization +Data processing is permitted only for authorized users. LS-Login can be used to [register a service/client](https://docs.google.com/document/d/17pNXM_psYOP5rWF302ObAJACsfYnEWhjvxAHzcjvfIE/edit?tab=t.0#heading=h.suudoy1bqtvm). The provided client credentials can be used for your service to obtain an access token. Potential users need to request authorization to use the service. + +### Execution +WESkit allows execution of [Snakemake](https://snakemake.readthedocs.io/en/v7.32.3/) and [Nextflow](https://www.nextflow.io/docs/latest/) workflows by sending a request to the compute infrastructure (Cloud/Cluster). Find details in the [WESkit docs](https://gitlab.com/one-touch-pipeline/weskit/documentation). + +A Slurm cluster can be deployed with little effort using [BiBiGrid](https://cloud.denbi.de/wiki/Tutorials/BiBiGrid/), a framework for creating and managing cloud clusters. BiBiGrid uses Ansible to configure cloud images and set up an on-demand SLURM cluster. Alternatively use any other Slurm deployment. + +Access to the SPE must be restricted due to national restrictions and laws. Collaborators and foreign researchers need to obtain permission from the Identity Provider to use the SPE. A permission allows them to authenticate at the Identity Provider site and request workflow execution via WESkit on the SLURM cluster. + +### Results +Finally, results are stored in a storage that is mounted into the cluster and an interface that is only accessible via LS-Login. Sensitive data is not managed by WESkit or accessible in the result storage. + +## Step 1: WESkit + +The SPE uses WESkit to execute workflows on the sensitive data. Therefore, WESkit must be installed on a machine that is accessible via the internet and has access to the internet. This machine could be hosted by an institute compute center or by a cloud provider. + +The deployment of WESkit involves the following steps: + +1. **Install WESkit:** Simple deployment [using Docker](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/README.md). +2. **Set up compute environment:** WESkit must be configured according to the [compute environment](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/executor.md). +3. **Provide workflows:** In this scenario, a data controller has to validate and provide every workflow on the compute evironment. Only then they are available for the researchers. WESkit provides instructions for [workflow installation](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/workflow-installation.md). Workflows are Snakemake or Nextflow scripts, along with all dependencies and additional data. +4. **Configure workflow engine:** Define workflow [engine parameters](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/executor.md). +5. **Provide data:** The workflows are executed on sensitive data within the compute environment. Therefore, the data should be available in the file system of the compute environment (e.g. Slurm). +6. **Publish web service:** We assume that the service will be available online. This requires configuration on the provider side. + +## Step 2: MinIO + +The SPE uses MinIO/S3 to provide researchers access to non-sensitive results data. Depending on the environment, there are several options available on how to [deploy MinIO](https://github.com/minio/minio?tab=readme-ov-file). To configure OpenID please refer to the [MinIO OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). + +In this scenario we create a bucket "results" in MinIO and allow all authorized users to access MinIO with read-access on the results data. + +Note: MinIO as a storage provider has removed its open source license, therefore it might be advisable to switch to a different storage solution. Refer to [legacy binary releases](https://github.com/minio/minio?tab=readme-ov-file#legacy-binary-releases) for the last open source release. + +### Results crawler + +To make the non-sensitive results available, a crawler continuously checks for new results and copies them to MinIO. This can be implemented as a shell script running as a cron job. + +A simple example script is given below: + +```bash +mc config host add local http://localhost:9000 USERNAME PASSWORD; + +BASE_DIR=/minio_data/data + +process_directory() { + local dir="$1" + local bucketname=$(basename $dir) + if [[ ! -f "$dir/upload_tokenx" ]]; then + if [ -f "$dir/plots/quals.svg" ]; then + mc mb local/results/$bucketname; + mc cp $dir/results.csv local/results/$bucketname; + fi + touch "$dir/upload_token" + fi +} + +for dir in "$BASE_DIR"/*/*/; do + for logsdir in "$dir".weskit/*/; do + if [ -d "$logsdir" ]; then + if [ -f "$logsdir/log.json" ]; then + process_directory $dir + fi + fi + done +done +``` + +This script regularly checks the WESkit results folder. WESkit logs information about a workflow execution in the file `log.json` once the workflow execution has finished. The script checks if the `log.json` file exists and, if so, uploads the result file `results.csv` to the S3 bucket. Uploaded run-directories are tagged with an `upload_token` file to prevent redundant uploads. + +## Step 3: User Interface + +To offer a user interface for the SPE, the simplest way is to use a [customized version](https://gitlab.com/one-touch-pipeline/weskit/gui/-/tree/spe4hd_demo) of the WESkit GUI. It offers a lightweight web application to allow researchers to run and monitor workflows. The WESkit GUI repository can be used as a blueprint to create a customized website. + +## Step 4: Authentication and Authorization + +Authentication and authorization is implemented using OIDC. This setup uses the [LS-Login infrastructure](https://lifescience-ri.eu/ls-login/) for OIDC integration. The LS-Login documentation contains a [guide](https://lifescience-ri.eu/ls-login/documentation/service-provider-documentation/service-provider-documentation.html) on how to register a new service. + +LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instructions in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login. + +In this tutorial, we assume a single LS-Login service for all the deployed tools (WESkit, MinIO, WebApp). This requires that the following three addresses are valid as OIDC redirect URLs. + +### LS-Login in MinIO + +LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instruction in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login. + +### LS-Login in WESkit + +WESkit can be configured for OIDC. After enabling OIDC, WESkit requires OAuth2 tokens for each request. Please refer to the [WESkit documentation](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/login.md) for configuration instructions. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 7a3f8cf..b6b403f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -90,6 +90,8 @@ nav: - Administrators: - "guides/guide-admin/index.md" - "LS Login configuration": "guides/guide-admin/services_to_ls_aai.md" + - "Sensitive Data analysis in Secure Processing Environments (SPE)": "guides/guide-admin/sensitive_data_analysis_spe.md" + - "Crypt4GH Middleware with proTES and Funnel": "guides/guide-admin/crypt4gh_funnel.md" - Contributors: - "guides/guide-contributor/index.md" - "Workflow": "guides/guide-contributor/workflow.md" From 040017d3179a8a7609b18b3e13fdb056c4a2b191 Mon Sep 17 00:00:00 2001 From: Valentin Schneider-Lunitz Date: Mon, 12 Jan 2026 12:41:55 +0000 Subject: [PATCH 2/3] docs(guides): extend Crypt4GH_proTES tutorial with use case example --- docs/guides/guide-admin/crypt4gh_to_protes.md | 23 +++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/docs/guides/guide-admin/crypt4gh_to_protes.md b/docs/guides/guide-admin/crypt4gh_to_protes.md index e54f3a3..2cff90e 100644 --- a/docs/guides/guide-admin/crypt4gh_to_protes.md +++ b/docs/guides/guide-admin/crypt4gh_to_protes.md @@ -2,6 +2,23 @@ This guide explains how to configure and deploy an environment that enables encryption and decryption of sensitive data files using TES/[Funnel](https://github.com/ohsu-comp-bio/funnel) with [proTES](https://github.com/elixir-cloud-aai/proTES) as a stable and scalable [GA4GH TES](https://github.com/ga4gh/task-execution-schemas) gateway. +## Use Case + +Imagine you are a researcher who needs to analyse sensitive data in a cloud environment. You need to ensure: + +- **Your data is encrypted during transfer**: Your files are encrypted for transfer. Raw sensitive data remains located at your storage. +- **Only authorized researcher can decrypt the data**: Data can only be decrypted with specific private keys. Data theft is useless without specific keys. +- **Automatic decryption**: Your setup does automatic decryption given `.c4gh` files and the correct private key. +- **Secure collaboration**: Data exchange between collaborators is not restricted, as long as the correct key is available. + +This tutorial presents a solution where: + +1. A data provider encrypts sensitive data using Crypt4GH before uploading them to storage. +2. Encrypted data is sent to a `Task Execution Service (TES)` instance via `proTES` and a `proTES middleware` for processing. +3. A researcher (recipient) can process these files in a secure containerized environment where automatic decryption happens using the `proTES middleware`. + +This approach allows collaborative research where sensitive data can be processed in cloud environments while maintaining strict access controls and encryption throughout the data lifecycle. + ## Overview [Crypt4GH](https://crypt4gh.readthedocs.io/) is a standard for encrypting sensitive genomic data. This setup demonstrates: @@ -13,9 +30,11 @@ This guide explains how to configure and deploy an environment that enables encr **Security Note:** Private keys should be stored in secure locations and used only for decryption. Consider using signed URLs for transferring private keys to the TES instance. -## Workflow +**Goal of this tutorial:** You'll have a setup where you can submit encrypted files via task inputs, and they will be automatically decrypted and processed, ensuring that sensitive data remains protected. + +## Setup -The complete workflow consists of three main tasks: +The complete setup consists of three main tasks: 1. **Key Generation**: Generate Crypt4GH key pairs for the sender and recipient parties (optional). 2. **File Encryption**: Encrypt sensitive data using the generated keys. From 95e25c0bdffc6451c4b681fa57a500cf9db469e9 Mon Sep 17 00:00:00 2001 From: Valentin Schneider-Lunitz Date: Tue, 13 Jan 2026 14:36:25 +0000 Subject: [PATCH 3/3] docs(guides): improve Crypt4GH + proTES tutorial with a detailed use case --- docs/guides/guide-admin/crypt4gh_to_protes.md | 135 ++++++++---------- 1 file changed, 62 insertions(+), 73 deletions(-) diff --git a/docs/guides/guide-admin/crypt4gh_to_protes.md b/docs/guides/guide-admin/crypt4gh_to_protes.md index 2cff90e..ad12d5b 100644 --- a/docs/guides/guide-admin/crypt4gh_to_protes.md +++ b/docs/guides/guide-admin/crypt4gh_to_protes.md @@ -1,46 +1,44 @@ # Setting up Crypt4GH encryption/decryption in Funnel -This guide explains how to configure and deploy an environment that enables encryption and decryption of sensitive data files using TES/[Funnel](https://github.com/ohsu-comp-bio/funnel) with [proTES](https://github.com/elixir-cloud-aai/proTES) as a stable and scalable [GA4GH TES](https://github.com/ga4gh/task-execution-schemas) gateway. +This guide explains how to configure and deploy an environment that enables collaborative research on sensitive genomic data. Data holders can securely provide encrypted data for analysis while researchers process it through TES/[Funnel](https://github.com/ohsu-comp-bio/funnel) and [proTES](https://github.com/elixir-cloud-aai/proTES), where automatic decryption occurs within secure containers without granting researchers direct access to the sensitive data. This setup leverages [GA4GH TES](https://github.com/ga4gh/task-execution-schemas) standards for scalable and secure task execution. ## Use Case -Imagine you are a researcher who needs to analyse sensitive data in a cloud environment. You need to ensure: +A data holder needs to provide sensitive genomic data for analysis to researchers in a cloud environment. The data must remain encrypted during storage and transfer, with decryption occurring only within a secure computational environment (container), without granting direct data access to the researcher. -- **Your data is encrypted during transfer**: Your files are encrypted for transfer. Raw sensitive data remains located at your storage. -- **Only authorized researcher can decrypt the data**: Data can only be decrypted with specific private keys. Data theft is useless without specific keys. -- **Automatic decryption**: Your setup does automatic decryption given `.c4gh` files and the correct private key. -- **Secure collaboration**: Data exchange between collaborators is not restricted, as long as the correct key is available. +1. The data holder encrypts sensitive data using Crypt4GH and stores them at a secure storage (e.g. S3 buckets). +2. The researcher submits a GA4GH TES task to `proTES` for analysis of the encrypted data. +3. The installed `proTES middleware` automatically detects the encrypted data and decrypts them using Crypt4GH keys that are managed by `proTES`. +4. The researcher's task command is executed on the decrypted data. +5. The analysis results are stored at a dedicated storage accessible to the researcher -This tutorial presents a solution where: +`Note` all computational steps are done in a secure containerized environment. -1. A data provider encrypts sensitive data using Crypt4GH before uploading them to storage. -2. Encrypted data is sent to a `Task Execution Service (TES)` instance via `proTES` and a `proTES middleware` for processing. -3. A researcher (recipient) can process these files in a secure containerized environment where automatic decryption happens using the `proTES middleware`. +This approach allows collaborative research where sensitive data can be processed in cloud environments without provisioning data access to the researcher but instead utilizing a combination of `Crypt4GH` and `proTES` for data encryption, decryption, and analysis. +Additionally, the researcher can repeat the analysis with adjusted parameters anytime without further action of the data holder. -This approach allows collaborative research where sensitive data can be processed in cloud environments while maintaining strict access controls and encryption throughout the data lifecycle. ## Overview [Crypt4GH](https://crypt4gh.readthedocs.io/) is a standard for encrypting sensitive genomic data. This setup demonstrates: -- Generating cryptographic key pairs for data exchange between parties (sender and recipient) -- Encrypting files using the sender's private key and recipient's public key +- Generating cryptographic key pairs for data exchange between parties (data holder and researcher) +- Encrypting files using the data holder's private key and researcher's public key - Automatically decrypting `.c4gh` encrypted files during task execution using [protes-middleware-crypt4gh](https://github.com/elixir-cloud-aai/protes-middleware-crypt4gh) - Securely processing sensitive data in containerized environments -**Security Note:** Private keys should be stored in secure locations and used only for decryption. Consider using signed URLs for transferring private keys to the TES instance. +**Security Note:** Private keys should be stored in secure locations and used only for encryption/decryption. Consider using signed URLs for transferring private keys to the TES instance. -**Goal of this tutorial:** You'll have a setup where you can submit encrypted files via task inputs, and they will be automatically decrypted and processed, ensuring that sensitive data remains protected. +**Goal of this tutorial:** You'll have a setup which encrypts sensitive data, stores them in a secure storage, automatic detection of encrypted data triggers decryption followed by processing, ensuring that sensitive data remains protected. ## Setup The complete setup consists of three main tasks: -1. **Key Generation**: Generate Crypt4GH key pairs for the sender and recipient parties (optional). -2. **File Encryption**: Encrypt sensitive data using the generated keys. -3. **File Decryption**: Decrypt and process encrypted files in a secure environment. +1. **Key Generation**: Generate Crypt4GH key pairs for the data holder and researcher parties (optional). +2. **File Encryption**: Encrypt sensitive data using the Crypt4GH keys. +3. **File Decryption**: automatic detection of encrypted data, their decryption and processing in a secure computing environment. -All keys are generated inside containers and exported to configured storage via TES outputs. The encrypted files (with `.c4gh` extension) are automatically decrypted during task execution using the proTES middleware. ## Prerequisites @@ -52,7 +50,7 @@ Before starting, ensure you have: - ProTES deployment VM - [Docker](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) installed on all VMs - Network connectivity between all VMs -- Sufficient storage space for encrypted/decrypted files +- Sufficient storage space for encrypted/decrypted files and results. ## Installation and Configuration @@ -200,49 +198,49 @@ The following examples demonstrate the complete encryption/decryption workflow u ### Task 1: Generate Crypt4GH Key Pairs -This task generates cryptographic key pairs for both the sender and recipient parties. This step is independent of the following steps and may have happened a while ago. Your private keys may already be in a secure place. If you have crypt4gh keys, feel free to skip this step. +This task generates cryptographic key pairs for both the data holder and researcher. This step is independent of the following steps and may have happened a while ago. Your private keys may already be in a secure place. If you have crypt4gh keys, feel free to skip this step. Create a file named `task1_keygen.json`: ```json { "name": "Generate crypt4gh key pairs", - "description": "Generate sender and recipient key pairs locally in container", + "description": "Generate data holder and researcher key pairs locally in container", "inputs": [], "outputs": [ { - "name": "sender_sk", - "description": "Sender secret key", - "url": "file:///tmp/funnel-storage/keys/sender/sender.sec", - "path": "/outputs/keys/sender/sender.sec", + "name": "data_holder_sk", + "description": "Data holder secret key", + "url": "file:///tmp/funnel-storage/keys/data_holder/data_holder.sec", + "path": "/outputs/keys/data_holder/data_holder.sec", "type": "FILE" }, { - "name": "sender_pk", - "description": "Sender public key", - "url": "file:///tmp/funnel-storage/keys/sender/sender.pub", - "path": "/outputs/keys/sender/sender.pub", + "name": "data_holder_pk", + "description": "data_holder public key", + "url": "file:///tmp/funnel-storage/keys/data_holder/data_holder.pub", + "path": "/outputs/keys/data_holder/data_holder.pub", "type": "FILE" }, { - "name": "recipient_sk", - "description": "Recipient secret key", - "url": "file:///tmp/funnel-storage/keys/recipient/recipient.sec", - "path": "/outputs/keys/recipient/recipient.sec", + "name": "researcher_sk", + "description": "researcher secret key", + "url": "file:///tmp/funnel-storage/keys/researcher/researcher.sec", + "path": "/outputs/keys/researcher/researcher.sec", "type": "FILE" }, { - "name": "recipient_pk", - "description": "Recipient public key", - "url": "file:///tmp/funnel-storage/keys/recipient/recipient.pub", - "path": "/outputs/keys/recipient/recipient.pub", + "name": "researcher_pk", + "description": "researcher public key", + "url": "file:///tmp/funnel-storage/keys/researcher/researcher.pub", + "path": "/outputs/keys/researcher/researcher.pub", "type": "FILE" }, { - "name": "recipient_pk_copy", - "description": "Copy of recipient public key", - "url": "file:///tmp/funnel-storage/keys/sender/recipient.pub", - "path": "/outputs/keys/sender/recipient.pub", + "name": "researcher_pk_copy", + "description": "Copy of researcher public key", + "url": "file:///tmp/funnel-storage/keys/data_holder/researcher.pub", + "path": "/outputs/keys/data_holder/researcher.pub", "type": "FILE" } ], @@ -252,7 +250,7 @@ Create a file named `task1_keygen.json`: "command": [ "/bin/bash", "-c", - "crypt4gh-keygen --sk /outputs/keys/sender/sender.sec --pk /outputs/keys/sender/sender.pub -f --nocrypt && crypt4gh-keygen --sk /outputs/keys/recipient/recipient.sec --pk /outputs/keys/recipient/recipient.pub -f --nocrypt && cp /outputs/keys/recipient/recipient.pub /outputs/keys/sender/recipient.pub" + "crypt4gh-keygen --sk /outputs/keys/data_holder/data_holder.sec --pk /outputs/keys/data_holder/data_holder.pub -f --nocrypt && crypt4gh-keygen --sk /outputs/keys/researcher/researcher.sec --pk /outputs/keys/researcher/researcher.pub -f --nocrypt && cp /outputs/keys/researcher/researcher.pub /outputs/keys/data_holder/researcher.pub" ], "workdir": "/tmp" } @@ -267,32 +265,32 @@ Create a file named `task1_keygen.json`: **Key Details:** -- Generates two key pairs: one for the sender and one for the recipient +- Generates two key pairs: one for the data holder and one for the researcher - Keys are generated without encryption (`--nocrypt`) for demonstration purposes -- The recipient's public key is copied to the sender's directory for use in encryption +- The researcher's public key is copied to the data holder's directory for use in encryption - All keys are exported to local storage via TES outputs ### Task 2: Encrypt a File -This task downloads a file, encrypts it using Crypt4GH, and stores both the encrypted file and metadata. Create a file named `task2_encrypt_file.json`: +This task retrieves a file, encrypts it using Crypt4GH, and stores both the encrypted file and metadata. Create a file named `task2_encrypt_file.json`: ```json { "name": "Encrypt file with crypt4gh", - "description": "Download a file, record its size, and encrypt it locally using sender and recipient keys", + "description": "Retrieve a file, record its size, and encrypt it using data holder and researcher keys", "inputs": [ { - "name": "sender_sk", - "description": "Sender secret key", - "url": "file:///tmp/funnel-storage/keys/sender/sender.sec", - "path": "/inputs/keys/sender/sender.sec", + "name": "data_holder_sk", + "description": "data_holder secret key", + "url": "file:///tmp/funnel-storage/keys/data_holder/data_holder.sec", + "path": "/inputs/keys/data_holder/data_holder.sec", "type": "FILE" }, { - "name": "recipient_pk", - "description": "Recipient public key", - "url": "file:///tmp/funnel-storage/keys/recipient/recipient.pub", - "path": "/inputs/keys/recipient/recipient.pub", + "name": "researcher_pk", + "description": "researcher public key", + "url": "file:///tmp/funnel-storage/keys/researcher/researcher.pub", + "path": "/inputs/keys/researcher/researcher.pub", "type": "FILE" } ], @@ -318,7 +316,7 @@ This task downloads a file, encrypts it using Crypt4GH, and stores both the encr "command": [ "/bin/bash", "-c", - "curl -L -o /tmp/file.png http://britishfamily.co.uk/wp-content/uploads/2015/02/MADE_IN_BRITAIN_web_300x300.png && stat -c %s /tmp/file.png > /outputs/raw/united_kingdom_logo_size.txt && crypt4gh encrypt --sk /inputs/keys/sender/sender.sec --recipient_pk /inputs/keys/recipient/recipient.pub < /outputs/raw/united_kingdom_logo_size.txt > /outputs/encrypted/united_kingdom_logo_size.txt.c4gh" + "curl -L -o /tmp/file.png http://britishfamily.co.uk/wp-content/uploads/2015/02/MADE_IN_BRITAIN_web_300x300.png && stat -c %s /tmp/file.png > /outputs/raw/united_kingdom_logo_size.txt && crypt4gh encrypt --sk /inputs/keys/data_holder/data_holder.sec --recipient_pk /inputs/keys/researcher/researcher.pub < /outputs/raw/united_kingdom_logo_size.txt > /outputs/encrypted/united_kingdom_logo_size.txt.c4gh" ], "workdir": "/tmp" } @@ -333,7 +331,7 @@ This task downloads a file, encrypts it using Crypt4GH, and stores both the encr **Key Details:** -- Takes the sender's private key and recipient's public key as inputs +- Takes the data holder's private key and researcher's public key as inputs - Downloads a sample file from a URL - Records the original file size for verification - Encrypts the file using Crypt4GH, producing a `.c4gh` encrypted file @@ -341,7 +339,7 @@ This task downloads a file, encrypts it using Crypt4GH, and stores both the encr ### Task 3: Decrypt and Process File -This task decrypts the encrypted file using the recipient's private key and processes it. +This task decrypts the encrypted file using the researcher's private key and processes it. **Note:** The different paths indicate isolated storage paths that do not necessarily see each other. For example, distinct S3 buckets. @@ -350,7 +348,7 @@ Create a file named `task3_decrypt_and_write_size.json`: ```json { "name": "Decrypt crypt4gh file", - "description": "Decrypt an encrypted file using recipient key locally", + "description": "Decrypt an encrypted file using researcher key locally", "volumes": ["/outputs/test"], "inputs": [ { @@ -361,10 +359,10 @@ Create a file named `task3_decrypt_and_write_size.json`: "type": "FILE" }, { - "name": "recipient_sk", - "description": "Recipient secret key", - "url": "file:///tmp/funnel-storage/keys/recipient/recipient.sec", - "path": "/inputs/keys/recipient/recipient.sec", + "name": "researcher_sk", + "description": "researcher secret key", + "url": "file:///tmp/funnel-storage/keys/researcher/researcher.sec", + "path": "/inputs/keys/researcher/researcher.sec", "type": "FILE" } ], @@ -398,7 +396,7 @@ Create a file named `task3_decrypt_and_write_size.json`: **Key Details:** -- Takes the encrypted `.c4gh` file and recipient's private key as inputs +- Takes the encrypted `.c4gh` file and researcher's private key as inputs - The proTES middleware automatically decrypts the file during task execution - Computes an MD5 checksum of the decrypted data for verification - Stores the checksum in the output directory @@ -432,15 +430,6 @@ Each task submission returns a task ID that you can use to monitor progress: curl http://localhost:8080/ga4gh/tes/v1/tasks/ ``` -## How It Works - -1. **Task Submission**: Tasks are submitted to proTES via the GA4GH TES API -2. **Task Distribution**: ProTES distributes tasks to available Funnel TES endpoints -3. **Automatic Decryption**: The Crypt4GH middleware automatically detects `.c4gh` files and injects a decryption step -4. **Container Execution**: Funnel worker nodes execute tasks in isolated containers -5. **Result Storage**: All results are stored in the configured local storage directory (`/tmp/funnel-storage`) -6. **Data Persistence**: Task and scheduler metadata is stored in the Funnel BoltDB database - ## Troubleshooting ### Common Issues