diff --git a/AWS/Submodule_2_annotation_only.ipynb b/AWS/Submodule_2_annotation_only.ipynb index 603b3bc..4dc3893 100644 --- a/AWS/Submodule_2_annotation_only.ipynb +++ b/AWS/Submodule_2_annotation_only.ipynb @@ -174,36 +174,82 @@ "id": "64228197", "metadata": {}, "source": [ - "### **Step 2:** AWS Batch Setup\n", + "## Get Started\n", + "### **Step 2:** Setting up AWS Batch\n", "\n", - "AWS Batch will create the needed permissions, roles and resources to run Nextflow in a serverless manner. You can set up AWS Batch manually or deploy it **automatically** with a stack template. The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n", + "AWS Batch manages the provisioning of compute environments (EC2, Fargate), container orchestration, job queues, IAM roles, and permissions. We can deploy a full environment either:\n", + "- Automatically using a preconfigured AWS CloudFormation stack (**recommended**)\n", + "- Manually by setting up roles, queues, and buckets\n", + "The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n", "\n", - "If you prefer to skip manual deployment and deploy automatically in the cloud, click the Launch Stack button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n", + "If you prefer to skip manual deployment and deploy automatically in the cloud, click the **Launch Stack** button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n", "\n", - "[![Launch Stack](../images/LaunchStack.jpg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml)\n", + "[![Launch Stack](../images/LaunchStack.jpg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml )\n", "\n", + "### **Step 3:** Install dependencies, update paths and create a new S3 Bucket to store input and output files\n", "\n", - "Before beginning this tutorial, if you do not have required roles, policies, permissions or compute environment and would like to **manually** set those up please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/AWS-Batch-Setup.md) to set that up." + "After setting up an AWS CloudFormation stack, we need to let the nextflow workflow to know where are those resrouces by providing the configuration:\n", + "
\n", + "
\n", + " Important - Customize Required\n", + "
\n", + "

\n", + "After successfull creation of your stack you must attatch a new role to SageMaker to be able to submit batch jobs. Please following the the following steps to change your SageMaker role:
\n", + "

  1. Navigate to your SageMaker AI notebook dashboard (where you initially created and launched your VM)
  2. Locate your instance and click the Stop button
  3. Once the instance is stopped:
    • Click Edit
    • Scroll to the \"Permissions and encryption\" section
    • Click the IAM role dropdown
    • Select the new role created during stack formation (named something like aws-batch-nigms-SageMakerExecutionRole)
  4. \n", + "
  5. Click Update notebook instance to save your changes
  6. \n", + "
  7. After the update completes:
    • Click Start to relaunch your instance
    • Reconnect to your instance
    • Resume your work from this point
\n", + "\n", + "Warning: Make sure to replace the stack name to the stack that you just created. STACK_NAME = \"your-stack-name-here\"\n", + "

\n", + "
" ] }, { - "cell_type": "markdown", - "id": "4506a617", + "cell_type": "code", + "execution_count": null, + "id": "e6d78aa5", + "metadata": {}, + "outputs": [], + "source": [ + "# define a stack name variable\n", + "STACK_NAME = \"aws-batch-nigms-test1\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc344828", "metadata": {}, + "outputs": [], "source": [ - "#### Change the parameters as desired in `aws` profile inside `../denovotrascript/nextflow.config` file:\n", - " - Name of your **AWS Batch Job Queue**\n", - " - AWS region \n", - " - Nextflow work directory\n", - " - Nextflow output directory" + "import boto3\n", + "# Get account ID and region \n", + "account_id = boto3.client('sts').get_caller_identity().get('Account')\n", + "region = boto3.session.Session().region_name" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c908d53", + "metadata": {}, + "outputs": [], + "source": [ + "# Set variable names \n", + "# These variables should come from the Intro AWS Batch tutorial (or leave as-is if using the launch stack button)\n", + "BUCKET_NAME = f\"{STACK_NAME}-batch-bucket-{account_id}\"\n", + "AWS_QUEUE = f\"{STACK_NAME}-JobQueue\"\n", + "INPUT_FOLDER = 'nigms-sandbox/nosi-inbremaine-storage/'\n", + "AWS_REGION = region" ] }, { "cell_type": "markdown", - "id": "abdb13bb", + "id": "596667bd", "metadata": {}, "source": [ - "### **Step 3:** Install Nextflow" + "#### Install dependencies\n", + "Installs Nextflow and Java, which are required to execute the pipeline. In environments like SageMaker, Java is usually pre-installed. But if you're running outside SageMaker (e.g., EC2 or local), you’ll need to manually install it." ] }, { @@ -213,20 +259,45 @@ "metadata": {}, "outputs": [], "source": [ - "%%capture\n", - "! mamba create -n nextflow -c bioconda nextflow -y\n", - "! mamba install -n nextflow ipykernel -y" + "# Install Nextflow\n", + "! mamba install -y -c conda-forge -c bioconda nextflow --quiet" ] }, { "cell_type": "markdown", - "id": "096b76d5", + "id": "9e08a0d5", "metadata": {}, "source": [ - "
\n", - " \n", - " Alert: Remember to change your kernel to conda_nextflow to run nextflow.\n", - "
" + "
\n", + "Install Java and Nextflow if needed in other systems\n", + "If using other system other than AWS SageMaker Notebook, you might need to install java and nextflow using the code below:\n", + "
# Install java
\n",
+    "    sudo apt update\n",
+    "    sudo apt-get install default-jdk -y\n",
+    "    java -version\n",
+    "    
\n", + " # Install Nextflow
\n",
+    "    curl https://get.nextflow.io | bash\n",
+    "    chmod +x nextflow\n",
+    "    ./nextflow self-update\n",
+    "    ./nextflow plugin update\n",
+    "    
\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c46757a3", + "metadata": {}, + "outputs": [], + "source": [ + "# replace batch bucket name in nextflow configuration file\n", + "! sed -i \"s/aws-batch-nigms-batch-bucket-/$BUCKET_NAME/g\" /home/ec2-user/SageMaker/Transcriptome-Assembly-Refinement-and-Applications/denovotranscript/nextflow.config\n", + "# replace job queue name in configuration file \n", + "! sed -i \"s/aws-batch-nigms-JobQueue/$AWS_QUEUE/g\" /home/ec2-user/SageMaker/Transcriptome-Assembly-Refinement-and-Applications/denovotranscipt/nextflow.config\n", + "# replace the region placeholder with the region you are in \n", + "! sed -i \"s/aws-region/$AWS_REGION/g\" /home/ec2-user/SageMaker/Transcriptome-Assembly-Refinement-and-Applications/denovotranscipt/nextflow.config" ] }, { @@ -234,7 +305,7 @@ "id": "de3d1b9b", "metadata": {}, "source": [ - "### **Step 4:** Run `denovotranscript`" + "### **Step 4:** Enable AWS Batch for the nextflow script `denovotranscript`" ] }, { @@ -242,6 +313,11 @@ "id": "8e1541b9-abb6-47c0-aa49-5c1720680376", "metadata": {}, "source": [ + "Run the pipeline in a cloud-native, serverless manner using AWS Batch. AWS Batch offloads the burden of provisioning and managing compute resources. When you execute this command:\n", + "- Nextflow uploads tasks to AWS Batch. \n", + "- AWS Batch pulls the necessary containers.\n", + "- Each process/task in the pipeline runs as an isolated job in the cloud.\n", + "\n", "Now we can run `denovotranscript` using the option `annotation_only` run-mode which assumes that the transcriptome has been generated, and will only run the various steps for annotation of the transcripts.\n", "\n", ">This run should take about **5 minutes**" @@ -254,8 +330,12 @@ "metadata": {}, "outputs": [], "source": [ - "! nextflow run ../denovotranscript/main.nf --input ../denovotranscript/test_samplesheet_aws.csv -profile aws \\\n", - "--run_mode annotation_only --transcript_fasta s3://nigms-sandbox/nosi-inbremaine-storage/resources/trans/Oncorhynchus_mykiss_GGBN01.1.fa" + "! nextflow run /home/ec2-user/SageMaker/Transcriptome-Assembly-Refinement-and-Applications/denovotranscript/main.nf \\\n", + " --input /home/ec2-user/SageMaker/Transcriptome-Assembly-Refinement-and-Applications/denovotranscript/test_samplesheet_aws.csv \\\n", + " -profile docker,awsbatch \\\n", + " -c /home/ec2-user/SageMaker/Transcriptome-Assembly-Refinement-and-Applications/denovotranscript/nextflow.config \\\n", + " --run_mode annotation_only \\\n", + " --transcript_fasta s3://nigms-sandbox/nosi-inbremaine-storage/resources/trans/Oncorhynchus_mykiss_GGBN01.1.fa --awsqueue $AWS_QUEUE --awsregion $AWS_REGION" ] }, { @@ -263,7 +343,16 @@ "id": "8a0f8dfb-366d-4e0f-af4e-d96f6ee97d34", "metadata": {}, "source": [ - "The output will be arranged in a directory structure in your Amazon S3 bucket. We will download it into our local directory:" + "The output will be arranged in a directory structure in your Amazon S3 bucket. We will download it into our local directory:\n", + "
\n", + "
\n", + " Important \n", + "
\n", + "

\n", + "\n", + " Update \\ to your local annotation only folder.
\n", + "

\n", + "
\n" ] }, { @@ -274,7 +363,7 @@ "outputs": [], "source": [ "! mkdir -p \n", - "! aws s3 cp --recursive s3://// ./" + "! aws s3 cp --recursive s3://$BUCKET_NAME/nextflow_output/ ./" ] }, { @@ -287,15 +376,6 @@ "! ls -l ./" ] }, - { - "cell_type": "markdown", - "id": "1b3ac17d", - "metadata": {}, - "source": [ - "----\n", - "# Andrea, please update this part" - ] - }, { "cell_type": "markdown", "id": "337b1049", @@ -314,14 +394,6 @@ "! cat ./onlyAnnRun/output/RUN_INFO.txt" ] }, - { - "cell_type": "markdown", - "id": "df312985", - "metadata": {}, - "source": [ - "---" - ] - }, { "cell_type": "markdown", "id": "4187a790-276c-4bf2-8ce8-2f7985e8c662", @@ -597,7 +669,38 @@ "source": [ "## Conclusion\n", "\n", - "This notebook provided a comprehensive hands-on experience in transcriptome annotation using the `denovoscript` pipeline in annotation-only mode, leveraging AWS Batch for serverless execution and Docker containers for BUSCO analysis. Through a guided workflow, users learned to set up AWS Batch, execute `denovoscript` to annotate a rainbow trout transcriptome, assess transcriptome completeness with BUSCO, and critically interpret the results from BUSCO, GO, and TransDecoder analyses. Furthermore, the notebook emphasized the importance of understanding data provenance and culminated in an independent BUSCO analysis exercise, challenging users to apply their newfound skills to different transcriptomes and critically evaluate the outcomes, thus solidifying their understanding of transcriptome assembly and annotation principles." + "This notebook provided a comprehensive hands-on experience in transcriptome annotation using the `denovoscript` pipeline in annotation-only mode, leveraging AWS Batch for serverless execution and Docker containers for BUSCO analysis. Through a guided workflow, users learned to set up AWS Batch, execute `denovoscript` to annotate a rainbow trout transcriptome, assess transcriptome completeness with BUSCO, and critically interpret the results from BUSCO, GO, and TransDecoder analyses. Furthermore, the notebook emphasized the importance of understanding data provenance and culminated in an independent BUSCO analysis exercise, challenging users to apply their newfound skills to different transcriptomes and critically evaluate the outcomes, thus solidifying their understanding of transcriptome assembly and annotation principles.\n", + "\n", + "\n", + "### Why Use AWS Batch?\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
BenefitExplanation
ScalabilityProcess large MeRIP-seq datasets with multiple jobs in parallel
ReproducibilityEnsures the exact same Docker containers and config are used every time
Ease of ManagementNo need to manually manage EC2 instances or storage mounts
Integration with S3Input/output seamlessly handled via S3 buckets
\n", + "\n", + "Running on AWS Batch is ideal when your dataset grows beyond what your local notebook or server can handleor when you want reproducible, cloud-native workflows that are easier to scale, share, and manage." ] }, { @@ -605,7 +708,31 @@ "id": "5bc80021", "metadata": {}, "source": [ - "## Clean Up\n", + "## Clean Up the AWS Environment\n", + "\n", + "Once you've successfully run your analysis and downloaded the results, it's a good idea to clean up unused resources to avoid unnecessary charges.\n", + "\n", + "#### Recommended Cleanup Steps:\n", + "\n", + "- **Delete Output Files from S3 (Optional)** \n", + " If you've downloaded your results locally and no longer need them stored in the cloud.\n", + "- **Delete the S3 Bucket (Optional)** \n", + " To remove the entire bucket (only do this if you're sure!)\n", + "- **Shut Down AWS Batch Resources (Optional but Recommended):** \n", + " If you used a CloudFormation stack to set up AWS Batch, you can delete all associated resources in one step (⚠️ Note: Deleting the stack will also remove IAM roles and compute environments created by the template.):\n", + " + Go to the AWS CloudFormation Console\n", + " + Select your stack (e.g., aws-batch-nigms-test1)\n", + " + Click Delete\n", + " + Wait for all resources (compute environments, roles, queues) to be removed\n", + " \n", + "
\n", + "
\n", + " Tips\n", + "
\n", + "

\n", + "It’s always good practice to periodically review your EC2 instances, ECR containers, S3 storage, and CloudWatch logs to ensure no stray resources are incurring charges.\n", + "

\n", + "
\n", "\n", "Remember to proceed to the next notebook [`Submodule_04_gls_assembly.ipynb`](Submodule_04_gls_assembly.ipynb) or shut down your instance if you are finished." ] diff --git a/AWS/Submodule_3_basic_assembly.ipynb b/AWS/Submodule_3_basic_assembly.ipynb index e228149..6605cdc 100644 --- a/AWS/Submodule_3_basic_assembly.ipynb +++ b/AWS/Submodule_3_basic_assembly.ipynb @@ -128,41 +128,63 @@ "! aws s3 ls s3://nigms-sandbox/nosi-inbremaine-storage/resources/seq2/" ] }, + { + "cell_type": "markdown", + "id": "5e8dd0c5", + "metadata": {}, + "source": [ + "***If you have not set up AWS Batch please proceed to Step 2, otherwise proceed to Step 3.***" + ] + }, { "cell_type": "markdown", "id": "7a87b0d2", "metadata": {}, "source": [ - "### **Step 2:** AWS Batch Setup\n", + "### **Step 2:** Setting up AWS Batch \n", "\n", - "AWS Batch will create the needed permissions, roles and resources to run Nextflow in a serverless manner. You can set up AWS Batch manually or deploy it **automatically** with a stack template. The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n", + "AWS Batch manages the provisioning of compute environments (EC2, Fargate), container orchestration, job queues, IAM roles, and permissions. We can deploy a full environment either:\n", + "- Automatically using a preconfigured AWS CloudFormation stack (**recommended**)\n", + "- Manually by setting up roles, queues, and buckets\n", + "The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n", "\n", - "If you prefer to skip manual deployment and deploy automatically in the cloud, click the Launch Stack button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n", + "If you prefer to skip manual deployment and deploy automatically in the cloud, click the **Launch Stack** button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n", "\n", - "[![Launch Stack](../images/LaunchStack.jpg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml)\n", - "\n", - "\n", - "Before beginning this tutorial, if you do not have required roles, policies, permissions or compute environment and would like to **manually** set those up please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/AWS-Batch-Setup.md) to set that up." + "[![Launch Stack](../images/LaunchStack.jpg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml )" ] }, { "cell_type": "markdown", - "id": "413ac931", + "id": "1f55c633", "metadata": {}, "source": [ - "#### Change the parameters as desired in `aws` profile inside `../denovotrascript/nextflow.config` file:\n", - " - Name of your **AWS Batch Job Queue**\n", - " - AWS region \n", - " - Nextflow work directory\n", - " - Nextflow output directory" + "### **Step 3:** Install dependencies, update paths and create a new S3 Bucket to store input and output files\n", + "\n", + "After setting up an AWS CloudFormation stack, we need to let the nextflow workflow to know where are those resrouces by providing the configuration:\n", + "
\n", + " \n", + "

\n", + "After successfull creation of your stack you must attatch a new role to SageMaker to be able to submit batch jobs. Please following the the following steps to change your SageMaker role:
\n", + "

  1. Navigate to your SageMaker AI notebook dashboard (where you initially created and launched your VM)
  2. Locate your instance and click the Stop button
  3. Once the instance is stopped:
    • Click Edit
    • Scroll to the \"Permissions and encryption\" section
    • Click the IAM role dropdown
    • Select the new role created during stack formation (named something like aws-batch-nigms-SageMakerExecutionRole)
  4. \n", + "
  5. Click Update notebook instance to save your changes
  6. \n", + "
  7. After the update completes:
    • Click Start to relaunch your instance
    • Reconnect to your instance
    • Resume your work from this point
\n", + "\n", + "Warning: Make sure to replace the stack name to the stack that you just created. STACK_NAME = \"your-stack-name-here\"\n", + "

\n", + "
" ] }, { - "cell_type": "markdown", - "id": "1f55c633", + "cell_type": "code", + "execution_count": null, + "id": "0da9939e", "metadata": {}, + "outputs": [], "source": [ - "### **Step 3:** Install Nextflow" + "# define a stack name variable\n", + "STACK_NAME = \"aws-batch-nigms-test1\"" ] }, { @@ -172,55 +194,115 @@ "metadata": {}, "outputs": [], "source": [ - "%%capture\n", - "! mamba create -n nextflow -c bioconda nextflow -y\n", - "! mamba install -n nextflow ipykernel -y" + "import boto3\n", + "# Get account ID and region \n", + "account_id = boto3.client('sts').get_caller_identity().get('Account')\n", + "region = boto3.session.Session().region_name" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b52c37c5", + "metadata": {}, + "outputs": [], + "source": [ + "# Set variable names \n", + "# These variables should come from the Intro AWS Batch tutorial (or leave as-is if using the launch stack button)\n", + "BUCKET_NAME = f\"{STACK_NAME}-batch-bucket-{account_id}\"\n", + "AWS_QUEUE = f\"{STACK_NAME}-JobQueue\"\n", + "INPUT_FOLDER = 'nigms-sandbox/nosi-inbremaine-storage/'\n", + "AWS_REGION = region" ] }, { "cell_type": "markdown", - "id": "bcb1fe5e", + "id": "8fce8e92", "metadata": {}, "source": [ - "
\n", - " \n", - " Alert: Remember to change your kernel to conda_nextflow to run nextflow.\n", - "
" + "#### Install dependencies\n", + "Installs Nextflow and Java, which are required to execute the pipeline. In environments like SageMaker, Java is usually pre-installed. But if you're running outside SageMaker (e.g., EC2 or local), you’ll need to manually install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b7625b33", + "metadata": {}, + "outputs": [], + "source": [ + "# Install Nextflow\n", + "! mamba install -y -c conda-forge -c bioconda nextflow --quiet" ] }, { "cell_type": "markdown", - "id": "72d1a3b8", + "id": "80b91ef0", "metadata": {}, "source": [ - "### **Step 4:** Run `denovotranscript`" + "
\n", + "Install Java and Nextflow if needed in other systems\n", + "If using other system other than AWS SageMaker Notebook, you might need to install java and nextflow using the code below:\n", + "
# Install java
\n",
+    "    sudo apt update\n",
+    "    sudo apt-get install default-jdk -y\n",
+    "    java -version\n",
+    "    
\n", + " # Install Nextflow
\n",
+    "    curl https://get.nextflow.io | bash\n",
+    "    chmod +x nextflow\n",
+    "    ./nextflow self-update\n",
+    "    ./nextflow plugin update\n",
+    "    
\n", + "
" ] }, { "cell_type": "code", "execution_count": null, - "id": "ee5985e3-93df-4779-afe1-4464e13bf619", + "id": "61f28ac4", "metadata": {}, "outputs": [], "source": [ - "! nextflow run main.nf --input test_samplesheet.csv -profile aws --run_mode full" + "# replace batch bucket name in nextflow configuration file\n", + "! sed -i \"s/aws-batch-nigms-batch-bucket-/$BUCKET_NAME/g\" ../denovotranscipt/nextflow.config\n", + "# replace job queue name in configuration file \n", + "! sed -i \"s/aws-batch-nigms-JobQueue/$AWS_QUEUE/g\" ../denovotranscipt/nextflow.config\n", + "# replace the region placeholder with the region you are in \n", + "! sed -i \"s/aws-region/$AWS_REGION/g\" ../denovotranscipt/nextflow.config" ] }, { "cell_type": "markdown", - "id": "0117d994-0502-4a58-b07a-861d254f11e2", + "id": "72d1a3b8", "metadata": {}, "source": [ - "The beauty and power of using a defined workflow in a management system (such as Nextflow) are that we not only get a defined set of steps that are carried out in the proper order, but we also get a well-structured and concise directory structure that holds all pertinent output." + "### **Step 4:** Enable AWS Batch for the nextflow script `denovotranscript`" ] }, { "cell_type": "markdown", - "id": "5ad70acb", + "id": "92fb30de", "metadata": {}, "source": [ - "---\n", - "# Andrea, please update the rest for result" + "Run the pipeline in a cloud-native, serverless manner using AWS Batch. AWS Batch offloads the burden of provisioning and managing compute resources. When you execute this command:\n", + "- Nextflow uploads tasks to AWS Batch. \n", + "- AWS Batch pulls the necessary containers.\n", + "- Each process/task in the pipeline runs as an isolated job in the cloud.\n", + "\n", + "The beauty and power of using a defined workflow in a management system (such as Nextflow) are that we not only get a defined set of steps that are carried out in the proper order, but we also get a well-structured and concise directory structure that holds all pertinent output.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ee5985e3-93df-4779-afe1-4464e13bf619", + "metadata": {}, + "outputs": [], + "source": [ + "! nextflow run /home/ec2-user/SageMaker/Transcriptome-Assembly-Refinement-and-Applications/denovotranscript/main.nf \\\n", + " --input /home/ec2-user/SageMaker/Transcriptome-Assembly-Refinement-and-Applications/denovotranscript/test_samplesheet_aws.csv \\\n", + " -profile awsbatch --run_mode full --awsqueue $AWS_QUEUE --awsregion $AWS_REGION " ] }, { @@ -238,7 +320,7 @@ "metadata": {}, "outputs": [], "source": [ - "! aws s3 ls s3:////" + "! aws s3 ls s3://$BUCKET_NAME/nextflow_output/" ] }, { @@ -386,7 +468,35 @@ "id": "909f6112", "metadata": {}, "source": [ - "## Conclusion" + "## Conclusion: Why Use AWS Batch?\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
BenefitExplanation
ScalabilityProcess large MeRIP-seq datasets with multiple jobs in parallel
ReproducibilityEnsures the exact same Docker containers and config are used every time
Ease of ManagementNo need to manually manage EC2 instances or storage mounts
Integration with S3Input/output seamlessly handled via S3 buckets
\n", + "\n", + "Running on AWS Batch is ideal when your dataset grows beyond what your local notebook or server can handleor when you want reproducible, cloud-native workflows that are easier to scale, share, and manage." ] }, { @@ -394,9 +504,31 @@ "id": "b68484f3", "metadata": {}, "source": [ - "## Clean Up\n", - "\n", - "Shut down your instance if you are finished." + "## Clean Up the AWS Environment\n", + "\n", + "Once you've successfully run your analysis and downloaded the results, it's a good idea to clean up unused resources to avoid unnecessary charges.\n", + "\n", + "#### Recommended Cleanup Steps:\n", + "\n", + "- **Delete Output Files from S3 (Optional)** \n", + " If you've downloaded your results locally and no longer need them stored in the cloud.\n", + "- **Delete the S3 Bucket (Optional)** \n", + " To remove the entire bucket (only do this if you're sure!)\n", + "- **Shut Down AWS Batch Resources (Optional but Recommended):** \n", + " If you used a CloudFormation stack to set up AWS Batch, you can delete all associated resources in one step (⚠️ Note: Deleting the stack will also remove IAM roles and compute environments created by the template.):\n", + " + Go to the AWS CloudFormation Console\n", + " + Select your stack (e.g., aws-batch-nigms-test1)\n", + " + Click Delete\n", + " + Wait for all resources (compute environments, roles, queues) to be removed\n", + " \n", + "
\n", + "
\n", + " Tips\n", + "
\n", + "

\n", + "It’s always good practice to periodically review your EC2 instances, ECR containers, S3 storage, and CloudWatch logs to ensure no stray resources are incurring charges.\n", + "

\n", + "
" ] } ], diff --git a/denovotranscript/nextflow.config b/denovotranscript/nextflow.config index 3f7f208..939ed74 100644 --- a/denovotranscript/nextflow.config +++ b/denovotranscript/nextflow.config @@ -24,6 +24,14 @@ params { qc_only = false skip_assembly = false + // AWS parameters + awsqueue = 'aws-batch-nigms-JobQueue' + awsregion = 'aws-region' + awsworkdir = 's3://aws-batch-nigms-batch-bucket-/nextflow_env/' + outdir = 's3://aws-batch-nigms-batch-bucket-/nextflow_output/' + awscli_path = '/home/ec2-user/miniconda/bin/aws' + aws_execrole = 'ExecutionRole' + // Trimming options adapter_fasta = null save_trimmed_fail = false @@ -74,7 +82,7 @@ params { igenomes_ignore = false // Boilerplate options - outdir = null + //outdir = null publish_dir_mode = 'copy' email = null email_on_fail = null @@ -104,18 +112,19 @@ includeConfig 'conf/base.config' profiles { - aws { + awsbatch { process { executor = 'awsbatch' - queue = 'aws-batch-nigms-JobQueue' // Name of your Job queue + queue = params.awsqueue // Name of your Job queue + container = 'quay.io/nf-core/ubuntu:22.04' } - fusion.enabled = true - wave.enabled = true - aws.region = 'us-east-1' // YOUR AWS REGION - - workDir = 's3:////' // Path of your working directory - params.outdir = 's3:////' // Path of your output directory - + workDir = params.awsworkdir // Path of your working directory + outdir = params.outdir // Path of your output directory + fusion.enabled = false + wave.enabled = false + // Give path to where aws is installed + aws.batch.cliPath = params.awscli_path + aws.region = params.awsregion // YOUR AWS REGION } gbatch { @@ -126,7 +135,7 @@ profiles { process.machineType = 'n2-highmem-48' workDir = 'gs:////' // Path of your working directory - params.outdir = 'gs:////' // Path of your output directory + outdir = 'gs:////' // Path of your output directory } debug { diff --git a/denovotranscript/workflows/denovotranscript.nf b/denovotranscript/workflows/denovotranscript.nf index 97b8eae..75af48f 100644 --- a/denovotranscript/workflows/denovotranscript.nf +++ b/denovotranscript/workflows/denovotranscript.nf @@ -75,324 +75,330 @@ include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfco workflow DENOVOTRANSCRIPT { take: - ch_samplesheet // channel: samplesheet read in from --input - main: +ch_samplesheet // channel: samplesheet read in from --input +main: - ch_versions = Channel.empty() - ch_multiqc_files = Channel.empty() - - if (params.run_mode == 'full' || params.run_mode == 'assembly_only') { +// Initialize channels +ch_filtered_reads = Channel.empty() +ch_transcripts = Channel.empty() +ch_versions = Channel.empty() +ch_multiqc_files = Channel.empty() + +if (params.run_mode == 'full' || params.run_mode == 'assembly_only' || (params.run_mode == 'annotation_only' && params.input)) { +// +// MODULE: FASTQ_TRIM_FASTP_FASTQC +// + +FASTQ_TRIM_FASTP_FASTQC ( + ch_samplesheet, + params.adapter_fasta ?: [], + params.save_trimmed_fail, + params.save_merged, + params.skip_fastp, + params.skip_fastqc +) +ch_multiqc_files = ch_multiqc_files.mix(FASTQ_TRIM_FASTP_FASTQC.out.fastqc_raw_zip.collect{it[1]}) +ch_multiqc_files = ch_multiqc_files.mix(FASTQ_TRIM_FASTP_FASTQC.out.fastqc_trim_zip.collect{it[1]}) +ch_versions = ch_versions.mix(FASTQ_TRIM_FASTP_FASTQC.out.versions) +ch_filtered_reads = FASTQ_TRIM_FASTP_FASTQC.out.reads + +// Handle rRNA removal if needed +if (params.remove_ribo_rna) { + ch_sortmerna_fastas = Channel.from(ch_ribo_db.readLines()).map { row -> file(row, checkIfExists: true) }.collect() // - // MODULE: FASTQ_TRIM_FASTP_FASTQC + // MODULE: SORTMERNA // + SORTMERNA ( + ch_filtered_reads, + ch_sortmerna_fastas + ) + ch_filtered_reads = SORTMERNA.out.reads + ch_multiqc_files = ch_multiqc_files.mix(SORTMERNA.out.log.collect{it[1]}.ifEmpty([])) + ch_versions = ch_versions.mix(SORTMERNA.out.versions) - FASTQ_TRIM_FASTP_FASTQC ( - ch_samplesheet, - params.adapter_fasta ?: [], - params.save_trimmed_fail, - params.save_merged, - params.skip_fastp, - params.skip_fastqc + // + // MODULE: FASTQC + // + FASTQC_FINAL ( + SORTMERNA.out.reads ) - ch_multiqc_files = ch_multiqc_files.mix(FASTQ_TRIM_FASTP_FASTQC.out.fastqc_raw_zip.collect{it[1]}) - ch_multiqc_files = ch_multiqc_files.mix(FASTQ_TRIM_FASTP_FASTQC.out.fastqc_trim_zip.collect{it[1]}) - ch_versions = ch_versions.mix(FASTQ_TRIM_FASTP_FASTQC.out.versions) - ch_filtered_reads = FASTQ_TRIM_FASTP_FASTQC.out.reads + ch_multiqc_files = ch_multiqc_files.mix(FASTQC_FINAL.out.zip.collect{it[1]}) + ch_versions = ch_versions.mix(FASTQC_FINAL.out.versions) +} +} - if (params.remove_ribo_rna) { - ch_sortmerna_fastas = Channel.from(ch_ribo_db.readLines()).map { row -> file(row, checkIfExists: true) }.collect() - // - // MODULE: SORTMERNA - // - SORTMERNA ( - ch_filtered_reads, - ch_sortmerna_fastas - ) - ch_filtered_reads = SORTMERNA.out.reads - ch_multiqc_files = ch_multiqc_files.mix(SORTMERNA.out.log.collect{it[1]}.ifEmpty([])) - ch_versions = ch_versions.mix(SORTMERNA.out.versions) +// Assembly section (only for full and assembly_only) +if (!params.qc_only) { + + if (!params.skip_assembly || params.run_mode == 'full' || params.run_mode == 'assembly_only') { + + // All methods use pooled reads + ch_pool = ch_filtered_reads.collect { meta, fastq -> fastq }.map { [[id:'pooled_reads', single_end:false], it] } // - // MODULE: FASTQC + // MODULE: CAT_FASTQ // - FASTQC_FINAL ( - SORTMERNA.out.reads + CAT_FASTQ ( + ch_pool ) - ch_multiqc_files = ch_multiqc_files.mix(FASTQC_FINAL.out.zip.collect{it[1]}) - ch_versions = ch_versions.mix(FASTQC_FINAL.out.versions) - } - } - - if (!params.qc_only) { + ch_versions = ch_versions.mix(CAT_FASTQ.out.versions) - if (!params.skip_assembly) { + ch_assemblies = Channel.empty() - // All methods use pooled reads - ch_pool = ch_filtered_reads.collect { meta, fastq -> fastq }.map { [[id:'pooled_reads', single_end:false], it] } + def assemblers = params.assemblers.tokenize(',') + if (assemblers.contains('trinity')) { // - // MODULE: CAT_FASTQ + // MODULE: TRINITY // - CAT_FASTQ ( - ch_pool + TRINITY ( + CAT_FASTQ.out.reads ) - ch_versions = ch_versions.mix(CAT_FASTQ.out.versions) - - ch_assemblies = Channel.empty() - - def assemblers = params.assemblers.tokenize(',') - - if (assemblers.contains('trinity')) { - // - // MODULE: TRINITY - // - TRINITY ( - CAT_FASTQ.out.reads - ) - ch_versions = ch_versions.mix(TRINITY.out.versions) - ch_assemblies = ch_assemblies.mix(TRINITY.out.transcript_fasta) - } - - if (assemblers.contains('trinity_no_norm')) { - // - // MODULE: TRINITY_NO_NORM - // - TRINITY_NO_NORM ( - CAT_FASTQ.out.reads - ) - ch_versions = ch_versions.mix(TRINITY_NO_NORM.out.versions) - ch_assemblies = ch_assemblies.mix(TRINITY_NO_NORM.out.transcript_fasta) - } - - if (assemblers.contains('rnaspades')) { - CAT_FASTQ.out.reads.map { meta, illumina -> - [ meta, illumina, [], [] ] }.set { ch_spades } - - // - // MODULE: SPADES - // - SPADES ( - ch_spades, - [], - [] - ) - ch_versions = ch_versions.mix(SPADES.out.versions) - ch_assemblies = ch_assemblies.mix(SPADES.out.transcripts) - - if (params.soft_filtered_transcripts) { - ch_assemblies = ch_assemblies.mix(SPADES.out.soft_filtered_transcripts) - } - - if (params.hard_filtered_transcripts) { - ch_assemblies = ch_assemblies.mix(SPADES.out.hard_filtered_transcripts) - } - } - - ch_assemblies = ch_assemblies - .collect { meta, fasta -> fasta } - .map { [[id:'all_assembled', single_end:false], it ] } - - // - // MODULE: CAT_CAT - // - CAT_CAT ( - ch_assemblies - ) - ch_versions = ch_versions.mix(CAT_CAT.out.versions) + ch_versions = ch_versions.mix(TRINITY.out.versions) + ch_assemblies = ch_assemblies.mix(TRINITY.out.transcript_fasta) + } + if (assemblers.contains('trinity_no_norm')) { // - // MODULE: EVIGENE_TR2AACDS + // MODULE: TRINITY_NO_NORM // - EVIGENE_TR2AACDS ( - CAT_CAT.out.file_out + TRINITY_NO_NORM ( + CAT_FASTQ.out.reads ) - ch_versions = ch_versions.mix(EVIGENE_TR2AACDS.out.versions) + ch_versions = ch_versions.mix(TRINITY_NO_NORM.out.versions) + ch_assemblies = ch_assemblies.mix(TRINITY_NO_NORM.out.transcript_fasta) + } - ch_transcripts = EVIGENE_TR2AACDS.out.okayset.map { meta, dir -> - def mrna_file = dir.listFiles().find { it.name.endsWith('.okay.mrna') } - if (!mrna_file) throw new Exception("No .okay.mrna file found in ${dir}") - return [ meta, mrna_file ] - } + if (assemblers.contains('rnaspades')) { + CAT_FASTQ.out.reads.map { meta, illumina -> + [ meta, illumina, [], [] ] }.set { ch_spades } - ch_pubids = EVIGENE_TR2AACDS.out.okayset.map { meta, dir -> - def pubids_file = dir.listFiles().find { it.name.endsWith('.pubids') } - if (!pubids_file) throw new Exception("No .pubids file found in ${dir}") - return [ meta, pubids_file ] - } // - // MODULE: TX2GENE + // MODULE: SPADES // - TX2GENE ( - ch_pubids, + SPADES ( + ch_spades, + [], [] ) - ch_versions = ch_versions.mix(TX2GENE.out.versions) - - // MODULE: BUSCO - // - BUSCO_BUSCO ( - ch_transcripts, - params.busco_mode, - params.busco_lineage, - params.busco_lineages_path ?: [], - params.busco_config ?: [], - ) - ch_multiqc_files = ch_multiqc_files.mix(BUSCO_BUSCO.out.short_summaries_txt.collect{it[1]}) - ch_versions = ch_versions.mix(BUSCO_BUSCO.out.versions) + ch_versions = ch_versions.mix(SPADES.out.versions) + ch_assemblies = ch_assemblies.mix(SPADES.out.transcripts) - // - // MODULE: RNAQUAST - // - RNAQUAST ( - ch_transcripts - ) - ch_versions = ch_versions.mix(RNAQUAST.out.versions) - - // only run if profile is not conda or mamba - if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() == 0) { - // - // MODULE: TRANSRATE - // - TRANSRATE ( - ch_transcripts, - CAT_FASTQ.out.reads, - params.transrate_reference ?: [] - ) - ch_versions = ch_versions.mix(TRANSRATE.out.versions) - } - } - - - if (params.run_mode == 'full' || params.run_mode == 'annotation_only') { - if (params.run_mode == 'annotation_only') { - if (!params.transcript_fasta) { - error "For annotation_only mode, you must provide --transcript_fasta" + if (params.soft_filtered_transcripts) { + ch_assemblies = ch_assemblies.mix(SPADES.out.soft_filtered_transcripts) + } + + if (params.hard_filtered_transcripts) { + ch_assemblies = ch_assemblies.mix(SPADES.out.hard_filtered_transcripts) } - ch_transcripts = Channel.fromPath(params.transcript_fasta) - .map { file -> [ [id: file.simpleName], file ] } } + ch_assemblies = ch_assemblies + .collect { meta, fasta -> fasta } + .map { [[id:'all_assembled', single_end:false], it ] } + // - // MODULE: TRANSDECODER_LONGORF + // MODULE: CAT_CAT // - TRANSDECODER_LONGORF ( - ch_transcripts + CAT_CAT ( + ch_assemblies ) - ch_versions = ch_versions.mix(TRANSDECODER_LONGORF.out.versions) - + ch_versions = ch_versions.mix(CAT_CAT.out.versions) + // - // MODULE: TRANSDECODER_PREDICT + // MODULE: EVIGENE_TR2AACDS // - TRANSDECODER_PREDICT ( - ch_transcripts, - TRANSDECODER_LONGORF.out.folder + EVIGENE_TR2AACDS ( + CAT_CAT.out.file_out ) - ch_versions = ch_versions.mix(TRANSDECODER_PREDICT.out.versions) - // Rename the TransDecoder output for TRANSRATE - ch_transrate_input = TRANSDECODER_PREDICT.out.cds.map { meta, cds -> - def new_meta = meta + [id: "${meta.id}.transdecoder"] - [new_meta, cds] - } + ch_versions = ch_versions.mix(EVIGENE_TR2AACDS.out.versions) - // - // MODULE: TRINOTATE - // - if (params.trinotate_db) { - TRINOTATE ( - ch_transcripts.join(TRANSDECODER.out.pep), - params.trinotate_db - ) - ch_versions = ch_versions.mix(TRINOTATE.out.versions) - } - } - - if (params.run_mode == 'full' || params.run_mode == 'assembly_only') { - // Continue with Quantification steps // - if (params.skip_assembly) { - ch_transcripts_fa = params.transcript_fasta - } else { - ch_transcripts_fa = ch_transcripts.collect { meta, fasta -> fasta } + ch_transcripts = EVIGENE_TR2AACDS.out.okayset.map { meta, dir -> + def mrna_file = dir.listFiles().find { it.name.endsWith('.okay.mrna') } + if (!mrna_file) throw new Exception("No .okay.mrna file found in ${dir}") + return [ meta, mrna_file ] } + ch_pubids = EVIGENE_TR2AACDS.out.okayset.map { meta, dir -> + def pubids_file = dir.listFiles().find { it.name.endsWith('.pubids') } + if (!pubids_file) throw new Exception("No .pubids file found in ${dir}") + return [ meta, pubids_file ] + } + // + // MODULE: TX2GENE // - // MODULE: SALMON_INDEX + TX2GENE ( + ch_pubids, + [] + ) + ch_versions = ch_versions.mix(TX2GENE.out.versions) + + // MODULE: BUSCO // - SALMON_INDEX ( - ch_transcripts_fa + BUSCO_BUSCO ( + ch_transcripts, + params.busco_mode, + params.busco_lineage, + params.busco_lineages_path ?: [], + params.busco_config ?: [], ) - ch_versions = ch_versions.mix(SALMON_INDEX.out.versions) + ch_multiqc_files = ch_multiqc_files.mix(BUSCO_BUSCO.out.short_summaries_txt.collect{it[1]}) + ch_versions = ch_versions.mix(BUSCO_BUSCO.out.versions) // - // MODULE: SALMON_QUANT + // MODULE: RNAQUAST // - SALMON_QUANT ( - ch_filtered_reads, - SALMON_INDEX.out.index, - ch_transcripts_fa, - params.lib_type + RNAQUAST ( + ch_transcripts ) - ch_multiqc_files = ch_multiqc_files.mix(SALMON_QUANT.out.results.collect{it[1]}) - ch_versions = ch_versions.mix(SALMON_QUANT.out.versions) - + ch_versions = ch_versions.mix(RNAQUAST.out.versions) + // only run if profile is not conda or mamba + if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() == 0) { + // + // MODULE: TRANSRATE + // + TRANSRATE ( + ch_transcripts, + CAT_FASTQ.out.reads, + params.transrate_reference ?: [] + ) + ch_versions = ch_versions.mix(TRANSRATE.out.versions) + } + } + + +if (params.run_mode == 'full' || params.run_mode == 'annotation_only') { + if (params.run_mode == 'annotation_only') { + if (!params.transcript_fasta) { + error "For annotation_only mode, you must provide --transcript_fasta" } - + ch_transcripts = Channel.fromPath(params.transcript_fasta) + .map { file -> [ [id: file.simpleName], file ] } } // - // Collate and save software versions + // MODULE: TRANSDECODER_LONGORF // - softwareVersionsToYAML(ch_versions) - .collectFile( - storeDir: "${params.outdir}/pipeline_info", - name: 'nf_core_' + 'denovotranscript_software_' + 'mqc_' + 'versions.yml', - sort: true, - newLine: true - ).set { ch_collated_versions } - + TRANSDECODER_LONGORF ( + ch_transcripts + ) + ch_versions = ch_versions.mix(TRANSDECODER_LONGORF.out.versions) + + // + // MODULE: TRANSDECODER_PREDICT + // + TRANSDECODER_PREDICT ( + ch_transcripts, + TRANSDECODER_LONGORF.out.folder + ) + ch_versions = ch_versions.mix(TRANSDECODER_PREDICT.out.versions) + // Rename the TransDecoder output for TRANSRATE + ch_transrate_input = TRANSDECODER_PREDICT.out.cds.map { meta, cds -> + def new_meta = meta + [id: "${meta.id}.transdecoder"] + [new_meta, cds] + } // - // MODULE: MultiQC + // MODULE: TRINOTATE // - ch_multiqc_config = Channel.fromPath( - "$projectDir/assets/multiqc_config.yml", checkIfExists: true) - ch_multiqc_custom_config = params.multiqc_config ? - Channel.fromPath(params.multiqc_config, checkIfExists: true) : - Channel.empty() - ch_multiqc_logo = params.multiqc_logo ? - Channel.fromPath(params.multiqc_logo, checkIfExists: true) : - Channel.empty() - - summary_params = paramsSummaryMap( - workflow, parameters_schema: "nextflow_schema.json") - ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params)) - ch_multiqc_files = ch_multiqc_files.mix( - ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) - ch_multiqc_custom_methods_description = params.multiqc_methods_description ? - file(params.multiqc_methods_description, checkIfExists: true) : - file("$projectDir/assets/methods_description_template.yml", checkIfExists: true) - ch_methods_description = Channel.value( - methodsDescriptionText(ch_multiqc_custom_methods_description)) - - ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) - ch_multiqc_files = ch_multiqc_files.mix( - ch_methods_description.collectFile( - name: 'methods_description_mqc.yaml', - sort: true + if (params.trinotate_db) { + TRINOTATE ( + ch_transcripts.join(TRANSDECODER.out.pep), + params.trinotate_db ) + ch_versions = ch_versions.mix(TRINOTATE.out.versions) + } +} + +// Quantification section (only if we have both transcripts and reads) +if ((params.run_mode == 'full' || params.run_mode == 'assembly_only') || (params.run_mode == 'annotation_only' && params.input)) { + // Continue with Quantification steps // + if (params.skip_assembly) { + ch_transcripts_fa = params.transcript_fasta + } else { + ch_transcripts_fa = ch_transcripts.collect { meta, fasta -> fasta } + } + + // + // MODULE: SALMON_INDEX + // + SALMON_INDEX ( + ch_transcripts_fa ) + ch_versions = ch_versions.mix(SALMON_INDEX.out.versions) - MULTIQC ( - ch_multiqc_files.collect(), - ch_multiqc_config.toList(), - ch_multiqc_custom_config.toList(), - ch_multiqc_logo.toList(), - [], - [] + // + // MODULE: SALMON_QUANT + // + SALMON_QUANT ( + ch_filtered_reads, + SALMON_INDEX.out.index, + ch_transcripts_fa, + params.lib_type ) + ch_multiqc_files = ch_multiqc_files.mix(SALMON_QUANT.out.results.collect{it[1]}) + ch_versions = ch_versions.mix(SALMON_QUANT.out.versions) - emit:multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html - versions = ch_versions // channel: [ path(versions.yml) ] + + } + +} + +// +// Collate and save software versions +// +softwareVersionsToYAML(ch_versions) + .collectFile( + storeDir: "${params.outdir}/pipeline_info", + name: 'nf_core_' + 'denovotranscript_software_' + 'mqc_' + 'versions.yml', + sort: true, + newLine: true + ).set { ch_collated_versions } + + +// +// MODULE: MultiQC +// +ch_multiqc_config = Channel.fromPath( + "$projectDir/assets/multiqc_config.yml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? + Channel.fromPath(params.multiqc_config, checkIfExists: true) : + Channel.empty() +ch_multiqc_logo = params.multiqc_logo ? + Channel.fromPath(params.multiqc_logo, checkIfExists: true) : + Channel.empty() + +summary_params = paramsSummaryMap( + workflow, parameters_schema: "nextflow_schema.json") +ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params)) +ch_multiqc_files = ch_multiqc_files.mix( + ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) +ch_multiqc_custom_methods_description = params.multiqc_methods_description ? + file(params.multiqc_methods_description, checkIfExists: true) : + file("$projectDir/assets/methods_description_template.yml", checkIfExists: true) +ch_methods_description = Channel.value( + methodsDescriptionText(ch_multiqc_custom_methods_description)) + +ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) +ch_multiqc_files = ch_multiqc_files.mix( + ch_methods_description.collectFile( + name: 'methods_description_mqc.yaml', + sort: true + ) +) + +MULTIQC ( + ch_multiqc_files.collect(), + ch_multiqc_config.toList(), + ch_multiqc_custom_config.toList(), + ch_multiqc_logo.toList(), + [], + [] +) + +emit:multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html +versions = ch_versions // channel: [ path(versions.yml) ] }