Skip to content

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented Dec 20, 2025

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the bigbio/quantms branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Summary by CodeRabbit

  • New Features

    • MS2Features rescoring & fine‑tuning framework; Onsite PTM localization added; DIA (.dia) input support; MultiQC module; ThermoRawFileParser support; TMT18plex default correction matrix; additional test profiles.
  • Bug Fixes

    • Fixed contrast construction in statistical summaries and small output/formatting refinements.
  • Chores

    • Nextflow → 25.04.0, Python → 3.14; CI/actions and container image updates; removed legacy dev/workspace configs; added .gitignore entries.
  • Documentation

    • README/usage updated with supported file formats and examples; changelog updated; schema now expects an "assay name" field.

✏️ Tip: You can customize this high-level summary in your review settings.

daichengxin and others added 30 commits August 14, 2025 18:12
fixing of bug in msstats_tmt.R (Issue: Error in the msstats_tmt.R function parse_contrasts #577)
Pass correct flag to ProteinQuantifier
fixed rescoring range bugs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
subworkflows/local/psm_rescoring/main.nf (1)

59-69: Critical: Undefined variable reference in add_file_prefix.

Line 67 references file_name within file(file_name), but file_name is the variable being assigned in this statement and hasn't been defined yet. This should reference the function parameter file_path instead.

🔎 Proposed fix
 def add_file_prefix(file_path) {
     position = file(file_path).name.lastIndexOf('_sage_perc.idXML')
     if (position == -1) {
         position = file(file_path).name.lastIndexOf('_comet_perc.idXML')
         if (position == -1) {
             position = file(file_path).name.lastIndexOf('_msgf_perc.idXML')
         }
     }
-    file_name = file(file_name).name.take(position)
+    file_name = file(file_path).name.take(position)
     return [file_name, file_path]
 }
♻️ Duplicate comments (1)
modules/local/utils/msrescore_features/main.nf (1)

47-59: Tolerance unit logic inconsistency remains unaddressed.

Line 53 checks params.ms2features_tolerance_unit == 'ppm' but line 46 loaded ms2_tolerance_unit from meta['fragmentmasstoleranceunit']. This creates an inconsistency: the code should check the meta-derived value at line 53 for consistency with the rest of the tolerance logic.

🔎 Proposed fix
     if (using_ms2pip) {
         // ms2pip only supports Da unit
         ms2_tolerance_unit = 'Da'
         ms2_tolerance = params.ms2features_tolerance
         if (meta['fragmentmasstoleranceunit'].toLowerCase().endsWith('da')) {
             ms2_tolerance = meta['fragmentmasstolerance']
-        } else if (params.ms2features_tolerance_unit == 'ppm') {
+        } else if (meta['fragmentmasstoleranceunit'].toLowerCase() == 'ppm') {
             log.info "Warning: MS2pip only supports Da unit. Using default from config!"
             ms2_tolerance = 0.05
         } else {
             log.info "Warning: MS2pip only supports Da unit. Using default from config!"
         }
     }

Based on past review comments.

🧹 Nitpick comments (2)
subworkflows/local/peptide_database_search/main.nf (2)

94-124: Consider parameterizing the random seed for reproducibility control.

Lines 99, 107, and 115 use a hardcoded seed value of 2025 for randomSample(). While this provides reproducibility, consider making this a parameter (e.g., params.random_seed or params.fine_tuning_seed) to give users control over reproducibility or the ability to run multiple experiments with different seeds.


242-252: Refactor to avoid redundant file() calls.

Line 247 calls file(filestr) twice, which is inefficient. Store the file object in a variable to avoid redundant operations.

🔎 Proposed fix
 def get_sample_map(LinkedHashMap row) {
     def sample_map = [:]
 
     filestr               = row.Spectra_Filepath
-    file_name             = file(filestr).name.take(file(filestr).name.lastIndexOf('.'))
+    def fileObj           = file(filestr)
+    file_name             = fileObj.name.take(fileObj.name.lastIndexOf('.'))
     sample                = row.Sample
 
     return [file_name, sample]
 
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 929ca6a and 4659730.

📒 Files selected for processing (7)
  • modules/local/utils/msrescore_features/main.nf
  • modules/local/utils/msrescore_fine_tuning/main.nf
  • modules/local/utils/spectrum_features/main.nf
  • subworkflows/local/dda_id/main.nf
  • subworkflows/local/id/main.nf
  • subworkflows/local/peptide_database_search/main.nf
  • subworkflows/local/psm_rescoring/main.nf
🚧 Files skipped from review as they are similar to previous changes (1)
  • subworkflows/local/id/main.nf
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: Docker Tests (25.04.0, test_localize)
  • GitHub Check: Docker Tests (25.04.0, test_dda_id_ms2pip)
  • GitHub Check: Docker Tests (latest-everything, test_dda_id_ms2pip)
  • GitHub Check: Docker Tests (latest-everything, test_tmt_corr)
  • GitHub Check: Docker Tests (latest-everything, test_dia)
  • GitHub Check: Docker Tests (latest-everything, test_localize)
  • GitHub Check: Docker Tests (latest-everything, test_dda_id_fine_tuning)
  • GitHub Check: Docker Tests (latest-everything, test_lfq)
  • GitHub Check: Docker Tests (25.04.0, test_dia)
  • GitHub Check: Docker Tests (25.04.0, test_tmt_corr)
  • GitHub Check: Docker Tests (25.04.0, test_lfq)
  • GitHub Check: Docker Tests (25.04.0, test_dda_id_fine_tuning)
  • GitHub Check: Docker Tests (25.04.0, test_tmt)
  • GitHub Check: Run pipeline with test data (25.04.0, test_lfq, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_fine_tuning, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dia, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_ms2pip, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_localize, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_alphapeptdeep, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_tmt, docker)
🔇 Additional comments (15)
modules/local/utils/spectrum_features/main.nf (3)

5-7: The container version 0.0.13 is documented in CHANGELOG.md for release 1.7.0 (PR #629: "Update quantms-rescoring to 0.0.13 to support transfer learning") and is used consistently across the spectrum_features and psm_clean modules. The version is valid and appropriate for this release.


10-10: No action needed — upstream processes correctly provide the search_engine parameter.

The SPECTRUM_FEATURES process receives search_engine via the .combine() operation at line 165. The MSRESCORE_FEATURES process (which produces ch_id_files_feats) already outputs search_engine as part of its tuple, so the parameter flows through correctly. While search_engine is not consumed within the SPECTRUM_FEATURES script section, passing it through to the output is a valid pattern for propagating metadata through the workflow pipeline.


13-13: The downstream consumer already correctly handles the current tuple structure, making this not a breaking change.

The output tuple at line 13 (tuple val(meta), path("${id_file.baseName}_snr.idXML"), val(search_engine)) is properly destructured by the downstream process at line 19 of subworkflows/local/peptide_database_search/main.nf ({ meta, file_name, engine_name ->). Both the conditional code paths—when ms2features_snr is enabled (using SPECTRUM_FEATURES.out.id_files_snr) and when disabled (using MSRESCORE_FEATURES.out.idxml)—emit the same 3-element tuple structure with search_engine, ensuring compatibility throughout the workflow.

However, the meta.yml file does not document the search_engine value in the output tuple, which should be updated for accuracy.

Likely an incorrect or invalid review comment.

modules/local/utils/msrescore_features/main.nf (3)

5-7: LGTM: Container versions updated consistently.

Both Singularity and Docker containers are correctly updated to version 0.0.14.


25-34: Good defensive handling of parameter edge cases.

The logic properly handles cases where --ms2features_model_dir might be an empty string, null, boolean true, or whitespace. This prevents common configuration issues.


85-89: Good addition of debug support.

The ms2features_debug parameter provides useful debugging capabilities by enabling DEBUG log level when needed.

modules/local/utils/msrescore_fine_tuning/main.nf (3)

5-7: LGTM: Container versions consistent.

Container versions match the updated versions in msrescore_features (0.0.14).


38-42: LGTM: force_transfer_learning variable correctly defined and used.

The variable naming is now consistent between definition (lines 38-42) and usage (line 62).


29-30: No action needed; the code is correct as written.

The meta[0] access is intentional. After groupTuple(by: 3) in the workflow (line 101, 109, 117 in peptide_database_search/main.nf), the meta variable becomes a list of metadata objects from grouped samples: [meta1, meta2, ...]. Accessing meta[0]['fragmentmasstolerance'] and meta[0]['fragmentmasstoleranceunit'] retrieves the tolerance values from the first grouped sample. Since samples are grouped by these same tolerance properties, all grouped meta objects will have identical values, making this approach correct.

subworkflows/local/peptide_database_search/main.nf (2)

6-11: LGTM: New imports aligned with MS2 feature functionality.

The new module imports (PSM_CLEAN, MSRESCORE_FINE_TUNING, MSRESCORE_FEATURES, GET_SAMPLE, SPECTRUM_FEATURES, ID_MERGER) are correctly added to support the enhanced MS2 rescoring workflow.


126-143: Complex channel transformations are logically sound.

The channel transformations for msgf, sage, and comet feature inputs correctly reorder elements to match the expected input structure for MSRESCORE_FEATURES, moving the engine name to the last position.

subworkflows/local/dda_id/main.nf (3)

46-61: Conditional logic handles ms2features_range appropriately.

The code correctly distinguishes between "independent_run" (which skips ID_RIPPER) and other values (which use ID_RIPPER to split by runs). Both "by_sample" and "by_project" follow the same path, which appears intentional since both need per-run splitting for CONSENSUSID.


71-74: Version mixing is correctly scoped within conditional.

Line 73 mixes CONSENSUSID.out.versions within the conditional block (line 71) that ensures CONSENSUSID is only called when multiple search engines are used. This is correct and avoids referencing undefined outputs.


30-36: No issue found.

PEPTIDE_DATABASE_SEARCH always emits a versions channel because search_engines has a default value ('comet' in nextflow.config) and the workflow ensures at least one search engine is enabled. Since the comet, msgf, or sage checks will always match at least one enabled engine, the ch_versions channel is guaranteed to receive at least one version mix before emission. The removal of ifEmpty(null) is safe and does not introduce channel errors.

subworkflows/local/psm_rescoring/main.nf (1)

22-51: Conditional logic properly handles three ms2features_range modes.

The three branches correctly handle "independent_run", "by_sample", and "by_project" modes with appropriate PERCOLATOR and ID_RIPPER operations for each.

ypriverol and others added 3 commits January 4, 2026 17:50
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
using the bigbio onsite module
@ypriverol ypriverol requested a review from Copilot January 4, 2026 23:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
modules/bigbio/onsite/tests/main.nf.test (1)

1-112: Good test coverage for the multi-algorithm support.

The test suite properly validates all three algorithms (ascore, phosphors, lucxor) with consistent assertions for:

  • Process success
  • Version snapshots
  • Output file counts
  • Correct filename suffixes

Consider adding a negative test case for an invalid algorithm value to verify the error handling path in main.nf (line 113).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4659730 and 6130dc6.

📒 Files selected for processing (8)
  • conf/dev.config
  • modules.json
  • modules/bigbio/onsite/environment.yml
  • modules/bigbio/onsite/main.nf
  • modules/bigbio/onsite/meta.yml
  • modules/bigbio/onsite/tests/main.nf.test
  • modules/bigbio/onsite/tests/nextflow.config
  • subworkflows/local/phospho_scoring/main.nf
🚧 Files skipped from review as they are similar to previous changes (2)
  • subworkflows/local/phospho_scoring/main.nf
  • conf/dev.config
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: enryH
Repo: bigbio/quantms PR: 574
File: .github/actions/get-shards/action.yml:31-41
Timestamp: 2025-08-14T09:53:36.777Z
Learning: The .github/actions/get-shards/ and .github/actions/nf-test/ actions in quantms are part of the nf-core template (introduced in template version 3.3.2) for nf-test framework integration, not custom pipeline code.
📚 Learning: 2025-08-14T09:53:36.777Z
Learnt from: enryH
Repo: bigbio/quantms PR: 574
File: .github/actions/get-shards/action.yml:31-41
Timestamp: 2025-08-14T09:53:36.777Z
Learning: The .github/actions/get-shards/ and .github/actions/nf-test/ actions in quantms are part of the nf-core template (introduced in template version 3.3.2) for nf-test framework integration, not custom pipeline code.

Applied to files:

  • modules.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: Docker Tests (latest-everything, test_dda_id_fine_tuning)
  • GitHub Check: Docker Tests (latest-everything, test_dda_id_ms2pip)
  • GitHub Check: Docker Tests (latest-everything, test_localize)
  • GitHub Check: Docker Tests (25.04.0, test_dda_id_ms2pip)
  • GitHub Check: Docker Tests (latest-everything, test_tmt)
  • GitHub Check: Docker Tests (latest-everything, test_lfq)
  • GitHub Check: Docker Tests (25.04.0, test_dda_id_alphapeptdeep)
  • GitHub Check: Docker Tests (latest-everything, test_dia)
  • GitHub Check: Docker Tests (25.04.0, test_lfq)
  • GitHub Check: Docker Tests (25.04.0, test_localize)
  • GitHub Check: Docker Tests (25.04.0, test_dia)
  • GitHub Check: Docker Tests (25.04.0, test_dda_id_fine_tuning)
  • GitHub Check: Docker Tests (25.04.0, test_tmt)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_fine_tuning, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_tmt, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_lfq, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_alphapeptdeep, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_ms2pip, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_localize, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dia, docker)
🔇 Additional comments (9)
modules.json (4)

13-17: MultiQC module entry is valid and properly configured.

The git SHA 8deffd6a402233ebf905c66dbf804bd4c2637946 is confirmed on the nf-core/modules repository, corresponding to commit "fix: multiqc report publishing (#9437)" dated 2025-11-23. The module entry correctly points to a legitimate multiqc-related change that aligns with the PR's release 1.7.0 objectives.


10-12: Git SHA e753770db613ce014b3c4bc94f6cba443427b726 is valid on nf-core/modules.

The commit exists on the nf-core/modules repository and is a legitimate coordination commit (pre-commit rules enforcement across the repository).


39-55: The bigbio/nf-modules SHAs have been verified as valid commits on the main branch and are production-ready.

Both modules are confirmed:

  • onsite (e4f16df2acb2c084a5043840212d0fb52f69fbbc): Valid commit on main, author Wei Zhongchun
  • thermorawfileparser (a1a4a11ff508b2b5c23c9fb21c51c3327b748d4d): Valid commit on main, author Chengxin Dai

No concerns with these SHAs or module configurations.


22-36: modules.json is a newly created file; no inconsistency exists between the subworkflow SHAs.

The review comment incorrectly assumes that utils_nextflow_pipeline was "unchanged" while the other two were "updated." In fact, modules.json is being added for the first time in this commit, so all three subworkflows—including utils_nextflow_pipeline, utils_nfcore_pipeline, and utils_nfschema_plugin—are initial entries in the configuration, not a selective mix of updates. There is no prior state to compare against, and therefore no inconsistency in the SHA versions.

Likely an incorrect or invalid review comment.

modules/bigbio/onsite/environment.yml (1)

1-7: LGTM!

The environment file is properly structured with the correct channel order. The pinned version 0.0.2 matches the container version specified in main.nf, ensuring consistency across execution environments.

modules/bigbio/onsite/tests/nextflow.config (1)

1-3: LGTM!

Standard nf-core test configuration pattern for dynamic output directory resolution based on process name.

modules/bigbio/onsite/main.nf (3)

83-86: Shell quoting in default values may cause issues.

The embedded single quotes in the default values for target_mods, neutral_losses, and decoy_losses may cause shell interpretation problems when the command is executed, particularly with the spaces and special characters.

Verify these defaults work correctly during integration testing, especially the values containing spaces like 'sty -H3PO4 -97.97690'.


1-9: LGTM on container and process setup.

Container versioning is consistent with environment.yml (both use pyonsite:0.0.2). The tag and labels are appropriately defined for resource allocation and process identification.


122-122: The grep -P concern may not apply to this specific container.

While the review correctly notes that BusyBox grep lacks PCRE support (-P flag), BioContainers images are conda-based rather than minimal Alpine/BusyBox images. The pyonsite container (quay.io/biocontainers/pyonsite:0.0.2--pyhdfd78af_0) is built from Bioconda recipes, which automatically include GNU tools like grep with PCRE support. Additionally, the code includes a || echo "unknown" fallback that handles version extraction failures gracefully, and integration tests show no version-related failures.

If portability to minimal containers is a requirement, the sed alternative could be used, though it adds complexity for an issue unlikely to occur with the current container setup.

Comment on lines +40 to +53
if (algorithm == 'ascore') {
// AScore: uses -in, -id, -out, --fragment-mass-unit
fragment_unit = params.onsite_fragment_unit ?: 'Da'
algorithm_cmd = """
onsite ascore \\
-in ${mzml_file} \\
-id ${id_file} \\
-out ${id_file.baseName}_ascore.idXML \\
--fragment-mass-tolerance ${fragment_tolerance} \\
--fragment-mass-unit ${fragment_unit} \\
${add_decoys} \\
${compute_all_scores} \\
${debug}
"""
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Empty optional flags may leave trailing backslashes.

When $add_decoys, $compute_all_scores, or $debug are empty strings (e.g., when their corresponding params are false), the command will have lines with only backslashes, potentially causing shell errors.

A common pattern is to consolidate optional flags or filter empty values.

🔎 Proposed fix - use conditional flag inclusion
     if (algorithm == 'ascore') {
         // AScore: uses -in, -id, -out, --fragment-mass-unit
         fragment_unit = params.onsite_fragment_unit ?: 'Da'
+        def optional_flags = [add_decoys, compute_all_scores, debug].findAll { it }.join(' \\\n            ')
         algorithm_cmd = """
         onsite ascore \\
             -in ${mzml_file} \\
             -id ${id_file} \\
             -out ${id_file.baseName}_ascore.idXML \\
             --fragment-mass-tolerance ${fragment_tolerance} \\
-            --fragment-mass-unit ${fragment_unit} \\
-            ${add_decoys} \\
-            ${compute_all_scores} \\
-            ${debug}
+            --fragment-mass-unit ${fragment_unit}${optional_flags ? ' \\\\\n            ' + optional_flags : ''}
         """
     }

Apply similar pattern to phosphors and lucxor blocks.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In modules/bigbio/onsite/main.nf around lines 40 to 53, the heredoc builds the
onsite ascore command by interpolating optional flag variables directly, which
leaves lines containing only backslashes when those variables are empty;
instead, construct the command by assembling an ordered list/array of mandatory
and optional arguments, filter out empty/null entries, and then join them with "
\\\n            " (or equivalent) so no blank backslash-only lines remain; apply
the same conditional list-and-join pattern to the phosphors and lucxor blocks to
avoid trailing backslashes from empty optional flags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weizhongchun Can you take this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +116 to +118
"""
${algorithm_cmd} \\
2>&1 | tee ${id_file.baseName}_${algorithm}.log
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Trailing backslash may cause command parsing issues.

The algorithm_cmd blocks already end with trailing backslashes for line continuation, and line 117 adds another \\ after ${algorithm_cmd}. This could result in a malformed command if any optional flags (like $debug) are empty strings, leaving dangling backslashes.

Consider trimming trailing whitespace/backslashes or restructuring the command construction.

🔎 Proposed fix
     """
-    ${algorithm_cmd} \\
-        2>&1 | tee ${id_file.baseName}_${algorithm}.log
+    ${algorithm_cmd.trim()} 2>&1 | tee ${id_file.baseName}_${algorithm}.log

     cat <<-END_VERSIONS > versions.yml
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""
${algorithm_cmd} \\
2>&1 | tee ${id_file.baseName}_${algorithm}.log
"""
${algorithm_cmd.trim()} 2>&1 | tee ${id_file.baseName}_${algorithm}.log
cat <<-END_VERSIONS > versions.yml
🤖 Prompt for AI Agents
In modules/bigbio/onsite/main.nf around lines 116-118, the line adds an extra
trailing backslash after ${algorithm_cmd} which, combined with algorithm_cmd
already ending in backslashes or empty optional flags, can leave a dangling
continuation and break the shell command; fix by removing the additional literal
backslash and ensure you either trim any trailing backslashes/whitespace from
algorithm_cmd before concatenation or build the full command using conditional
joins (only append line-continuation when a non-empty fragment follows) so the
final command ends cleanly before piping to tee.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weizhongchun Can you take this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +23 to +53
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- mzml_file:
type: file
description: Input spectrum file in mzML format
pattern: "*.mzML"
- id_file:
type: file
description: Protein/peptide identifications file in idXML format
pattern: "*.idXML"
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- ptm_in_id_onsite:
type: file
description: Protein/peptide identifications file with PTM localization scores
pattern: "*_{ascore,phosphors,lucxor}.idXML"
- log:
type: file
description: Log file from onsite execution
pattern: "*.log"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Input/output sections are malformed.

The meta.yml structure has issues:

  1. meta is defined twice (lines 24-28 and 37-41)
  2. Output fields (ptm_in_id_onsite, log, versions) are incorrectly placed under input: instead of a separate output: section

Per nf-core module conventions, inputs and outputs should be separate sections.

🔎 Proposed fix
 input:
   - meta:
       type: map
       description: |
         Groovy Map containing sample information
         e.g. [ id:'test', mzml_id:'sample1' ]
   - mzml_file:
       type: file
       description: Input spectrum file in mzML format
       pattern: "*.mzML"
   - id_file:
       type: file
       description: Protein/peptide identifications file in idXML format
       pattern: "*.idXML"
-  - meta:
-      type: map
-      description: |
-        Groovy Map containing sample information
-        e.g. [ id:'test', mzml_id:'sample1' ]
-  - ptm_in_id_onsite:
+output:
+  - meta:
+      type: map
+      description: |
+        Groovy Map containing sample information
+        e.g. [ id:'test', mzml_id:'sample1' ]
+  - ptm_in_id_onsite:
       type: file
       description: Protein/peptide identifications file with PTM localization scores
       pattern: "*_{ascore,phosphors,lucxor}.idXML"
   - log:
       type: file
       description: Log file from onsite execution
       pattern: "*.log"
   - versions:
       type: file
       description: File containing software versions
       pattern: "versions.yml"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- mzml_file:
type: file
description: Input spectrum file in mzML format
pattern: "*.mzML"
- id_file:
type: file
description: Protein/peptide identifications file in idXML format
pattern: "*.idXML"
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- ptm_in_id_onsite:
type: file
description: Protein/peptide identifications file with PTM localization scores
pattern: "*_{ascore,phosphors,lucxor}.idXML"
- log:
type: file
description: Log file from onsite execution
pattern: "*.log"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- mzml_file:
type: file
description: Input spectrum file in mzML format
pattern: "*.mzML"
- id_file:
type: file
description: Protein/peptide identifications file in idXML format
pattern: "*.idXML"
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- ptm_in_id_onsite:
type: file
description: Protein/peptide identifications file with PTM localization scores
pattern: "*_{ascore,phosphors,lucxor}.idXML"
- log:
type: file
description: Log file from onsite execution
pattern: "*.log"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
🤖 Prompt for AI Agents
In modules/bigbio/onsite/meta.yml around lines 23-53, the input/output section
is malformed: `meta` is defined twice and several fields (ptm_in_id_onsite, log,
versions) that are outputs are placed under `input`. Remove the duplicate `meta`
entry (merge any differing descriptions into a single `meta` input), keep only
unique input entries (mzml_file, id_file, meta), and create a new top-level
`output:` section containing ptm_in_id_onsite, log, and versions with their
existing type/description/patterns; ensure keys and patterns follow nf-core
conventions and that indentation and YAML ordering are valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weizhongchun Can you refine this, this needs to be done in the nf-modules

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Implementation 🛠️

Implementation: Fix the nf-core module meta.yml structure by removing the duplicated meta input and moving output artifacts (ptm_in_id_onsite, log, versions) into a separate top-level output: section while preserving existing descriptions, types, and patterns.

Suggested change
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- mzml_file:
type: file
description: Input spectrum file in mzML format
pattern: "*.mzML"
- id_file:
type: file
description: Protein/peptide identifications file in idXML format
pattern: "*.idXML"
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- ptm_in_id_onsite:
type: file
description: Protein/peptide identifications file with PTM localization scores
pattern: "*_{ascore,phosphors,lucxor}.idXML"
- log:
type: file
description: Log file from onsite execution
pattern: "*.log"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- mzml_file:
type: file
description: Input spectrum file in mzML format
pattern: "*.mzML"
- id_file:
type: file
description: Protein/peptide identifications file in idXML format
pattern: "*.idXML"
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- ptm_in_id_onsite:
type: file
description: Protein/peptide identifications file with PTM localization scores
pattern: "*_{ascore,phosphors,lucxor}.idXML"
- log:
type: file
description: Log file from onsite execution
pattern: "*.log"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"

See review comment here

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
subworkflows/local/peptide_database_search/main.nf (1)

89-92: Previous issue resolved.

The exit(1, ...) has been correctly replaced with error(...) as recommended in the past review. The error message is now more informative.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6130dc6 and 60f5ee6.

📒 Files selected for processing (1)
  • subworkflows/local/peptide_database_search/main.nf
🔇 Additional comments (6)
subworkflows/local/peptide_database_search/main.nf (6)

6-11: LGTM!

New module imports are properly structured and follow Nextflow DSL2 conventions.


13-20: LGTM!

Workflow declaration with the new ch_expdesign input and channel initialization is correct.


147-161: LGTM!

The non-fine-tuning path correctly combines channels with required inputs and processes them through MSRESCORE_FEATURES.


172-181: LGTM!

The branching logic correctly separates results by search engine and maps to the expected output structure.


237-240: Emit block is correctly structured.

The emit section properly exposes workflow outputs, assuming the ch_id_files_out undefined variable issue is addressed.


197-204: Combine operation should include validation or use a safer join approach to prevent silent data loss.

The combine(..., by: 0) operations (lines 198, 201, 204) perform an inner join on mzml_id. While both ch_id_files_*_feats and sample_map_idv derive from the same ch_expdesign source file using identical filename extraction logic, there is no explicit validation that all files are preserved through the sample mapping. If the openms2sample tool filters entries or if unexpected discrepancies occur in file naming, data could be silently dropped. Consider adding error handling or using a join pattern that fails on missing keys to ensure data integrity.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is a major release (1.7.0) introducing significant changes to the quantms pipeline's rescoring framework and PTM localization capabilities.

Key changes:

  • Replaces the ms2rescore parameter naming scheme with a new ms2features framework supporting multiple prediction models (AlphaPeptDeep, MS2PIP)
  • Introduces Onsite module to replace Luciphor for PTM localization, providing multiple algorithm options (AScore, PhosphoRS, LucXor)
  • Adds fine-tuning capability for AlphaPeptDeep MS2 prediction models

Reviewed changes

Copilot reviewed 100 out of 103 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
workflows/quantms.nf Major refactoring: removed .ifEmpty(null) patterns, improved code formatting, restructured multiqc input handling
workflows/tmt.nf, lfq.nf, dia.nf Removed .ifEmpty(null) patterns for cleaner version channel handling
nextflow.config Renamed ms2rescore parameters to ms2features_*, updated Nextflow requirement to 25.04.0, version bump to 1.7.0
subworkflows/local/psm_rescoring Simplified rescoring logic by moving feature generation upstream to peptide_database_search
subworkflows/local/peptide_database_search Added MS2Features generation, fine-tuning, and sample/project-level merging logic
subworkflows/local/phospho_scoring Replaced LUCIPHOR with ONSITE module for PTM localization
modules/bigbio/onsite New module supporting three PTM localization algorithms (ascore, phosphors, lucxor)
modules/bigbio/thermorawfileparser Migrated from local to bigbio nf-modules repository with improved stub mode
bin/msstats_tmt.R Fixed critical bug in contrast matrix generation for control conditions, code formatting improvements
nextflow_schema.json Comprehensive parameter updates for ms2features framework and onsite configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
subworkflows/local/dda_id/main.nf (1)

71-80: The CONSENSUSID.out.versions access is safe here.

The previous concern about CONSENSUSID.out.versions being undefined when only one search engine is used is not applicable because line 73 is inside the conditional block that only executes when size() > 1. When a single search engine is used, this block is skipped entirely and ch_psmfdrcontrol is assigned directly from ch_consensus_input.

subworkflows/local/peptide_database_search/main.nf (2)

83-88: Fallback path doesn't validate directory existence.

Line 87 uses file("pretrained_models") without checkIfExists: true, unlike line 84. If the pretrained_models directory doesn't exist, this will fail at runtime without a clear error message.

🔎 Proposed fix
             if (params.ms2features_model_dir && params.ms2features_model_dir != true) {
                 ms2_model_dir = Channel.from(file(params.ms2features_model_dir, checkIfExists: true))
             } else {
-                ms2_model_dir = Channel.from(file("pretrained_models"))
+                ms2_model_dir = Channel.from(file("pretrained_models", checkIfExists: true))
             }

246-253: Missing def keywords for local variables in get_sample_map.

Variables filestr, file_name, and sample (lines 248-250) are declared without def, making them script-scoped rather than function-local. This could cause unexpected behavior in concurrent contexts within Nextflow.

🔎 Proposed fix
 // Function to get sample map
 def get_sample_map(LinkedHashMap row) {
-
-    filestr               = row.Spectra_Filepath
-    file_name             = file(filestr).name.take(file(filestr).name.lastIndexOf('.'))
-    sample                = row.Sample
-
+    def filestr   = row.Spectra_Filepath
+    def file_name = file(filestr).name.take(file(filestr).name.lastIndexOf('.'))
+    def sample    = row.Sample
     return [file_name, sample]
-
 }
🧹 Nitpick comments (5)
subworkflows/local/dda_id/main.nf (2)

85-91: Inconsistent use of ifEmpty(null) for PHOSPHO_SCORING.

Line 87 uses .ifEmpty(null) for PHOSPHO_SCORING.out.versions, while other version channels in this file (lines 35, 48, 52, 60, 73, 83, 95) now use direct access without the fallback. Consider making this consistent, or document why the fallback is needed here specifically.


112-122: Consider making the add_file_prefix function more robust.

The function assumes specific file suffixes (_sage_perc.idXML, _comet_perc.idXML, _msgf_perc.idXML). If none match, position remains -1, and file_name = file(file_path).name.take(-1) will return an empty string or unexpected result.

🔎 Proposed fix
 def add_file_prefix(file_path) {
     position = file(file_path).name.lastIndexOf('_sage_perc.idXML')
     if (position == -1) {
         position = file(file_path).name.lastIndexOf('_comet_perc.idXML')
         if (position == -1) {
             position = file(file_path).name.lastIndexOf('_msgf_perc.idXML')
+            if (position == -1) {
+                // Fallback: use filename without extension
+                position = file(file_path).name.lastIndexOf('.')
+            }
         }
     }
     file_name = file(file_path).name.take(position)
     return [file_name, file_path]
 }
subworkflows/local/peptide_database_search/main.nf (3)

96-102: Hardcoded random seed reduces variability across runs.

The randomSample(params.fine_tuning_sample_run, 2025) uses a fixed seed 2025. While this ensures reproducibility, it means every run will select the same samples. Consider making the seed configurable via a parameter if variability across runs is desired.


148-162: Channel variable reassignment may cause confusion.

Lines 149-157 reassign ch_id_msgf, ch_id_comet, and ch_id_sage to new channel structures, which shadows the original channel values. This could lead to maintenance confusion. Consider using distinct variable names.

🔎 Proposed fix - use distinct names
-            ch_id_msgf.combine(ch_mzmls_search, by: 0)
+            ch_id_msgf_combined = ch_id_msgf.combine(ch_mzmls_search, by: 0)
                 .combine(ms2_model_dir)
-                .combine(Channel.value("msgf")).set{ ch_id_msgf }
-            ch_id_comet.combine(ch_mzmls_search, by: 0)
+                .combine(Channel.value("msgf"))
+            ch_id_comet_combined = ch_id_comet.combine(ch_mzmls_search, by: 0)
                 .combine(ms2_model_dir)
-                .combine(Channel.value("comet")).set{ ch_id_comet }
-            ch_id_sage.combine(ch_mzmls_search, by: 0)
+                .combine(Channel.value("comet"))
+            ch_id_sage_combined = ch_id_sage.combine(ch_mzmls_search, by: 0)
                 .combine(ms2_model_dir)
-                .combine(Channel.value("sage")).set{ ch_id_sage }
+                .combine(Channel.value("sage"))

-            MSRESCORE_FEATURES(ch_id_msgf.mix(ch_id_comet).mix(ch_id_sage))
+            MSRESCORE_FEATURES(ch_id_msgf_combined.mix(ch_id_comet_combined).mix(ch_id_sage_combined))

207-213: groupTuple(by: 2) relies on positional index.

The grouping assumes the third element (index 2) is always the sample identifier. If the tuple structure changes upstream (lines 198-205), this will silently break. Consider using a named key or adding a comment documenting the expected tuple structure.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60f5ee6 and 2bd3823.

📒 Files selected for processing (3)
  • modules/local/utils/msrescore_fine_tuning/main.nf
  • subworkflows/local/dda_id/main.nf
  • subworkflows/local/peptide_database_search/main.nf
🚧 Files skipped from review as they are similar to previous changes (1)
  • modules/local/utils/msrescore_fine_tuning/main.nf
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: Docker Tests (25.04.0, test_localize)
  • GitHub Check: Docker Tests (25.04.0, test_tmt_corr)
  • GitHub Check: Docker Tests (latest-everything, test_localize)
  • GitHub Check: Docker Tests (latest-everything, test_dda_id_fine_tuning)
  • GitHub Check: Docker Tests (latest-everything, test_lfq)
  • GitHub Check: Docker Tests (25.04.0, test_lfq)
  • GitHub Check: Docker Tests (latest-everything, test_dia)
  • GitHub Check: Docker Tests (latest-everything, test_dda_id_alphapeptdeep)
  • GitHub Check: Docker Tests (latest-everything, test_dda_id_ms2pip)
  • GitHub Check: Docker Tests (latest-everything, test_tmt_corr)
  • GitHub Check: Docker Tests (25.04.0, test_dda_id_alphapeptdeep)
  • GitHub Check: Docker Tests (25.04.0, test_dda_id_fine_tuning)
  • GitHub Check: Docker Tests (25.04.0, test_tmt)
  • GitHub Check: Run pipeline with test data (25.04.0, test_tmt, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_ms2pip, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_localize, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_lfq, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_alphapeptdeep, docker)
  • GitHub Check: Run pipeline with test data (25.04.0, test_dia, docker)
🔇 Additional comments (5)
subworkflows/local/dda_id/main.nf (2)

8-8: LGTM!

The new PHOSPHO_SCORING subworkflow import aligns with the conditional phospho scoring logic added at lines 85-91.


30-36: LGTM!

The addition of ch_expdesign to PEPTIDE_DATABASE_SEARCH and the channel rename from ch_id_files_idx to ch_id_files_feats align with the broader refactoring for ms2features support.

subworkflows/local/peptide_database_search/main.nf (3)

6-11: LGTM!

The new module imports are properly structured and align with the ms2features functionality being added.


90-93: LGTM - error() function is now used correctly.

The previous issue about using exit() instead of error() has been addressed.


231-238: LGTM - ch_id_files_out is now properly defined in all branches.

The previous issue about ch_id_files_out being undefined when skip_rescoring == true and psm_clean != true has been addressed with the else branch at line 236-238.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants