-
Notifications
You must be signed in to change notification settings - Fork 51
Release 1.7.0 - Major changes with quantms-rescoring #625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Increase dev version
fixing of bug in msstats_tmt.R (Issue: Error in the msstats_tmt.R function parse_contrasts #577)
Pass correct flag to ProteinQuantifier
fixed rescoring range bugs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
fixed quantms-rescoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
subworkflows/local/psm_rescoring/main.nf (1)
59-69: Critical: Undefined variable reference in add_file_prefix.Line 67 references
file_namewithinfile(file_name), butfile_nameis the variable being assigned in this statement and hasn't been defined yet. This should reference the function parameterfile_pathinstead.🔎 Proposed fix
def add_file_prefix(file_path) { position = file(file_path).name.lastIndexOf('_sage_perc.idXML') if (position == -1) { position = file(file_path).name.lastIndexOf('_comet_perc.idXML') if (position == -1) { position = file(file_path).name.lastIndexOf('_msgf_perc.idXML') } } - file_name = file(file_name).name.take(position) + file_name = file(file_path).name.take(position) return [file_name, file_path] }
♻️ Duplicate comments (1)
modules/local/utils/msrescore_features/main.nf (1)
47-59: Tolerance unit logic inconsistency remains unaddressed.Line 53 checks
params.ms2features_tolerance_unit == 'ppm'but line 46 loadedms2_tolerance_unitfrommeta['fragmentmasstoleranceunit']. This creates an inconsistency: the code should check the meta-derived value at line 53 for consistency with the rest of the tolerance logic.🔎 Proposed fix
if (using_ms2pip) { // ms2pip only supports Da unit ms2_tolerance_unit = 'Da' ms2_tolerance = params.ms2features_tolerance if (meta['fragmentmasstoleranceunit'].toLowerCase().endsWith('da')) { ms2_tolerance = meta['fragmentmasstolerance'] - } else if (params.ms2features_tolerance_unit == 'ppm') { + } else if (meta['fragmentmasstoleranceunit'].toLowerCase() == 'ppm') { log.info "Warning: MS2pip only supports Da unit. Using default from config!" ms2_tolerance = 0.05 } else { log.info "Warning: MS2pip only supports Da unit. Using default from config!" } }Based on past review comments.
🧹 Nitpick comments (2)
subworkflows/local/peptide_database_search/main.nf (2)
94-124: Consider parameterizing the random seed for reproducibility control.Lines 99, 107, and 115 use a hardcoded seed value of
2025forrandomSample(). While this provides reproducibility, consider making this a parameter (e.g.,params.random_seedorparams.fine_tuning_seed) to give users control over reproducibility or the ability to run multiple experiments with different seeds.
242-252: Refactor to avoid redundantfile()calls.Line 247 calls
file(filestr)twice, which is inefficient. Store the file object in a variable to avoid redundant operations.🔎 Proposed fix
def get_sample_map(LinkedHashMap row) { def sample_map = [:] filestr = row.Spectra_Filepath - file_name = file(filestr).name.take(file(filestr).name.lastIndexOf('.')) + def fileObj = file(filestr) + file_name = fileObj.name.take(fileObj.name.lastIndexOf('.')) sample = row.Sample return [file_name, sample] }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
modules/local/utils/msrescore_features/main.nfmodules/local/utils/msrescore_fine_tuning/main.nfmodules/local/utils/spectrum_features/main.nfsubworkflows/local/dda_id/main.nfsubworkflows/local/id/main.nfsubworkflows/local/peptide_database_search/main.nfsubworkflows/local/psm_rescoring/main.nf
🚧 Files skipped from review as they are similar to previous changes (1)
- subworkflows/local/id/main.nf
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: Docker Tests (25.04.0, test_localize)
- GitHub Check: Docker Tests (25.04.0, test_dda_id_ms2pip)
- GitHub Check: Docker Tests (latest-everything, test_dda_id_ms2pip)
- GitHub Check: Docker Tests (latest-everything, test_tmt_corr)
- GitHub Check: Docker Tests (latest-everything, test_dia)
- GitHub Check: Docker Tests (latest-everything, test_localize)
- GitHub Check: Docker Tests (latest-everything, test_dda_id_fine_tuning)
- GitHub Check: Docker Tests (latest-everything, test_lfq)
- GitHub Check: Docker Tests (25.04.0, test_dia)
- GitHub Check: Docker Tests (25.04.0, test_tmt_corr)
- GitHub Check: Docker Tests (25.04.0, test_lfq)
- GitHub Check: Docker Tests (25.04.0, test_dda_id_fine_tuning)
- GitHub Check: Docker Tests (25.04.0, test_tmt)
- GitHub Check: Run pipeline with test data (25.04.0, test_lfq, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_fine_tuning, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dia, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_ms2pip, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_localize, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_alphapeptdeep, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_tmt, docker)
🔇 Additional comments (15)
modules/local/utils/spectrum_features/main.nf (3)
5-7: The container version 0.0.13 is documented in CHANGELOG.md for release 1.7.0 (PR #629: "Update quantms-rescoring to 0.0.13 to support transfer learning") and is used consistently across the spectrum_features and psm_clean modules. The version is valid and appropriate for this release.
10-10: No action needed — upstream processes correctly provide the search_engine parameter.The
SPECTRUM_FEATURESprocess receivessearch_enginevia the.combine()operation at line 165. TheMSRESCORE_FEATURESprocess (which producesch_id_files_feats) already outputssearch_engineas part of its tuple, so the parameter flows through correctly. Whilesearch_engineis not consumed within theSPECTRUM_FEATURESscript section, passing it through to the output is a valid pattern for propagating metadata through the workflow pipeline.
13-13: The downstream consumer already correctly handles the current tuple structure, making this not a breaking change.The output tuple at line 13 (
tuple val(meta), path("${id_file.baseName}_snr.idXML"), val(search_engine)) is properly destructured by the downstream process at line 19 of subworkflows/local/peptide_database_search/main.nf ({ meta, file_name, engine_name ->). Both the conditional code paths—whenms2features_snris enabled (usingSPECTRUM_FEATURES.out.id_files_snr) and when disabled (usingMSRESCORE_FEATURES.out.idxml)—emit the same 3-element tuple structure withsearch_engine, ensuring compatibility throughout the workflow.However, the meta.yml file does not document the
search_enginevalue in the output tuple, which should be updated for accuracy.Likely an incorrect or invalid review comment.
modules/local/utils/msrescore_features/main.nf (3)
5-7: LGTM: Container versions updated consistently.Both Singularity and Docker containers are correctly updated to version 0.0.14.
25-34: Good defensive handling of parameter edge cases.The logic properly handles cases where
--ms2features_model_dirmight be an empty string, null, boolean true, or whitespace. This prevents common configuration issues.
85-89: Good addition of debug support.The
ms2features_debugparameter provides useful debugging capabilities by enabling DEBUG log level when needed.modules/local/utils/msrescore_fine_tuning/main.nf (3)
5-7: LGTM: Container versions consistent.Container versions match the updated versions in msrescore_features (0.0.14).
38-42: LGTM: force_transfer_learning variable correctly defined and used.The variable naming is now consistent between definition (lines 38-42) and usage (line 62).
29-30: No action needed; the code is correct as written.The
meta[0]access is intentional. AftergroupTuple(by: 3)in the workflow (line 101, 109, 117 in peptide_database_search/main.nf), themetavariable becomes a list of metadata objects from grouped samples:[meta1, meta2, ...]. Accessingmeta[0]['fragmentmasstolerance']andmeta[0]['fragmentmasstoleranceunit']retrieves the tolerance values from the first grouped sample. Since samples are grouped by these same tolerance properties, all grouped meta objects will have identical values, making this approach correct.subworkflows/local/peptide_database_search/main.nf (2)
6-11: LGTM: New imports aligned with MS2 feature functionality.The new module imports (PSM_CLEAN, MSRESCORE_FINE_TUNING, MSRESCORE_FEATURES, GET_SAMPLE, SPECTRUM_FEATURES, ID_MERGER) are correctly added to support the enhanced MS2 rescoring workflow.
126-143: Complex channel transformations are logically sound.The channel transformations for msgf, sage, and comet feature inputs correctly reorder elements to match the expected input structure for MSRESCORE_FEATURES, moving the engine name to the last position.
subworkflows/local/dda_id/main.nf (3)
46-61: Conditional logic handles ms2features_range appropriately.The code correctly distinguishes between
"independent_run"(which skips ID_RIPPER) and other values (which use ID_RIPPER to split by runs). Both"by_sample"and"by_project"follow the same path, which appears intentional since both need per-run splitting for CONSENSUSID.
71-74: Version mixing is correctly scoped within conditional.Line 73 mixes
CONSENSUSID.out.versionswithin the conditional block (line 71) that ensures CONSENSUSID is only called when multiple search engines are used. This is correct and avoids referencing undefined outputs.
30-36: No issue found.PEPTIDE_DATABASE_SEARCH always emits a versions channel because
search_engineshas a default value ('comet'in nextflow.config) and the workflow ensures at least one search engine is enabled. Since thecomet,msgf, orsagechecks will always match at least one enabled engine, thech_versionschannel is guaranteed to receive at least one version mix before emission. The removal ofifEmpty(null)is safe and does not introduce channel errors.subworkflows/local/psm_rescoring/main.nf (1)
22-51: Conditional logic properly handles three ms2features_range modes.The three branches correctly handle
"independent_run","by_sample", and"by_project"modes with appropriate PERCOLATOR and ID_RIPPER operations for each.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
using the bigbio onsite module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
modules/bigbio/onsite/tests/main.nf.test (1)
1-112: Good test coverage for the multi-algorithm support.The test suite properly validates all three algorithms (ascore, phosphors, lucxor) with consistent assertions for:
- Process success
- Version snapshots
- Output file counts
- Correct filename suffixes
Consider adding a negative test case for an invalid algorithm value to verify the error handling path in
main.nf(line 113).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
conf/dev.configmodules.jsonmodules/bigbio/onsite/environment.ymlmodules/bigbio/onsite/main.nfmodules/bigbio/onsite/meta.ymlmodules/bigbio/onsite/tests/main.nf.testmodules/bigbio/onsite/tests/nextflow.configsubworkflows/local/phospho_scoring/main.nf
🚧 Files skipped from review as they are similar to previous changes (2)
- subworkflows/local/phospho_scoring/main.nf
- conf/dev.config
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: enryH
Repo: bigbio/quantms PR: 574
File: .github/actions/get-shards/action.yml:31-41
Timestamp: 2025-08-14T09:53:36.777Z
Learning: The .github/actions/get-shards/ and .github/actions/nf-test/ actions in quantms are part of the nf-core template (introduced in template version 3.3.2) for nf-test framework integration, not custom pipeline code.
📚 Learning: 2025-08-14T09:53:36.777Z
Learnt from: enryH
Repo: bigbio/quantms PR: 574
File: .github/actions/get-shards/action.yml:31-41
Timestamp: 2025-08-14T09:53:36.777Z
Learning: The .github/actions/get-shards/ and .github/actions/nf-test/ actions in quantms are part of the nf-core template (introduced in template version 3.3.2) for nf-test framework integration, not custom pipeline code.
Applied to files:
modules.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: Docker Tests (latest-everything, test_dda_id_fine_tuning)
- GitHub Check: Docker Tests (latest-everything, test_dda_id_ms2pip)
- GitHub Check: Docker Tests (latest-everything, test_localize)
- GitHub Check: Docker Tests (25.04.0, test_dda_id_ms2pip)
- GitHub Check: Docker Tests (latest-everything, test_tmt)
- GitHub Check: Docker Tests (latest-everything, test_lfq)
- GitHub Check: Docker Tests (25.04.0, test_dda_id_alphapeptdeep)
- GitHub Check: Docker Tests (latest-everything, test_dia)
- GitHub Check: Docker Tests (25.04.0, test_lfq)
- GitHub Check: Docker Tests (25.04.0, test_localize)
- GitHub Check: Docker Tests (25.04.0, test_dia)
- GitHub Check: Docker Tests (25.04.0, test_dda_id_fine_tuning)
- GitHub Check: Docker Tests (25.04.0, test_tmt)
- GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_fine_tuning, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_tmt, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_lfq, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_alphapeptdeep, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_ms2pip, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_localize, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dia, docker)
🔇 Additional comments (9)
modules.json (4)
13-17: MultiQC module entry is valid and properly configured.The git SHA
8deffd6a402233ebf905c66dbf804bd4c2637946is confirmed on the nf-core/modules repository, corresponding to commit "fix: multiqc report publishing (#9437)" dated 2025-11-23. The module entry correctly points to a legitimate multiqc-related change that aligns with the PR's release 1.7.0 objectives.
10-12: Git SHAe753770db613ce014b3c4bc94f6cba443427b726is valid on nf-core/modules.The commit exists on the nf-core/modules repository and is a legitimate coordination commit (pre-commit rules enforcement across the repository).
39-55: The bigbio/nf-modules SHAs have been verified as valid commits on the main branch and are production-ready.Both modules are confirmed:
- onsite (e4f16df2acb2c084a5043840212d0fb52f69fbbc): Valid commit on main, author Wei Zhongchun
- thermorawfileparser (a1a4a11ff508b2b5c23c9fb21c51c3327b748d4d): Valid commit on main, author Chengxin Dai
No concerns with these SHAs or module configurations.
22-36: modules.json is a newly created file; no inconsistency exists between the subworkflow SHAs.The review comment incorrectly assumes that
utils_nextflow_pipelinewas "unchanged" while the other two were "updated." In fact, modules.json is being added for the first time in this commit, so all three subworkflows—includingutils_nextflow_pipeline,utils_nfcore_pipeline, andutils_nfschema_plugin—are initial entries in the configuration, not a selective mix of updates. There is no prior state to compare against, and therefore no inconsistency in the SHA versions.Likely an incorrect or invalid review comment.
modules/bigbio/onsite/environment.yml (1)
1-7: LGTM!The environment file is properly structured with the correct channel order. The pinned version
0.0.2matches the container version specified inmain.nf, ensuring consistency across execution environments.modules/bigbio/onsite/tests/nextflow.config (1)
1-3: LGTM!Standard nf-core test configuration pattern for dynamic output directory resolution based on process name.
modules/bigbio/onsite/main.nf (3)
83-86: Shell quoting in default values may cause issues.The embedded single quotes in the default values for
target_mods,neutral_losses, anddecoy_lossesmay cause shell interpretation problems when the command is executed, particularly with the spaces and special characters.Verify these defaults work correctly during integration testing, especially the values containing spaces like
'sty -H3PO4 -97.97690'.
1-9: LGTM on container and process setup.Container versioning is consistent with
environment.yml(both usepyonsite:0.0.2). The tag and labels are appropriately defined for resource allocation and process identification.
122-122: The grep -P concern may not apply to this specific container.While the review correctly notes that BusyBox grep lacks PCRE support (-P flag), BioContainers images are conda-based rather than minimal Alpine/BusyBox images. The pyonsite container (quay.io/biocontainers/pyonsite:0.0.2--pyhdfd78af_0) is built from Bioconda recipes, which automatically include GNU tools like grep with PCRE support. Additionally, the code includes a
|| echo "unknown"fallback that handles version extraction failures gracefully, and integration tests show no version-related failures.If portability to minimal containers is a requirement, the sed alternative could be used, though it adds complexity for an issue unlikely to occur with the current container setup.
| if (algorithm == 'ascore') { | ||
| // AScore: uses -in, -id, -out, --fragment-mass-unit | ||
| fragment_unit = params.onsite_fragment_unit ?: 'Da' | ||
| algorithm_cmd = """ | ||
| onsite ascore \\ | ||
| -in ${mzml_file} \\ | ||
| -id ${id_file} \\ | ||
| -out ${id_file.baseName}_ascore.idXML \\ | ||
| --fragment-mass-tolerance ${fragment_tolerance} \\ | ||
| --fragment-mass-unit ${fragment_unit} \\ | ||
| ${add_decoys} \\ | ||
| ${compute_all_scores} \\ | ||
| ${debug} | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty optional flags may leave trailing backslashes.
When $add_decoys, $compute_all_scores, or $debug are empty strings (e.g., when their corresponding params are false), the command will have lines with only backslashes, potentially causing shell errors.
A common pattern is to consolidate optional flags or filter empty values.
🔎 Proposed fix - use conditional flag inclusion
if (algorithm == 'ascore') {
// AScore: uses -in, -id, -out, --fragment-mass-unit
fragment_unit = params.onsite_fragment_unit ?: 'Da'
+ def optional_flags = [add_decoys, compute_all_scores, debug].findAll { it }.join(' \\\n ')
algorithm_cmd = """
onsite ascore \\
-in ${mzml_file} \\
-id ${id_file} \\
-out ${id_file.baseName}_ascore.idXML \\
--fragment-mass-tolerance ${fragment_tolerance} \\
- --fragment-mass-unit ${fragment_unit} \\
- ${add_decoys} \\
- ${compute_all_scores} \\
- ${debug}
+ --fragment-mass-unit ${fragment_unit}${optional_flags ? ' \\\\\n ' + optional_flags : ''}
"""
}Apply similar pattern to phosphors and lucxor blocks.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In modules/bigbio/onsite/main.nf around lines 40 to 53, the heredoc builds the
onsite ascore command by interpolating optional flag variables directly, which
leaves lines containing only backslashes when those variables are empty;
instead, construct the command by assembling an ordered list/array of mandatory
and optional arguments, filter out empty/null entries, and then join them with "
\\\n " (or equivalent) so no blank backslash-only lines remain; apply
the same conditional list-and-join pattern to the phosphors and lucxor blocks to
avoid trailing backslashes from empty optional flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weizhongchun Can you take this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| """ | ||
| ${algorithm_cmd} \\ | ||
| 2>&1 | tee ${id_file.baseName}_${algorithm}.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing backslash may cause command parsing issues.
The algorithm_cmd blocks already end with trailing backslashes for line continuation, and line 117 adds another \\ after ${algorithm_cmd}. This could result in a malformed command if any optional flags (like $debug) are empty strings, leaving dangling backslashes.
Consider trimming trailing whitespace/backslashes or restructuring the command construction.
🔎 Proposed fix
"""
- ${algorithm_cmd} \\
- 2>&1 | tee ${id_file.baseName}_${algorithm}.log
+ ${algorithm_cmd.trim()} 2>&1 | tee ${id_file.baseName}_${algorithm}.log
cat <<-END_VERSIONS > versions.yml📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """ | |
| ${algorithm_cmd} \\ | |
| 2>&1 | tee ${id_file.baseName}_${algorithm}.log | |
| """ | |
| ${algorithm_cmd.trim()} 2>&1 | tee ${id_file.baseName}_${algorithm}.log | |
| cat <<-END_VERSIONS > versions.yml |
🤖 Prompt for AI Agents
In modules/bigbio/onsite/main.nf around lines 116-118, the line adds an extra
trailing backslash after ${algorithm_cmd} which, combined with algorithm_cmd
already ending in backslashes or empty optional flags, can leave a dangling
continuation and break the shell command; fix by removing the additional literal
backslash and ensure you either trim any trailing backslashes/whitespace from
algorithm_cmd before concatenation or build the full command using conditional
joins (only append line-continuation when a non-empty fragment follows) so the
final command ends cleanly before piping to tee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weizhongchun Can you take this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| input: | ||
| - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. [ id:'test', mzml_id:'sample1' ] | ||
| - mzml_file: | ||
| type: file | ||
| description: Input spectrum file in mzML format | ||
| pattern: "*.mzML" | ||
| - id_file: | ||
| type: file | ||
| description: Protein/peptide identifications file in idXML format | ||
| pattern: "*.idXML" | ||
| - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. [ id:'test', mzml_id:'sample1' ] | ||
| - ptm_in_id_onsite: | ||
| type: file | ||
| description: Protein/peptide identifications file with PTM localization scores | ||
| pattern: "*_{ascore,phosphors,lucxor}.idXML" | ||
| - log: | ||
| type: file | ||
| description: Log file from onsite execution | ||
| pattern: "*.log" | ||
| - versions: | ||
| type: file | ||
| description: File containing software versions | ||
| pattern: "versions.yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input/output sections are malformed.
The meta.yml structure has issues:
metais defined twice (lines 24-28 and 37-41)- Output fields (
ptm_in_id_onsite,log,versions) are incorrectly placed underinput:instead of a separateoutput:section
Per nf-core module conventions, inputs and outputs should be separate sections.
🔎 Proposed fix
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', mzml_id:'sample1' ]
- mzml_file:
type: file
description: Input spectrum file in mzML format
pattern: "*.mzML"
- id_file:
type: file
description: Protein/peptide identifications file in idXML format
pattern: "*.idXML"
- - meta:
- type: map
- description: |
- Groovy Map containing sample information
- e.g. [ id:'test', mzml_id:'sample1' ]
- - ptm_in_id_onsite:
+output:
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', mzml_id:'sample1' ]
+ - ptm_in_id_onsite:
type: file
description: Protein/peptide identifications file with PTM localization scores
pattern: "*_{ascore,phosphors,lucxor}.idXML"
- log:
type: file
description: Log file from onsite execution
pattern: "*.log"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| input: | |
| - meta: | |
| type: map | |
| description: | | |
| Groovy Map containing sample information | |
| e.g. [ id:'test', mzml_id:'sample1' ] | |
| - mzml_file: | |
| type: file | |
| description: Input spectrum file in mzML format | |
| pattern: "*.mzML" | |
| - id_file: | |
| type: file | |
| description: Protein/peptide identifications file in idXML format | |
| pattern: "*.idXML" | |
| - meta: | |
| type: map | |
| description: | | |
| Groovy Map containing sample information | |
| e.g. [ id:'test', mzml_id:'sample1' ] | |
| - ptm_in_id_onsite: | |
| type: file | |
| description: Protein/peptide identifications file with PTM localization scores | |
| pattern: "*_{ascore,phosphors,lucxor}.idXML" | |
| - log: | |
| type: file | |
| description: Log file from onsite execution | |
| pattern: "*.log" | |
| - versions: | |
| type: file | |
| description: File containing software versions | |
| pattern: "versions.yml" | |
| input: | |
| - meta: | |
| type: map | |
| description: | | |
| Groovy Map containing sample information | |
| e.g. [ id:'test', mzml_id:'sample1' ] | |
| - mzml_file: | |
| type: file | |
| description: Input spectrum file in mzML format | |
| pattern: "*.mzML" | |
| - id_file: | |
| type: file | |
| description: Protein/peptide identifications file in idXML format | |
| pattern: "*.idXML" | |
| output: | |
| - meta: | |
| type: map | |
| description: | | |
| Groovy Map containing sample information | |
| e.g. [ id:'test', mzml_id:'sample1' ] | |
| - ptm_in_id_onsite: | |
| type: file | |
| description: Protein/peptide identifications file with PTM localization scores | |
| pattern: "*_{ascore,phosphors,lucxor}.idXML" | |
| - log: | |
| type: file | |
| description: Log file from onsite execution | |
| pattern: "*.log" | |
| - versions: | |
| type: file | |
| description: File containing software versions | |
| pattern: "versions.yml" |
🤖 Prompt for AI Agents
In modules/bigbio/onsite/meta.yml around lines 23-53, the input/output section
is malformed: `meta` is defined twice and several fields (ptm_in_id_onsite, log,
versions) that are outputs are placed under `input`. Remove the duplicate `meta`
entry (merge any differing descriptions into a single `meta` input), keep only
unique input entries (mzml_file, id_file, meta), and create a new top-level
`output:` section containing ptm_in_id_onsite, log, and versions with their
existing type/description/patterns; ensure keys and patterns follow nf-core
conventions and that indentation and YAML ordering are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weizhongchun Can you refine this, this needs to be done in the nf-modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Implementation 🛠️
Implementation: Fix the nf-core module meta.yml structure by removing the duplicated meta input and moving output artifacts (ptm_in_id_onsite, log, versions) into a separate top-level output: section while preserving existing descriptions, types, and patterns.
| input: | |
| - meta: | |
| type: map | |
| description: | | |
| Groovy Map containing sample information | |
| e.g. [ id:'test', mzml_id:'sample1' ] | |
| - mzml_file: | |
| type: file | |
| description: Input spectrum file in mzML format | |
| pattern: "*.mzML" | |
| - id_file: | |
| type: file | |
| description: Protein/peptide identifications file in idXML format | |
| pattern: "*.idXML" | |
| - meta: | |
| type: map | |
| description: | | |
| Groovy Map containing sample information | |
| e.g. [ id:'test', mzml_id:'sample1' ] | |
| - ptm_in_id_onsite: | |
| type: file | |
| description: Protein/peptide identifications file with PTM localization scores | |
| pattern: "*_{ascore,phosphors,lucxor}.idXML" | |
| - log: | |
| type: file | |
| description: Log file from onsite execution | |
| pattern: "*.log" | |
| - versions: | |
| type: file | |
| description: File containing software versions | |
| pattern: "versions.yml" | |
| input: | |
| - meta: | |
| type: map | |
| description: | | |
| Groovy Map containing sample information | |
| e.g. [ id:'test', mzml_id:'sample1' ] | |
| - mzml_file: | |
| type: file | |
| description: Input spectrum file in mzML format | |
| pattern: "*.mzML" | |
| - id_file: | |
| type: file | |
| description: Protein/peptide identifications file in idXML format | |
| pattern: "*.idXML" | |
| output: | |
| - meta: | |
| type: map | |
| description: | | |
| Groovy Map containing sample information | |
| e.g. [ id:'test', mzml_id:'sample1' ] | |
| - ptm_in_id_onsite: | |
| type: file | |
| description: Protein/peptide identifications file with PTM localization scores | |
| pattern: "*_{ascore,phosphors,lucxor}.idXML" | |
| - log: | |
| type: file | |
| description: Log file from onsite execution | |
| pattern: "*.log" | |
| - versions: | |
| type: file | |
| description: File containing software versions | |
| pattern: "versions.yml" |
See review comment here
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
subworkflows/local/peptide_database_search/main.nf (1)
89-92: Previous issue resolved.The
exit(1, ...)has been correctly replaced witherror(...)as recommended in the past review. The error message is now more informative.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
subworkflows/local/peptide_database_search/main.nf
🔇 Additional comments (6)
subworkflows/local/peptide_database_search/main.nf (6)
6-11: LGTM!New module imports are properly structured and follow Nextflow DSL2 conventions.
13-20: LGTM!Workflow declaration with the new
ch_expdesigninput and channel initialization is correct.
147-161: LGTM!The non-fine-tuning path correctly combines channels with required inputs and processes them through
MSRESCORE_FEATURES.
172-181: LGTM!The branching logic correctly separates results by search engine and maps to the expected output structure.
237-240: Emit block is correctly structured.The emit section properly exposes workflow outputs, assuming the
ch_id_files_outundefined variable issue is addressed.
197-204: Combine operation should include validation or use a safer join approach to prevent silent data loss.The
combine(..., by: 0)operations (lines 198, 201, 204) perform an inner join onmzml_id. While bothch_id_files_*_featsandsample_map_idvderive from the samech_expdesignsource file using identical filename extraction logic, there is no explicit validation that all files are preserved through the sample mapping. If theopenms2sampletool filters entries or if unexpected discrepancies occur in file naming, data could be silently dropped. Consider adding error handling or using a join pattern that fails on missing keys to ensure data integrity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This is a major release (1.7.0) introducing significant changes to the quantms pipeline's rescoring framework and PTM localization capabilities.
Key changes:
- Replaces the ms2rescore parameter naming scheme with a new ms2features framework supporting multiple prediction models (AlphaPeptDeep, MS2PIP)
- Introduces Onsite module to replace Luciphor for PTM localization, providing multiple algorithm options (AScore, PhosphoRS, LucXor)
- Adds fine-tuning capability for AlphaPeptDeep MS2 prediction models
Reviewed changes
Copilot reviewed 100 out of 103 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| workflows/quantms.nf | Major refactoring: removed .ifEmpty(null) patterns, improved code formatting, restructured multiqc input handling |
| workflows/tmt.nf, lfq.nf, dia.nf | Removed .ifEmpty(null) patterns for cleaner version channel handling |
| nextflow.config | Renamed ms2rescore parameters to ms2features_*, updated Nextflow requirement to 25.04.0, version bump to 1.7.0 |
| subworkflows/local/psm_rescoring | Simplified rescoring logic by moving feature generation upstream to peptide_database_search |
| subworkflows/local/peptide_database_search | Added MS2Features generation, fine-tuning, and sample/project-level merging logic |
| subworkflows/local/phospho_scoring | Replaced LUCIPHOR with ONSITE module for PTM localization |
| modules/bigbio/onsite | New module supporting three PTM localization algorithms (ascore, phosphors, lucxor) |
| modules/bigbio/thermorawfileparser | Migrated from local to bigbio nf-modules repository with improved stub mode |
| bin/msstats_tmt.R | Fixed critical bug in contrast matrix generation for control conditions, code formatting improvements |
| nextflow_schema.json | Comprehensive parameter updates for ms2features framework and onsite configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
subworkflows/local/dda_id/main.nf (1)
71-80: TheCONSENSUSID.out.versionsaccess is safe here.The previous concern about
CONSENSUSID.out.versionsbeing undefined when only one search engine is used is not applicable because line 73 is inside the conditional block that only executes whensize() > 1. When a single search engine is used, this block is skipped entirely andch_psmfdrcontrolis assigned directly fromch_consensus_input.subworkflows/local/peptide_database_search/main.nf (2)
83-88: Fallback path doesn't validate directory existence.Line 87 uses
file("pretrained_models")withoutcheckIfExists: true, unlike line 84. If thepretrained_modelsdirectory doesn't exist, this will fail at runtime without a clear error message.🔎 Proposed fix
if (params.ms2features_model_dir && params.ms2features_model_dir != true) { ms2_model_dir = Channel.from(file(params.ms2features_model_dir, checkIfExists: true)) } else { - ms2_model_dir = Channel.from(file("pretrained_models")) + ms2_model_dir = Channel.from(file("pretrained_models", checkIfExists: true)) }
246-253: Missingdefkeywords for local variables inget_sample_map.Variables
filestr,file_name, andsample(lines 248-250) are declared withoutdef, making them script-scoped rather than function-local. This could cause unexpected behavior in concurrent contexts within Nextflow.🔎 Proposed fix
// Function to get sample map def get_sample_map(LinkedHashMap row) { - - filestr = row.Spectra_Filepath - file_name = file(filestr).name.take(file(filestr).name.lastIndexOf('.')) - sample = row.Sample - + def filestr = row.Spectra_Filepath + def file_name = file(filestr).name.take(file(filestr).name.lastIndexOf('.')) + def sample = row.Sample return [file_name, sample] - }
🧹 Nitpick comments (5)
subworkflows/local/dda_id/main.nf (2)
85-91: Inconsistent use ofifEmpty(null)for PHOSPHO_SCORING.Line 87 uses
.ifEmpty(null)forPHOSPHO_SCORING.out.versions, while other version channels in this file (lines 35, 48, 52, 60, 73, 83, 95) now use direct access without the fallback. Consider making this consistent, or document why the fallback is needed here specifically.
112-122: Consider making theadd_file_prefixfunction more robust.The function assumes specific file suffixes (
_sage_perc.idXML,_comet_perc.idXML,_msgf_perc.idXML). If none match,positionremains-1, andfile_name = file(file_path).name.take(-1)will return an empty string or unexpected result.🔎 Proposed fix
def add_file_prefix(file_path) { position = file(file_path).name.lastIndexOf('_sage_perc.idXML') if (position == -1) { position = file(file_path).name.lastIndexOf('_comet_perc.idXML') if (position == -1) { position = file(file_path).name.lastIndexOf('_msgf_perc.idXML') + if (position == -1) { + // Fallback: use filename without extension + position = file(file_path).name.lastIndexOf('.') + } } } file_name = file(file_path).name.take(position) return [file_name, file_path] }subworkflows/local/peptide_database_search/main.nf (3)
96-102: Hardcoded random seed reduces variability across runs.The
randomSample(params.fine_tuning_sample_run, 2025)uses a fixed seed2025. While this ensures reproducibility, it means every run will select the same samples. Consider making the seed configurable via a parameter if variability across runs is desired.
148-162: Channel variable reassignment may cause confusion.Lines 149-157 reassign
ch_id_msgf,ch_id_comet, andch_id_sageto new channel structures, which shadows the original channel values. This could lead to maintenance confusion. Consider using distinct variable names.🔎 Proposed fix - use distinct names
- ch_id_msgf.combine(ch_mzmls_search, by: 0) + ch_id_msgf_combined = ch_id_msgf.combine(ch_mzmls_search, by: 0) .combine(ms2_model_dir) - .combine(Channel.value("msgf")).set{ ch_id_msgf } - ch_id_comet.combine(ch_mzmls_search, by: 0) + .combine(Channel.value("msgf")) + ch_id_comet_combined = ch_id_comet.combine(ch_mzmls_search, by: 0) .combine(ms2_model_dir) - .combine(Channel.value("comet")).set{ ch_id_comet } - ch_id_sage.combine(ch_mzmls_search, by: 0) + .combine(Channel.value("comet")) + ch_id_sage_combined = ch_id_sage.combine(ch_mzmls_search, by: 0) .combine(ms2_model_dir) - .combine(Channel.value("sage")).set{ ch_id_sage } + .combine(Channel.value("sage")) - MSRESCORE_FEATURES(ch_id_msgf.mix(ch_id_comet).mix(ch_id_sage)) + MSRESCORE_FEATURES(ch_id_msgf_combined.mix(ch_id_comet_combined).mix(ch_id_sage_combined))
207-213:groupTuple(by: 2)relies on positional index.The grouping assumes the third element (index 2) is always the sample identifier. If the tuple structure changes upstream (lines 198-205), this will silently break. Consider using a named key or adding a comment documenting the expected tuple structure.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
modules/local/utils/msrescore_fine_tuning/main.nfsubworkflows/local/dda_id/main.nfsubworkflows/local/peptide_database_search/main.nf
🚧 Files skipped from review as they are similar to previous changes (1)
- modules/local/utils/msrescore_fine_tuning/main.nf
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: Docker Tests (25.04.0, test_localize)
- GitHub Check: Docker Tests (25.04.0, test_tmt_corr)
- GitHub Check: Docker Tests (latest-everything, test_localize)
- GitHub Check: Docker Tests (latest-everything, test_dda_id_fine_tuning)
- GitHub Check: Docker Tests (latest-everything, test_lfq)
- GitHub Check: Docker Tests (25.04.0, test_lfq)
- GitHub Check: Docker Tests (latest-everything, test_dia)
- GitHub Check: Docker Tests (latest-everything, test_dda_id_alphapeptdeep)
- GitHub Check: Docker Tests (latest-everything, test_dda_id_ms2pip)
- GitHub Check: Docker Tests (latest-everything, test_tmt_corr)
- GitHub Check: Docker Tests (25.04.0, test_dda_id_alphapeptdeep)
- GitHub Check: Docker Tests (25.04.0, test_dda_id_fine_tuning)
- GitHub Check: Docker Tests (25.04.0, test_tmt)
- GitHub Check: Run pipeline with test data (25.04.0, test_tmt, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_ms2pip, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_localize, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_lfq, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dda_id_alphapeptdeep, docker)
- GitHub Check: Run pipeline with test data (25.04.0, test_dia, docker)
🔇 Additional comments (5)
subworkflows/local/dda_id/main.nf (2)
8-8: LGTM!The new
PHOSPHO_SCORINGsubworkflow import aligns with the conditional phospho scoring logic added at lines 85-91.
30-36: LGTM!The addition of
ch_expdesigntoPEPTIDE_DATABASE_SEARCHand the channel rename fromch_id_files_idxtoch_id_files_featsalign with the broader refactoring for ms2features support.subworkflows/local/peptide_database_search/main.nf (3)
6-11: LGTM!The new module imports are properly structured and align with the ms2features functionality being added.
90-93: LGTM -error()function is now used correctly.The previous issue about using
exit()instead oferror()has been addressed.
231-238: LGTM -ch_id_files_outis now properly defined in all branches.The previous issue about
ch_id_files_outbeing undefined whenskip_rescoring == trueandpsm_clean != truehas been addressed with the else branch at line 236-238.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).Summary by CodeRabbit
New Features
Bug Fixes
Chores
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.