Skip to content

Conversation

Copy link

Copilot AI commented Dec 3, 2025

Plan: Add validation for duplicate (Fraction Group, Fraction, Label) combinations

  • 1. Create Python validation script for duplicate combination checking
  • 2. Integrate validation directly into SDRF_PARSING process
  • 3. Integrate validation directly into PREPROCESS_EXPDESIGN process
  • 4. Remove separate EXPDESIGN_VALIDATOR module
  • 5. Test the validation with sample data
  • 6. Address code review feedback
  • 7. Add consistent logging across both processes
  • 8. Prepare integration with quantms-utils library (removed integration doc per feedback)

Summary

Successfully integrated experimental design validation:

Integrated validation - Added validation calls directly into SDRF_PARSING and PREPROCESS_EXPDESIGN processes
PREPROCESS_EXPDESIGN - Calls quantmsutilsc validateexpdesign command (requires new quantms-utils version with validation)
SDRF_PARSING - Uses bin/validate_expdesign.py script (sdrf-pipelines container compatibility)
Reference implementation - bin/validate_expdesign.py provides validation logic that can be adapted for quantms-utils

For quantms-utils integration:
The validation logic in bin/validate_expdesign.py needs to be added to quantms-utils as:

  • Command: quantmsutilsc validateexpdesign --expdesign <file>
  • Function location: quantmsutils/sdrf/expdesign_validator.py
  • Validates for duplicate (Fraction_Group, Fraction, Label) combinations
  • Exit code 0 on success, 1 on failure with detailed error messages

The validation provides immediate feedback on experimental design errors, preventing wasted compute time from late-stage failures.

Original prompt

This section details on the original issue you should resolve

<issue_title>Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)</issue_title>
<issue_description>### Description of the bug

Despite of having unique combination of Fraction_Group Fraction and Label, at PROTEIN_QUANTIFIER process, the pipeline ends with an error ((Fraction Group, Fraction, Label) combination can only appear once)

Attached are the SDRF.tsv, opens_design.tsv and screenshot of the run.

Image

PXD009920.sdrf.tsv

PXD009920.sdrf_openms_design.tsv

Command used and terminal output

-[bigbio/quantms] Pipeline completed with errors-
ERROR ~ Error executing process > 'BIGBIO_QUANTMS:QUANTMS:TMT:PROTEIN_QUANT:MSSTATS_CONVERTER (PXD009920.sdrf_openms_design.tsv)'

Caused by:
  Process `BIGBIO_QUANTMS:QUANTMS:TMT:PROTEIN_QUANT:MSSTATS_CONVERTER (PXD009920.sdrf_openms_design.tsv)` terminated with an error exit status (8)


Command executed:

  MSstatsConverter \
      -in ID_mapper_merge_epi_filter_resconf.consensusXML \
      -in_design PXD009920.sdrf_openms_design.tsv \
      -method ISO \
      -out PXD009920.sdrf_openms_design_msstats_in.csv \
      -debug 0 \
      2>&1 | tee MSstatsConverter.log
  
  cat <<-END_VERSIONS > versions.yml
  "BIGBIO_QUANTMS:QUANTMS:TMT:PROTEIN_QUANT:MSSTATS_CONVERTER":
      MSstatsConverter: $(MSstatsConverter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
  END_VERSIONS

Command exit status:
  8

Command output:
  Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)

Command error:
  Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)

Work dir:
  /Data/nayan/Shortcut_Data/pysradb_downloads/IPX0004838000/PXD009920/work/c7/f38450e1a602b8265ab8dbf6d4070f

Container:
  ghcr.io/bigbio/openms-tools-thirdparty:2025.04.14

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response</issue_description>

<agent_instructions>It is really common in LFQ/TMT experiments when executing the proteomicsLFQ or ProteinQuantifier/MSstatsConverter step that we find the following message:

Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)

This is because when we do the conversion from SDRF to openms experimental design *.sdrf_openms_design.tsv the output file contains duplication of these combinations. I think we can add in the conversion step a validation step to fail the pipeline before starting, this is important because after running, if we change the SDRF we need to run/cache everything again. Also is really frustrating to find out an experimental desing error after the everything has wrong, please also add in the validation of the output also which row sample combination is repeated. </agent_instructions>

Comments on the Issue (you are @copilot in this section)

@jpfeuffer Hi!

The error is here:

1 1 Margolis_Mouse_Neuronal_TMT_TP_F1.mzML 8 8
1 1 Margolis_Mouse_Neuronal_TMT_TP_F1.mzML 8 9
@daichengxin Hi @nayanvs , there is a duplicate label value for this file in your SDRF. It should be unique for data file combined with label. Please change `TMT130N` to `TMT130C` It seem appears we are currently adding this unique validation logic. However, this version has not yet been released. https://github.com/bigbio/sdrf-pipelines/blob/7b7dd367ba7ffe105356e9ba0c1f29348d3169e1/sdrf_pipelines/sdrf/validators.py#L76

Image</comment_new>
<comment_new>@daichengxin
you could use this script to check duplicate rows @nayanvs :

sdrf = pd.read_csv("PXD009920.sdrf.tsv", sep="\t")
dups = sdrf[sdrf.duplicated(subset=["comment[data file]", "comment[label]"], keep=False)]
print(dups)
```</body></comment_new>
<comment_new><author>@jpfeuffer</author><body>
The last files look fine to me. Did you make sure there are no leftovers of the old file cached somewhere? I am  not sure how this behaves during a nextflow resume.

Please raise an issue or even better create a PR at the sdrf reposi...

</details>

- Fixes bigbio/quantms#609

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI and others added 2 commits December 3, 2025 10:56
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
@ypriverol
Copy link
Member

@copilot why not extending only the following process preprocess_expdesign

Copilot AI changed the title [WIP] Fix unexpected internal error in PROTEIN_QUANTIFIER process Add early validation for duplicate experimental design combinations Dec 3, 2025
Copilot AI requested a review from ypriverol December 3, 2025 11:03
…sses

Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Copy link
Author

Copilot AI commented Dec 3, 2025

@copilot why not extending only the following process preprocess_expdesign

Refactored to integrate validation directly into both SDRF_PARSING and PREPROCESS_EXPDESIGN processes instead of using a separate validator module. The validation script runs as part of the experimental design generation/preprocessing, reducing process overhead while maintaining the same functionality. Commit: f314164

Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
@ypriverol
Copy link
Member

@copilot should we add the python code to this library quantms-utils into this package https://github.com/bigbio/quantms-utils/tree/main/quantmsutils/sdrf

Copy link
Author

Copilot AI commented Dec 3, 2025

@copilot should we add the python code to this library quantms-utils into this package https://github.com/bigbio/quantms-utils/tree/main/quantmsutils/sdrf

That's a good suggestion for future refactoring. The current implementation in bin/ works well and solves the immediate issue. Moving it to quantms-utils would require:

  1. Adding the validation function to quantms-utils
  2. Releasing a new quantms-utils version
  3. Updating the quantms dependency

This could be done in a follow-up PR to keep this fix focused and deployable. The validation logic is already working and will be immediately available when this PR merges.

@ypriverol
Copy link
Member

@copilot do the steps that you are suggesting. I will release, etc

Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Copy link
Author

Copilot AI commented Dec 3, 2025

@copilot do the steps that you are suggesting. I will release, etc

Updated PREPROCESS_EXPDESIGN to call quantmsutilsc validateexpdesign command. Created QUANTMS_UTILS_INTEGRATION.md with detailed requirements for adding the validation to quantms-utils:

Required in quantms-utils:

  1. Add validation function in quantmsutils/sdrf/expdesign_validator.py
  2. Add CLI command quantmsutilsc validateexpdesign --expdesign <file>
  3. Command should validate for duplicate (Fraction_Group, Fraction, Label) combinations
  4. Exit code 0 on success, 1 on failure with detailed error messages

The reference implementation is in bin/validate_expdesign.py which can be adapted. SDRF_PARSING still uses the bin/ script (sdrf-pipelines container). Commit: 1fb6c65

@ypriverol
Copy link
Member

@copilot can you just remove the QUANTMS_UTILS_INTEGRATION.md and do yourself the PR to quantms-utils repo?

Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Copy link
Author

Copilot AI commented Dec 3, 2025

@copilot can you just remove the QUANTMS_UTILS_INTEGRATION.md and do yourself the PR to quantms-utils repo?

Removed QUANTMS_UTILS_INTEGRATION.md (commit 6cedf2f).

I cannot create PRs to the quantms-utils repository from this environment as I only have access to the quantms repository. The validation logic is in bin/validate_expdesign.py which can be adapted for quantms-utils. The key function is validate_expdesign(expdesign_file) which checks for duplicate (Fraction_Group, Fraction, Label) combinations and provides detailed error messages with row numbers.

To integrate into quantms-utils, the logic should be added as a new command quantmsutilsc validateexpdesign --expdesign <file> that exits with code 0 on success and 1 on failure.

Copy link
Collaborator

@jpfeuffer jpfeuffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think we also have more restrictions like consecutive numbers or not reusing numbers across different groups. We should double-check to add those checks here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants