-
Notifications
You must be signed in to change notification settings - Fork 52
Add early validation for duplicate experimental design combinations #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the Comment |
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
|
@copilot why not extending only the following process |
…sses Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Refactored to integrate validation directly into both |
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
|
@copilot should we add the python code to this library quantms-utils into this package https://github.com/bigbio/quantms-utils/tree/main/quantmsutils/sdrf |
That's a good suggestion for future refactoring. The current implementation in
This could be done in a follow-up PR to keep this fix focused and deployable. The validation logic is already working and will be immediately available when this PR merges. |
|
@copilot do the steps that you are suggesting. I will release, etc |
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Updated PREPROCESS_EXPDESIGN to call Required in quantms-utils:
The reference implementation is in |
|
@copilot can you just remove the QUANTMS_UTILS_INTEGRATION.md and do yourself the PR to quantms-utils repo? |
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Removed I cannot create PRs to the quantms-utils repository from this environment as I only have access to the quantms repository. The validation logic is in To integrate into quantms-utils, the logic should be added as a new command |
jpfeuffer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I think we also have more restrictions like consecutive numbers or not reusing numbers across different groups. We should double-check to add those checks here.
Plan: Add validation for duplicate (Fraction Group, Fraction, Label) combinations
Summary
Successfully integrated experimental design validation:
✅ Integrated validation - Added validation calls directly into
SDRF_PARSINGandPREPROCESS_EXPDESIGNprocesses✅ PREPROCESS_EXPDESIGN - Calls
quantmsutilsc validateexpdesigncommand (requires new quantms-utils version with validation)✅ SDRF_PARSING - Uses
bin/validate_expdesign.pyscript (sdrf-pipelines container compatibility)✅ Reference implementation -
bin/validate_expdesign.pyprovides validation logic that can be adapted for quantms-utilsFor quantms-utils integration:
The validation logic in
bin/validate_expdesign.pyneeds to be added to quantms-utils as:quantmsutilsc validateexpdesign --expdesign <file>quantmsutils/sdrf/expdesign_validator.pyThe validation provides immediate feedback on experimental design errors, preventing wasted compute time from late-stage failures.
Original prompt
This section details on the original issue you should resolve
<issue_title>Error: Unexpected internal error ((Fraction Group, Fraction, Label) combination can only appear once)</issue_title>
<issue_description>### Description of the bug
Despite of having unique combination of Fraction_Group Fraction and Label, at PROTEIN_QUANTIFIER process, the pipeline ends with an error ((Fraction Group, Fraction, Label) combination can only appear once)
Attached are the SDRF.tsv, opens_design.tsv and screenshot of the run.
PXD009920.sdrf.tsv
PXD009920.sdrf_openms_design.tsv
Command used and terminal output
Relevant files
No response
System information
No response</issue_description>
<agent_instructions>It is really common in LFQ/TMT experiments when executing the proteomicsLFQ or ProteinQuantifier/MSstatsConverter step that we find the following message:
This is because when we do the conversion from SDRF to openms experimental design *.sdrf_openms_design.tsv the output file contains duplication of these combinations. I think we can add in the conversion step a validation step to fail the pipeline before starting, this is important because after running, if we change the SDRF we need to run/cache everything again. Also is really frustrating to find out an experimental desing error after the everything has wrong, please also add in the validation of the output also which row sample combination is repeated. </agent_instructions>
Comments on the Issue (you are @copilot in this section)
@jpfeuffer Hi!The error is here:
<comment_new>@daichengxin
you could use this script to check duplicate rows @nayanvs :