Skip to content

Commit 59013a4

Browse files
authored
Merge changes from discvr-20.3 (#42)
* Mark intermediate BAMs for deletion in pipeline * Migrate DepthOfCoverage to GARK4 * Report depth by amplicon * Pass -L to DepthOfCoverage in all cases * Retain TSV output for DepthOfCoverage * Delete output BAM if it exists for lofreq indelqual * Use annotated VCF as output * Don't use thousands separator in table output * Also report median depth over amplicons * After failure, ensure we delete the working copy of the GenomicsDB workspace and remake. * Add additional feature attributes to index * Support MINIMUM_DISTANCE for MarkDuplicatesWithMate * Fault tolerance for ApplyBQSR * Ensure wait indicator hidden * Add check to prevent STAR submission without GTF unless specifically chosen * Bugfix GenomicsDBImport * Allow GenomicsDBWorkspace as input * Create FileType for GenomicsDB * Bugfix STAR validation * Add --genomicsdb-use-vcf-codec workaround to GenotypeGVCFs * ChangeReadsetStatus broken from earlier refactor * Switch svn -> github * upgrade to dijit 1.6.3 based on dependabot recommendation * capture mean coverage as metric for lofreq * Favor DataRegion getChecked over getSelected
1 parent 1fde264 commit 59013a4

File tree

16 files changed

+159
-100
lines changed

16 files changed

+159
-100
lines changed

SequenceAnalysis/resources/views/siteAdmin.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
html: '<h3>Installation of Sequence Tools</h3>'
2222
},{
2323
html: 'DISCVR-Seq performs multiple functions, including management of data and analysis pipelines. The latter requires many external tools, such as sequence aligners.<br><br>' +
24-
'Tools can be installed manually; however, we have created a bash script to facilitate installation of the core tools. You can <b><a target="_blank" href="https://cpas:cpas@svn.mgt.labkey.host/stedi/trunk/externalModules/labModules/SequenceAnalysis/pipeline_code/sequence_tools_install.sh">download this script from subversion here</a></b>, using the username/password: cpas/cpas. It can be executed using a command like:<br><br>' +
24+
'Tools can be installed manually; however, we have created a bash script to facilitate installation of the core tools. You can <b><a target="_blank" href="https://raw.githubusercontent.com/BimberLab/DiscvrLabKeyModules/discvr-20.3/SequenceAnalysis/pipeline_code/sequence_tools_install.sh">download this script from github here</a></b>. It can be executed using a command like:<br><br>' +
2525
'bash sequence_tools_install.sh -d /usr/local/labkey/ -u labkey | tee sequence_tools_install.log<br><br>' +
2626
'The command above will install the various tools into /usr/local/labkey/bin, using the user \'labkey\'.<br><br>'
2727
},{
@@ -32,8 +32,8 @@
3232
'<ul><li><a href="https://www.labkey.org/Documentation/wiki-page.view?name=installConfigureEnterprisePipeline">Install/Configure Enterprise Pipeline</a></li>' +
3333
'<li><a href="https://www.labkey.org/Documentation/wiki-page.view?name=configureRemoteServer">Configure Remote Server</a></li></ul>' +
3434
'Beyond the core configuration, you will need to perform several steps for DISCVR-Seq:' +
35-
'<ul><li>By default, all of the DISCVR-Seq\'s tasks will run locally (on the location \'webserver\'). This is so the module works out of the box on a given server. Changing this is done using a file comparable to the ms2Config.xml described in the \'Configure Remote Server\' link above. You will need to create the file \'sequenceanalysisConfig.xml\' in the /configs folder where your LK server is installed (i.e. /usr/local/labkey/configs/). <b><a href="https://cpas:cpas@svn.mgt.labkey.host/stedi/trunk/externalModules/labModules/SequenceAnalysis/tools/pipeline_config/sequenceanalysisConfig.xml">Click here to view an example config file with comments</a></b>. (use the username/password: cpas/cpas). You will need to do this on both the webserver and remote server.</li>' +
36-
'<li>You will also need to configure the pipelineConfig.xml file on the remote server. <b><a href="https://cpas:cpas@svn.mgt.labkey.host/stedi/trunk/externalModules/labModules/SequenceAnalysis/tools/pipeline_config/pipelineConfig_remote.xml">Click here to view an example with comments.</a></b> Please note that you will need to configure the host name of the remote server to match the name you use in sequenceanalysisConfig.xml</li></ul>'
35+
'<ul><li>By default, all of the DISCVR-Seq\'s tasks will run locally (on the location \'webserver\'). This is so the module works out of the box on a given server. Changing this is done using a file comparable to the ms2Config.xml described in the \'Configure Remote Server\' link above. You will need to create the file \'sequenceanalysisConfig.xml\' in the /configs folder where your LK server is installed (i.e. /usr/local/labkey/configs/). <b><a href="https://raw.githubusercontent.com/BimberLab/DiscvrLabKeyModules/discvr-20.3/SequenceAnalysis/tools/pipeline_config/sequenceanalysisConfig.xml">Click here to view an example config file with comments</a></b>. You will need to do this on both the webserver and remote server.</li>' +
36+
'<li>You will also need to configure the pipelineConfig.xml file on the remote server. <b><a href="https://raw.githubusercontent.com/BimberLab/DiscvrLabKeyModules/discvr-20.3/SequenceAnalysis/tools/pipeline_config/pipelineConfig_remote.xml">Click here to view an example with comments.</a></b> Please note that you will need to configure the host name of the remote server to match the name you use in sequenceanalysisConfig.xml</li></ul>'
3737
},{
3838
html: '<h3>Sequence Pipeline Validation</h3>'
3939
},{

SequenceAnalysis/resources/web/SequenceAnalysis/panel/AnalysisSectionPanel.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Ext4.define('SequenceAnalysis.panel.AnalysisSectionPanel', {
5050
var val = i.additionalExtConfig[prop];
5151
if (Ext4.isString(val) && val.match(/^js:/)){
5252
val = val.replace(/^js:/, '');
53-
val = eval(val);
53+
val = eval("false || " + val);
5454

5555
i.additionalExtConfig[prop] = val;
5656
}

SequenceAnalysis/resources/web/SequenceAnalysis/window/AddFileSetsWindow.js

Lines changed: 11 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,18 @@ Ext4.define('SequenceAnalysis.window.AddFileSetsWindow', {
88

99
statics: {
1010
buttonHandlerForOutputFiles: function(dataRegionName){
11-
var dr = LABKEY.DataRegions[dataRegionName];
12-
Ext4.Msg.wait('Loading...');
13-
dr.getSelected({
14-
scope: this,
15-
success: function(results, response) {
16-
if (!results || !results.selected || !results.selected.length) {
17-
Ext4.Msg.alert('Error', 'No rows selected');
18-
return;
19-
}
11+
var checked = LABKEY.DataRegions[dataRegionName].getChecked();
12+
if (!checked.length) {
13+
Ext4.Msg.alert('Error', 'No rows selected');
14+
return;
15+
}
2016

21-
var checked = LABKEY.DataRegions[dataRegionName].getChecked();
22-
Ext4.create('SequenceAnalysis.window.AddFileSetsWindow', {
23-
targetTable: 'outputfiles',
24-
targetField: 'outputFileId',
25-
dataRegionName: dataRegionName,
26-
pks: checked
27-
}).show();
28-
},
29-
failure: LDK.Utils.getErrorCallback()
30-
});
17+
Ext4.create('SequenceAnalysis.window.AddFileSetsWindow', {
18+
targetTable: 'outputfiles',
19+
targetField: 'outputFileId',
20+
dataRegionName: dataRegionName,
21+
pks: checked
22+
}).show();
3123
}
3224
},
3325

SequenceAnalysis/resources/web/SequenceAnalysis/window/ChangeReadsetStatusWindow.js

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Ext4.define('SequenceBasedGenotyping.window.ChangeReadsetStatusWindow', {
1+
Ext4.define('SequenceAnalysis.window.ChangeReadsetStatusWindow', {
22
extend: 'Ext.window.Window',
33

44
statics: {
@@ -12,7 +12,7 @@ Ext4.define('SequenceBasedGenotyping.window.ChangeReadsetStatusWindow', {
1212
return;
1313
}
1414

15-
Ext4.create('SequenceBasedGenotyping.window.ChangeReadsetStatusWindow', {
15+
Ext4.create('SequenceAnalysis.window.ChangeReadsetStatusWindow', {
1616
targetQuery: 'sequence_analyses',
1717
targetColumns: 'readset/rowid,readset/container',
1818
readsetField: 'readset/rowid',
@@ -32,7 +32,7 @@ Ext4.define('SequenceBasedGenotyping.window.ChangeReadsetStatusWindow', {
3232
return;
3333
}
3434

35-
Ext4.create('SequenceBasedGenotyping.window.ChangeReadsetStatusWindow', {
35+
Ext4.create('SequenceAnalysis.window.ChangeReadsetStatusWindow', {
3636
targetQuery: 'sequence_readsets',
3737
targetColumns: 'rowid,container',
3838
readsetField: 'rowid',

SequenceAnalysis/resources/web/SequenceAnalysis/window/OutputHandlerWindow.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ Ext4.define('SequenceAnalysis.window.OutputHandlerWindow', {
161161
var val = i.additionalExtConfig[prop];
162162
if (Ext4.isString(val) && val.match(/^js:/)){
163163
val = val.replace(/^js:/, '');
164-
val = eval(val);
164+
val = eval("false || " + val);
165165

166166
i.additionalExtConfig[prop] = val;
167167
}

SequenceAnalysis/resources/web/SequenceAnalysis/window/VisualizeDataWindow.js

Lines changed: 25 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,32 @@ Ext4.define('SequenceAnalysis.window.VisualizeDataWindow', {
22
extend: 'Ext.window.Window',
33
statics: {
44
buttonHandler: function(dataRegionName){
5-
var dataRegion = LABKEY.DataRegions[dataRegionName];
6-
dataRegion.getSelected({
7-
scope: this,
8-
success: function (results, response) {
9-
if (!results || !results.selected || !results.selected.length) {
10-
Ext4.Msg.alert('Error', 'No rows selected');
11-
return;
12-
}
13-
14-
var checked = LABKEY.DataRegions[dataRegionName].getChecked();
15-
16-
Ext4.Msg.wait('Loading...');
17-
LABKEY.Ajax.request({
18-
method: 'POST',
19-
url: LABKEY.ActionURL.buildURL('sequenceanalysis', 'getAvailableHandlers', null),
20-
params: {
21-
handlerType: 'OutputFile',
22-
outputFileIds: checked
23-
},
24-
scope: this,
25-
failure: LDK.Utils.getErrorCallback(),
26-
success: LABKEY.Utils.getCallbackWrapper(function (results) {
27-
Ext4.Msg.hide();
28-
29-
Ext4.create('SequenceAnalysis.window.VisualizeDataWindow', {
30-
dataRegionName: dataRegionName,
31-
handlers: results.handlers,
32-
partialHandlers: results.partialHandlers,
33-
outputFileIds: checked
34-
}).show();
35-
}, this)
36-
});
5+
var checked = LABKEY.DataRegions[dataRegionName].getChecked();
6+
if (!checked.length) {
7+
Ext4.Msg.alert('Error', 'No rows selected');
8+
return;
9+
}
10+
11+
Ext4.Msg.wait('Loading...');
12+
LABKEY.Ajax.request({
13+
method: 'POST',
14+
url: LABKEY.ActionURL.buildURL('sequenceanalysis', 'getAvailableHandlers', null),
15+
params: {
16+
handlerType: 'OutputFile',
17+
outputFileIds: checked
3718
},
38-
failure: LDK.Utils.getErrorCallback()
19+
scope: this,
20+
failure: LDK.Utils.getErrorCallback(),
21+
success: LABKEY.Utils.getCallbackWrapper(function (results) {
22+
Ext4.Msg.hide();
23+
24+
Ext4.create('SequenceAnalysis.window.VisualizeDataWindow', {
25+
dataRegionName: dataRegionName,
26+
handlers: results.handlers,
27+
partialHandlers: results.partialHandlers,
28+
outputFileIds: checked
29+
}).show();
30+
}, this)
3931
});
4032
}
4133
},

SequenceAnalysis/src/org/labkey/sequenceanalysis/pipeline/ProcessVariantsHandler.java

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,9 @@
4040
import org.labkey.api.view.ActionURL;
4141
import org.labkey.api.writer.PrintWriters;
4242
import org.labkey.sequenceanalysis.SequenceAnalysisModule;
43+
import org.labkey.sequenceanalysis.run.util.AbstractGenomicsDBImportHandler;
4344
import org.labkey.sequenceanalysis.run.util.CombineVariantsWrapper;
45+
import org.labkey.sequenceanalysis.util.SequenceUtil;
4446

4547
import java.io.File;
4648
import java.io.IOException;
@@ -275,15 +277,7 @@ public static void initVariantProcessing(PipelineJob job, SequenceAnalysisJobSup
275277
for (SequenceOutputFile so : inputFiles)
276278
{
277279
job.getLogger().info("reading file: " + so.getFile().getName());
278-
try (FeatureReader reader = AbstractFeatureReader.getFeatureReader(so.getFile().getPath(), new VCFCodec(), false))
279-
{
280-
VCFHeader header = (VCFHeader)reader.getHeader();
281-
sampleNames.addAll(header.getSampleNamesInOrder());
282-
}
283-
catch (IOException e)
284-
{
285-
throw new PipelineJobException(e);
286-
}
280+
sampleNames.addAll(getSamples(so.getFile()));
287281
}
288282

289283
job.getLogger().info("total samples: " + sampleNames.size());
@@ -850,4 +844,28 @@ public void serializeTest() throws Exception
850844
f.delete();
851845
}
852846
}
847+
848+
private static Collection<String> getSamples(File input) throws PipelineJobException
849+
{
850+
if (SequenceUtil.FILETYPE.vcf.getFileType().isType(input))
851+
{
852+
try (FeatureReader reader = AbstractFeatureReader.getFeatureReader(input.getPath(), new VCFCodec(), false))
853+
{
854+
VCFHeader header = (VCFHeader) reader.getHeader();
855+
return header.getSampleNamesInOrder();
856+
}
857+
catch (IOException e)
858+
{
859+
throw new PipelineJobException(e);
860+
}
861+
}
862+
else if (AbstractGenomicsDBImportHandler.TILE_DB_FILETYPE.isType(input))
863+
{
864+
return AbstractGenomicsDBImportHandler.getSamplesForWorkspace(input.getParentFile());
865+
}
866+
else
867+
{
868+
throw new PipelineJobException("Unknown file type: " + input.getPath());
869+
}
870+
}
853871
}

SequenceAnalysis/src/org/labkey/sequenceanalysis/run/alignment/StarWrapper.java

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,10 @@ public AlignmentOutput performAlignment(Readset rs, File inputFastq1, @Nullable
213213
args.add("--quantMode");
214214
args.add("GeneCounts");
215215
}
216+
else
217+
{
218+
getPipelineCtx().getLogger().info("No GTF was provided, so gene counts will not be created");
219+
}
216220

217221
addThreadArgs(args);
218222
getWrapper().execute(args);
@@ -432,7 +436,11 @@ public Provider()
432436
put("extensions", Arrays.asList("gtf", "gff"));
433437
put("width", 400);
434438
put("allowBlank", true);
439+
put("getErrors", "js:function(){var errors = [];var val = this.getValue();var countField = this.up('panel').down('field[name=\"alignment.STAR.generateCounts\"]');if (!val && countField && countField.getValue()) { errors.push('Must select a GTF when Generate Counts is selected') } return errors;}");
435440
}}, null),
441+
ToolParameterDescriptor.create("generateCounts", "Generate Counts", "There are rare cases when STAR is used without a GTF, so this field is not required. If that field is left blank, counts are not generated. Checking this field prevents job submission unless a GTF is provided. Uncheck to allow running without a GTF.", "checkbox", new JSONObject(){{
442+
put("checked", true);
443+
}}, false),
436444
ToolParameterDescriptor.create(LONG_READS, "Reads >500bp", "If the reads are expected to exceed 500bp (per pair), this will use STARlong instead of STAR.", "checkbox", new JSONObject(){{
437445
put("checked", false);
438446
}}, false),

SequenceAnalysis/src/org/labkey/sequenceanalysis/run/analysis/LofreqAnalysis.java

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,7 @@ public Output performAnalysisPerSampleRemote(Readset rs, File inputBam, Referenc
158158
int minCoverage = getProvider().getParameterByName("minCoverage").extractValue(getPipelineCtx().getJob(), getProvider(), getStepIdx(), Integer.class);
159159
int positionsSkipped = 0;
160160
int gapIntervals = 0;
161+
double avgDepth;
161162

162163
File mask = new File(outputDir, "mask.bed");
163164
Map<String, Integer> gatkDepth = new HashMap<>();
@@ -166,6 +167,8 @@ public Output performAnalysisPerSampleRemote(Readset rs, File inputBam, Referenc
166167
String[] line;
167168

168169
Interval intervalOfCurrentGap = null;
170+
double totalDepth = 0;
171+
double totalPositions = 0;
169172

170173
int i = 0;
171174
while ((line = reader.readNext()) != null)
@@ -180,6 +183,9 @@ public Output performAnalysisPerSampleRemote(Readset rs, File inputBam, Referenc
180183
int depth = Integer.parseInt(line[1]);
181184
gatkDepth.put(line[0], depth);
182185

186+
totalPositions++;
187+
totalDepth += depth;
188+
183189
if (depth < minCoverage)
184190
{
185191
positionsSkipped++;
@@ -224,6 +230,8 @@ public Output performAnalysisPerSampleRemote(Readset rs, File inputBam, Referenc
224230
writer.writeNext(new String[]{intervalOfCurrentGap.getContig(), String.valueOf(intervalOfCurrentGap.getStart()-1), String.valueOf(intervalOfCurrentGap.getEnd())});
225231
gapIntervals++;
226232
}
233+
234+
avgDepth = totalDepth / totalPositions;
227235
}
228236
catch (IOException e)
229237
{
@@ -548,6 +556,7 @@ public Output performAnalysisPerSampleRemote(Readset rs, File inputBam, Referenc
548556
writer.writeNext(new String[]{"LoFreq Analysis", "VariantGT50", String.valueOf(totalGT50)});
549557
writer.writeNext(new String[]{"LoFreq Analysis", "IndelsGTThreshold", String.valueOf(totalIndelGTThreshold)});
550558
writer.writeNext(new String[]{"LoFreq Analysis", "TotalConsensusVariantsInPBS", String.valueOf(totalConsensusInPBS)});
559+
writer.writeNext(new String[]{"LoFreq Analysis", "MeanCoverage", String.valueOf(avgDepth)});
551560
}
552561
catch (IOException e)
553562
{

SequenceAnalysis/src/org/labkey/sequenceanalysis/run/bampostprocessing/BaseQualityScoreRecalibrator.java

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import org.labkey.api.sequenceanalysis.run.AbstractGatk4Wrapper;
1616
import org.labkey.api.util.FileUtil;
1717
import org.labkey.api.writer.PrintWriters;
18+
import org.labkey.sequenceanalysis.util.SequenceUtil;
1819

1920
import java.io.File;
2021
import java.io.IOException;
@@ -70,29 +71,39 @@ public File execute(File bam, File fasta, File output, @Nullable File knownVaria
7071
argsRecal.add(recalTable.getPath());
7172
execute(argsRecal);
7273

73-
List<String> argsApply = new ArrayList<>(getBaseArgs());
74-
argsApply.add("ApplyBQSR");
75-
argsApply.add("-I");
76-
argsApply.add(bam.getPath());
77-
argsApply.add("-R");
78-
argsApply.add(fasta.getPath());
79-
argsApply.add("--bqsr-recal-file");
80-
argsApply.add(recalTable.getPath());
81-
argsApply.add("-O");
82-
argsApply.add(output.getPath());
83-
execute(argsApply);
74+
// If there is not recal possible, the output has 132 lines.
75+
long lineCount = SequenceUtil.getLineCount(recalTable);
76+
if (lineCount > 132)
77+
{
78+
List<String> argsApply = new ArrayList<>(getBaseArgs());
79+
argsApply.add("ApplyBQSR");
80+
argsApply.add("-I");
81+
argsApply.add(bam.getPath());
82+
argsApply.add("-R");
83+
argsApply.add(fasta.getPath());
84+
argsApply.add("--bqsr-recal-file");
85+
argsApply.add(recalTable.getPath());
86+
argsApply.add("-O");
87+
argsApply.add(output.getPath());
88+
execute(argsApply);
89+
90+
if (!output.exists())
91+
{
92+
throw new PipelineJobException("Expected output not created: " + output.getPath());
93+
}
94+
}
95+
else
96+
{
97+
getLogger().info("No recalibration was possible, skipping ApplyBQSR");
98+
output = bam;
99+
}
84100

85101
if (deleteKnownVariantFile)
86102
{
87103
knownVariants.delete();
88104
new File(knownVariants.getPath() + ".idx").delete();
89105
}
90106

91-
if (!output.exists())
92-
{
93-
throw new PipelineJobException("Expected output not created: " + output.getPath());
94-
}
95-
96107
return output;
97108
}
98109

@@ -138,7 +149,7 @@ public Output processBam(Readset rs, File inputBam, ReferenceGenome referenceGen
138149
}
139150
}
140151

141-
getWrapper().execute(inputBam, referenceGenome.getWorkingFastaFile(), outputBam, knownVariants);
152+
outputBam = getWrapper().execute(inputBam, referenceGenome.getWorkingFastaFile(), outputBam, knownVariants);
142153

143154
output.setBAM(outputBam);
144155

0 commit comments

Comments
 (0)