-
Notifications
You must be signed in to change notification settings - Fork 20
Register workflow files in nextstrain-pathogen.yaml
#462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
e701c24
5821639
38ecf54
1e4a963
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -228,30 +228,54 @@ def run(opts): | |
| # Resolve pathogen and workflow names to a local workflow directory. | ||
| pathogen = PathogenVersion(opts.pathogen) | ||
|
|
||
| if opts.workflow not in pathogen.registered_workflows(): | ||
| if opts.workflow not in pathogen.compatible_workflows("nextstrain run"): | ||
| print(f"The {opts.workflow!r} workflow is not registered as a compatible workflow, but trying to run anyways.") | ||
|
|
||
| workflow_directory = pathogen.workflow_path(opts.workflow) | ||
| workflow_files = pathogen.workflow_files(opts.workflow) | ||
| workflow_snakefile = workflow_files["snakefile"] | ||
|
|
||
| if not workflow_directory.is_dir() or not (workflow_directory / "Snakefile").is_file(): | ||
| if not workflow_snakefile.is_file(): | ||
| raise UserError(f""" | ||
| No {opts.workflow!r} workflow for pathogen {opts.pathogen!r} found {f"in {str(workflow_directory)!r}" if DEBUGGING else "locally"}. | ||
| No {opts.workflow!r} workflow for pathogen {opts.pathogen!r} found {f"(Snakefile {workflow_snakefile!r} does not exist)" if DEBUGGING else "locally"}. | ||
|
|
||
| Maybe you need to update to a newer version of the pathogen? | ||
|
|
||
| Hint: to update the pathogen, run `nextstrain update {shquote(pathogen.name)}`. | ||
| """) | ||
|
|
||
| if workflow_configfile := workflow_files["configfile"]: | ||
| assert workflow_configfile.is_file(), \ | ||
| f"Workflow's registered config file {workflow_configfile!r} does not exist." | ||
|
|
||
| # The pathogen volume is the pathogen directory (i.e. repo). | ||
| # The workflow volume is the workflow directory within the pathogen directory. | ||
| # The build volume is the user's analysis directory and will be the working directory. | ||
| pathogen_volume, workflow_volume = build.pathogen_volumes(workflow_directory, name = "pathogen") | ||
| pathogen_volume, _ = build.pathogen_volumes(pathogen.path, name = "pathogen") | ||
| build_volume = NamedVolume("build", opts.analysis_directory) | ||
|
|
||
| # for containerized runtimes (e.g. Docker, Singularity, and AWS Batch) | ||
| opts.volumes.append(pathogen_volume) | ||
| opts.volumes.append(build_volume) | ||
|
|
||
| # Resolve paths for workflow files | ||
| resolved_pathogen = ( | ||
| docker.mount_point(pathogen_volume) | ||
| if opts.__runner__ in {docker, singularity, aws_batch} else | ||
| pathogen_volume.src.resolve(strict = True) | ||
| ) | ||
| resolved_snakefile = resolved_pathogen / workflow_snakefile.relative_to(pathogen.path) | ||
| resolved_configfile = None | ||
| if workflow_configfile: | ||
| resolved_configfile = resolved_pathogen / workflow_configfile.relative_to(pathogen.path) | ||
|
|
||
| resolved_overlay = None | ||
| if (opts.analysis_directory / "config.yaml").is_file(): | ||
| resolved_build = ( | ||
| docker.mount_point(build_volume) | ||
| if opts.__runner__ in {docker, singularity, aws_batch} else | ||
| build_volume.src.resolve(strict = True) | ||
| ) | ||
| resolved_overlay = resolved_build / "config.yaml" | ||
|
|
||
| print(f"Running the {opts.workflow!r} workflow for pathogen {pathogen}") | ||
|
|
||
| # Set up Snakemake invocation. | ||
|
|
@@ -276,11 +300,15 @@ def run(opts): | |
|
|
||
| # Workdir will be the analysis volume (/nextstrain/build in a | ||
| # containerized runtime), so explicitly point to the Snakefile. | ||
| "--snakefile=%s/Snakefile" % ( | ||
| docker.mount_point(workflow_volume) | ||
| if opts.__runner__ in {docker, singularity, aws_batch} else | ||
| workflow_volume.src.resolve(strict = True)), | ||
| "--snakefile=%s" % (resolved_snakefile), | ||
|
|
||
| *(["--configfile=%s" % (resolved_configfile)] | ||
| if resolved_configfile else []), | ||
|
Comment on lines
+305
to
+306
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OH, just realized this overrides the custom config in the analysis directory...will think on this...
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I remember it took me a bit of digging figure out snakemake's config handling. I was alluding to this here, but a much more thorough comment is this one from the original avian-flu PR. It all makes sense once you figure it out, but it's confusing before that because
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Put another way, I think or As a thought experiment, I think the runner (via but I'm sure there's a nicer way.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah thanks for the pointers @jameshadfield! I was reading through the Snakemake config docs again and noticed that there is also an API method for overwriting configs ( diff --git a/phylogenetic/Snakefile b/phylogenetic/Snakefile
index 57a9da9474..399e6eef9f 100644
--- a/phylogenetic/Snakefile
+++ b/phylogenetic/Snakefile
@@ -16,7 +16,9 @@
# Use custom configuration from analysis directory (i.e. working dir), if any.
if os.path.exists("config.yaml"):
- configfile: "config.yaml"
+ with open("config.yaml", "r", encoding = "utf-8") as f:
+ custom_config = yaml.safe_load(f)
+ snakemake.utils.update_config(config, custom_config)Unfortunately, this means the overlay also overrides any
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Tried this out locally and it works as expected if it's defined before Not sure how I feel about that special case for
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that seems... fine. I think! It's not And this should work nicely with custom-named config overlays as you can add Would you follow this up with the removal of lines such as these in measles?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah, I meant implemented specifically for
As of right now,
I think so? Unless we expect the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We have two different scenarios:
Outside of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, probably best to write out how all of the different methods of running the workflow will look... For (1) - a pathogen that does have a Snakemake code to load default config and automatically use the analysis config overlay such as measles, the commands are pretty simple: The ---
compatibility:
nextstrain run:
ingest: ~
phylogenetic: ~
nextclade: ~For (2) - a pathogen without a default config, the automatic use of analysis config overlay does not work as expected. There will be no Snakemake code to define The ---
compatibility:
nextstrain run:
ingest: ~
phylogenetic/all-clades:
snakefile: phylogenetic/Snakefile
configfile: phylogenetic/defaults/mpxv/config.yaml
phylogenetic/clade-I:
snakefile: phylogenetic/Snakefile
configfile: phylogenetic/defaults/clade-i/config.yaml
phylogenetic/clade-IIb:
snakefile: phylogenetic/Snakefile
configfile: phylogenetic/defaults/hmpxv1/config.yaml
phylogenetic/lineage-B.1:
snakefile: phylogenetic/Snakefile
configfile: phylogenetic/defaults/hmpxv1_big/config.yamlIf we remove the default base config and automatic use of analysis config overlays, then the commands for measles would be more verbose and consistent with mpox. However, after writing this out, I'm not sure that consistency is worth it, especially since the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Can you add to that comment what the |
||
|
|
||
| # Ensure the overlay config in the user's analysis directory | ||
| # overrides any default config file provided above. | ||
| *(["--configfile=%s" % (resolved_overlay)] | ||
| if resolved_overlay else []), | ||
victorlin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # Pass thru appropriate resource options. | ||
| # | ||
| # Snakemake requires the --cores option as of 5.11, so provide a | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non-blocking) There's no documentation for the
nextstrain-pathogen.yamlschema yet, but it might be good to add soon.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, I was going to add the schema next. I'm thinking I'll stick with
jsonschemathat we already use for Augur.