Skip to content

Feature Request: +setGT plugin ability to set GT in a subset of samples in the VCF file #2467

@jcm6t

Description

@jcm6t

Use Case:

  1. multi sample vcf file from prior WGS
  2. New targeted seq data generated for a subset of the samples and determined to be higher quality (eg higher coverage) than prior genotyping.
    Eq a high coverage exome or targeted gene seq panel.
  3. We want to replace selected genotypes/samples in the original vcf file with genotypes from the second. This would be particularly useful for setting ./. in the original to called genotypes. If there are conflicting genotypes provide option to not update those or to allow file 2 to overwrite file GTs.

Easier Enhancement:

  • Modify existing +setGT to allow a -s / -S sample list options. If this is set, only apply GT updates to those samples in the list options. This requires processing VCF2 outside to subset to just the GTs you want to update.

More Flexible but probably more complex enhancement:

  • Extend +setGT to allow a second vcf.gz file to be input with the first original file. If a second vcf.gz is found, overwrite existing GTs with those GTs in the update file.

The current workaround for this is to convert to PLINK2 format and do with merge. This works well if you intend to use PLINK/PLINK2 and you can convert there but it feels that there should be a way to do this at the early VCF stage to permit use of VCF for other programs and analyses.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions