Skip to content

CLASSICO: SNP classification

Mathias Witte Paz edited this page Sep 21, 2022 · 3 revisions

The goal of Evidente is to link the phylogenetic data and the genome-wide SNP data by an interactive visualization. Therefore, for each SNP Evidente assesses if it is clade-specific, i.e., the SNP appears only in the respective samples of the clade’s descendant leaf-nodes. For the calculation of the clade-specificity Evidente includes a module called CLASSICO (CLAde Specific Snp IdentifiCatOr), which computes the distribution of the SNPs from the input files with respect to the least common ancestors (LCAs) that define a clade in the phylogenetic tree.

The algorithm first distributes the SNPs within the reconstructed clades by the following method: The identified SNPs are propagated from the leaves towards the root. An internal node receives a SNP from its children, as long as the SNP is present among all descendants with the same allele. One example would be the SNP-1 of the following figure.

image

This process is repeated through post-ordering of the nodes up to the root of the tree. It is important to note that some SNPs might not propagate at all and will be allocated only to one of the leaves of the phylogenetic tree. If a SNP is allocated to only one node, meaning it is a clade-specific SNP, then it is labelled as supporting of the tree structure (e.g. SNP-1), otherwise as non-supporting (SNP-2). Unresolved bases N do not undergo this process but are directly labelled as non-supporting SNPs, since they could be labelled as any base which prevents a clear classification.

CLASSICO produces two lists of SNPs, one for the supporting and one for the non-supporting SNPs which are used for the visualizations in Evidente

Clone this wiki locally