Skip to content

Joshua-Jandrell/Genome_Visualiser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genome Visualiser

Quick Build and Run

To build the genome visualizer app use the command:

bash build.sh

Note

This command will automatically install all required Python libraries to .venv/ and a bcftools build to src/assets/bin/.

After the build is completed the app executable can be run using:

dist/app/Genome\ Visualizer.exe 

To run un-packaged Python scripts activate .venv and run the command:

python src/app.py

Tip

If you have run build.sh the .venv/ directory will have been created and can be activated using:

source .venv/bin/activate # On Linux/MacOS/Unix

or

source .venv/Script/activate # On Windows

If build.sh has not been run, follow the setup instructions to create a new venv.

Features

Version 2.0 of the Genome Visualiser Application supports basic functionality for visualising .vcf, .vcf.gz and .bcf files. Version 2.0 is best suited for Usability Testing, allowing developers to understand which features of the visualiser need to be further developed and improved in future App releases.

Version 2.0 Features:

  • User specified data subset management
  • Input validation for acceptable VCF file formats
  • Supports .bcf file visualisation, provided bcftools has installed sucessfully
  • Desktop installation with PyInstaller
  • Scripts to automatically build bcftools

Visualization:

  • Reference and alternate allele sequences (nucleotide-encoded)
  • Variant-Zygosity colourmap
  • Variant-frequency count and density histograms
  • Variant probability heatmap
  • Comparion of variant proportions between case and control samples
  • Case and control samples colourmap
  • Exploratory view of allele position as a heatmap

Dataset subset selection using any combination of the following:

  • Inputting the range of genome positions to be visualised
  • Inputting a range of min and max sample quality values
  • Selecting a .csv, .txt and .tsv file to tag samples as either a case or a control sample

Version 2.0 no longer supports:

  • Sorting - by default the variants are sorted by position (lowest to highest).

Requirements

The app requires the following to be already setup in order to be configured:

Tip

If using Windows MSYS, a Windows native (non-posix) version of GNU make may be installed with the command mingw32-make. This can be used on Windows systems if the regular make command does not work.

Important

On some Unix/Unix-like systems (Linux, MacOS) Python v3+ must be accessed using python3 instead of python. In this case, please use python instead of python3 for all commands listed below to ensure that scripts are executed with Python v3.--- .

Setup

The application must be run in an active virtual environment (venv) with the required libraries installed. A build of bcftools must also be configured.

Make the Virtual Environment

To create a venv in the .venv/ directory:

  • Activate it and then,
  • Install all required python libraries using one of the following sets of bash commands (which are operating system dependant) :

Build for Linux/MacOS and other Unix-like systems:

python3 -m venv .venv                 # Create venv in .venv/ directory.
source .venv/bin/activate          # Activate venv.
pip install -r requirements.txt    # Install requirements.

Build for Windows:

python -m venv .venv                 # Create venv in .venv/ directory.
source .venv/Scripts/activate     # Activate venv.
pip install -r requirements.txt   # Install requirements.

Install Bcftools

A build of bcftools to search and index files. This can either be a local build or a pre-existing install configured as an environmental path variable. If you intend to build and package the app as an executable a local build is strongly recommended.

Local build

A local build of bcftools can be automatically compiled and installed in the src/assets/bin/ directory with:

python build_bcftools.py          # Replace python with python3 if required.

Tip

If this above method fails, the bcftools source code can be downloaded here and manually compiled. The executable must be copied to src/assets/bin/ after compilation.

Existing build

To use a pre-existing install of bcftools set local = false in config.toml.

Warning

This will only work if bcftools is accessible as an environmental variable. To check if it is, use the command:

bcftools --version

Caution

If you wish to build and distribute the app as an executable: a local build must be used.

Running the App

To run the app user the command:

python src/app.py         # Replace python with python3 if required.

Example Data for Visualisation

If you do not have access to .vcf.gz files to use or test the app's visualisation with, you can download some publicly available data which will load 3 .vcf.gz files into the Data directory along with 6 case/control files (3 case/control files that were used to obtain the results the Benchmarking/ tests and 3 randomly generated case/control files for experimentation). To download this public data use the following command:

bash get_data.sh

Cases & Controls file format

Case files can be used to specify which samples are cases (represent the sub-population/individuals who are being researched for the presence of a variant or physical trait) and which are controls (sample who have the specific variant or trait that being researched).It can be useful to split and sort samples into cases and controls for comparative visualisation.

Case files can be .txt, .tsv, or .csv files which define if a sample is a case or a control.

General Cases/Controls file format

A cases/control file should contain two columns: one is a list of all samples in the .vcf dataset, the other is a list of Boolean values [True or False] where True indicates that the corresponding sample is a case. For example, A case file for a dataset with 4 samples: s_1, s_2, s_3, and s_4; looks as follows in tsv format:

s_1        False

s_2        True

s_3        False

s_4        True

In the above example file, s_2 is a case while s_1 and s_3 are controls.

Alternative Cases/Controls file format

Sometimes it is easier to list only the case samples. This can be done by providing a single column which lists only the cases samples:

s_2

s_3

The above case file shows that s_2 and s_3 are cases while all other samples are controls.

Packaging the App

To package the app as an executable use the command:

pyinstaller app.spec 

App Optimisation Tests

The Benchmarking/ folder contains scripts used for tests. For more details regarding testing, see the testing README.

Running Unit Tests

To download all required testing files and run unit tests use the command

run_tests.sh

If the above data has already been downloaded, you can instead use the command:

python -m unittest discomver test/

About

Code final year dynamic genome visualizer project.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •