Skip to content

Latest commit

 

History

History
134 lines (101 loc) · 3.47 KB

File metadata and controls

134 lines (101 loc) · 3.47 KB

Python Bindings for lib_tsalign

PyPI

Python bindings for the template switch aligner. Aligns two DNA sequences while detecting template switches — short-range translocations where a query region is copied from (or aligns to) a different location, possibly on the reverse complement strand.

Installation

pip install tsalign

Quick start

import tsalign

result = tsalign.align("ACGTACGT", "ACGACGT")
print(result.cigar())   # compact alignment string
print(result.stats())   # cost, duration, node counts, …

Aligner options

Create an Aligner once and reuse it for many sequences:

aligner = tsalign.Aligner(
    min_length_strategy="preprocess_lookahead",  # default: "lookahead"
    chaining_strategy="lower_bound",             # default: "none"
    total_length_strategy="maximise",            # default: "maximise"
    no_ts=False,                                 # set True for plain gap-affine
)

result = aligner.align("ACGTACGT", "ACGACGT")

Custom cost configuration

Costs are specified in .tsa format. Use sample_tsa_config/config.tsa as a starting point and consult the main repository README for a description of each parameter.

aligner = tsalign.Aligner(costs_file="sample_tsa_config/config.tsa")
result = aligner.align("ACGTACGT", "ACGACGT")

You can also pass the cost string directly:

with open("my_costs.tsa") as f:
    cost_str = f.read()
aligner = tsalign.Aligner(costs=cost_str)

Restricting the alignment range

Use AlignmentRange to align only a window of the input sequences:

from tsalign import Aligner, AlignmentRange

aligner = Aligner()
result = aligner.align(
    "NNNACGTACGTNNN",
    "ACGACGT",
    range=AlignmentRange(reference_start=3, reference_end=11),
)
print(result.cigar())

Individual start/limit keyword arguments are also accepted when range is not provided:

result = aligner.align(
    "NNNACGTACGTNNN",
    "ACGACGT",
    reference_start=3,
    reference_limit=11,
)

Working with alignment operations

alignment.alignments() returns a typed list of (count, op) pairs:

from tsalign import align, TemplateSwitchEntranceOp, TemplateSwitchExitOp

result = align(reference, query)
for count, op in result.alignments():
    if isinstance(op, TemplateSwitchEntranceOp):
        print(f"Template switch: {op.direction}, descendant={op.descendant}, offset={op.first_offset}")
    elif isinstance(op, TemplateSwitchExitOp):
        print(f"Exit, anti-descendant gap: {op.anti_descendant_gap}")
    else:
        # SimpleAlignmentOp — a basic edit in the primary or secondary track
        print(f"{count}x {op.kind}")

Visualisation

result = tsalign.align(reference, query)
result.viz_template_switches()   # prints ASCII art to stdout

Limiting search resources

result = aligner.align(
    reference,
    query,
    cost_limit=100,       # return None if cost would exceed this
    memory_limit=500_000, # return None if memory exceeds this number of bytes
)
if result is None:
    print("No alignment found within limits")

Accepted sequence types

Any object whose str() representation is a valid DNA string (ACGTN) is accepted — including Bio.Seq:

from Bio.Seq import Seq
result = tsalign.align(Seq("ACGTACGT"), Seq("ACGACGT"))