Skip to content

Prep extra for release: cleanup, security fixes, dep bumps#66

Merged
TimoLassmann merged 29 commits into
mainfrom
extra
May 16, 2026
Merged

Prep extra for release: cleanup, security fixes, dep bumps#66
TimoLassmann merged 29 commits into
mainfrom
extra

Conversation

@TimoLassmann
Copy link
Copy Markdown
Owner

Just a test

TimoLassmann and others added 29 commits March 9, 2026 09:18
…er, and benchmark infrastructure

Core library changes:
- Replace dense float[len_a * len_b] consistency bonus matrix with sparse
  per-row structure (K slots per row). Fixes integer overflow crash for
  large DNA families (profiles > 46k columns) and reduces memory from
  10 GB to 3.2 MB for 50k x 50k case. Bit-exact identical results for
  all existing cases.
- Add ensemble alignment with POAR consensus merging, configurable
  number of runs, and Hirschberg midpoint perturbation for diversity
- Add alignment-guided UPGMA tree rebuild (realign) within each
  ensemble run
- Add sequence weight rebalancing for profile merging
- Add variable scoring matrix (VSM) support
- Add anchor consistency bonus for progressive alignment guidance
- Add detailed alignment comparison (recall/precision/F1/TC) with
  BAliBASE XML core block mask support
- Expose new parameters through C API, CLI, and Python bindings

Benchmark infrastructure:
- Add NSGA-III multi-objective optimizer with pymoo mixed variables
  (Choice/Integer/Real) for proper categorical parameter exploration
- Add benchmark datasets: BAliBASE, BRAliBASE, MDSA DNA, BaliFam100
- Add Pareto front visualization with Dash app
- Validate downloaded tarballs to detect corrupt/HTML error pages
- Add fallback URLs for BAliBASE downloads

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Simplify Python API to mode-based presets (fast/default/accurate) as the
only public interface. Remove all legacy backward-compatibility code and
deprecated parameters (ensemble, refine, vsm_amax, etc. from align*()).

Add per-run support for vsm_amax and refine in the NSGA-III optimizer via
optional array parameters in ensemble_custom_file_to_file(). The C binding
also accepts per-run seq_weights, realign, consistency_anchors, and
consistency_weight. This lets the optimizer discover diverse ensemble
configurations where each run uses different scoring and refinement.

- kalign_run_config: 14-field per-run struct, single C entry point
- Public API: mode + optional gap penalty overrides only
- Optimizer: ensemble_custom_file_to_file() with per-run arrays
- Search space: 41 dimensions (7 per-run × 5 slots + 6 shared)
- Old v1 checkpoints can still be resumed (auto-expanded to per-run)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d presets for protein/RNA/DNA

Major features:
- POAR-based consistency merge for ensemble alignment: extracts pairwise
  residue consistency scores from the POAR table (already built during
  ensemble) and uses them as bonus weights in a final progressive alignment.
  Controlled by kalign_ensemble_config.consistency_merge (0=POAR consensus,
  1=consistency re-alignment). Tested by NSGA-III optimizer but POAR
  consensus consistently outperformed it on BAliBASE.

- Kimura two-parameter nucleotide substitution matrices (1PAM, 20PAM,
  200PAM) with kappa=2 transition/transversion ratio. Gives the optimizer
  matrix diversity for DNA/RNA, analogous to PFASUM43/60/CB66 for protein.
  K200 strongly preferred by optimizer for both RNA and DNA.

- Farthest-first anchor selection for guide tree distance computation.
  Replaces length-stratified sampling with BPM-based diversity selection.
  Performance equivalent on BAliBASE but more principled.

- 12 NSGA-III optimized presets: 4 protein (BAliBASE gen 41), 4 RNA
  (BRAliBASE gen 88), 4 DNA (MDSA gen 100). Each biotype has fast,
  default, recall, and accurate modes. Nucleotide optimization run
  (combined BRAliBASE + MDSA) in progress to produce unified presets.

Protein presets (BAliBASE, 218 cases):
  fast:     R=0.815 P=0.674 F1=0.732  7s   (single P60)
  default:  R=0.786 P=0.722 F1=0.747  40s  (single P60, inline refine)
  recall:   R=0.841 P=0.728 F1=0.776  370s (ens5 CB66/P60/P43, realign=1)
  accurate: R=0.787 P=0.845 F1=0.807  502s (ens5 P43/CB66, realign=1)

RNA presets (BRAliBASE, 599 cases — all beat MUSCLE F1=0.825):
  fast:     R=0.832 P=0.825 F1=0.828  5s   (single K200)
  default:  R=0.804 P=0.869 F1=0.835  8s   (ens3 K200/K20)
  recall:   R=0.833 P=0.826 F1=0.829  6s   (single K200)
  accurate: R=0.811 P=0.863 F1=0.836  26s  (ens3 K200/K20, realign=2)

DNA presets (MDSA, 325 cases):
  fast:     R=0.741 P=0.788 F1=0.764  18s  (ens3 K200/K20, realign=1)
  default:  R=0.737 P=0.816 F1=0.775  35s  (ens3 K200/K20, realign=1)
  recall:   R=0.760 P=0.770 F1=0.765  65s  (ens5 K200, realign=1)
  accurate: R=0.737 P=0.816 F1=0.775  35s  (=default, optimizer converged)

Optimizer changes:
- Objectives changed from (F-beta, TC, time) to (recall, precision, time)
  for richer Pareto fronts
- Combined "nucleotide" dataset (BRAliBASE + MDSA) for unified optimization
- Checkpoint backfill for consistency_merge and per-run vsm_amax/refine
- Dashboard shows recall/precision columns, consistency merge status
- Pareto front seed merging from separate RNA + DNA checkpoints

Other changes:
- kalign_ensemble_config gains consistency_merge and consistency_merge_weight
- msa_struct gains poar_consistency void* for non-owning POAR reference
- aln_run.c dispatches poar_consistency before anchor_consistency
- pick_anchor.h exposes pick_anchor_n() for configurable anchor count
- Python __init__.py adds "recall" to _PRESET_MODES

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…et_nucleotide()

Replaces preset_rna() and preset_dna() with a single preset_nucleotide()
optimized on combined BRAliBASE (599 RNA) + MDSA (325 DNA) dataset.

Nucleotide presets (combined BRAliBASE + MDSA, gen 100):
  fast:     R=0.792 P=0.788 F1=0.790   5s  (single K200, realign=1)
  default:  R=0.773 P=0.842 F1=0.806  26s  (ens3 K20/K200, realign=1)
  recall:   R=0.800 P=0.796 F1=0.798  17s  (single K200, realign=1, refine=C)
  accurate: R=0.760 P=0.867 F1=0.810 100s  (ens5 K200/K1, realign=1, ms=3)

Dispatch is now: protein vs nucleotide (no RNA/DNA distinction).
Kalign auto-detects sequence type; user only picks mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Self-contained threadpool library in lib/src/threadpool/ with:
- Lock-free Chase-Lev deques (per-worker LIFO) + global ext queue
- Three parallelism patterns: parallel_for, fork-join groups, recursive tasks
- Event-count sleeping, per-worker group recycling, work-stealing
- 17 unit tests, 10 stress tests (TSan-clean), OpenMP comparison benchmarks
- Standalone CMake build (also builds as part of kalign)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace OpenMP with a built-in Chase-Lev work-stealing thread pool as the
default parallelization backend. Falls back to serial if pthreads is
unavailable. OpenMP remains available via -DUSE_OPENMP=ON.

Threadpool is 1.5-2.5x faster than OpenMP across protein and nucleotide
benchmarks on both ARM (M3) and x86-64 (Threadripper) hardware.

Key changes:
- Threadpool ON / OpenMP OFF by default in CMakeLists.txt
- All parallel regions (distance matrix, k-means, Hirschberg, anchor
  selection, pairwise distances) wired for both backends
- tp_parallel_for_chunked() with configurable minimum chunk size
- Compile-time parallelization thresholds in one place (CMakeLists.txt):
  ALN_SERIAL_THRESHOLD=500, KMEANS_UPGMA_THRESHOLD=50,
  DIST_MIN_SEQS=50, PFOR_MIN_CHUNK=10
- macOS Python wheels use threadpool (no libomp.dylib dependency)

CLI unified with Python API — both use NSGA-III optimized mode presets
via kalign_get_mode_preset() + kalign_align_full():
  kalign --mode fast|default|recall|accurate
Gap penalty overrides (--gpo/--gpe/--tgpe) work with all modes.
Removed dead CLI options (--ensemble, --refine, --consistency, etc.)
that are now managed by mode presets.

All three entry points (C CLI, Python API, kalign-py CLI) expose
identical options and produce identical results.

Bug fixes:
- bisectingKmeans: base-case guard for num_samples <= 1
- bisectingKmeans: out-of-bounds seed_idx in split2 k-means loop
- threadpool.c: missing #include <stdint.h> for uintptr_t

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major refactor of the C library internals for better threading:

API consolidation:
- Replace 7 redundant entry points (kalign_run, kalign_run_seeded,
  kalign_run_dist_scale, kalign_run_realign, kalign_post_realign,
  kalign_ensemble, kalign_ensemble_custom) with one internal
  kalign_single_run() + kalign_align_full() public entry point
- One threadpool created per kalign_align_full() call, shared across
  all work (ensemble runs, tree traversal, anchor consistency, etc.)

Parallelized components (all via threadpool fork-join or parallel-for):
- Inline refine tree traversal (create_msa_tree_inline_refine)
- Hirschberg fwd/bwd within inline refine edges
- Concurrent ensemble runs (5 runs share the global pool)
- Anchor consistency build (N×K pairwise DPs)
- POAR extraction (per-pair, disjoint writes)
- POAR scoring (per-row accumulation + sequential reduction)
- Consensus candidate enumeration (two-pass count+fill)
- Residue confidence computation (per-sequence)

Realign tree improvement:
- Replace O(N³) UPGMA on N×N distance matrix with O(N·K·log N)
  bisecting k-means on N×K anchor distances from aligned sequences
- Add pair_dist_fn callback to bisecting_kmeans for pluggable leaf
  cluster distance computation (BPM for initial tree, identity for
  realign)
- No N×N matrix allocated at any point in the realign path

Quality: fast/default byte-identical to previous version. Recall F1
-0.001, accurate F1 -0.004 on BAliBASE (218 cases, XML core block
scoring). Threading is deterministic across thread counts.

Speedup at 8 threads (DSSim 1000 sequences):
  fast 2.1x, default 2.4x, recall 1.3x, accurate 1.4x

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expose per-column ensemble confidence scores to users. Low-confidence
columns can be masked (lowercase residues) or removed (replaced with
gaps), useful for phylogenetics and structure prediction pipelines
where uncertain alignment regions should be excluded.

C library:
- kalign_mask_by_confidence(msa, threshold, style) in msa_op.c
- kalign_write_confidence(msa, path) for raw score output
- Styles: KALIGN_MASK_LOWERCASE (default), KALIGN_MASK_REMOVE

CLI:
- --confidence-threshold FLOAT (0-1, requires ensemble mode)
- --confidence-style (lowercase/remove)
- --confidence-output FILE

Python API:
- kalign.mask_alignment(result, threshold, style) → AlignedSequences
- kalign.filter_alignment(result, threshold) → AlignedSequences
- kalign.write_confidence(path, result)

Gracefully warns and skips when confidence is unavailable (non-ensemble
modes). 11 Python tests covering all paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New feature: align new sequences against an existing alignment without
re-aligning the existing sequences. Builds a consensus profile from
the existing alignment, aligns each new sequence via seq-to-profile
Hirschberg DP, and inserts gaps to match the column structure.

Strict mode (default): no new columns are introduced. Insertions in
new sequences relative to the existing alignment are dropped.

C library:
- kalign_add_sequences(existing, new_seqs, n_threads) in aln_add.c
- kalign_read_sequences() allows single-sequence input (for --add)
- Consensus profile built from column frequencies + substitution scores

CLI:
- kalign --add new.fa --existing aligned.fa -o combined.fa
- Both --add and --existing required together
- Existing sequences preserved byte-identical in output

Python API:
- kalign.add_to_alignment(existing, new_seqs, output, format, n_threads)
- _core.add_to_alignment_file() pybind11 binding

Tests: 6 Python tests (basic add, existing unchanged, residue preservation,
alignment length, file-not-found, larger dataset with 44 sequences).
15/15 C tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mark runner

- Add missing #include <mm_malloc.h> with HAVE_AVX2 guards in
  aln_apair_dist.c and aln_wrap.c (fixes undefined symbol on GCC 10)
- Update build.zig to zig 0.15 API (addLibrary/createModule), add aln_add.c
- Rewrite benchmark runner: replace --refine with --mode (fast/default/recall/accurate),
  simplify work dispatch, add per-category SP/Prec/F1/TC summary tables
- Containerfile: clone from git instead of COPY, no external binary dependencies
- Containerfile.downstream: same git-based approach

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, defensive buffer init

align_from_file_mode in _core.cpp wrote alignments to a fixed path
/tmp/kalign_output.fa and read them back. Parallel benchmark subprocesses
contending for this file produced corrupted output (wrong sequence counts,
wrong names, embedded null bytes). Replaced with direct reads from the
msa struct.

finalise_alignment now asserts that every sequence's len + sum(gaps)
matches the aln_len computed from sequence 0, converting silent
gap-invariant violations into clear errors. Also memsets the linear
buffer to '-' so any unwritten positions stay as gap characters rather
than uninitialized memory on glibc.

Same defensive init applied to kalign_msa_to_arr and the strict-mode
gapped buffer in aln_add, where partial population is plausible.

msa_seq_cpy fix: copy gaps[src->len] instead of gaps[src->alloc_len].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refreshes uv.lock via `uv lock --upgrade`. All flagged transitive deps
(cryptography, flask, werkzeug, pillow, pygments, pytest, urllib3,
requests, mako, black) now resolve to versions past their CVE fixes.
pip-audit reports no known vulnerabilities. Test suite unaffected.
The threadpool is now the default parallelism backend (USE_OPENMP=OFF,
USE_THREADPOOL=ON), so the OpenMP env var has no effect. Removed from
both CIBW_ENVIRONMENT in wheels.yml and the [tool.cibuildwheel.environment]
block in pyproject.toml.
The C library, CLI, and Python all expose four NSGA-III-derived presets:
fast, default, recall, accurate. The 'precise' name was a deprecated
alias for 'accurate' in the Python layer only; drop it now that 'extra'
is heading to main.

- python-kalign/__init__.py: remove MODE_PRECISE constant, the
  precise->accurate aliasing branch with DeprecationWarning, and the
  __all__ entry. Update ValueError text to list all four modes.
- lib/include/kalign/kalign.h: add 'recall' to the kalign_get_mode_preset
  docstring (was listing only fast/default/accurate).
- README-python.md: rewrite Modes table to show all four modes with the
  correct CLI syntax (--mode <name>, not --fast/--precise which never
  existed); update Quick-Start and Ensemble examples accordingly.
- tests: replace test_precise_mode cases with recall/accurate equivalents;
  drop MODE_PRECISE from constants assertion and expected exports.
Prepares the repo for public release by dropping material that belongs
to the manuscript pipeline rather than the kalign library itself.

Optimizers (moved to ~/Work/Documents/Manuscripts/2026_kalign_35/
scripts/optimizers before deletion; preserved there for reproducibility
of the NSGA-III preset numbers shipped in 3.5):
  - benchmarks/optimize_params.py
  - benchmarks/optimize_unified.py
  - benchmarks/optimize_ensemble.py
  - benchmarks/optimize_parallel.py
  - benchmarks/PRD_unified_optimizer.md
  - benchmarks/PRD_ensemble_optimizer.md

Analysis / visualisation scripts (paper repo has its own equivalents):
  - benchmarks/analysis.py
  - benchmarks/app.py
  - benchmarks/view_pareto.py
  - benchmarks/mumsa_plots.py, mumsa_precision.py
  - benchmarks/combined_improvements.py
  - benchmarks/full_comparison.py
  - benchmarks/external_balibase.py
  - benchmarks/make_summary_figure.py
  - benchmarks/eval_checkpoint_configs.py
  - benchmarks/vsm_ensemble_experiment.py
  - benchmarks/bench_quality_timing.py
  - benchmarks/run_balibase_comparison.py

Other paper-side material:
  - benchmarks/PRD_kalign_align_full.md  (superseded by docs/PRD-parameter-cleanup.md)
  - docs/PRD-benchmark-repo-update.md    (hard-coded paper-repo paths)
  - Containerfile.downstream             (explicitly labelled paper container)

Kept:
  - benchmarks/runner.py, datasets.py, scoring.py (invoked by CI workflow benchmark.yml)
  - benchmarks/downstream/* (covered by tests/python/test_downstream_integration.py)
  - Containerfile, Containerfile.memcheck
  - docs/PRD-{msa-consistency,confidence-masking-and-add-sequences}.md
  - PRD_sparse_consistency.md

Verified: benchmark package still imports; pytest tests/python/ passes
(170 passed, 1 pre-existing test_module_exports failure unrelated to
this cleanup).
Six high-confidence dead items identified during pre-release audit.
No callers anywhere in the source tree; not compiled, not exposed
via the public API or bindings.

- lib/src/coretralign.{c,h}: pthread-based scheduler superseded by
  lib/src/threadpool/; uncompiled (commented out in lib/CMakeLists.txt).
- lib/src/mod_tldevel.h: 10-line wrapper header with zero includes.
- lib/src/bpm.c bitShiftRight256ymm(): AVX2 helper, never called.
  Carried a stale "FIXME: not sure if this is correct!!!" comment.
- lib/src/bisectingKmeans.c split(): replaced by split2() (parallel
  variant); never called.
- python-kalign/io.py: unused 'import os'.
- python-kalign/utils.py: unused 'Tuple' import.

Verified: ctest 15/15 pass; pytest tests/python/ 170 pass (1 pre-existing
test_module_exports failure unchanged). ~547 lines removed in total.
- PRD_sparse_consistency.md, docs/PRD-msa-consistency.md: replace
  references to `optimize_unified.py` with a note pointing at the
  manuscript repository's scripts/optimizers/ copy. The optimizer
  itself was moved out of kalign in commit dd01498.
- docs/PRD-parameter-cleanup.md: add to the repo. Adds a brief status
  note acknowledging that a fourth mode (`recall`) was added during
  implementation; the rest of the architecture description is current.
Build now compiles warning-free.

Warning fixes:
- lib/src/euclidean_dist.c: drop unused local 'd2' in the UTEST_EDIST
  main() block.
- lib/src/msa_io.c: drop unused local 'line_len' (set on every iteration
  of the line-scan loop but never read).
- lib/src/msa_io.c: change 'size_t nread' to 'ssize_t nread' so the
  getline() return-value check against -1 is signed-correct.
- tests/kalign_lib_testCXX.cpp: cast string-literal initialisers to
  char* so the array-of-char* initialisation no longer warns under
  -Wwritable-strings.

CI cleanup:
- .github/workflows/{cmake,python,benchmark}.yml: drop apt 'libomp-dev'
  and brew 'libomp' installs. Threadpool is the default parallelism
  backend (USE_OPENMP=OFF, USE_THREADPOOL=ON) so libomp is no longer
  needed for these jobs. The explanatory comment in wheels.yml remains.
The four PRDs and the parameter-cleanup integration guide were internal
planning documents describing work that's now complete. The current
state of the API is documented in the public C header, the READMEs, the
CLI --help, the Python docstrings, and the ChangeLog. The design
rationale, where it's still relevant, lives in git history.

Removed:
- PRD_sparse_consistency.md
- docs/PRD-msa-consistency.md
- docs/PRD-confidence-masking-and-add-sequences.md
- docs/PRD-parameter-cleanup.md
- docs/parameter-cleanup-integration.md

The docs/ directory is now empty and dropped from the tree; verified no
README, code, or CI workflow references any of these files.
Two related fixes in the binary POAR loader (lib/src/poar.c). Both
guard against malformed POAR files supplied via --load-poar.

1. Pair-count overflow on numseq >= 65536.
   The expression `numseq * (numseq - 1) / 2` was evaluated in uint32_t
   and then cast to int, producing wrap to a negative value for
   numseq >= 65536 and outright uint32 overflow for numseq >= 65537.
   Now computed in uint64_t and rejected if it exceeds INT_MAX.

2. Per-pair n_entries unbounded by file.
   The 32-bit per-pair entry count was cast to int (could wrap
   negative) and multiplied by sizeof(struct poar_entry) without
   overflow check. Now capped at INT_MAX / sizeof(struct poar_entry).

Verified: ctest 15/15 pass; pytest tests/python/ 170 pass (one
pre-existing test_module_exports failure unchanged).
The hard-coded expected_exports set was stale relative to __all__ in
python-kalign/__init__.py — four symbols added in earlier commits
weren't reflected here:
  - add_to_alignment    (from 53abded, --add mode)
  - filter_alignment    (from 1002f0a, confidence masking)
  - mask_alignment      (from 1002f0a)
  - write_confidence    (from 1002f0a)

Full pytest tests/python/ now passes clean: 171 pass, 0 fail.
The earlier warning-cleanup commit (fb0d5c4) removed `float d2;` from
euclidean_dist.c's UTEST_EDIST main(), trusting an Apple Clang
-Wunused-variable warning. The warning was correct *for the Apple
Silicon build*, where edist_utest is compiled with -DNOHAVE_AVX2 and
the `#ifdef HAVE_AVX2` block containing the only use of d2 is
preprocessed out. On Linux GCC builds with -DHAVE_AVX2, the block IS
compiled and `edist_256(..., &d2)` references the now-missing variable
— breaking cmake.yml, benchmark.yml, codeql, and wheels builds.

Restore the declaration but guard it with the same `#ifdef HAVE_AVX2`
as its only consumer, so neither codepath warns.
The Benchmark workflow has been silently broken for months because
the BAliBASE download endpoint (http://www.lbgi.fr/balibase/...) is
gone, and the Wayback Machine archive returns HTML instead of the
tarball. Both failure modes are caught and surfaced by datasets.py,
but the result is that the workflow has been failing on every push
and the github-action-benchmark gh-pages history is stale.

Serious benchmark tracking lives in the manuscript repository
(~/Work/Documents/Manuscripts/2026_kalign_35/) which runs the full
BAliBASE / BaliFam100 / MDSA suite via Snakemake. Removing the broken
CI job is cleaner than carrying it indefinitely.

If CI-side perf-regression detection is wanted later, the small
3-case BAliBASE subset in tests/data/ (BB11001, BB12006, BB30014)
is the right starting point for a smoke benchmark.
The DSSIM stress test was failing under ASAN on Linux. Root cause:
the test built a kalign_run_config via kalign_run_config_defaults()
(which returns a protein-oriented config with matrix = PFASUM43) and
fed DNA sequences to it. The library correctly refused with
"Detected DNA sequences but a protein matrix was selected" and
returned FAIL. The test ignored the FAIL return and proceeded into
kalign_msa_compare with two un-finalized MSAs, where in turn
sort_msa_for_comparison ran with alnlen falling back to seq[0]->len
and read past the original (tight-packed) seq->seq buffer that
dssim_get_fasta had allocated, triggering the heap-buffer-overflow.

Production (CLI + Python bindings) is unaffected because both go
through kalign_get_mode_preset(...), which picks a biotype-appropriate
matrix automatically. This is a test-side bug that exposed two latent
issues in the library error-handling.

Fix #1 (root cause): tests/dssim_test.c sets cfg.matrix =
KALIGN_MATRIX_AUTO so the library resolves the matrix per biotype,
and wraps kalign_align_full / kalign_msa_compare in RUN() so any
future alignment failure aborts the test instead of being silently
ignored.

Fix #2 (defence-in-depth): lib/src/msa_cmp.c — wrap the
finalise_alignment(r/t) calls inside kalign_msa_compare,
kalign_msa_compare_detailed, and kalign_msa_compare_with_mask in
RUN() so a finalise failure propagates as a clean error instead of
silently passing a half-finalized MSA into the comparator.

Fix #3 (latent library bug): lib/src/msa_op.c — make
finalise_alignment atomic w.r.t. msa->sequences. Previously the
per-sequence loop replaced seq->seq pointers one at a time; if
make_linear_sequence failed on seq[k] the loop bailed leaving
seq[0..k-1] swapped to the new buffer and seq[k..numseq-1] still
pointing at the original (smaller) buffer — a structurally
inconsistent state that any subsequent code reading the MSA would
trip over. Now two-pass: build and validate every linear_seq first,
then swap pointers only after the whole batch is verified. On error
the MSA's buffers are untouched.

Fix #4 (API ergonomics): lib/include/kalign/kalign.h — header
comment on kalign_run_config_defaults explaining the protein-by-default
behaviour and pointing callers at KALIGN_MATRIX_AUTO for biotype
auto-selection.

Verified: 15/15 ctest pass on macOS clang and on Linux GCC under ASAN
inside the memcheck container; 171/171 pytest tests/python/ pass.
Packages the local pre-push checks into one command so the
working tree can be self-verified before pushing.

Phases (~5 min total):
  1. zig build      Cross-compile sanity across aarch64-macos,
                    aarch64-linux, x86_64-linux-{gnu,musl}.
                    Catches GCC-vs-Clang divergence.
  2. cmake + ctest  Native macOS Release build + 15 ctests.
  3. podman ASAN    Ubuntu container with kalign built under ASAN
                    and the full ctest suite. Catches Linux glibc
                    behaviour that Apple's malloc hides.
  4. pytest         Python bindings + ecosystem integration (~171
                    tests).

Each phase is independent — no short-circuit, so a single failing
phase doesn't prevent the others from running and reporting.
Skips gracefully if a tool isn't installed (e.g. podman) or its
environment isn't ready (machine not running).

Usage:
  tests/check-local.sh           # run all four (~5 min)
  tests/check-local.sh --quick   # skip Linux ASAN (~30s)
  tests/check-local.sh --help

Exits 0 only if every non-skipped phase passes.
Brings the file into compliance with the version of black used by
the python.yml CI lint job. Purely mechanical reformatting (line
wrapping of multi-argument calls); no behaviour change.

Verified: pytest tests/python/ still passes 171/171.
Updated in all four locations so the build artefacts agree:
  pyproject.toml (PyPI metadata)
  CMakeLists.txt (KALIGN_LIBRARY_VERSION_PATCH)
  build.zig (zig cross-compile package version)
  ChangeLog (release notes summary)

Release theme: this is the first version that ships the four-mode
preset system as the stable public interface (fast / default / recall
/ accurate), with the Chase-Lev threadpool replacing OpenMP as the
default parallelism backend and macOS wheels no longer linking
libomp.dylib — closing the conda-forge / numpy OpenMP runtime
conflict reported on the issue tracker.

Verified: kalign --version reports 3.5.2; kalign.__version__ reports
3.5.2; pre-push checklist (tests/check-local.sh --quick) green
prior to this commit.
@TimoLassmann TimoLassmann merged commit 5bd7602 into main May 16, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant