Multi pass decoding#256
Open
oscarhiggott wants to merge 4 commits into
Open
Conversation
Reorganise the Python package layout: - Rename pybind module from tesseract_decoder to _core - Move _tesseract_py_util to tesseract_decoder.utils with relative imports - Add tesseract_decoder/__init__.py with top-level re-exports - Add sinter_decoders.py with MultiPassSinterDecoder wrapper - Add setup.py for pip-installable builds via Bazel - Update stub_test.py for new API surface - Update CMakeLists.txt and BUILD for new module name
Prepare TesseractDecoder for multi-pass decoding support: - Add update_internal_costs() for incremental resynchronisation of internal cost structures (error_costs, d2e sort order) after external modification of error likelihoods - Add early return in decode_to_errors for empty syndromes - Add TesseractDebugger friend class for test access to internals - Reserve error_costs capacity before initial fill - Fix int/size_t mismatch in flip_detectors_and_block_errors - Update and simplify tesseract tests
Add foundational libraries for multi-pass decoding: - bern_utils: Bernoulli probability utilities (log-likelihood conversion, probability clamping) - tanner_graph: Union-Find-based connected component analysis of the detector-error Tanner graph - error_correlations: Correlation extraction pipeline computing marginal, joint, and conditional error probabilities from first-pass decoding results - dem_decomposition: DEM decomposition by detector class, error splitting across components, observable assignment, and DEM merging for multi-component decoding
Add the multi-pass Tesseract decoder, which decomposes a detector error model into independent components by detector class and decodes each component separately across multiple passes. Between passes, first-pass decoding correlations are used to reweight error probabilities in subsequent components, improving accuracy. Key components: - MultiPassTesseractDecoder: core decoder with static and causal scheduling across detector classes - FastTwoPassTesseractDecoder: optimised two-pass specialisation - multi_pass_sinter_compat.pybind.h: pybind11 bindings exposing MultiPassSinterDecoder and MultiPassSinterCompiledDecoder - Python integration tests for multi-pass bindings - Theory and architecture documentation Performance: 10-100x wall-clock speedup over single-pass Tesseract by decomposing the DEM into smaller independent components.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Multi-pass decoding for Tesseract
Summary
This PR introduces a multi-pass decoding framework for the Tesseract decoder. The key idea is to exploit correlations between different error classes (e.g., X-type and Z-type errors in a CSS code, see https://arxiv.org/abs/1401.6975 and https://arxiv.org/abs/1310.0863) by decomposing the decoding problem into independent components, decoding them in sequence across multiple passes, and using the results of earlier passes to reweight error probabilities in later passes.
This approach is inspired by correlated decoding strategies (e.g., two-pass PyMatching), but is implemented as a more general-purpose,$N$ -pass framework that can be used by more general decoders, and on circuits that are not necessarily matchable.
Motivation
In CSS codes, some physical errors (e.g.,$Y$ errors) can affect detectors associated with different error bases. A single-pass decoder treats all detectors monolithically, which can scale poorly and fails to exploit the structure of correlated noise.
Multi-pass decoding addresses this by:
Mathematical framework
DEM decomposition
Given a DEM error instruction with probability$p$ that flips detectors from multiple classes (e.g., $D_X \wedge D_Z$ ), the decomposition replaces it with independent component errors. The observable assignment is resolved by enumerating all consistent assignments across components via symmetric difference constraints:
where$O_k$ is the observable assignment for component $k$ and $\oplus$ denotes symmetric difference.
Correlation extraction
Before decomposition, the raw DEM is analyzed to extract cross-component correlations. For each pair of hyperedges$(h_A, h_B)$ that co-occur in a decomposed error instruction with probability $p$ :
These conditional probabilities form the reweighting rules: if error$A$ in component $C_A$ is predicted in pass $k$ , the likelihood cost of error $B$ in component $C_B$ is updated using $P(B \mid A)$ for pass $k+1$ .
Pass scheduling
Two scheduling strategies are implemented:
Static scheduling: All components are decoded in every pass. Simple but potentially redundant.
Causal scheduling: Pass sets$S_1, S_2, \ldots, S_N$ are determined by back-propagation from the final pass:
This ensures that only components whose predictions can influence the final logical outcome are decoded, reducing unnecessary computation.
Reweighting and cost update
When an error with internal index$i$ is predicted in component $C_A$ during a non-final pass, all associated reweighting rules are applied:
where the conditional probability$P(B|A)$ is capped at $0.499$ to prevent divergent (negative) costs. This is implemented via
Error::set_with_probability, which converts probability to the log-likelihood-ratio cost used internally by Tesseract:In-place internal cost resynchronization
An important design goal is that reweighting does not require reloading or reconstructing the DEM. The Tesseract decoder pre-computes several internal data structures at construction time from the DEM, and the reweighting step modifies error costs in place on the already-constructed decoder, then incrementally resynchronizes only the affected structures. This is what makes multi-pass decoding efficient — each component decoder is constructed once and reused across all shots and passes.
Tesseract's internal data structures
The Tesseract decoder maintains three key data structures that depend on error costs:
errors[i].likelihood_cost— The log-likelihood-ratio cost of each errorerror_costs[i]— A cachedErrorCoststruct storing bothlikelihood_costandmin_cost = likelihood_cost / |detectors(i)|. Themin_costfield is a normalized cost-per-detector used for early termination inget_detcost.d2e[d]— For each detectormin_cost. This sorted order is critical to the decoder's performance: inget_detcost, the decoder iterates throughd2e[d]to find the minimum-cost unblocked error touching detectorOther structures (
eneighbors,edets, the DEM itself) depend only on the graph topology, not on costs, and are therefore unaffected by reweighting.The
update_internal_costsmethodWhen a reweighting rule modifies$j$ become stale. The new
errors[j].likelihood_coston a target component's decoder, the cachederror_costs[j]and the sort order of everyd2e[d]list containingupdate_internal_costsmethod resynchronizes these incrementally:The key properties of this approach:
errorsvector and the cached cost structures.error_costsentries andd2elists for detectors that are actually touched by modified errors are recomputed. For a reweighting step that modifiesd2elist — far cheaper than reconstructing the decoder from scratch.original_costs), andupdate_internal_costsis called again with the same modified indices to restore the sort order. This ensures the decoder is in a clean state for the next shot without any reconstruction.Performance
The primary motivation for multi-pass Tesseract is speed: by decomposing the DEM into smaller independent components, each component decoder operates on a much smaller graph, resulting in a >100× wall-clock speedup compared to running full (single-pass) Tesseract on the monolithic DEM. The goal is to achieve this speedup with only a modest accuracy penalty.
Accuracy validation status
Accuracy validation is not yet complete. For$d = 3$ and $d = 5$ surface codes, two-pass Tesseract matches two-pass PyMatching (the expected baseline for codes where two-pass PyMatching is applicable). However, at $d = 7$ there is a small accuracy regression. This bug will be fixed in a PR stacked on top of this one.
Architecture
New files
src/multi_pass_tesseract_decoder.{h,cc}MultiPassTesseractDecoderclass with static/causal schedulingsrc/error_correlations.{h,cc}src/dem_decomposition.{h,cc}src/tanner_graph.{h,cc}src/bern_utils.{h,cc}src/multi_pass_sinter_compat.pybind.hMultiPassSinterDecoderandMultiPassSinterCompiledDecodersrc/py/tesseract_decoder/sinter_decoders.pysinter.Decoderwrapper for multi-pass decodingModifications to existing files / Restructuring
setup.pysrc/py/_tesseract_py_utiltotesseract_decoder.utils, introducing__init__.pyfor cleaner top-level re-exports and modular importing.src/tesseract.{h,cc}update_internal_costs()for incremental cost resynchronization after reweighting; added early return indecode_to_errorsfor empty syndromes; addedTesseractDebuggerfriend classsrc/tesseract.pybind.cctesseract_decoderto_coreand registered multi-pass pybind11 bindings.src/BUILDCMakeLists.txtPython API
The multi-pass decoder is exposed as a
sinter.DecoderviaMultiPassSinterDecoder:The
detector_classifiercallable receives(detector_index, coordinates, tag_string)and returns an integer class ID. Detectors with the same class ID are grouped into the same component.Test coverage
src/multi_pass_tesseract_decoder.test.cc: Tests for two-pass correlation benefit, disjoint decoding, causal schedule construction, surface code partitioning, causal scheduling with surface codes, and perfect reset verification.src/dem_decomposition.test.cc: Tests for DEM decomposition, symmetric difference, observable assignment, DEM splitting and merging.src/tanner_graph.test.cc: Tests for Tanner graph connected component analysis.src/error_correlations.test.cc: Tests for correlation extraction pipeline.src/py/multi_pass_bindings_test.py: Python integration tests forMultiPassSinterDecoder.src/py/stub_test.py: Updated stub tests for new public API surface.