[PyTorch Debug] Support tensor dump by pggPL · Pull Request #2645 · NVIDIA/TransformerEngine

pggPL · 2026-02-03T10:44:04Z

Description

This PR introduces a new debug feature focused on offline analysis of tensors.
The motivation is to make it easier to inspect and analyze intermediate tensors outside of runtime, especially during quantization debugging.

The new `DumpTensors` feature allows saving:

high-precision tensors (before quantization),
quantized tensors (after quantization).

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Added new debug feature: `transformer_engine.debug.features.dump_tensors.DumpTensors`.
Added support for dumping high-precision and quantized tensors via `inspect_tensor`.
Added/updated tests in `tests/pytorch/debug/test_log.py` for DumpTensors sanity flow.
Updated debug documentation/API listing to include `DumpTensors` in `docs/debug/3_api_features.rst`.
Fixed robustness issues found in review:
- logger re-initialization across debug sessions,
- dump test validation timing (before temp directory cleanup).

Checklist

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: root <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: root <pgadzinski@nvidia.com>

greptile-apps · 2026-03-05T10:48:13Z

Greptile Summary

This PR introduces DumpTensors, a new debug feature that saves intermediate tensors (both high-precision pre-quantization and quantized forms) to .pt files for offline analysis. The feature integrates with the existing nvdlfw_inspect debug framework, follows the established TEConfigAPIMapper + @api_method pattern used by other TE debug features, and organises saved files per-rank under {log_dir}/tensor_dumps/rank_{rank}/.

Key changes:

New transformer_engine/debug/features/dump_tensors.py: implements TensorLogger (singleton, rank-aware directory management) and DumpTensors (config-driven inspect_tensor handler). Several bugs caught in earlier review rounds have been resolved: the missing filepath variable, assert → raise ValueError, .detach() before saving, empty-dump log message, and the high-precision-skipped log message are all addressed.
Tests (tests/pytorch/debug/test_log.py): adds test_dump_tensors_sanity covering directory creation, exact filename format, dict-key presence, QuantizedTensor type round-trip, and tensor shape. Previously flagged issues (redundant local imports, missing filename/type assertions, weights_only=False comment, validation before temp-dir teardown) are all fixed.
Docs: DumpTensors added to docs/debug/3_api_features.rst.
Minor: import transformer_engine_torch as tex reordering in log_fp8_tensor_stats.py (no functional impact).

Confidence Score: 4/5

Safe to merge after addressing the minor type-annotation issue; no logic bugs remain after multiple review rounds.
The critical bugs identified in earlier rounds (undefined filepath, missing .detach(), wrong exception type, silent empty-dump, missing log messages) are all fixed. The test covers the happy path end-to-end including type round-trip. One minor outstanding point is the inaccurate Dict[str, torch.Tensor] annotation, which is a style/tooling concern rather than a runtime issue since QuantizedTensor is a torch.Tensor subclass.
transformer_engine/debug/features/dump_tensors.py — dump_dict type annotation (line 239) should be Dict[str, Union[torch.Tensor, QuantizedTensor]].

Important Files Changed

Filename	Overview
transformer_engine/debug/features/dump_tensors.py	New `DumpTensors` debug feature: saves high-precision and quantized tensors to `.pt` files per rank. Core bugs from earlier rounds (missing `filepath`, missing `AssertionError→ValueError`, missing `.detach()`, empty-dump logging) are all resolved in the current revision. One minor remaining issue: `dump_dict` is annotated as `Dict[str, torch.Tensor]` but can hold `QuantizedTensor` values.
tests/pytorch/debug/test_log.py	Adds `test_dump_tensors_sanity`: verifies dump directory creation, filename format, dict structure, `QuantizedTensor` round-trip, and high-precision shape. Previously discussed issues (redundant local imports, missing filename assertion, missing `isinstance` check, missing `weights_only=False` comment, validation-outside-tempdir) are all fixed in this revision.
transformer_engine/debug/features/log_fp8_tensor_stats.py	Minor import reorder: `import transformer_engine_torch as tex` moved from before the `from nvdlfw_inspect` imports to after them, breaking the conventional third-party import grouping. No functional impact.
docs/debug/3_api_features.rst	Adds `DumpTensors` to the API listing. Straightforward one-line addition; no issues.