Skip to content

[PyTorch Debug] Support tensor dump#2645

Open
pggPL wants to merge 28 commits intoNVIDIA:mainfrom
pggPL:inpsect_tensor_dump_support
Open

[PyTorch Debug] Support tensor dump#2645
pggPL wants to merge 28 commits intoNVIDIA:mainfrom
pggPL:inpsect_tensor_dump_support

Conversation

@pggPL
Copy link
Collaborator

@pggPL pggPL commented Feb 3, 2026

Description

This PR introduces a new debug feature focused on offline analysis of tensors.
The motivation is to make it easier to inspect and analyze intermediate tensors outside of runtime, especially during quantization debugging.

The new `DumpTensors` feature allows saving:

  • high-precision tensors (before quantization),
  • quantized tensors (after quantization).

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Added new debug feature: `transformer_engine.debug.features.dump_tensors.DumpTensors`.
  • Added support for dumping high-precision and quantized tensors via `inspect_tensor`.
  • Added/updated tests in `tests/pytorch/debug/test_log.py` for DumpTensors sanity flow.
  • Updated debug documentation/API listing to include `DumpTensors` in `docs/debug/3_api_features.rst`.
  • Fixed robustness issues found in review:
    • logger re-initialization across debug sessions,
    • dump test validation timing (before temp directory cleanup).

Checklist

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

pggPL added 2 commits February 3, 2026 08:54
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL changed the title [Debug] Support tensor dump [PyTorch Debug] Support tensor dump Feb 3, 2026
pre-commit-ci bot and others added 6 commits February 3, 2026 10:45
Signed-off-by: root <pgadzinski@nvidia.com>
Signed-off-by: root <pgadzinski@nvidia.com>
Signed-off-by: root <pgadzinski@nvidia.com>
@pggPL pggPL marked this pull request as ready for review March 5, 2026 10:44
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR introduces DumpTensors, a new debug feature that saves intermediate tensors (both high-precision pre-quantization and quantized forms) to .pt files for offline analysis. The feature integrates with the existing nvdlfw_inspect debug framework, follows the established TEConfigAPIMapper + @api_method pattern used by other TE debug features, and organises saved files per-rank under {log_dir}/tensor_dumps/rank_{rank}/.

Key changes:

  • New transformer_engine/debug/features/dump_tensors.py: implements TensorLogger (singleton, rank-aware directory management) and DumpTensors (config-driven inspect_tensor handler). Several bugs caught in earlier review rounds have been resolved: the missing filepath variable, assertraise ValueError, .detach() before saving, empty-dump log message, and the high-precision-skipped log message are all addressed.
  • Tests (tests/pytorch/debug/test_log.py): adds test_dump_tensors_sanity covering directory creation, exact filename format, dict-key presence, QuantizedTensor type round-trip, and tensor shape. Previously flagged issues (redundant local imports, missing filename/type assertions, weights_only=False comment, validation before temp-dir teardown) are all fixed.
  • Docs: DumpTensors added to docs/debug/3_api_features.rst.
  • Minor: import transformer_engine_torch as tex reordering in log_fp8_tensor_stats.py (no functional impact).

Confidence Score: 4/5

  • Safe to merge after addressing the minor type-annotation issue; no logic bugs remain after multiple review rounds.
  • The critical bugs identified in earlier rounds (undefined filepath, missing .detach(), wrong exception type, silent empty-dump, missing log messages) are all fixed. The test covers the happy path end-to-end including type round-trip. One minor outstanding point is the inaccurate Dict[str, torch.Tensor] annotation, which is a style/tooling concern rather than a runtime issue since QuantizedTensor is a torch.Tensor subclass.
  • transformer_engine/debug/features/dump_tensors.py — dump_dict type annotation (line 239) should be Dict[str, Union[torch.Tensor, QuantizedTensor]].

Important Files Changed

Filename Overview
transformer_engine/debug/features/dump_tensors.py New DumpTensors debug feature: saves high-precision and quantized tensors to .pt files per rank. Core bugs from earlier rounds (missing filepath, missing AssertionError→ValueError, missing .detach(), empty-dump logging) are all resolved in the current revision. One minor remaining issue: dump_dict is annotated as Dict[str, torch.Tensor] but can hold QuantizedTensor values.
tests/pytorch/debug/test_log.py Adds test_dump_tensors_sanity: verifies dump directory creation, filename format, dict structure, QuantizedTensor round-trip, and high-precision shape. Previously discussed issues (redundant local imports, missing filename assertion, missing isinstance check, missing weights_only=False comment, validation-outside-tempdir) are all fixed in this revision.
transformer_engine/debug/features/log_fp8_tensor_stats.py Minor import reorder: import transformer_engine_torch as tex moved from before the from nvdlfw_inspect imports to after them, breaking the conventional third-party import grouping. No functional impact.
docs/debug/3_api_features.rst Adds DumpTensors to the API listing. Straightforward one-line addition; no issues.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant debug_api
    participant DumpTensors
    participant TensorLogger

    Caller->>debug_api: inspect_tensor(layer_name, tensor_name, iteration, tensor, rowwise_quantized_tensor, ...)
    debug_api->>DumpTensors: inspect_tensor(config, ...)
    DumpTensors->>DumpTensors: validate rowwise/columnwise identity
    DumpTensors->>DumpTensors: resolve quantized_tensor (rowwise preferred)
    DumpTensors->>TensorLogger: ensure_initialized(root_log_dir)
    TensorLogger-->>DumpTensors: (root_dir ready)
    DumpTensors->>DumpTensors: build dump_dict {high_precision?, quantized?}
    alt dump_dict non-empty
        DumpTensors->>TensorLogger: save_tensor(dump_dict, layer_name, tensor_name, iteration)
        TensorLogger->>TensorLogger: sanitize names, build filepath
        TensorLogger->>disk: torch.save(dump_dict, filepath)
        TensorLogger-->>DumpTensors: done
        DumpTensors->>debug_api: log_message("Dumped ...")
    else dump_dict empty
        DumpTensors->>debug_api: log_message("No tensors available ...")
    end
Loading

Last reviewed commit: 677ad51

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
pggPL and others added 4 commits March 5, 2026 10:57
Signed-off-by: root <pgadzinski@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
pggPL and others added 3 commits March 5, 2026 13:13
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
Signed-off-by: root <pgadzinski@nvidia.com>
pggPL and others added 2 commits March 5, 2026 14:03
Signed-off-by: root <pgadzinski@nvidia.com>
@pggPL
Copy link
Collaborator Author

pggPL commented Mar 5, 2026

/te-ci pytorch

pggPL and others added 2 commits March 10, 2026 11:51
Drop the dump_quantized_internals config option, the _get_quantized_internals
method, and all helper functions for extracting scales/raw data from
Float8Tensor, Float8BlockwiseQTensor, MXFP8Tensor, and NVFP4Tensor.

Remove corresponding tests: test_dump_tensors_nvfp4_unpacked_codes and
NVFP4_DUMP_TENSORS_CONFIG, and scale/data assertions from test_dump_tensors_sanity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
- Add dot ('.') to _sanitize_name to handle common PyTorch dotted layer
  names like 'encoder.layer.0.attention'
- Add docstring note about pickle dependency for the 'quantized' key
- Add comment explaining weights_only=False in test
- Remove redundant local RecipeState import in test_nvfp4_numeric

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
pggPL and others added 2 commits March 10, 2026 12:08
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Avoids relying on stale self.rank when ensure_initialized is called
before initialize() has set the rank. Consistent with how nvdlfw_inspect
logger resolves rank.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
pggPL and others added 2 commits March 10, 2026 12:38
Detach both high_precision and quantized tensors before saving to avoid
serializing the autograd graph. For QuantizedTensor this is a zero-copy
view (make_like), so no extra GPU allocation.

Add filename format assertion to test_dump_tensors_sanity to catch
regressions in _sanitize_name or the naming convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
pggPL and others added 2 commits March 10, 2026 13:33
Log a message when no tensors are available to dump so the user
has an explicit signal that no file was written.

Assert that the quantized key round-trips as a QuantizedTensor
to catch regressions in detach() or serialisation path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
@pggPL
Copy link
Collaborator Author

pggPL commented Mar 10, 2026

/te-ci pytorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant