feat: Add Inductor backend config templates#688
Open
JewelRoam wants to merge 1 commit intoPaddlePaddle:developfrom
Open
feat: Add Inductor backend config templates#688JewelRoam wants to merge 1 commit intoPaddlePaddle:developfrom
JewelRoam wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
## Overview This PR introduces a flexible configuration system for PyTorch Inductor backend with 8 predefined config templates, CUDA Graphs compatibility fix, and comprehensive unit tests (28 tests total). ## Changes - Inductor backend with 8 config templates (triton, cpp_wrapper, cutlass, aten, cudagraphs, max_autotune, freezing, tma) - CUDA Graphs output buffer overwrite fix in test_compiler.py - 28 unit tests in test/inductor_backend_test.py ## Testing - All config keys verified against PyTorch 2.7.1 source code - All templates tested with actual model compilation - Unit tests pass: 28/28 OK - TMA config gracefully falls back on non-TMA GPUs (A100) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR introduces a flexible configuration system for PyTorch Inductor backend,
allowing users to select predefined config templates that set groups of
torch._inductor.configoverrides. This provides an extension to PyTorch'sofficial "mode" concept while maintaining full compatibility with the existing
test_compiler.pyframework.Motivation
Previously, the
InductorBackendaccepted only basic config parameters throughindividual
inductor_configdictionary entries. Users could not easily enablecommon combinations of Inductor options such as:
This PR addresses these limitations by introducing config templates - pre-defined,
well-tested combinations of
torch._inductor.configoptions that users can selectby name.
Changes Summary
1. Inductor Backend Configuration Templates
File:
graph_net_bench/torch/backend/inductor_backend.pyNew Features
_INDUCTOR_CONFIG_TEMPLATESdictionary with 8 predefined templates_TEMPLATE_TO_COMPILE_MODEmapping for templates that imply compile modes_set_nested_attr()utility function for setting nested config attributesSupported Templates
tritondefaultcpp_wrapperdefaultcutlassdefaultatendefaultcudagraphsreduce-overheadmax_autotunemax-autotunefreezingdefaulttmadefaultTMA Graceful Fallback
The TMA template has built-in graceful fallback behavior:
This ensures the template works universally while still leveraging TMA
benefits when available.
Enhanced Configuration Interface
Configuration Priority (highest to lowest)
inductor_config- Explicit user-specified overridesfreezing- Top-level convenience flagtemplate- Predefined template defaults2. CUDA Graphs Compatibility Fix
File:
graph_net_bench/torch/test_compiler.pyProblem
When CUDA Graphs is enabled (via
triton.cudagraphsormode="reduce-overhead"):Solution
Clone model outputs immediately after model invocation:
Impact
test_compiler.pyeval_backend_perf.pyandeval_backend_diff.py(they use
torch.save/torch.loadwhich creates independent copies)3. Comprehensive Test Suite
File:
test/inductor_backend_test.py(new file, 323 lines)Test Structure
Test Coverage
torch._inductor.configValidation Results
Usage Examples
Basic Template Usage
Template with Custom Mode
Combined Configuration
All Available Templates
tritoneyJ0ZW1wbGF0ZSI6ICJ0cml0b24ifQ==cpp_wrappereyJ0ZW1wbGF0ZSI6ICJjcHBfd3JhcHBlciJ9cutlasseyJ0ZW1wbGF0ZSI6ICJjdXRsYXNzIn0=ateneyJ0ZW1wbGF0ZSI6ICJhdGVuIn0=cudagraphseyJ0ZW1wbGF0ZSI6ICJjdWRhZ3JhcGhzIn0=max_autotuneeyJ0ZW1wbGF0ZSI6ICJtYXhfYXV0b3R1bmUifQ==freezingeyJ0ZW1wbGF0ZSI6ICJmcmVlemluZyJ9tmaeyJ0ZW1wbGF0ZSI6ICJ0bWEifQ==Testing
Manual Testing Results
All 8 templates have been manually tested with actual model compilation:
tritoncpp_wrappercutlassatencudagraphsmax_autotunefreezingtmaAutomated Testing
$ python -m unittest test.inductor_backend_test Ran 28 tests in 0.002s OKBreaking Changes
None. This is a purely additive feature with no changes to existing behavior
when the
templateparameter is not specified.Migration Guide
For users wanting to use the new template system:
--configwith base64-encoded templateinductor_configfor any custom settingsNo code changes required - this is fully backward compatible.
Documentation References
All configuration keys have been verified against PyTorch 2.7.1 source code:
Config File: https://github.com/pytorch/pytorch/blob/main/torch/_inductor/config.py
cpp_wrapper- Line 620max_autotune- Line 575max_autotune_gemm- Line 621epilogue_fusion- Line 527coordinate_descent_tuning- Line 614freezing- Line 429triton.cudagraphs- Line 1660autotune_fallback_to_aten- Line 696triton.enable_persistent_tma_matmul- Line 1916Compile API: https://pytorch.org/docs/stable/generated/torch.compile.html
mode="default"- Default modemode="reduce-overhead"- Uses CUDA Graphsmode="max-autotune"- Enables comprehensive autotuningmode="max-autotune-no-cudagraphs"- Autotuning without CUDA GraphsHardware Requirements
Performance Considerations
Checklist
Related Issues