Original Article: MLIR , Using Tablegen for Passes by Jeremy Kun
Windows Adaptation: Focus on TableGen integration with CMake instead of Bazel.
This tutorial uses emojis to help you navigate:
- 📖 Reading sections - Conceptual explanations and background
- 🔬 Examples - Code samples and detailed examination
- 🔍 Deep dives - Feature exploration and sage advice
- 👉 Action sections - Commands to run and tasks to complete
Here's an uncomfortable truth about software development: boilerplate exists because we're not smart enough to remember all the details.
When Jeremy Kun wrote about TableGen, he opened with this disarming confession: he's not clever enough to correctly implement all the required methods for MLIR passes from memory. He forgets which virtual methods need overriding. He misspells function names. He gets parameter types wrong.
This isn't weakness, it's honesty about how humans actually work.
And it leads to a crucial realization: if you're going to forget things anyway, you might as well have a tool generate the boilerplate for you. That tool, in MLIR, is TableGen.
But here's where things get interesting (and where I initially stumbled): TableGen isn't the magical abstraction layer it first appears to be. It's something both simpler and more powerful, a transparent code generator that trades "hiding complexity" for "making repetition mechanical".
- Why boilerplate matters and how TableGen addresses it
- The mental model shift from "abstraction" to "transparent generation"
- Writing
.tdfiles to define passes declaratively - Integrating TableGen with CMake builds (Windows-native approach)
- Understanding generated code (and why reading it is essential)
- The philosophy of white-box code generation
- Real-world tradeoffs and when TableGen helps vs. frustrates
Let me show you the problem before we discuss the solution.
// A minimal pass implementation - look at all this ceremony!
struct MyPass : public PassWrapper<MyPass, OperationPass<>> {
// Command-line name
StringRef getArgument() const override {
return "my-pass";
}
// Help text
StringRef getDescription() const override {
return "Does something useful";
}
// Which dialects does this pass need loaded?
void getDependentDialects(DialectRegistry ®istry) const override {
registry.insert<arith::ArithDialect>();
registry.insert<func::FuncDialect>();
}
// The actual transformation
void runOnOperation() override {
// Your logic here - maybe 10 lines
}
};
// Manual registration ceremony
void registerMyPass() {
PassRegistration<MyPass>();
}Notice the pattern? Out of ~20 lines, maybe 10 contain actual transformation logic. The rest is infrastructure, critical infrastructure that must be exactly right, but also mechanical and repetitive.
Now imagine you're building a real compiler with 50 passes:
- 50
getArgument()implementations (each must be unique) - 50
getDescription()implementations - 50 dependency lists
- 50 registration functions
- Thousands of lines of boilerplate
And every time you add a new pass, you copy-paste-modify this ceremony, hoping you didn't introduce a subtle bug (like forgetting to register a dependent dialect, which fails at runtime with cryptic errors).
This is where TableGen enters.
TableGen is a domain-specific language and code generator used throughout LLVM/MLIR. Instead of writing repetitive C++ boilerplate, you write a declarative .td file and let mlir-tblgen generate the code.
But let me be clear about something crucial, something that took me time to understand:
❌ Wrong: "TableGen is an abstraction layer I don't need to understand"
✅ Right: "TableGen is transparent code generation. I should read the generated .h.inc files to understand what it creates."
This distinction matters enormously.
Black-box abstraction (what I initially thought TableGen was):
- You write high-level specs
- Magic happens
- Usable code appears
- You never look at the generated output
- Errors are explained in terms of your input
- Example: A good DSL, a well-designed code generator
White-box code generation (what TableGen actually is):
- You write declarative definitions
- Predictable generation happens (you can trace it)
- Generated code is meant to be read
- You often inspect the output to understand behavior
- Errors may come from generated code, requiring you to understand the generation pattern
- Example: TableGen, C preprocessor macros
When you treat TableGen as a black box, you get frustrated. The error messages reference generated code you've never seen. The contracts between your code and the base classes are implicit. You're left guessing what methods you need to implement.
When you treat TableGen as a white-box generator, you gain clarity. You read the .h.inc file once, understand the pattern, and know exactly what's being generated. The tool becomes predictable rather than magical.
This tutorial will show you both: the declarative .td syntax (what you write) and the generated C++ code (what you need to understand).
For each pass definition in a .td file, TableGen generates:
- Factory functions -
std::unique_ptr<Pass> createMyPass() - Base class with CRTP - Pre-implemented
getArgument(),getDescription(),getDependentDialects() - Registration helpers - Functions to register passes with the pass manager
- Option/statistic members - If you declare them in the
.tdfile
The generated code uses the Curiously Recurring Template Pattern (CRTP), which allows the base class to know about your derived class at compile time without virtual function overhead.
Let's see the same pass implemented manually versus with TableGen.
// Manual pass implementation - notice the repetitive structure
struct AffineFullUnroll : public PassWrapper<AffineFullUnroll, OperationPass<>> {
StringRef getArgument() const override { return "affine-full-unroll"; }
StringRef getDescription() const override { return "Fully unroll affine loops"; }
void getDependentDialects(DialectRegistry ®istry) const override {
registry.insert<affine::AffineDialect>();
}
void runOnOperation() override {
// Actual pass logic here
}
};
// Manual registration
void registerAffineFullUnrollPass() {
PassRegistration<AffineFullUnroll>();
}With TableGen, the same pass splits into two parts: a declarative specification and a minimal implementation.
File: lib/Transform/Affine/Passes.td
include "mlir/Pass/PassBase.td"
def AffineFullUnroll : Pass<"affine-full-unroll"> {
let summary = "Fully unroll all affine loops";
let description = [{
This pass completely unrolls all affine.for loops by generating
a copy of the loop body for each iteration.
}];
let dependentDialects = ["mlir::affine::AffineDialect"];
}
File: lib/Transform/Affine/AffineFullUnroll.cpp
#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Pass/Pass.h"
namespace mlir {
namespace tutorial {
#define GEN_PASS_DEF_AFFINEFULLUNROLL
#include "Transform/Affine/Passes.h.inc"
struct AffineFullUnroll : impl::AffineFullUnrollBase<AffineFullUnroll> {
using AffineFullUnrollBase::AffineFullUnrollBase;
void runOnOperation() override {
// Only the transformation logic - boilerplate is generated
getOperation()->walk([&](affine::AffineForOp op) {
// Unroll logic here
});
}
};
} // namespace tutorial
} // namespace mlirThe metadata moved from C++ to the .td file:
- Pass name: now in
Pass<"affine-full-unroll"> - Description: now in
let summaryandlet description - Dependencies: now in
let dependentDialects
The C++ implementation only contains the unique logic: runOnOperation(). Everything mechanical is generated.
Trade-off: You now have two files to maintain instead of one. But for a codebase with dozens of passes, this declarative approach scales better and reduces errors.
// Import base definitions
include "mlir/Pass/PassBase.td"
def MyPass : Pass<"my-pass-name"> {
let summary = "Short one-line description";
let description = [{
Longer description with details.
Can span multiple lines.
}];
}
def MyPass : Pass<"my-pass"> {
let summary = "...";
let dependentDialects = [
"mlir::arith::ArithDialect",
"mlir::func::FuncDialect"
];
}
This ensures dialects are loaded before the pass runs.
// Pass runs only on func::FuncOp
def MyFuncPass : Pass<"my-pass", "func::FuncOp"> {
let summary = "...";
}
// Pass runs on entire module (default)
def MyModulePass : Pass<"my-pass"> {
let summary = "...";
}
def MyPass : Pass<"my-pass"> {
let summary = "...";
let options = [
Option<"threshold", "threshold", "int64_t", /*default=*/"10",
"Threshold value for optimization">,
ListOption<"targetOps", "target-ops", "std::string",
"List of operation names to target">
];
}
Usage:
Working directory: Repository root (such as D:\repos\mlir-tutorial\)
.\build\bin\tutorial-opt.exe input.mlir --my-pass="threshold=20 target-ops=arith.addi,arith.muli"def MyPass : Pass<"my-pass"> {
let summary = "...";
let statistics = [
Statistic<"numOpsReplaced", "num-ops-replaced",
"Number of operations replaced">
];
}
In C++:
void runOnOperation() override {
// ...
++numOpsReplaced; // Auto-generated statistic
}lib/Transform/Affine/Passes.td:
include "mlir/Pass/PassBase.td"
def AffineFullUnroll : Pass<"affine-full-unroll"> {
let summary = "Fully unroll all affine loops";
let dependentDialects = ["mlir::affine::AffineDialect"];
}
def AffineFullUnrollPatternRewrite : Pass<"affine-full-unroll-pattern-rewrite"> {
let summary = "Fully unroll affine loops using pattern rewriting";
let dependentDialects = ["mlir::affine::AffineDialect"];
}
lib/Transform/Affine/CMakeLists.txt:
# Generate header from TableGen
set(LLVM_TARGET_DEFINITIONS Passes.td)
mlir_tablegen(Passes.h.inc -gen-pass-decls -name Affine)
mlir_tablegen(Passes.capi.h.inc -gen-pass-capi-header --prefix Affine)
mlir_tablegen(Passes.capi.cpp.inc -gen-pass-capi-impl --prefix Affine)
add_public_tablegen_target(MLIRAffinePassIncGen)
# Build the library
add_mlir_library(MLIRAffineFullUnrollPasses
AffineFullUnroll.cpp
AffineFullUnrollPatternRewrite.cpp
ADDITIONAL_HEADER_DIRS
${PROJECT_SOURCE_DIR}/lib/Transform/Affine
DEPENDS
MLIRAffinePassIncGen
LINK_LIBS PUBLIC
MLIRAffineDialect
MLIRPass
MLIRTransforms
)Working directory: Repository root (such as D:\repos\mlir-tutorial\)
cd build
ninja MLIRAffinePassIncGen # Generate headers
ninja MLIRAffineFullUnrollPasses # Build librarybuild/lib/Transform/Affine/Passes.h.inc:
This file contains three guarded sections.
#define GEN_PASS_DECL_AFFINEFULLUNROLL
#include "Passes.h.inc"Generates:
std::unique_ptr<::mlir::Pass> createAffineFullUnroll();#define GEN_PASS_DEF_AFFINEFULLUNROLL
#include "Passes.h.inc"Generates:
namespace impl {
template <typename DerivedT>
class AffineFullUnrollBase : public ::mlir::OperationPass<> {
public:
using Base = AffineFullUnrollBase;
::llvm::StringRef getArgument() const override { return "affine-full-unroll"; }
::llvm::StringRef getDescription() const override { return "Fully unroll all affine loops"; }
void getDependentDialects(::mlir::DialectRegistry ®istry) const override {
registry.insert<::mlir::affine::AffineDialect>();
}
protected:
AffineFullUnrollBase() = default;
};
} // namespace impl#define GEN_PASS_REGISTRATION
#include "Passes.h.inc"Generates:
inline void registerAffinePasses() {
registerAffineFullUnroll();
registerAffineFullUnrollPatternRewrite();
}lib/Transform/Affine/AffineFullUnroll.cpp:
#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Pass/Pass.h"
namespace mlir {
namespace tutorial {
// Include factory function declaration
#define GEN_PASS_DECL_AFFINEFULLUNROLL
// Include base class definition
#define GEN_PASS_DEF_AFFINEFULLUNROLL
#include "Transform/Affine/Passes.h.inc"
// Implement pass by inheriting from generated base
struct AffineFullUnroll : impl::AffineFullUnrollBase<AffineFullUnroll> {
using AffineFullUnrollBase::AffineFullUnrollBase;
void runOnOperation() override {
// Your transformation logic
}
};
} // namespace tutorial
} // namespace mlirlib/Transform/Affine/Passes.h:
#ifndef LIB_TRANSFORM_AFFINE_PASSES_H
#define LIB_TRANSFORM_AFFINE_PASSES_H
#include "mlir/Pass/Pass.h"
namespace mlir {
namespace tutorial {
// Declare factory functions
#define GEN_PASS_DECL
#include "Transform/Affine/Passes.h.inc"
// Declare registration function
#define GEN_PASS_REGISTRATION
#include "Transform/Affine/Passes.h.inc"
} // namespace tutorial
} // namespace mlir
#endiftools/tutorial-opt.cpp:
#include "Transform/Affine/Passes.h"
#include "Transform/Arith/Passes.h"
int main(int argc, char **argv) {
// Register all tutorial passes
mlir::tutorial::registerAffinePasses();
mlir::tutorial::registerArithPasses();
return mlir::asMainReturnCode(
mlir::MlirOptMain(argc, argv, "Tutorial Pass Driver\n"));
}Passes.td:
def MyOptimization : Pass<"my-opt"> {
let summary = "My optimization pass";
let options = [
Option<"aggressiveness", "aggr", "int", /*default=*/"1",
"Optimization aggressiveness (0-3)">,
Option<"enableDebug", "debug", "bool", /*default=*/"false",
"Enable debug output">
];
}
In C++:
struct MyOptimization : impl::MyOptimizationBase<MyOptimization> {
using MyOptimizationBase::MyOptimizationBase;
void runOnOperation() override {
if (enableDebug) { // Access option
llvm::errs() << "Aggressiveness: " << aggressiveness << "\n";
}
// ...
}
};Usage:
Working directory: Repository root (such as D:\repos\mlir-tutorial\)
.\build\bin\tutorial-opt.exe input.mlir --my-opt="aggr=3 debug=true"Passes.td:
def CSE : Pass<"my-cse"> {
let summary = "Common subexpression elimination";
let statistics = [
Statistic<"numCSE", "num-cse",
"Number of operations eliminated">,
Statistic<"numFolded", "num-folded",
"Number of operations folded">
];
}
In C++:
void runOnOperation() override {
getOperation()->walk([&](Operation *op) {
if (/* can eliminate */) {
++numCSE;
op->erase();
}
});
}Output:
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
1 my-cse - Number of operations eliminated
5 my-cse - Number of operations folded
TableGen uses a clever pattern with C preprocessor macros to give you control over what gets generated. Understanding this pattern is essential for effective use.
When mlir-tblgen processes your .td file, it creates a single .h.inc file with three guarded sections:
// Section 1: Factory function declarations
#ifdef GEN_PASS_DECL_AFFINEFULLUNROLL
std::unique_ptr<::mlir::Pass> createAffineFullUnroll();
#undef GEN_PASS_DECL_AFFINEFULLUNROLL
#endif
// Section 2: Base class definition
#ifdef GEN_PASS_DEF_AFFINEFULLUNROLL
namespace impl {
template <typename DerivedT>
class AffineFullUnrollBase : public ::mlir::OperationPass<> {
// Generated methods here
};
}
#undef GEN_PASS_DEF_AFFINEFULLUNROLL
#endif
// Section 3: Registration functions
#ifdef GEN_PASS_REGISTRATION
inline void registerAffinePasses() {
// Registration code here
}
#undef GEN_PASS_REGISTRATION
#endifThis approach lets you include the same .h.inc file multiple times in different contexts:
- In your header (
Passes.h): Include withGEN_PASS_DECLto get factory declarations - In your implementation (
AffineFullUnroll.cpp): Include withGEN_PASS_DEFto get the base class - In your registration (
InitAllPasses.cpp): Include withGEN_PASS_REGISTRATIONto get registration helpers
Each #define before #include selectively enables one section.
The generated base class uses CRTP (Curiously Recurring Template Pattern):
template <typename DerivedT>
class AffineFullUnrollBase : public ::mlir::OperationPass<> {
// ...
};This allows the base class to call methods on your derived class at compile time without virtual dispatch overhead. When you write:
struct AffineFullUnroll : impl::AffineFullUnrollBase<AffineFullUnroll> {
// Your implementation
};The base class knows about AffineFullUnroll as DerivedT, enabling static polymorphism.
On Windows with CMake, TableGen integration is more straightforward than on Linux with Bazel. Let's see how it works.
TableGen can auto-generate markdown documentation:
# In CMakeLists.txt
mlir_tablegen(Passes.md -gen-pass-doc)Creates:
### `--affine-full-unroll` (AffineFullUnroll)
Fully unroll all affine loops
This pass completely unrolls all affine.for loops by generating
a copy of the loop body for each iteration.# Check what was generated
cat .\build\lib\Transform\Affine\Passes.h.incCommon error:
error: couldn't locate file 'mlir/Pass/PassBase.td'
Solution: Ensure CMakeLists.txt includes:
include_directories(${MLIR_INCLUDE_DIRS})If you get undefined references, check:
- Did you
#define GEN_PASS_DEF_YOURPASS? - Did you inherit from
impl::YourPassBase<YourPass>? - Did you implement
runOnOperation()?
TableGen is a tool with clear strengths and limitations. Understanding both helps you use it effectively.
-
Consistent metadata across many passes - When you have 20+ passes, declaring them consistently in
.tdfiles prevents typos and ensures uniform documentation -
Pass options and statistics - The declarative syntax for command-line options is cleaner than manually implementing them with
PassOptions<> -
Generated documentation - TableGen can generate markdown docs automatically from your pass descriptions
-
Refactoring - When the pass base class interface changes (rare, but happens), you update the TableGen backend once rather than manually updating dozens of passes
-
Learning curve - You need to learn both TableGen syntax and understand the generated code patterns
-
Debugging challenges - Error messages may reference generated code, requiring you to trace back to your
.tddefinitions -
Limited expressiveness - You can only generate what the TableGen backends support; custom boilerplate may still require manual code
-
Build complexity - Adding another code generation step means more build dependencies and longer builds (though this is minimal on Windows with Ninja)
TableGen is not a silver bullet. It trades one form of complexity (repetitive boilerplate) for another (code generation machinery). For small projects with 2-3 passes, manual implementation might be simpler. For larger projects with dozens of passes, TableGen pays off by ensuring consistency and reducing maintenance burden.
The key insight: TableGen doesn't eliminate complexity; it centralizes and mechanizes it.
These practices come from real experience building MLIR-based compilers.
When you first use TableGen for a pass, immediately look at the generated .h.inc file:
cat .\build\lib\Transform\Affine\Passes.h.inc | lessThis demystifies what's happening and helps you debug compilation errors. After seeing the pattern once, you'll understand how all TableGen passes work.
Don't try to generate complex logic. Use .td files for:
- ✅ Pass metadata (name, description, dependencies)
- ✅ Command-line options and statistics
- ✅ Standard lifecycle methods
- ❌ Transformation logic (write in C++)
- ❌ Complex initialization or cleanup code
Group related passes in the same .td file:
// lib/Transform/Affine/Passes.td
include "mlir/Pass/PassBase.td"
def AffineFullUnroll : Pass<"affine-full-unroll"> { /* ... */ }
def AffineLoopFusion : Pass<"affine-loop-fusion"> { /* ... */ }
def AffineVectorize : Pass<"affine-vectorize"> { /* ... */ }
This keeps related passes together and generates one .h.inc file with all their definitions.
Establish a consistent pattern for your namespace and generated code:
namespace mlir {
namespace tutorial {
#define GEN_PASS_DEF_MYPASS
#include "Transform/Affine/Passes.h.inc"
struct MyPass : impl::MyPassBase<MyPass> {
using MyPassBase::MyPassBase;
void runOnOperation() override { /* ... */ }
};
} // namespace tutorial
} // namespace mlirThis pattern appears throughout MLIR and makes code predictable.
When creating a new pass:
- Start with minimal
.tddefinition (name and summary only) - Add CMake integration and generate the
.h.incfile - Implement the minimal C++ pass structure
- Verify it compiles and registers correctly
- Add options, statistics, and dependencies as needed
This incremental approach catches errors early when they're easiest to fix.
Understanding why TableGen uses its particular approach helps you use it more effectively. The design reflects specific choices about what problems to solve and what complexity to accept.
TableGen originated in LLVM's backend code generation, where a single compiler needed to support dozens of target architectures (x86, ARM, RISC-V, etc.). Each target had:
- Hundreds of instruction definitions
- Different register classes
- Complex instruction selection patterns
- Target-specific calling conventions
Manual C++ for all this would be unmaintainable. But a fully generic runtime system would be too slow for production compilers.
TableGen's solution: generate specialized C++ at build time from declarative specifications. This gives you:
- Maintainability of declarative specs
- Performance of specialized C++ code
- Compile-time error checking
At its core, TableGen is a database of records with fields. The .td syntax lets you define records:
def MyPass : Pass<"my-pass"> {
let summary = "My transformation";
let dependentDialects = ["mlir::arith::ArithDialect"];
}
This creates a record named MyPass of type Pass, with fields summary and dependentDialects. The mlir-tblgen tool then queries this database and generates C++ based on what it finds.
You might wonder: why not define passes directly in C++ using some registration macro system?
// Hypothetical alternative
REGISTER_PASS("my-pass", MyPass)
.withSummary("My transformation")
.withDialect<arith::ArithDialect>();This would avoid code generation entirely. But it has downsides:
- Limited metaprogramming - C++ macros and templates can only do so much
- No cross-file analysis - Can't generate global registries spanning multiple files easily
- Runtime overhead - Registration happens at program startup rather than compile time
- No documentation generation - Can't extract structured information to create docs
TableGen's separate compilation step enables capabilities that pure C++ approaches struggle with.
The three-section pattern (_DECL, _DEF, _REGISTRATION) solves a specific problem: you need the same information in different forms in different places.
Consider pass registration. Your header file needs forward declarations:
std::unique_ptr<Pass> createMyPass();Your implementation needs the base class definition:
class MyPassBase : public OperationPass<> { /* ... */ };Your initialization code needs the registration function:
void registerMyPass() { /* ... */ }Rather than generate three separate files (increasing build complexity), TableGen generates one file with three guarded sections. Each consumer includes the same file but activates different sections.
This pattern appears throughout LLVM/MLIR's TableGen usage, once you recognize it, you'll see it everywhere.
Many code generators aim for complete abstraction, you never see the generated code. TableGen takes the opposite approach: the generated code is meant to be read.
This choice has trade-offs:
Benefits:
- Easier debugging (read the generated code to understand errors)
- Educational (you learn MLIR patterns by reading generated examples)
- Transparent (no "magic" to worry about)
- Flexible (you can mix generated and hand-written code)
Costs:
- Steeper learning curve (must understand both
.tdsyntax and generated patterns) - More exposed to implementation changes (generated code format may evolve)
- Less "automated" feeling (you're more aware of the machinery)
The MLIR team chose transparency over opacity. This tutorial follows that philosophy, hence the emphasis on reading generated .h.inc files.
Problem: You write .td definitions, get compilation errors, and have no idea why.
Solution: Read the generated .h.inc file. Understand the pattern once, and all TableGen passes become clear.
Problem: You include Passes.h.inc but get linking errors about undefined methods.
Solution: Remember the preprocessor pattern:
#define GEN_PASS_DEF_MYPASS
#include "Passes.h.inc"The #define must come before the #include.
Problem: Your .td file defines MyPass, but the generated base class is MypassBase (wrong capitalization).
Solution: TableGen converts pass names to CamelCase for generated identifiers. If you define def my_special_pass, the generated base is MySpecialPassBase.
Problem: CMake tries to compile your pass before the .h.inc file is generated.
Solution: Add the TableGen target to your library's dependencies:
add_mlir_library(MyPasses
MyPass.cpp
DEPENDS
MLIRMyPassIncGen # This ensures generation happens first
)This tutorial covered TableGen from both philosophical and practical perspectives. Here are the essential points:
✅ TableGen is white-box code generation, not abstraction, you should understand the generated code
✅ Boilerplate exists because humans forget details, TableGen mechanizes what we'd otherwise copy-paste-modify
✅ .td files declare pass metadata, name, description, options, statistics, dependencies
✅ Generated code uses CRTP for efficient base class implementation without virtual overhead
✅ Three preprocessor guards control what gets included: _DECL, _DEF, _REGISTRATION
✅ CMake integration is straightforward on Windows, mlir_tablegen() handles code generation
✅ TableGen trades complexity types, less boilerplate, but more code generation machinery
✅ Always read generated .h.inc files when learning, this demystifies the process
When you first encounter TableGen, the experience can be disorienting. You're learning MLIR (already complex), and now there's this additional layer of code generation with its own syntax and conventions.
This is normal. Here's what the learning curve typically looks like:
You copy-paste .td examples, get mysterious compilation errors, and wonder why anyone thought code generation was a good idea. The error messages mention files that don't exist in your source tree.
What helps: Stop and read one generated .h.inc file completely. Just one. Pick a simple pass and trace through: what you wrote in the .td, what got generated, how your C++ code uses it.
You start recognizing the three-section pattern. You understand that GEN_PASS_DEF_MYPASS needs to be defined before including the .h.inc file. You can write a simple pass without consulting documentation.
What helps: Implement 2-3 passes from scratch. Not copy-pasting, actually typing out the .td definition, CMake integration, and C++ implementation. Muscle memory matters.
TableGen becomes mundane. You know the syntax for options and statistics. You understand when to use TableGen and when manual C++ is simpler. You can debug TableGen-related compilation errors efficiently.
What helps: Read MLIR's built-in .td files (in C:\msys64\clang64\include\mlir\). See how the professionals structure complex pass definitions.
You encounter a scenario where you need to add a new field to 20 pass definitions. In the manual world, this would mean editing 20 C++ files carefully. With TableGen, you add one field to the base class generator and update 20 .td files mechanically.
This is when TableGen's value becomes visceral. It's not about eliminating complexity, it's about making certain kinds of changes safe and systematic.
This tutorial emphasizes Windows development with CMake and MSYS2. A few Windows-specific notes about TableGen:
CMake on Windows handles TableGen paths correctly, but be aware:
- Generated
.h.incfiles appear inbuild\lib\Transform\...(Windows path separators) - Include paths in CMake use forward slashes even on Windows (CMake normalizes them)
- When viewing generated files, use PowerShell or a Unix-aware tool like
less
TableGen generation is fast (milliseconds), but on Windows with many small files, file system overhead can accumulate. Use Ninja (not Visual Studio MSBuild) for parallel builds:
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release ..
ninja -j8 # Parallel build with 8 jobsVSCode on Windows can index generated .h.inc files if you configure the C++ extension correctly:
{
"C_Cpp.default.compileCommands": "${workspaceFolder}/build/compile_commands.json"
}This lets IntelliSense understand the generated code, making development more pleasant.
TableGen represents a specific philosophy in compiler engineering: make repetitive tasks mechanical rather than manual. It doesn't eliminate complexity, but it centralizes it in a way that scales.
The key to using TableGen effectively is understanding it as white-box code generation. Read the generated code. Understand the patterns. Treat the .td files as structured input to a predictable generator, not as magic incantations.
Once you internalize this mental model, TableGen transforms from a mysterious obstacle into a practical tool. You'll start using it not because you have to, but because it makes certain tasks genuinely easier.
The next tutorial moves beyond passes to defining custom dialects, where TableGen becomes even more valuable.
- Tutorial 05: Defining a New Dialect - Create custom MLIR dialects using TableGen
- Explore MLIR's built-in pass definitions - Read
C:\msys64\clang64\include\mlir\Dialect\*\Passes.td - Experiment with pass options - Add command-line arguments to your passes
- Read PassBase.td - Understanding the base definitions helps decode generated code
- TableGen Language Reference: llvm.org/docs/TableGen/
- MLIR Pass Infrastructure: mlir.llvm.org/docs/PassManagement/
- PassBase.td Source:
C:\msys64\clang64\include\mlir\Pass\PassBase.td - Original Tutorial: jeremykun.com
- LLVM TableGen Backend Development: For those interested in how TableGen itself works
Previous: ← Tutorial 03: Writing Your First Pass Next: Tutorial 05: Defining a New Dialect →