|
| 1 | +# lower (BuildHIR) |
| 2 | + |
| 3 | +## File |
| 4 | +`src/HIR/BuildHIR.ts` |
| 5 | + |
| 6 | +## Purpose |
| 7 | +Converts a Babel AST function node into a High-level Intermediate Representation (HIR), which represents code as a control-flow graph (CFG) with basic blocks, instructions, and terminals. This is the first major transformation pass in the React Compiler pipeline, enabling precise expression-level memoization analysis. |
| 8 | + |
| 9 | +## Input Invariants |
| 10 | +- Input must be a valid Babel `NodePath<t.Function>` (FunctionDeclaration, FunctionExpression, or ArrowFunctionExpression) |
| 11 | +- The function must be a component or hook (determined by the environment) |
| 12 | +- Babel scope analysis must be available for binding resolution |
| 13 | +- An `Environment` instance must be provided with compiler configuration |
| 14 | +- Optional `bindings` map for nested function lowering (recursive calls) |
| 15 | +- Optional `capturedRefs` map for context variables captured from outer scope |
| 16 | + |
| 17 | +## Output Guarantees |
| 18 | +- Returns `Result<HIRFunction, CompilerError>` - either a successfully lowered function or compilation errors |
| 19 | +- The HIR function contains: |
| 20 | + - A complete CFG with basic blocks (`body.blocks: Map<BlockId, BasicBlock>`) |
| 21 | + - Each block has an array of instructions and exactly one terminal |
| 22 | + - All control flow is explicit (if/else, loops, switch, logical operators, ternary) |
| 23 | + - Parameters are converted to `Place` or `SpreadPattern` |
| 24 | + - Context captures are tracked in `context` array |
| 25 | + - Function metadata (id, async, generator, directives) |
| 26 | +- All identifiers get unique `IdentifierId` values |
| 27 | +- Instructions have placeholder instruction IDs (set to 0, assigned later) |
| 28 | +- Effects are null (populated by later inference passes) |
| 29 | + |
| 30 | +## Algorithm |
| 31 | +The lowering algorithm uses a recursive descent pattern with a `HIRBuilder` helper class: |
| 32 | + |
| 33 | +1. **Initialization**: Create an `HIRBuilder` with environment and optional bindings. Process captured context variables. |
| 34 | + |
| 35 | +2. **Parameter Processing**: For each function parameter: |
| 36 | + - Simple identifiers: resolve binding and create Place |
| 37 | + - Patterns (object/array): create temporary Place, then emit destructuring assignments |
| 38 | + - Rest elements: wrap in SpreadPattern |
| 39 | + - Unsupported: emit Todo error |
| 40 | + |
| 41 | +3. **Body Processing**: |
| 42 | + - Arrow function expressions: lower body expression to temporary, emit implicit return |
| 43 | + - Block statements: recursively lower each statement |
| 44 | + |
| 45 | +4. **Statement Lowering** (`lowerStatement`): Handle each statement type: |
| 46 | + - **Control flow**: Create separate basic blocks for branches, loops connect back to conditional blocks |
| 47 | + - **Variable declarations**: Create `DeclareLocal`/`DeclareContext` or `StoreLocal`/`StoreContext` instructions |
| 48 | + - **Expressions**: Lower to temporary and discard result |
| 49 | + - **Hoisting**: Detect forward references and emit `DeclareContext` for hoisted identifiers |
| 50 | + |
| 51 | +5. **Expression Lowering** (`lowerExpression`): Convert expressions to `InstructionValue`: |
| 52 | + - **Identifiers**: Create `LoadLocal`, `LoadContext`, or `LoadGlobal` based on binding |
| 53 | + - **Literals**: Create `Primitive` values |
| 54 | + - **Operators**: Create `BinaryExpression`, `UnaryExpression` etc. |
| 55 | + - **Calls**: Distinguish `CallExpression` vs `MethodCall` (member expression callee) |
| 56 | + - **Control flow expressions**: Create separate value blocks for branches (ternary, logical, optional chaining) |
| 57 | + - **JSX**: Lower to `JsxExpression` with lowered tag, props, and children |
| 58 | + |
| 59 | +6. **Block Management**: The builder maintains: |
| 60 | + - A current work-in-progress block accumulating instructions |
| 61 | + - Completed blocks map |
| 62 | + - Scope stack for break/continue resolution |
| 63 | + - Exception handler stack for try/catch |
| 64 | + |
| 65 | +7. **Termination**: Add implicit void return at end if no explicit return |
| 66 | + |
| 67 | +## Key Data Structures |
| 68 | + |
| 69 | +### HIRBuilder (from HIRBuilder.ts) |
| 70 | +- `#current: WipBlock` - Work-in-progress block being populated |
| 71 | +- `#completed: Map<BlockId, BasicBlock>` - Finished blocks |
| 72 | +- `#scopes: Array<Scope>` - Stack for break/continue target resolution (LoopScope, LabelScope, SwitchScope) |
| 73 | +- `#exceptionHandlerStack: Array<BlockId>` - Stack of catch handlers for try/catch |
| 74 | +- `#bindings: Bindings` - Map of variable names to their identifiers |
| 75 | +- `#context: Map<t.Identifier, SourceLocation>` - Captured context variables |
| 76 | +- Methods: `push()`, `reserve()`, `enter()`, `terminate()`, `terminateWithContinuation()` |
| 77 | + |
| 78 | +### Core HIR Types |
| 79 | +- **BasicBlock**: Contains `instructions: Array<Instruction>`, `terminal: Terminal`, `preds: Set<BlockId>`, `phis: Set<Phi>`, `kind: BlockKind` |
| 80 | +- **Instruction**: Contains `id`, `lvalue` (Place), `value` (InstructionValue), `effects` (null initially), `loc` |
| 81 | +- **Terminal**: Block terminator - `if`, `branch`, `goto`, `return`, `throw`, `for`, `while`, `switch`, `ternary`, `logical`, etc. |
| 82 | +- **Place**: Reference to a value - `{kind: 'Identifier', identifier, effect, reactive, loc}` |
| 83 | +- **InstructionValue**: The operation - `LoadLocal`, `StoreLocal`, `CallExpression`, `BinaryExpression`, `FunctionExpression`, etc. |
| 84 | + |
| 85 | +### Block Kinds |
| 86 | +- `block` - Regular sequential block |
| 87 | +- `loop` - Loop header/test block |
| 88 | +- `value` - Block that produces a value (ternary/logical branches) |
| 89 | +- `sequence` - Sequence expression block |
| 90 | +- `catch` - Exception handler block |
| 91 | + |
| 92 | +## Edge Cases |
| 93 | + |
| 94 | +1. **Hoisting**: Forward references to `let`/`const`/`function` declarations emit `DeclareContext` before the reference, enabling correct temporal dead zone handling |
| 95 | + |
| 96 | +2. **Context Variables**: Variables captured by nested functions use `LoadContext`/`StoreContext` instead of `LoadLocal`/`StoreLocal` |
| 97 | + |
| 98 | +3. **For-of/For-in Loops**: Synthesize iterator instructions (`GetIterator`, `IteratorNext`, `NextPropertyOf`) |
| 99 | + |
| 100 | +4. **Optional Chaining**: Creates nested `OptionalTerminal` structures with short-circuit branches |
| 101 | + |
| 102 | +5. **Logical Expressions**: Create branching structures where left side stores to temporary, right side only evaluated if needed |
| 103 | + |
| 104 | +6. **Try/Catch**: Adds `MaybeThrowTerminal` after each instruction in try block, modeling potential control flow to handler |
| 105 | + |
| 106 | +7. **JSX in fbt**: Tracks `fbtDepth` counter to handle whitespace differently in fbt/fbs tags |
| 107 | + |
| 108 | +8. **Unsupported Syntax**: `var` declarations, `with` statements, inline `class` declarations, `eval` - emit appropriate errors |
| 109 | + |
| 110 | +## TODOs |
| 111 | +- `returnTypeAnnotation: null, // TODO: extract the actual return type node if present` |
| 112 | +- `TODO(gsn): In the future, we could only pass in the context identifiers that are actually used by this function and its nested functions` |
| 113 | +- Multiple `// TODO remove type cast` in destructuring pattern handling |
| 114 | +- `// TODO: should JSX namespaced names be handled here as well?` |
| 115 | + |
| 116 | +## Example |
| 117 | +Input JavaScript: |
| 118 | +```javascript |
| 119 | +export default function foo(x, y) { |
| 120 | + if (x) { |
| 121 | + return foo(false, y); |
| 122 | + } |
| 123 | + return [y * 10]; |
| 124 | +} |
| 125 | +``` |
| 126 | + |
| 127 | +Output HIR (simplified): |
| 128 | +``` |
| 129 | +foo(<unknown> x$0, <unknown> y$1): <unknown> $12 |
| 130 | +bb0 (block): |
| 131 | + [1] <unknown> $6 = LoadLocal <unknown> x$0 |
| 132 | + [2] If (<unknown> $6) then:bb2 else:bb1 fallthrough=bb1 |
| 133 | +
|
| 134 | +bb2 (block): |
| 135 | + predecessor blocks: bb0 |
| 136 | + [3] <unknown> $2 = LoadGlobal(module) foo |
| 137 | + [4] <unknown> $3 = false |
| 138 | + [5] <unknown> $4 = LoadLocal <unknown> y$1 |
| 139 | + [6] <unknown> $5 = Call <unknown> $2(<unknown> $3, <unknown> $4) |
| 140 | + [7] Return Explicit <unknown> $5 |
| 141 | +
|
| 142 | +bb1 (block): |
| 143 | + predecessor blocks: bb0 |
| 144 | + [8] <unknown> $7 = LoadLocal <unknown> y$1 |
| 145 | + [9] <unknown> $8 = 10 |
| 146 | + [10] <unknown> $9 = Binary <unknown> $7 * <unknown> $8 |
| 147 | + [11] <unknown> $10 = Array [<unknown> $9] |
| 148 | + [12] Return Explicit <unknown> $10 |
| 149 | +``` |
| 150 | + |
| 151 | +Key observations: |
| 152 | +- The function has 3 basic blocks: entry (bb0), consequent (bb2), alternate/fallthrough (bb1) |
| 153 | +- The if statement creates an `IfTerminal` at the end of bb0 |
| 154 | +- Each branch ends with its own `ReturnTerminal` |
| 155 | +- All values are stored in temporaries (`$N`) or named identifiers (`x$0`, `y$1`) |
| 156 | +- Instructions have sequential IDs within blocks |
| 157 | +- Types and effects are `<unknown>` at this stage (populated by later passes) |
0 commit comments