-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add floating-point math intrinsic words that lower to MLIR math dialect operations (and ultimately to NVVM/libdevice intrinsics on GPU).
Words to implement
Unary operations
| Word | Stack effect | MLIR op | Description |
|---|---|---|---|
FEXP |
( f -- f ) |
math.exp |
Exponential (e^x) |
FSQRT |
( f -- f ) |
math.sqrt |
Square root |
FLOG |
( f -- f ) |
math.log |
Natural logarithm |
FABS |
( f -- f ) |
math.absf |
Absolute value |
FNEG |
( f -- f ) |
arith.negf |
Negation |
Binary operations
| Word | Stack effect | MLIR op | Description |
|---|---|---|---|
FMAX |
( f f -- f ) |
arith.maximumf |
Maximum of two floats |
FMIN |
( f f -- f ) |
arith.minimumf |
Minimum of two floats |
Motivation
- FEXP + FMAX: Required for softmax (
exp(x - max)) — blocks flash attention, transformer kernels, and any probability-based computation. - FSQRT: Required for score scaling (
1/sqrt(d_k)) in attention, and for normalization (LayerNorm, RMSNorm). - FLOG: Log-space softmax, cross-entropy loss.
- FABS: Numerical stability checks, absolute error computation.
- FNEG: Cleaner than
0.0 SWAP F-; needed for initializing accumulators to-infpatterns. - FMAX/FMIN: Online reductions (running max/min across tiles), clamping values.
Implementation notes
- All follow the same pattern as existing float ops (
F+,F-, etc.): bitcast i64↔f64 around the math op. - Unary ops: pop one value, bitcast to f64, apply math op, bitcast back, push result.
- Binary ops (FMAX/FMIN): pop two values, bitcast both to f64, apply op, bitcast result back, push.
- MLIR's
mathdialect lowers to LLVM intrinsics, which NVVM maps to libdevice calls (e.g.,__nv_exp,__nv_sqrt). - FNEG uses
arith.negfrather thanmathdialect. - FMAX/FMIN use
arith.maximumf/arith.minimumf(IEEE 754 semantics: propagate NaN).
Files to modify
include/warpforth/Dialect/Forth/ForthOps.td— Define new opslib/Translation/ForthToMLIR/ForthToMLIR.cpp— Parse wordslib/Conversion/ForthToMemRef/ForthToMemRef.cpp— Add conversion patternstest/Translation/Forth/— Parser teststest/Conversion/ForthToMemRef/— Conversion teststest/Pipeline/— End-to-end pipeline tests
Priority
High — blocks flash attention and most non-trivial GPU compute kernels.
Related
- Warp-level primitives: shuffle and reductions #10 — Warp-level primitives (needed together for performant reductions)
- Tensor core / MMA intrinsics #11 — Tensor core intrinsics (complementary GPU capability)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request