Skip to content

Float math intrinsics: FEXP, FSQRT, FLOG, FABS, FNEG, FMAX, FMIN #42

@tetsuo-cpp

Description

@tetsuo-cpp

Summary

Add floating-point math intrinsic words that lower to MLIR math dialect operations (and ultimately to NVVM/libdevice intrinsics on GPU).

Words to implement

Unary operations

Word Stack effect MLIR op Description
FEXP ( f -- f ) math.exp Exponential (e^x)
FSQRT ( f -- f ) math.sqrt Square root
FLOG ( f -- f ) math.log Natural logarithm
FABS ( f -- f ) math.absf Absolute value
FNEG ( f -- f ) arith.negf Negation

Binary operations

Word Stack effect MLIR op Description
FMAX ( f f -- f ) arith.maximumf Maximum of two floats
FMIN ( f f -- f ) arith.minimumf Minimum of two floats

Motivation

  • FEXP + FMAX: Required for softmax (exp(x - max)) — blocks flash attention, transformer kernels, and any probability-based computation.
  • FSQRT: Required for score scaling (1/sqrt(d_k)) in attention, and for normalization (LayerNorm, RMSNorm).
  • FLOG: Log-space softmax, cross-entropy loss.
  • FABS: Numerical stability checks, absolute error computation.
  • FNEG: Cleaner than 0.0 SWAP F-; needed for initializing accumulators to -inf patterns.
  • FMAX/FMIN: Online reductions (running max/min across tiles), clamping values.

Implementation notes

  • All follow the same pattern as existing float ops (F+, F-, etc.): bitcast i64↔f64 around the math op.
  • Unary ops: pop one value, bitcast to f64, apply math op, bitcast back, push result.
  • Binary ops (FMAX/FMIN): pop two values, bitcast both to f64, apply op, bitcast result back, push.
  • MLIR's math dialect lowers to LLVM intrinsics, which NVVM maps to libdevice calls (e.g., __nv_exp, __nv_sqrt).
  • FNEG uses arith.negf rather than math dialect.
  • FMAX/FMIN use arith.maximumf/arith.minimumf (IEEE 754 semantics: propagate NaN).

Files to modify

  1. include/warpforth/Dialect/Forth/ForthOps.td — Define new ops
  2. lib/Translation/ForthToMLIR/ForthToMLIR.cpp — Parse words
  3. lib/Conversion/ForthToMemRef/ForthToMemRef.cpp — Add conversion patterns
  4. test/Translation/Forth/ — Parser tests
  5. test/Conversion/ForthToMemRef/ — Conversion tests
  6. test/Pipeline/ — End-to-end pipeline tests

Priority

High — blocks flash attention and most non-trivial GPU compute kernels.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions