-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add atomic memory operation words for safe concurrent updates from multiple threads.
Words to implement
Integer atomics
| Word | Stack effect | MLIR op | Description |
|---|---|---|---|
ATOMIC+ |
( n addr -- ) |
memref.atomic_rmw addi |
Atomic add (i64) |
ATOMIC-MAX |
( n addr -- ) |
memref.atomic_rmw maxs |
Atomic signed max (i64) |
ATOMIC-MIN |
( n addr -- ) |
memref.atomic_rmw mins |
Atomic signed min (i64) |
ATOMIC-AND |
( n addr -- ) |
memref.atomic_rmw andi |
Atomic bitwise AND |
ATOMIC-OR |
( n addr -- ) |
memref.atomic_rmw ori |
Atomic bitwise OR |
ATOMIC-XOR |
( n addr -- ) |
memref.atomic_rmw xori |
Atomic bitwise XOR |
ATOMIC-XCHG |
( n addr -- old ) |
memref.atomic_rmw assign |
Atomic exchange, returns old value |
ATOMIC-CAS |
( expected new addr -- old ) |
memref.generic_atomic_rmw |
Compare-and-swap, returns old value |
Float atomics
| Word | Stack effect | MLIR op | Description |
|---|---|---|---|
ATOMIC-F+ |
( f addr -- ) |
memref.atomic_rmw addf |
Atomic float add |
ATOMIC-FMAX |
( f addr -- ) |
memref.atomic_rmw maximumf |
Atomic float max |
ATOMIC-FMIN |
( f addr -- ) |
memref.atomic_rmw minimumf |
Atomic float min |
Motivation
- Multi-block reductions: When a reduction spans more than one thread block, the output must be accumulated atomically (e.g.,
ATOMIC-F+for partial sums,ATOMIC-FMAXfor global max). - Histogram / scatter patterns: Common GPU patterns where multiple threads update the same output location.
- Lock-free data structures:
ATOMIC-CASenables lock-free algorithms. - Flash attention: Multi-block flash attention variants need atomic output accumulation.
Implementation notes
- Integer atomics: straightforward mapping to
memref.atomic_rmwwith the appropriatearith::AtomicRMWKind. - Float atomics: values are i64 bit patterns on the stack, so bitcast to f64 before the atomic op. The address computation follows the same pattern as
!/F!. ATOMIC-CASis more complex: needsmemref.generic_atomic_rmwwith a comparison body, or lower directly to an LLVMcmpxchg.- NVVM has native support for all of these via PTX
atom.*instructions. - Consider starting with just
ATOMIC+andATOMIC-F+as the minimum viable set.
Files to modify
include/warpforth/Dialect/Forth/ForthOps.td— Define new opslib/Translation/ForthToMLIR/ForthToMLIR.cpp— Parse wordslib/Conversion/ForthToMemRef/ForthToMemRef.cpp— Add conversion patternstest/Translation/Forth/— Parser teststest/Conversion/ForthToMemRef/— Conversion tests
Priority
Medium — needed for multi-block reductions and scatter patterns. Not required for single-block kernels.
Related
- Float math intrinsics: FEXP, FSQRT, FLOG, FABS, FNEG, FMAX, FMIN #42 — Float math intrinsics (FMAX/FMIN needed alongside ATOMIC-FMAX/ATOMIC-FMIN)
- Warp-level primitives: shuffle and reductions #10 — Warp-level primitives (warp reductions reduce the need for atomics)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request