Skip to content

Atomic memory operations #43

@tetsuo-cpp

Description

@tetsuo-cpp

Summary

Add atomic memory operation words for safe concurrent updates from multiple threads.

Words to implement

Integer atomics

Word Stack effect MLIR op Description
ATOMIC+ ( n addr -- ) memref.atomic_rmw addi Atomic add (i64)
ATOMIC-MAX ( n addr -- ) memref.atomic_rmw maxs Atomic signed max (i64)
ATOMIC-MIN ( n addr -- ) memref.atomic_rmw mins Atomic signed min (i64)
ATOMIC-AND ( n addr -- ) memref.atomic_rmw andi Atomic bitwise AND
ATOMIC-OR ( n addr -- ) memref.atomic_rmw ori Atomic bitwise OR
ATOMIC-XOR ( n addr -- ) memref.atomic_rmw xori Atomic bitwise XOR
ATOMIC-XCHG ( n addr -- old ) memref.atomic_rmw assign Atomic exchange, returns old value
ATOMIC-CAS ( expected new addr -- old ) memref.generic_atomic_rmw Compare-and-swap, returns old value

Float atomics

Word Stack effect MLIR op Description
ATOMIC-F+ ( f addr -- ) memref.atomic_rmw addf Atomic float add
ATOMIC-FMAX ( f addr -- ) memref.atomic_rmw maximumf Atomic float max
ATOMIC-FMIN ( f addr -- ) memref.atomic_rmw minimumf Atomic float min

Motivation

  • Multi-block reductions: When a reduction spans more than one thread block, the output must be accumulated atomically (e.g., ATOMIC-F+ for partial sums, ATOMIC-FMAX for global max).
  • Histogram / scatter patterns: Common GPU patterns where multiple threads update the same output location.
  • Lock-free data structures: ATOMIC-CAS enables lock-free algorithms.
  • Flash attention: Multi-block flash attention variants need atomic output accumulation.

Implementation notes

  • Integer atomics: straightforward mapping to memref.atomic_rmw with the appropriate arith::AtomicRMWKind.
  • Float atomics: values are i64 bit patterns on the stack, so bitcast to f64 before the atomic op. The address computation follows the same pattern as ! / F!.
  • ATOMIC-CAS is more complex: needs memref.generic_atomic_rmw with a comparison body, or lower directly to an LLVM cmpxchg.
  • NVVM has native support for all of these via PTX atom.* instructions.
  • Consider starting with just ATOMIC+ and ATOMIC-F+ as the minimum viable set.

Files to modify

  1. include/warpforth/Dialect/Forth/ForthOps.td — Define new ops
  2. lib/Translation/ForthToMLIR/ForthToMLIR.cpp — Parse words
  3. lib/Conversion/ForthToMemRef/ForthToMemRef.cpp — Add conversion patterns
  4. test/Translation/Forth/ — Parser tests
  5. test/Conversion/ForthToMemRef/ — Conversion tests

Priority

Medium — needed for multi-block reductions and scatter patterns. Not required for single-block kernels.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions