Skip to content

Comments

feat: add shared memory support via \! shared header directive#34

Merged
tetsuo-cpp merged 2 commits intocanonfrom
shared-memory
Feb 19, 2026
Merged

feat: add shared memory support via \! shared header directive#34
tetsuo-cpp merged 2 commits intocanonfrom
shared-memory

Conversation

@tetsuo-cpp
Copy link
Owner

@tetsuo-cpp tetsuo-cpp commented Feb 19, 2026

Closes #4

Summary

  • Add \! shared <name> i64[<N>] header directive for declaring GPU shared (workgroup) memory
  • ForthToMLIR emits tagged memref.alloca at kernel entry; using the shared name pushes its base address onto the stack
  • ForthToGPU converts tagged allocas to gpu.func workgroup attributions (memref<Nxi64, #gpu.address_space<workgroup>>)

Test plan

  • Translation test: shared alloca emission with forth.shared_name attr and pointer push sequence
  • Translation error test: shared name rejected inside word definitions
  • Translation error test: param/shared cross-namespace name collision
  • ForthToGPU conversion test: alloca replaced by workgroup attribution
  • End-to-end pipeline test: full pipeline produces gpu.binary, intermediate check for workgroup attribution
  • All 72 tests pass

Copy link
Owner Author

@tetsuo-cpp tetsuo-cpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Clean, well-structured PR. The design of using memref.alloca + marker attribute as IR, then converting to workgroup attributions in ForthToGPU, is a nice approach that avoids a new dialect op.

Issues

1. Stale TODOForthToMLIR.h:29

/// TODO: Not yet consumed — scaffolding for shared memory support.
struct SharedDecl {

SharedDecl is now fully consumed — this TODO should be removed.

2. Comment says "three transformations" but lists twoForthToGPU.cpp:161-163

// block, with three transformations:
// - func.return → gpu.return
// - shared memref.alloca → gpu.func workgroup attribution

Should say "two transformations" (or add the implicit third: everything else gets cloned).

3. Scalar shared declarations silently accepted
The shared parsing path (\! shared X i64, no array suffix) is accepted since it shares code with \! param. A scalar shared variable is unusual — consider rejecting it or documenting it. CLAUDE.md currently only documents the array form.

4. No test for param/shared cross-namespace name collision
The duplicate detection correctly catches e.g. \! param A i64[4] followed by \! shared A i64[8], but no test exercises this path. Only param-param collision is tested (header-duplicate-param-error.forth).

What looks good

  • Clean separation: ForthToMLIR handles parsing/emission, ForthToGPU handles address space conversion
  • IRMapping approach in ForthToGPU elegantly maps alloca → workgroup BlockArgument so all downstream uses automatically see the workgroup-typed memref
  • Good error message for shared refs inside word definitions
  • Solid test coverage across translation, conversion, and pipeline layers

@tetsuo-cpp tetsuo-cpp merged commit fcfbdcd into canon Feb 19, 2026
1 check passed
@tetsuo-cpp tetsuo-cpp deleted the shared-memory branch February 19, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shared memory support for GPU scratchpad

1 participant