feat: add shared memory support via \! shared header directive#34
feat: add shared memory support via \! shared header directive#34tetsuo-cpp merged 2 commits intocanonfrom
Conversation
tetsuo-cpp
left a comment
There was a problem hiding this comment.
Code Review
Clean, well-structured PR. The design of using memref.alloca + marker attribute as IR, then converting to workgroup attributions in ForthToGPU, is a nice approach that avoids a new dialect op.
Issues
1. Stale TODO — ForthToMLIR.h:29
/// TODO: Not yet consumed — scaffolding for shared memory support.
struct SharedDecl {SharedDecl is now fully consumed — this TODO should be removed.
2. Comment says "three transformations" but lists two — ForthToGPU.cpp:161-163
// block, with three transformations:
// - func.return → gpu.return
// - shared memref.alloca → gpu.func workgroup attributionShould say "two transformations" (or add the implicit third: everything else gets cloned).
3. Scalar shared declarations silently accepted
The shared parsing path (\! shared X i64, no array suffix) is accepted since it shares code with \! param. A scalar shared variable is unusual — consider rejecting it or documenting it. CLAUDE.md currently only documents the array form.
4. No test for param/shared cross-namespace name collision
The duplicate detection correctly catches e.g. \! param A i64[4] followed by \! shared A i64[8], but no test exercises this path. Only param-param collision is tested (header-duplicate-param-error.forth).
What looks good
- Clean separation: ForthToMLIR handles parsing/emission, ForthToGPU handles address space conversion
IRMappingapproach in ForthToGPU elegantly maps alloca → workgroup BlockArgument so all downstream uses automatically see the workgroup-typed memref- Good error message for shared refs inside word definitions
- Solid test coverage across translation, conversion, and pipeline layers
Closes #4
Summary
\! shared <name> i64[<N>]header directive for declaring GPU shared (workgroup) memorymemref.allocaat kernel entry; using the shared name pushes its base address onto the stackgpu.funcworkgroup attributions (memref<Nxi64, #gpu.address_space<workgroup>>)Test plan
forth.shared_nameattr and pointer push sequencegpu.binary, intermediate check for workgroup attribution