Skip to content

Deeploy-GAP9 Platform#143

Open
runwangdl wants to merge 48 commits intopulp-platform:develfrom
runwangdl:gap9-operators-github
Open

Deeploy-GAP9 Platform#143
runwangdl wants to merge 48 commits intopulp-platform:develfrom
runwangdl:gap9-operators-github

Conversation

@runwangdl
Copy link
Contributor

@runwangdl runwangdl commented Dec 17, 2025

Summary

This PR adds complete GAP9 platform support to Deeploy, including platform integration, DMA support, tiling capabilities, CI/CD workflows, and comprehensive testing infrastructure. This represents 20 commits specifically focused on GAP9 development.

Added

GAP9 Platform Support

  • Initial GAP9 platform integration with full deployer, bindings, and platform configuration (Deeploy/Targets/GAP9/)
  • GAP9 DMA support with L3 DMA and Mchan DMA implementations
  • GAP9-specific memory allocation and free templates
  • GAP9 tiling support for L3 memory
  • GAP9 CI/CD workflows (.github/workflows/_runner-gap9.yml, .github/workflows/ci-platform-gap9.yml, .github/workflows/ci-platform-gap9-tiled.yml)
  • Link to PULP-NN, PULP kernels, and Math libraries for GAP9
  • GAP9 SDK configuration with cluster stack macros
  • GAP9 GVSoC simulation support

Changed

  • Minimally modified PULP kernel syntax to fix GAP9 compiler issues. Changes are minimal and maintain compatibility with PULP kernels with GAP9 GCC toolchain-specific requirements
    - Transpose operator: Fixed GCC segmentation fault caused by template syntax (commit 9ca4595)
    - LayerNorm operator: Resolved epsilon ABI compatibility issue (commit 6b5c2e5)

Known Limitations

  • L3-L2 Async DMA - Currently synchronous; async blocked by Siracusa inheritance
  • NE16 Accelerator - Not yet integrated
  • AutoTiler DW/PW - GAP9 SDK AutoTiler kernels not integrated
  • GAP9 Float Math - Limited coverage (affects RMSNorm, etc.)

Platform Capabilities

✅ Multi-core (1-8) | ✅ L1/L2/L3 memory | ✅ Multi-channel DMA
✅ GVSoC simulation | ✅ Tiling | ✅ PULP-NN integration

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

Xeratec and others added 24 commits November 13, 2025 22:33
- Enable pulling from private GitLab repo
- Improve caching for pip, apt and cargo
- Fix cMake version
- Remove problematic pip installation in favor of apt package
- Add ZSH an Oh My ZSH
- Add package dependencies for GAP9 SDK
- Remove unused files from the container
- Fix banshee package problems
… issueFix duplicate template generation due to PULP inheritance issue
@Xeratec Xeratec added the Feature Addition of new features label Dec 24, 2025
@Xeratec Xeratec added this to Deeploy Dec 24, 2025
@Xeratec Xeratec added this to the Release 0.3.0 milestone Dec 24, 2025
@Xeratec Xeratec moved this to In progress in Deeploy Dec 24, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@TargetLibraries/GAP9/prebuilt/include/pulp_nn_utils.h`:
- Around line 1616-1633: The tail loop uses the variable left as the loop guard
but erroneously decrements length inside the loop, causing an infinite loop; in
the loop in pulp_nn_utils (the block that extracts nibbles with bitextu from
*pIn and *pCom, computes inA0/inA1/inB0/inB1, and writes back with bitins)
replace the decrement of length (length--) with a decrement of left (left--) so
the while(left > 0u) loop terminates correctly while leaving pIn, pCom, and
length semantics intact.
🧹 Nitpick comments (1)
TargetLibraries/GAP9/prebuilt/include/pulp_nn_utils.h (1)

857-865: Existing TODO indicates potential correctness issue for non-multiple-of-8 channel sizes.

The TODO comment warns that this code path may be incorrect when the number of channels is not a multiple of 8. The leftover loop decrements lfover -= 8 which could cause issues or skip elements if lfover is between 1-7.

Would you like me to open an issue to track fixing this limitation, or is this an acceptable constraint documented elsewhere?

- Add GAP9 tiled L3 memory mode tests (singlebuffer/doublebuffer)
- Fix duplicate NUM_CORES definition (global → target-specific)
- Add prebuilt library NUM_CORES validation with auto-fallback
- Fix hex file count reporting (CMAKE_MATCH_COUNT → list LENGTH)
- Reduce dangerous warning suppressions in deeploygap9
- Standardize math.h include in BatchNorm_fp32.c
- Update GVSOC_INSTALL_DIR error messages (env var → CMake var)
- Add stack size configuration comments in sdk.config
- Remove dead code in deeployRunner_tiled_gap9.py
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@cmake/gap9/gap9_gvsoc.cmake`:
- Around line 18-21: The guard currently uses if(NOT DEFINED GAP9_SDK_HOME)
which won't catch an empty environment value because set(GAP9_SDK_HOME
$ENV{GAP_SDK_HOME}) always creates the CMake variable; change the check to test
for emptiness or the environment directly: either replace the guard with if(NOT
GAP9_SDK_HOME) after the set, or check the environment before setting with
if(NOT DEFINED ENV{GAP_SDK_HOME}) and then set(GAP9_SDK_HOME
$ENV{GAP_SDK_HOME}); update the error message path check around GAP9_SDK_HOME
accordingly so an empty or missing SDK path triggers the FATAL_ERROR.

In `@cmake/simulation.cmake`:
- Around line 84-95: The add_custom_target for gvsoc_${name} currently uses a
COMMAND that includes shell-specific constructs (wildcard *.bin and "|| true")
which are not handled because add_custom_target does not run a shell; replace
the copy COMMAND so it runs via an explicit shell at build time (e.g., invoke sh
-c "...") or implement a build-time CMake script invoked by a COMMAND that
checks for and copies any *.bin files, ensuring the wildcard expansion and the
"|| true" fallback happen at build execution rather than configure time; update
the COMMAND line in the add_custom_target definition (the one referencing
${CMAKE_COMMAND} -E copy_if_different ${CMAKE_BINARY_DIR}/*.bin
${GVSOC_WORKDIR}/ || true) to use the shell wrapper or the build-time script so
the copy is evaluated correctly.

In `@Deeploy/Targets/GAP9/Platform.py`:
- Around line 275-306: The untiledOps lists in GAP9Platform and
MemoryGAP9PlatformWrapper currently contain the lowercase string "add", which
never matches ONNX node.op values like "Add"; update the untiledOps definitions
(in both GAP9Platform.untiledOps and MemoryGAP9PlatformWrapper.untiledOps) to
use the correct ONNX name "Add" (or implement a case-insensitive check in
getTargetMemoryLevel by normalizing node.op) so the getTargetMemoryLevel methods
properly detect Add nodes and return ctxt.lookup(tensorName)._memoryLevel.

In `@DeeployTest/Platforms/GAP9/sdk.config`:
- Around line 5-6: The config currently enables two boards
(CONFIG_BOARD_GAP9MOD_V1_0_B and CONFIG_BOARD_GAP9EVK_V1_3) which conflicts with
the build target hardcoded as --target=gap9.evk in gap9_gvsoc.cmake; update
sdk.config to select only the EVK board to match the CMake target (remove or set
to n the CONFIG_BOARD_GAP9MOD_V1_0_B entry) or alternatively update
gap9_gvsoc.cmake to accept the MOD board if intended—ensure only one of
CONFIG_BOARD_GAP9MOD_V1_0_B or CONFIG_BOARD_GAP9EVK_V1_3 remains enabled to
remove ambiguity.
🧹 Nitpick comments (3)
CMakeLists.txt (2)

36-40: Guard GAP_SDK_HOME before include to avoid opaque configure failures.
If the env var is missing, include() fails with a confusing path error. A direct guard makes the failure explicit.

🔧 Suggested patch
 elseif(platform STREQUAL GAP9)
   message(STATUS "Building for platform 'GAP9'")
   set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
   set(ENV{KCONFIG_CONFIG} DeeployTest/Platforms/GAP9/sdk.config)
-  include($ENV{GAP_SDK_HOME}/utils/cmake/setup.cmake)
+  if(NOT DEFINED ENV{GAP_SDK_HOME})
+    message(FATAL_ERROR "GAP_SDK_HOME is not set")
+  endif()
+  include($ENV{GAP_SDK_HOME}/utils/cmake/setup.cmake)

225-227: Avoid undefined TESTNAME in GAP9 project setup.
If TESTNAME isn’t provided, project(${TESTNAME} ...) can fail at configure time. Consider a safe default or explicit error.

🔧 Suggested patch
 if(platform STREQUAL GAP9)
-  project(${TESTNAME} LANGUAGES C ASM)
+  if(NOT DEFINED TESTNAME)
+    set(TESTNAME deeploy)
+  endif()
+  project(${TESTNAME} LANGUAGES C ASM)
Deeploy/Targets/GAP9/Platform.py (1)

264-286: Avoid mutable defaults and instantiated objects in __init__ parameters.
Using a list literal with GAP9ClusterEngine(...) shares state across instances.

🔧 Suggested patch
 class GAP9Platform(DeploymentPlatform):
 
     def __init__(self,
-                 engines = [GAP9ClusterEngine("GAP9Cluster")],
+                 engines = None,
                  variableBuffer = GAP9VariableBuffer,
                  constantBuffer = GAP9ConstantBuffer,
                  structBuffer = GAP9StructBuffer,
                  transientBuffer = GAP9TransientBuffer) -> None:
+        if engines is None:
+            engines = [GAP9ClusterEngine("GAP9Cluster")]
         super().__init__(engines, variableBuffer, constantBuffer, structBuffer, transientBuffer)
@@
 class MemoryGAP9Platform(MemoryPlatform):
 
     def __init__(self,
                  memoryHierarchy: MemoryHierarchy,
                  defaultTargetMemoryLevel: MemoryLevel,
-                 engines = [GAP9ClusterEngine("GAP9Cluster")],
+                 engines = None,
                  variableBuffer = GAP9VariableBuffer,
                  constantBuffer = GAP9ConstantBuffer,
                  structBuffer = GAP9StructBuffer,
                  transientBuffer = GAP9TransientBuffer) -> None:
+        if engines is None:
+            engines = [GAP9ClusterEngine("GAP9Cluster")]
         super().__init__(memoryHierarchy, defaultTargetMemoryLevel, engines, variableBuffer, constantBuffer,
                          structBuffer, transientBuffer)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/_runner-gap9.yml:
- Around line 38-42: The workflow step named "Cache ccache" currently uses
actions/cache/restore@v4 to only restore /app/.ccache (key ccache-ci) but never
saves updated entries; update this by either replacing uses:
actions/cache/restore@v4 with uses: actions/cache@v4 so restore+save are handled
automatically, or add a matching post step that uses actions/cache/save@v4 with
the same path (/app/.ccache) and key (ccache-ci) so that ccache entries produced
during runs (CCACHE_DIR=/app/.ccache) are persisted across runs.

The prebuilt pulp-nn library headers already have Apache 2.0 licenses
but lack SPDX identifiers. Since these are external prebuilt binaries
(similar to third_party code), they should be excluded from SPDX
license validation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@runwangdl runwangdl requested a review from Xeratec February 4, 2026 23:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @.github/workflows/infra-generate-ccache-gap9.yml:
- Around line 16-17: The comment stating the cron schedule is CET is incorrect
because GitHub Actions uses UTC; update the comment above the cron entry that
currently reads the schedule as "0 2 * * *" so it accurately states the timezone
(UTC) or converts the time to CET if you want a 02:00 CET run (e.g., note that
"0 2 * * *" runs at 02:00 UTC / 03:00 CET). Keep the cron line "- cron: \"0 2 *
* *\"" as-is if you intend UTC and change the explanatory comment to mention
UTC, or adjust the cron expression and comment together if you want actual CET
scheduling.
- Around line 50-54: The cache step using actions/cache@v4 currently uses a
static key "ccache-gap9" which prevents updates; update the step that defines
path: /app/.ccache and key: ccache-gap9 to generate a unique, versioned key
(e.g., include github.run_id or a hashFiles() call) and add a restore-keys entry
so older caches can be used as fallbacks; specifically modify the "Clean and
Upload CCache" step to compute a dynamic key (for example by combining a prefix
like "ccache-gap9-" with hashFiles('**/ccache-config') or github.run_id) and add
restore-keys: ccache-gap9- to allow partial restores.

In `@Deeploy/Targets/GAP9/Platform.py`:
- Around line 202-218: The _bufferRepresentation output in GAP9TransientBuffer
uses self._type while GAP9VariableBuffer uses self._instance, causing
inconsistent type/property serialization; inspect the classes
GAP9TransientBuffer and GAP9VariableBuffer and decide on the canonical attribute
name (either _type or _instance), then update
GAP9TransientBuffer._bufferRepresentation (or GAP9VariableBuffer) to use the
chosen attribute consistently and add a fallback (e.g., check hasattr and
default to None) to avoid AttributeError when the attribute is absent; ensure
the updated attribute name matches how instances are initialized elsewhere in
the class constructors.
- Around line 294-306: MemoryGAP9PlatformWrapper defines untiledOps = [] which
is inconsistent with MemoryGAP9Platform (which sets untiledOps = ["add"]) and
renders the conditional in getTargetMemoryLevel dead; update
MemoryGAP9PlatformWrapper.untiledOps to match the platform (set untiledOps to
["add"]) following the pattern used in MemoryPULPPlatformWrapper so the check in
getTargetMemoryLevel(node, tensorName, ctxt) behaves as intended.
🧹 Nitpick comments (4)
cmake/simulation.cmake (1)

72-78: Minor stylistic inconsistency with gvsoc_flags_add_files_to_hyperflash.

The hyperflash version (lines 63-70) uses list(TRANSFORM) while this uses a foreach loop. Both are correct, but aligning the implementation style would improve maintainability.

♻️ Optional: align with hyperflash version using list(TRANSFORM)
 function(gvsoc_flags_add_files_to_flash out_var files_var)
-	set(flags)
-	foreach(file ${${files_var}})
-		list(APPEND flags "--flash-property=${file}@flash:readfs_flash:files")
-	endforeach()
+	set(flags ${${files_var}})
+	list(TRANSFORM flags PREPEND "--flash-property=")
+	list(TRANSFORM flags APPEND "@flash:readfs_flash:files")
 	set(${out_var} ${flags} PARENT_SCOPE)
 endfunction()
Deeploy/Targets/GAP9/Platform.py (2)

221-237: LGTM!

Properly extends parent class buffer representation. The same getattr simplification could be applied here for consistency:

memoryLevel = getattr(self, "_memoryLevel", None)

262-270: Avoid mutable default argument with function call.

The default engines = [GAP9ClusterEngine("GAP9Cluster")] is evaluated once at function definition time. While unlikely to cause issues in practice, this pattern can lead to subtle bugs if the list is modified.

♻️ Suggested fix
 class GAP9Platform(DeploymentPlatform):

     def __init__(self,
-                 engines = [GAP9ClusterEngine("GAP9Cluster")],
+                 engines = None,
                  variableBuffer = GAP9VariableBuffer,
                  constantBuffer = GAP9ConstantBuffer,
                  structBuffer = GAP9StructBuffer,
                  transientBuffer = GAP9TransientBuffer) -> None:
+        if engines is None:
+            engines = [GAP9ClusterEngine("GAP9Cluster")]
         super().__init__(engines, variableBuffer, constantBuffer, structBuffer, transientBuffer)
.github/workflows/infra-generate-ccache-gap9.yml (1)

32-41: Avoid swallowing setup/build failures.

|| true hides failures in environment setup and installation, which can lead to generating or uploading an incomplete cache with no signal. If the script is optional, log and skip explicitly; otherwise fail fast.

🛠️ Proposed fix (fail fast or explicit guard)
       - name: Build Deeploy
         shell: bash
         run: |
           source /app/install/gap9-sdk/.gap9-venv/bin/activate
-          source /app/install/gap9-sdk/configs/gap9_evk_audio.sh || true
-          pip install -e . || true
+          if [ ! -f /app/install/gap9-sdk/configs/gap9_evk_audio.sh ]; then
+            echo "Missing gap9_evk_audio.sh" >&2
+            exit 1
+          fi
+          source /app/install/gap9-sdk/configs/gap9_evk_audio.sh
+          pip install -e .
           deactivate

       - name: Generate CCache for GAP9
         run: |
           source /app/install/gap9-sdk/.gap9-venv/bin/activate
-          source /app/install/gap9-sdk/configs/gap9_evk_audio.sh || true
+          source /app/install/gap9-sdk/configs/gap9_evk_audio.sh

@runwangdl runwangdl force-pushed the gap9-operators-github branch from f4face7 to 67e7e5b Compare February 4, 2026 23:55
- Remove TargetLibraries/GAP9/third_party from git tracking
  (this is a CMake-generated symlink that should not be in the repo)
- Add GAP9/third_party to .gitignore to prevent future tracking
- Revert pulp-nn-mixed submodule to original commit (a9b4aaf5)
  (submodule was unintentionally updated to b69ec23e)
coderabbitai[bot]

This comment was marked as off-topic.

@Xeratec Xeratec modified the milestones: Release 0.3.0, Release 0.2.2 Feb 5, 2026
Copy link
Member

@Xeratec Xeratec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the adjustments. I think we will converge in the next round. I only have a few minor comments and would still prefer to move the pulp-nn libraries from TargetLibraries/PULPOpen/third_party to TargetLibraries/third_party.

As discussed offline, I will create another PR once this is merged to properly update the Dockerflow, including support for running on external hardware.

with:
path: /app/.ccache
key: ccache-ci
key: ccache-gap9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why use a separate cache?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
TargetLibraries/PULPOpen/CMakeLists.txt (1)

42-47: ⚠️ Potential issue | 🟠 Major

Guard pulp-nn-mixed inclusion to prevent duplicate target errors.

Both TargetLibraries/PULPOpen/CMakeLists.txt and TargetLibraries/GAP9/CMakeLists.txt call add_subdirectory(../third_party/pulp-nn-mixed). Configuring both platforms in the same build causes CMake to fail with "source directory already added" error. Wrap the subdirectory call with a TARGET guard:

-add_subdirectory(../third_party/pulp-nn-mixed)
+if(NOT TARGET pulp-nn-mixed)
+  add_subdirectory(../third_party/pulp-nn-mixed)
+endif()
 target_include_directories(pulp-nn-mixed PUBLIC ${PULP_SDK_INCLUDES})
 target_compile_options(pulp-nn-mixed PUBLIC ${PULP_SDK_COMPILE_FLAGS})
🤖 Fix all issues with AI agents
In @.github/workflows/ci-platform-gap9-tiled.yml:
- Around line 27-30: The conditional passed to the select-env step (the value of
input docker_image_deeploy in the select-env block) lacks explicit grouping for
the || and && operators; update the expression to add parentheses so the
repo-equals check is grouped with the && fallback (e.g. wrap (github.repository
== 'pulp-platform/Deeploy' && 'ghcr.io/pulp-platform/deeploy-gap9:latest')
before the ||) so forks don't yield the literal "false" string and the fallback
behaves correctly; modify the docker_image_deeploy expression in the select-env
step accordingly.

@runwangdl runwangdl force-pushed the gap9-operators-github branch from 8c1f6ed to a0e65a6 Compare February 5, 2026 10:47
Specify binary directory when adding third_party submodules
since they are not subdirectories of PULPOpen/GAP9.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@TargetLibraries/GAP9/CMakeLists.txt`:
- Around line 39-47: NUM_CORES is used without being defined which leads to
surprising behavior; add an explicit guard that checks DEFINED NUM_CORES before
the USE_PREBUILT_PULPNN check and error out if it is unset. Specifically, before
the block that uses NUM_CORES (the if(USE_PREBUILT_PULPNN AND NOT NUM_CORES
EQUAL 8) check), add a check like if(NOT DEFINED NUM_CORES) and call
message(FATAL_ERROR "...") to fail fast and tell the user to supply NUM_CORES;
keep the rest of the logic for option(USE_PREBUILT_PULPNN ...) and the 8-core
validation unchanged.
🧹 Nitpick comments (2)
TargetLibraries/GAP9/CMakeLists.txt (2)

5-12: Add CONFIGURE_DEPENDS to keep globbed sources in sync.
Without it, adding/removing files won’t trigger a CMake reconfigure.

♻️ Proposed change
-file(GLOB_RECURSE SOURCES
-  "src/**"
-)
+file(GLOB_RECURSE SOURCES CONFIGURE_DEPENDS
+  "${CMAKE_CURRENT_LIST_DIR}/src/**"
+)
@@
-file(GLOB_RECURSE PULPOPEN_SOURCES "../PULPOpen/src/**")
+file(GLOB_RECURSE PULPOPEN_SOURCES CONFIGURE_DEPENDS
+  "${CMAKE_CURRENT_LIST_DIR}/../PULPOpen/src/**"
+)

21-23: Prefer target_compile_definitions for NUM_CORES.
This keeps usage requirements in the right property instead of COMPILE_OPTIONS.

♻️ Proposed change
-target_compile_options(deeploygap9 PUBLIC
-    -DNUM_CORES=${NUM_CORES}
-  )
+target_compile_definitions(deeploygap9 PUBLIC NUM_CORES=${NUM_CORES})

@runwangdl runwangdl requested a review from Xeratec February 5, 2026 13:55
Copy link
Member

@Xeratec Xeratec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please update the Changelog, and if you don't mind, answer the question in the unresolved threads. Once done, ping me, and we can likely merge it.

I think it is only these two comments:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Addition of new features

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

2 participants