Skip to content

Latest commit

 

History

History
371 lines (292 loc) · 14.7 KB

File metadata and controls

371 lines (292 loc) · 14.7 KB

Tutorial 02 — Your first custom module

What you'll learn: Author, sign, publish, and assign a custom NodeModule — the full module supply chain from blank git repo to a Module showing up on a NodeInstance.

Time: ~45 min (most of which is waiting for CI)

Builds on: Tutorial 01 — needs the catalog seeded and the platform running. The VM from Tutorial 01 doesn't need to be live for this tutorial; we'll provision a fresh one in step 9.

Sets you up for: Tutorial 03 — Docker runtime (uses a real module — docker-engine — to provision Docker; the workflow you learn here is how you'd author your own runtime variant).

What you're building

flowchart LR
    Op[Operator]
    subgraph Repo["Gitea repo (your module source)"]
        Mf[manifest.yaml]
        Rfs[rootfs/]
        Cf[Containerfile]
        Wf[.gitea/workflows/build.yaml]
    end
    subgraph CI["Gitea Actions"]
        Build[Stage 1<br/>Containerfile build]
        Compose[Stage 2<br/>composefs encode]
        Sign[cosign sign<br/>keyless via Fulcio]
    end
    Reg[(OCI registry<br/>registry.example.com)]
    subgraph Plat["Powernode platform"]
        Ingest[ModuleOciIngestService]
        Ver[NodeModuleVersion]
        Tpl[NodeTemplate]
    end
    VM[NodeInstance]

    Op --> Repo
    Repo -- "git push tag v0.1.0" --> CI
    Build --> Compose --> Sign
    Sign -- "oras push + cosign attest" --> Reg
    CI -- "webhook back" --> Ingest
    Ingest -- "creates" --> Ver
    Op -- "system_assign_module_to_template" --> Tpl
    Tpl -- "next reconcile" --> VM
Loading

By the end you'll have published my-redis as a versioned, signed, lifecycle-tracked module assignable to any NodeTemplate.

Concept refresher

A NodeModule is a versioned, signed unit of filesystem + package state. Each module ships:

  • manifest.yaml — authoring-time hints (name, license, packages, file globs)
  • rootfs/ — files copied verbatim into the module's filesystem layer
  • Containerfile — Stage 1 of CI; installs packages + copies rootfs

The platform composes a NodeInstance's root filesystem from priority-ordered module composefs layers (see Tutorial 01 overlay-union diagram). Each layer is content-addressed, fs-verity-hashed, cosign-signed — tamper-detected at file-open time.

Authority note: the four glob-spec fields (mask, file_spec, package_spec, dependency_spec) on the System::NodeModule platform record are authoritative for builds. The repo's manifest.yaml only seeds those fields on first import; subsequent edits happen via the platform UI / API.

Prerequisites

Requirement How
Tutorial 01 completed Catalog seeded
Gitea account with permission to create repos E.g. registry.example.com/<account>/modules/...
docker + oras + cosign CLIs apt install docker.io; curl -L <release>/oras_*.tar.gz | tar xz; curl -L <release>/cosign-linux-amd64 -o ~/.local/bin/cosign && chmod +x ~/.local/bin/cosign
Gitea Actions runner labeled ubuntu-24.04 (self-hosted; multi-arch needs ubuntu-24.04-arm too)
A NodeTemplate to assign your module to Use base or hardened from Tutorial 01's catalog

Step 1 — Clone the canonical template

git clone git@registry.example.com:powernode/templates/module-repo.git my-redis-module
cd my-redis-module
rm -rf .git
git init
git remote add origin git@registry.example.com:<account>/modules/my-redis-module.git

Expected outcome: clean working tree with Containerfile, manifest.yaml, rootfs/.gitkeep, and .gitea/workflows/build.yaml. The template is the canonical layout — never deviate from it; the platform's ingest path expects exactly this shape.

Step 2 — Edit manifest.yaml

schema_version: 1

# Identity — flat top-level keys (no `identity:` wrapper).
name: my-redis
display_name: "Redis 7.4"
description: "Redis 7.4 with TLS + persistence"
license: "BSD-3-Clause"

# Packages installed in the Containerfile builder stage via mmdebstrap.
package_spec:
  - redis-server
  - redis-tools

# Paths this module OWNS in the artifact (rsync-include, flat glob list).
file_spec:
  - "/etc/redis/**"
  - "/var/lib/redis/.gitkeep"

# Paths to EXCLUDE from this module's blob (rsync-style mask, applied locally).
mask:
  - "/etc/redis/sentinel.conf"

# Paths I own that no neighbor module may ship — folded into every
# neighbor's effective_mask in both priority directions, so this module's
# /etc/redis/redis.conf cannot be silently overridden by a higher-priority
# module's overlay.
protected_spec:
  - "/etc/redis/redis.conf"

# Module dependencies — resolved transitively by DependencyResolutionService.
# `requires` form is "<owner>/<module>@<version-constraint>".
dependencies:
  requires:
    - "powernode/powernode-base-ruby@^1.0"
    - "powernode/security-hardening@^1.0"
  provides: []

# Init lifecycle. Strings here populate NodeModule.init_start/stop/restart;
# powernode-agent runs them as subprocesses (never eval'd).
init:
  start: "systemctl start redis-server"
  stop:  "systemctl stop redis-server"
  restart: "systemctl restart redis-server"

reboot_required: false

# Optional build hints (pin Ubuntu base + apt snapshot for reproducibility).
build:
  ubuntu_digest: null    # falls back to Containerfile's UBUNTU_DIGEST default
  apt_snapshot: null     # falls back to Containerfile's APT_SNAPSHOT default

Expected outcome: YAML validates against MODULE_MANIFEST_COMPLETE_SCHEMA.md and modules/.schema/module-manifest.schema.json.

Important: category, variety, cosign_identity_regexp, and cosign_issuer_regexp are NOT manifest fields. They live on the platform-side NodeModule DB row and are set at registration time (Step 5 below — via the UI form or the registration MCP action). The cosign identity regex must match exactly the path your Gitea Actions runs from — mismatches cause ingestion to reject the artifact post-build.

Step 3 — Add the rootfs tree

mkdir -p rootfs/etc/redis rootfs/var/lib/redis
touch rootfs/var/lib/redis/.gitkeep

Write rootfs/etc/redis/redis.conf:

bind 0.0.0.0 ::
port 6379
protected-mode yes
tls-port 6380
tls-cert-file /etc/redis/tls/server.crt
tls-key-file  /etc/redis/tls/server.key
tls-ca-cert-file /etc/redis/tls/ca.crt
appendonly yes
dir /var/lib/redis

Expected outcome: Files under rootfs/ will be copied verbatim into the module's composefs layer at build time. .gitkeep is the convention for empty directories — composefs needs the dir to exist in the artifact, even empty.

Step 4 — Validate the manifest locally

# Until system_validate_module_manifest ships (currently in MCP gap backlog):
docker run --rm -v "$PWD:/work:ro" ghcr.io/powernode/module-builder:latest --dry-run

Expected outcome: dry-run reports any schema violations, glob-spec conflicts, or missing required fields without building artifacts. Fix warnings here — they'll fail later much more expensively.

Step 5 — Register the module on the platform

Use the operator UI today. Navigate to /app/system/modules/new in your platform UI. The form collects:

  • name — must match the name in your manifest.yaml
  • display_name, description
  • category_id — pick from the seeded NodeModuleCategory list (system-base, network-overlay, container-runtimes, security-hardening, userland)
  • varietysubscription / config / instance
  • gitea_repo_full_name — e.g. <account>/modules/my-redis-module
  • cosign_identity_regexp + cosign_issuer_regexp — set on the DB row (NOT in manifest)

On save, the UI returns a webhook_secret. Copy it immediately — it's displayed once and used to HMAC-sign the build-completion webhook back from Gitea Actions to the platform.

In Gitea: open repo → Settings → Actions → Secrets → add POWERNODE_WEBHOOK_SECRET with the value from above.

Note on MCP: system_create_module_from_package does exist as an MCP action but materialises a module from an existing PackageRepository (signature: repository_id, package_name, architectures, recommends_selected, category_id, dispatch_build). It is not a Gitea-repo-to-NodeModule registration shortcut. The Gitea-driven flow uses the UI form today; a dedicated MCP wrapper for it may land later.

Step 6 — Push to Gitea

git add manifest.yaml Containerfile rootfs/ .gitea/
git commit -m "feat: my-redis module v0.1.0"
git tag v0.1.0
git push origin develop --tags

Expected outcome: tag push triggers the workflow in .gitea/workflows/build.yaml. Watch via MCP:

platform.list_gitea_workflow_runs({
  owner: "<account>",
  repo: "modules/my-redis-module"
})
// → { runs: [{ id, status: "in_progress", ... }] }

Step 7 — Wait for CI

The workflow runs:

  1. Stage 1 (Containerfile build) — pulls the Ubuntu 24.04 base at the pinned digest, installs package_spec, copies rootfs/ to /work/
  2. Stage 2 (composefs encode) — converts /work/ to a content-addressed composefs blob set with fs-verity root hash
  3. syft + grype — generates SBOM + VEX; the SBOM is ingested by the platform's CVE pipeline
  4. cosign keyless sign — Sigstore Fulcio issues an ephemeral cert bound to the Gitea Actions OIDC token; signs the OCI manifest
  5. oras push — pushes to registry.example.com/<account>/modules/my-redis-module:v0.1.0
  6. Webhook — POSTs to platform's /api/v1/system/webhooks/gitea/module with HMAC signed by POWERNODE_WEBHOOK_SECRET

Expected outcome: ~5–8 min runtime. Workflow shows success. The platform's ModuleOciIngestService polls the registry and creates a NodeModuleVersion row in lifecycle_state: draft.

Step 8 — Verify ingestion

platform.system_list_module_versions({ module_name: "my-redis" })
// → { versions: [{
//      id: "v-redis-0.1.0",
//      version_string: "0.1.0",
//      promotion_state: "built",
//      composefs_digest: "sha256:abc...",
//      fsverity_root_hash: "sha256:def...",
//      cosign_verified: true,
//      ...
//    }] }

Expected outcome: the version row exists, signature verified, promotion_state is built. Promote through staging → blessed → live as you verify the module behaves correctly. The column is promotion_state (not lifecycle_state); valid states are built, staging, blessed, live, retired:

platform.system_promote_module_version({ id: "v-redis-0.1.0", to: "staging" })
// Test on a non-prod NodeInstance
platform.system_promote_module_version({ id: "v-redis-0.1.0", to: "blessed" })
// Operator review passed; module is recommendable
platform.system_promote_module_version({ id: "v-redis-0.1.0", to: "live" })
// Now eligible for fleet-wide rollout

Promotion to live is often require_approval — check module_promote_to_live intervention policy. Demoting / rolling back uses the same MCP action with to: "retired" (no archived state — that's old documentation).

Step 9 — Assign to a Template + provision

platform.system_assign_module_to_template({
  template_id: "<base-or-hardened-template-id>",
  module_name: "my-redis"
})

// Provision a fresh instance from that template
platform.system_create_node({ hostname: "redis-test-1", node_template_id: "<template-id>", ... })
platform.system_provision_instance({ node_id: ... })
// Wait ~3-5 min for KVM boot + module reconcile

Expected outcome: instance reaches status: running and the module appears in running_module_digests.

Verification

platform.system_get_instance({ id: "<instance-id>" })
// → { instance: {
//      running_module_digests: { "my-redis": "sha256:abc...", "system-base": "...", ... },
//      ...
//    }}

platform.system_drift_report({ instance_id: "<id>" })
// → { drift: false }

If you can reach the instance over SDWAN (set up during Tutorial 01):

ssh ops@<instance-host-address>
systemctl status redis-server.service
# → active (running)
redis-cli ping
# → PONG

Cleanup

Leaves the catalog seeded with my-redis for future reference, but removes the test instance:

platform.system_terminate_instance({ id: "<instance-id>" })

// Unassign so the next instance from this template doesn't get the test module
platform.system_unassign_module_from_template({
  template_id: "<template-id>",
  module_name: "my-redis"
})

// (Optional) archive the module if you don't want it visible in the catalog
platform.system_delete_module({ name: "my-redis" })   // cascade-deletes versions

Troubleshooting

Workflow fails at cosign step with unable to fetch token from OIDC issuer — Gitea Actions OIDC isn't configured. In Gitea: Admin Panel → Settings → enable Actions OIDC; check .gitea/workflows/build.yaml has id-token: write permissions.

Workflow succeeds but no NodeModuleVersion row appears — webhook failed to authenticate. Two common causes:

  • POWERNODE_WEBHOOK_SECRET doesn't match NodeModule.webhook_secret (regenerate via Settings → Actions → Secrets and re-paste)
  • Platform's webhook controller IP-banned the Gitea runner (check journalctl -u powernode-backend@default | grep gitea_module)

cosign_verified: false in version row — identity / issuer regex mismatch. Edit the module record's regex fields to match the Gitea Actions OIDC subject exactly. Re-trigger workflow.

Instance reconciles but module doesn't appear in running_module_digests — agent's heartbeat is reporting a different list than what's assigned. Two sub-cases:

  • Module dependency missing on the platform side (your dependency_spec references a module that's not in staging+)
  • Agent failed to verify fs-verity root hash on download (check agent logs via serial console; look for "fsverity verification failed")

PoolEmptyError on provision — you're using an InstancePool template but the pool's empty. Either wait for replenishment (~5 min) or create a fresh non-pool NodeInstance.

What's next