Skip to content

feat: Helm/YAML semantic extraction — named templates, includes, chart dependencies #338

@ilyabrykau-orca

Description

@ilyabrykau-orca

Summary

YAML files (including Helm chart templates) are indexed as flat key extraction in v0.6.1 — one Variable node per YAML leaf key, no semantic edges, no Helm-specific understanding. The result is a node flood with zero graph traversal value.

Scale

Repo .yaml/.yml files Primary content
1 3,147 App config, K8s manifests
2 429 K8s, Helm values, CI
3 301 K8s, CI
4 161 K8s, Helm values
5 56 Helm chart templates (source of truth)

Currently excluded from indexing (except helm-charts) because flat extraction produces noise without signal.

Current behavior — helm-charts repo (56 files, kept indexed)

MATCH (n) RETURN DISTINCT labels(n) AS type, count(n) AS cnt ORDER BY cnt DESC
// → Variable (278), File (54), Module (54), Folder (23)

MATCH ()-[r]->() RETURN DISTINCT type(r), count(r) ORDER BY count(r) DESC
// → DEFINES (332), CONTAINS_FILE (44), CONTAINS_FOLDER (20), FILE_CHANGES_WITH (16)
// → ZERO CALLS, IMPORTS, REFERENCES, or any semantic edges
MATCH (n) WHERE n.file_path ENDS WITH '.yaml'
RETURN n.name, n.type, n.language, n.content LIMIT 3
// → name="appVersion", type="", language="", content=""
// → name="version",    type="", language="", content=""
// → name="name",       type="", language="", content=""

values.yaml produces 86 duplicate nodes (one per leaf key). values.schema.json produces 124 nodes. Fields type, language, content all empty. DEFINES edge means: "this file has a key named X" — not useful for traversal.

Expected behavior — Helm templates

Named template definitions → Function nodes

_helpers.tpl:

{{- define "chart.fullname" -}}

Function {name: "chart.fullname", language: "helm", file_path: "templates/_helpers.tpl"}

include / template calls → CALLS edges

{{ include "chart.fullname" . }}

CALLS edge: calling template → chart.fullname Function node

Chart.yaml dependencies → DEPENDS_ON edges

dependencies:
  - name: postgresql
    repository: https://charts.bitnami.com/bitnami

DEPENDS_ON edge: Chart node → dependency Chart node

values.yaml → structured Variable nodes

Top-level keys only (not leaf explosion), with inferred type, not one-node-per-leaf.

Expected behavior — generic YAML (K8s manifests, CI config)

At minimum:

  • One node per file (not per key)
  • language: "yaml" populated
  • kind + apiVersion fields for K8s resources (e.g. kind: Deployment)
  • REFERENCES edges where one manifest references another by name (e.g. Service → Deployment selector)

If full semantic extraction is not feasible for generic YAML, a single File node per .yaml file (no key explosion) is strongly preferred over the current Variable flood — it produces less noise and the same traversal value (zero).

Proposed node types

Helm construct CBM node type Key fields
{{- define "name" -}} Function name, language: "helm"
Chart.yaml Module name, version, app_version
values.yaml top-level key Variable name, type, default
K8s manifest Resource kind, api_version, name

Proposed edges

Edge Trigger
CALLS {{ include "name" }} → named template
DEPENDS_ON Chart.yaml dependency → dependency chart
REFERENCES K8s Service selector → Deployment labels

Related

Companion to #337 (HCL/Terraform semantic extraction). Together these two cover the full IaC surface of cloud-native repos where application code, Terraform infra, and Helm deployment config coexist.

Environment

  • CBM version: 0.6.1
  • Platform: macOS arm64
  • Primary use case: cloud-native monorepos (Python/Go app + Terraform + Helm)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions