Skip to content

feat: HCL/Terraform semantic extraction — wire tree-sitter visitor to graph builder #337

@ilyabrykau-orca

Description

@ilyabrykau-orca

Summary

Terraform (.tf, .tfvars) and HCL (.hcl) files produce no semantic graph nodes in v0.6.1. The tree-sitter HCL grammar is compiled into the binary and the language is registered — the AST→graph visitor is the missing piece.

Scale

Tested across 3 production repos:

Repo .tf files
1 702
2 437
3 43
Total 1,182

All currently excluded from indexing via .cbmignore because they produce noise. This is a significant dead zone — infra-as-code is a first-class part of these codebases.

Current behavior

After indexing a repo with .tf files:

MATCH (n) WHERE n.file_path ENDS WITH '.tf'
RETURN n.name, n.type, n.language, n.content LIMIT 5
name="backend", type="", language="", content=""
name="locals",  type="", language="", content=""
name="main.tf", type="", language="", content=""
  • Node labels: "" (unlabeled)
  • type, language, content: all empty
  • Edges from .tf nodes: zero
  • Pattern: one node per filename + one per top-level HCL block label — pure tokenization

Evidence grammar is compiled in

strings codebase-memory-mcp | grep -E 'hcl|terraform|_resource_block'
# → .hcl
# → terraform
# → _resource_block
# → _resource_block_repeat1

Language map registered: {"hcl", CBM_LANG_HCL}, {"terraform", CBM_LANG_HCL}

extra_extensions = {"tf":"hcl","tfvars":"hcl"} accepted by config but produces no change — confirms the visitor is missing, not the grammar.

Expected behavior

MATCH (n:Resource) WHERE n.file_path ENDS WITH '.tf'
RETURN n.name, n.resource_type, n.language
// → {name: "my_bucket", resource_type: "aws_s3_bucket", language: "hcl"}

MATCH (n:Variable {language: "hcl"}) RETURN n.name, n.type, n.default

MATCH (a)-[:CALLS]->(b)
WHERE a.file_path ENDS WITH '.tf' AND a.type = "module"
RETURN a.name, b.name

Proposed node types

HCL construct CBM node type Key fields
resource "aws_s3_bucket" "x" {} Resource resource_type, name
variable "name" {} Variable name, type, default
output "name" {} Output name, value
module "name" { source = "..." } Module name, source
provider "aws" {} Provider name, version
data "type" "name" {} DataSource resource_type, name

Proposed edges

Edge Trigger
CALLS module block → module source path
DEPENDS_ON depends_on list → target Resource nodes
REFERENCES resource attribute referencing <resource>.<name>.<attr>

Also needed: built-in .tf extension registration

Only .hcl is in the built-in extension map. .tf and .tfvars should be registered by default alongside CBM_LANG_HCL without requiring extra_extensions config.

Environment

  • CBM version: 0.6.1
  • Platform: macOS arm64
  • Repos: Go, Python/Django monorepos with Terraform infra alongside application code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions