This guide walks through every file you need to touch when adding support for a new programming language.
Codegraph uses a dual-engine design:
| Engine | Technology | Availability |
|---|---|---|
| WASM | web-tree-sitter + pre-built .wasm grammars |
Always available (baseline) |
| Native | napi-rs + Rust tree-sitter crates |
Optional; 5-10x faster; auto-fallback to WASM |
Both engines produce the same ExtractorOutput structure, so graph building and
queries are engine-agnostic. When adding a new language you implement the
extraction logic twice — once in TypeScript (WASM) and once in Rust
(native) — and a parity test guarantees they agree.
LANGUAGE_REGISTRY in src/domain/parser.ts is the single source of truth
for all supported languages. Each entry declares:
{
id: '<lang>', // LanguageId string
extensions: ['.<ext>'], // File extensions (auto-derives EXTENSIONS)
grammarFile: 'tree-sitter-<lang>.wasm', // WASM grammar filename
extractor: extract<Lang>Symbols, // Extraction function reference
required: false, // true = crash if missing; false = skip gracefully
}Adding a language to the WASM engine requires one registry entry plus an extractor function. Everything else — extension routing, parser loading, dispatch — is automatic.
SUPPORTED_EXTENSIONS(re-exported asEXTENSIONSinshared/constants.ts) is derived from the registry. You never edit it manually.createParsers()iterates the registry and builds aMap<id, Parser>.getParser()uses an extension→registry lookup map (_extToLang).wasmExtractSymbols()callsentry.extractor(tree, filePath)— no ternary chains.parseFilesAuto()inparser.tshandles all dispatch — no per-language routing needed.
Every language extractor must return ExtractorOutput (defined in src/types.ts):
interface ExtractorOutput {
definitions: Definition[]; // functions, methods, classes, interfaces, types
calls: Call[]; // function / method invocations
imports: Import[]; // module / file imports
classes: ClassRelation[]; // extends / implements relationships
exports: Export[]; // named exports (mainly JS/TS)
typeMap: Map<string, TypeMapEntry>; // symbol type annotations
_tree?: TreeSitterTree; // retained for CFG / dataflow analysis
_langId?: LanguageId; // language identifier
_lineCount?: number; // line count for metrics
// (dataflow, astNodes, _typeMapBackfilled are populated post-extraction — do not set)
}| Structure | Fields | Notes |
|---|---|---|
Definition |
name, kind, line, endLine?, children?, visibility?, decorators? |
kind ∈ symbol kinds (see below). Methods: ClassName.methodName. children for sub-declarations (params, properties). visibility: 'public' | 'private' | 'protected' |
Call |
name, line, receiver?, dynamic? |
receiver for method calls (e.g. obj in obj.method()) |
Import |
source, names[], line, typeOnly?, reexport?, wildcardReexport?, dynamicImport?, <lang><Keyword>? |
Set a language flag (see note below) |
ClassRelation |
name, extends?, implements?, line |
|
Export |
name, kind, line |
|
TypeMapEntry |
type, confidence |
Confidence 0-1 (typically 0.9 for native) |
Language import flags use the language's idiomatic keyword, not a fixed
suffix. Examples: goImport, pythonImport, rustUse, csharpUsing,
rubyRequire, phpUse. Choose whichever name matches your language's import
statement (e.g. swiftImport, kotlinImport, zigImport).
Symbol kinds: function, method, class, interface, type, struct,
enum, trait, record, module, parameter, property, constant
(defined in src/shared/kinds.ts). Use the language's native kind (e.g. Go
structs → struct, Rust traits → trait, Ruby modules → module).
Methods inside a class use the ClassName.methodName naming convention.
Use the placeholder <lang> for your language name (e.g. c, swift,
kotlin) and <ext> for its file extensions.
Then install:
npm installAdd an entry to the grammars array:
{ name: 'tree-sitter-<lang>', pkg: 'tree-sitter-<lang>', sub: null },If the grammar ships sub-grammars (like
tree-sitter-typescriptshipstypescriptandtsx), setsubto the subdirectory name.
Build the WASM binary:
npm run build:wasmThis generates grammars/tree-sitter-<lang>.wasm (gitignored — built from
devDeps on npm install).
Two things to do on the TypeScript side:
Every language extractor lives in its own file under src/extractors/ (e.g.
go.ts, python.ts, rust.ts). Create src/extractors/<lang>.ts and
re-export it from src/extractors/index.ts. Then:
- Add
extract<Lang>Symbolsto the re-export block at the top ofsrc/domain/parser.ts(export { ... } from '../extractors/index.js') so the extractor is available fromparser.tsfor backward compatibility. - Add
extract<Lang>Symbolsto the import block directly below (import { ... } from '../extractors/index.js') so it is in scope withinparser.tsitself. (Aexport { X } fromre-export does not makeXavailable in the current file — both blocks are required.) - Reference the extractor function in the
LANGUAGE_REGISTRYarray insrc/domain/parser.ts(see Step 3c).
Write a recursive AST walker that matches tree-sitter node types for your
language. Copy the pattern from an existing extractor like extractGoSymbols in
src/extractors/go.ts or extractRustSymbols in src/extractors/rust.ts:
import type {
ExtractorOutput,
TreeSitterNode,
TreeSitterTree,
} from '../types.js';
import { /* helpers you need, e.g. findChild, nodeEndLine */ } from './helpers.js';
/**
* Extract symbols from <Lang> files.
*/
export function extract<Lang>Symbols(tree: TreeSitterTree, _filePath: string): ExtractorOutput {
const ctx: ExtractorOutput = {
definitions: [],
calls: [],
imports: [],
classes: [],
exports: [],
typeMap: new Map(),
};
function walk(node: TreeSitterNode): void {
switch (node.type) {
// ── Definitions ──
case '<function_node_type>': {
const nameNode = node.childForFieldName('name');
if (nameNode) {
ctx.definitions.push({
name: nameNode.text,
kind: 'function',
line: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
});
}
break;
}
// ── Classes / Structs ──
case '<class_node_type>': {
// ...
break;
}
// ── Imports ──
case '<import_node_type>': {
// ...
ctx.imports.push({
source: '...',
names: [...],
line: node.startPosition.row + 1,
<lang><Keyword>: true, // e.g. goImport, rustUse, rubyRequire
});
break;
}
// ── Calls ──
case 'call_expression': {
const fn = node.childForFieldName('function');
if (fn && fn.type === 'identifier') {
ctx.calls.push({ name: fn.text, line: node.startPosition.row + 1 });
}
break;
}
}
for (let i = 0; i < node.childCount; i++) {
const child = node.child(i);
if (child) walk(child);
}
}
walk(tree.rootNode);
return ctx;
}Tip: Use the tree-sitter playground
to explore AST node types for your language. Paste sample code and inspect the
tree to find the right node.type strings.
Visibility helpers are available in src/extractors/helpers.ts:
goVisibility(name)— uppercase → public (Go convention)rustVisibility(node)— extract fromvisibility_modifierchildpythonVisibility(name)—__name→ private,_name→ protectedextractModifierVisibility(node, modifierTypes?)— general modifier extraction (Java, C#, PHP).modifierTypesis an optionalSet<string>of node type names; defaults cover the most common cases
LanguageRegistryEntry.id is typed as LanguageId — a closed string union in
src/types.ts. Add your language to it before referencing it in the registry:
export type LanguageId =
| 'javascript' | 'typescript' | 'tsx'
| 'python' | 'go' | 'rust'
| 'java' | 'csharp' | 'ruby'
| 'php' | 'hcl'
| '<lang>'; // ← add your language hereWithout this, TypeScript will reject your LANGUAGE_REGISTRY entry with
Type '"<lang>"' is not assignable to type 'LanguageId'.
Add your language to the LANGUAGE_REGISTRY array in src/domain/parser.ts:
{
id: '<lang>',
extensions: ['.<ext>'],
grammarFile: 'tree-sitter-<lang>.wasm',
extractor: extract<Lang>Symbols,
required: false,
},Set required: false so codegraph still works when the WASM grammar isn't
available (e.g. in CI without npm install). Only JS/TS/TSX are required: true.
That's it for the WASM engine. The registry automatically:
- Adds
.<ext>toSUPPORTED_EXTENSIONS(andEXTENSIONSinshared/constants.ts) - Registers the parser in
createParsers() - Routes
getParser()calls via the extension map - Dispatches to your extractor in
wasmExtractSymbols() - Handles
parseFilesAuto()dispatch inparser.ts
You do not need to edit shared/constants.ts or domain/graph/builder.ts.
If your language's imports use a language-specific flag (e.g. pythonImport,
rustUse), add the camelCase mapping in patchNativeResult():
if (i.<lang><Keyword> === undefined) i.<lang><Keyword> = i.<lang>_<keyword>;[dependencies]
tree-sitter-<lang> = "0.x"Four changes in this file:
// 1. Add enum variant
pub enum LanguageKind {
// ... existing ...
<Lang>,
}
// 2. Map extensions in from_extension()
impl LanguageKind {
pub fn from_extension(file_path: &str) -> Option<Self> {
match ext {
// ... existing ...
"<ext>" => Some(Self::<Lang>),
_ => None,
}
}
// 3. Return the tree-sitter Language
pub fn tree_sitter_language(&self) -> Language {
match self {
// ... existing ...
Self::<Lang> => tree_sitter_<lang>::LANGUAGE.into(),
}
}
// 4. Return the language ID string (used by dataflow/CFG rules)
pub fn lang_id_str(&self) -> &'static str {
match self {
// ... existing ...
Self::<Lang> => "<lang>",
}
}
}Create a new file following the pattern in go.rs or rust_lang.rs:
use tree_sitter::{Node, Tree};
use crate::types::*;
use super::helpers::*;
use super::SymbolExtractor;
pub struct <Lang>Extractor;
impl SymbolExtractor for <Lang>Extractor {
fn extract(&self, tree: &Tree, source: &[u8], file_path: &str) -> FileSymbols {
let mut symbols = FileSymbols::new(file_path.to_string());
walk_node(&tree.root_node(), source, &mut symbols);
symbols
}
}
fn walk_node(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
match node.kind() {
"<function_node_type>" => {
if let Some(name_node) = node.child_by_field_name("name") {
symbols.definitions.push(Definition {
name: node_text(&name_node, source).to_string(),
kind: "function".to_string(),
line: start_line(node),
end_line: Some(end_line(node)),
decorators: None,
});
}
}
// ... match other AST node types ...
_ => {}
}
for i in 0..node.child_count() {
if let Some(child) = node.child(i) {
walk_node(&child, source, symbols);
}
}
}Available helpers (from helpers.rs):
| Function | Purpose |
|---|---|
node_text(&node, source) |
Get node text as &str |
find_child(&node, "kind") |
First child of a given type |
find_parent_of_type(&node, "kind") |
Walk up to find parent |
find_parent_of_types(&node, &["a","b"]) |
Walk up, match any type |
named_child_text(&node, "field", source) |
Shorthand for field text |
start_line(&node) / end_line(&node) |
1-based line numbers |
// 1. Declare module
pub mod <lang>;
// 2. Add dispatch arm in extract_symbols_with_opts()
// (extract_symbols() simply delegates to this function — do NOT modify it)
pub fn extract_symbols_with_opts(..., include_ast_nodes: bool) -> FileSymbols {
match lang {
// ... existing ...
LanguageKind::<Lang> => <lang>::<Lang>Extractor.extract_with_opts(tree, source, file_path, include_ast_nodes),
}
}If your imports need a language-specific flag, add it to the Import struct:
pub <lang>_<keyword>: Option<bool>, // e.g. go_import, rust_use, ruby_requireAnd update Import::new() to default it to None.
Follow the pattern from tests/parsers/go.test.js:
import { describe, it, expect, beforeAll } from 'vitest';
import { createParsers } from '../../src/domain/parser.js';
import { extract<Lang>Symbols } from '../../src/extractors/<lang>.js';
describe('<Lang> parser', () => {
let parsers;
beforeAll(async () => {
parsers = await createParsers();
});
function parse<Lang>(code) {
const parser = parsers.get('<lang>');
if (!parser) throw new Error('<Lang> parser not available');
const tree = parser.parse(code);
return extract<Lang>Symbols(tree, 'test.<ext>');
}
it('extracts function definitions', () => {
const symbols = parse<Lang>(`<sample code>`);
expect(symbols.definitions).toContainEqual(
expect.objectContaining({ name: 'myFunc', kind: 'function' })
);
});
// Test: classes/structs, methods, imports, calls, type definitions, etc.
});Note:
parsersis aMap— useparsers.get('<lang>'), notparsers.<lang>Parser. Test imports use.jsextension for vitest resolution of TypeScript sources.
Recommended test cases:
- Function definitions (regular, with parameters)
- Class/struct/enum definitions
- Method definitions (associated with a type)
- Import/include directives
- Function calls (direct and method calls)
- Type definitions / aliases
- Visibility extraction (if applicable)
- Forward declarations (if applicable)
Add test snippets to tests/engines/parity.test.js to verify the native and
WASM extractors produce identical output for your language.
# 1. Build WASM grammar
npm run build:wasm
# 2. Run your parser tests
npx vitest run tests/parsers/<lang>.test.js
# 3. Run the full test suite
npm test
# 4. Build native and test parity
cd crates/codegraph-core && cargo build
npx vitest run tests/engines/parity.test.js
# 5. Test on a real project
codegraph build /path/to/a/<lang>/project
codegraph map
codegraph query someFunction| # | File | Engine | Action |
|---|---|---|---|
| 1 | package.json |
WASM | Add tree-sitter-<lang> devDependency |
| 2 | scripts/build-wasm.js |
WASM | Add grammar entry to array |
| 3 | src/extractors/<lang>.ts + src/domain/parser.ts |
WASM | Create extractor in src/extractors/, re-export via index.ts, add to parser.ts re-export block and import block, add LANGUAGE_REGISTRY entry |
| 4 | src/types.ts |
Both | Add '<lang>' to the LanguageId union; add language-specific flag to Import if needed |
| 5 | src/domain/parser.ts |
WASM | Update patchNativeResult (if language flag needed) |
| 6 | crates/codegraph-core/Cargo.toml |
Native | Add tree-sitter crate |
| 7 | crates/.../parser_registry.rs |
Native | Register enum + extension + grammar + lang_id_str |
| 8 | crates/.../extractors/<lang>.rs |
Native | Implement SymbolExtractor trait |
| 9 | crates/.../extractors/mod.rs |
Native | Declare module + dispatch arm in extract_symbols_with_opts() |
| 10 | crates/.../types.rs |
Native | Add language flag to Import (if needed) |
| 11 | tests/parsers/<lang>.test.js |
WASM | Parser extraction tests |
| 12 | tests/engines/parity.test.js |
Both | Cross-engine validation snippets |
Files you do NOT need to touch:
src/shared/constants.ts—EXTENSIONSis derived from the registry automaticallysrc/shared/kinds.ts— symbol kinds are universal across languagessrc/domain/graph/builder.ts— build pipeline usesparseFilesAuto()fromparser.ts, no manual routing