This document is the single source of truth for the GitLab Knowledge Graph (Orbit) project.
The GitLab Knowledge Graph (GKG), product name Orbit, is a backend service that builds a property graph from GitLab instance data (SDLC metadata + code structure) and exposes it through a JSON-based Cypher-like DSL compiled to ClickHouse SQL. It provides a unified context API for AI systems (via MCP) and human users, and queryable APIs for data products.
GA Target: .com end of April 2026 | Dedicated/SM Q2 FY27
Deployment: Cloud native only (Kubernetes/Helm). No Omnibus packaging for the initial iteration.
Program Landing Page: internal handbook (source)
The architecture is documented in the design documents and implemented in the knowledge-graph repository.
flowchart LR
GitLab[GitLab Core] -- CDC replication --> DIP[Data Insights Platform]
GKG -- Git RPC --> GitLab
DIP -- datalake --> CH[(ClickHouse)]
CH <-- graph tables --> GKG[Knowledge Graph · Orbit]
GitLab -. gRPC / AuthZ .-> GKG
style GitLab fill:#333,color:#fff,stroke:#333
style DIP fill:#6E49CB,color:#fff,stroke:#6E49CB
style CH fill:#FFCC00,color:#000,stroke:#FFCC00
style GKG fill:#FC6D26,color:#fff,stroke:#FC6D26
- GitLab Core -- PostgreSQL (OLTP), Gitaly (Git storage), and Rails (application server). The source of all SDLC and code data. Handles authentication and authorization for graph queries.
- Data Insights Platform -- Siphon (CDC) streams PostgreSQL logical replication events through NATS JetStream into ClickHouse.
- ClickHouse -- Columnar database serving two logical databases on one instance: the datalake (raw CDC rows from Siphon) and the graph database (indexed property graph tables).
- Knowledge Graph (Orbit) -- Rust service that transforms datalake rows into a property graph, parses code via Gitaly, and serves graph queries over gRPC. Single binary running as indexer, webserver, scheduler, and health-check.
| Resource | Location |
|---|---|
| Design documents | docs/design-documents/ |
| Crate source | crates/ |
| Ontology definitions | config/ontology/ |
| Dev documentation | docs/ |
Note: gitlab-org/rust/knowledge-graph is the old repository for the local client-side knowledge graph, which will be archived. The code graph was taken from that repo and moved into
orbit/knowledge-graph.
GitLab Knowledge Graph as a Service - GA (#19744)
Blocks: L4: Introduce GitLab Orbit (#773)
| Workstream | Epic | Lead(s) | Description |
|---|---|---|---|
| Product | #20884 | Meg Corren, Angelo Rivera | GTM strategy, pricing, legal review, design |
| Core Development | #20357 | Angelo Rivera, J-G Doyon, M. Usachenko, Bohdan | Indexing, query engine, web server, Rails integration, ontology |
| Security | #20248 | Gus Gray, Angelo Rivera | AuthZ model, threat modeling, AppSec review, penetration testing |
| Infra / Delivery / PREP | #36 | Stephanie Jackson, Jason Plum | Production readiness (PREP MR !64), Siphon deployment (epic #16), observability, self-managed strategy |
| Architecture & Discovery | #20885 | Angelo Rivera, GKG team | DB selection, design doc, executive brief, POC demo |
| Epic | Namespace | Relationship |
|---|---|---|
| #1804 | gl-infra | Infrastructure Support for KG |
| #407 | gl-security | DataSec Support to Orbit |
| #86 | operating-model | DE&M Data Product GKG |
| #79 | operating-model | Monetization - Usage-Based Billing |
| #915 | gitlab-dedicated | GKG on Dedicated (confidential) |
| #17514 | gitlab-org | First Iteration (closed, predecessor) |
Filtered by knowledge graph label:
| Repository | Purpose |
|---|---|
| gitlab-org/orbit/knowledge-graph | Main GKG service -- 19 Rust crates covering parsing, indexing, query compilation, serving, testing, and infrastructure. Single gkg-server binary runs in 4 modes (webserver, indexer, scheduler, health-check). |
| gitlab-org/orbit/build-images | CI builder images (Rust toolchain, pre-compiled tools, sccache) used by the knowledge-graph pipeline |
| gitlab-org/orbit/gkg-helm-charts | Official production Helm chart for GKG (v1.0.0, application chart, uses common-ci-tasks patterns) |
| gitlab-org/orbit/documentation/orbit-artifacts | Offsite transcripts and session notes (Feb 3-5, 2026) |
| Repository | Purpose |
|---|---|
| gitlab-org/analytics-section/siphon | CDC pipeline (Go): PostgreSQL logical replication -> NATS -> ClickHouse. Helm chart at helm/siphon/ (v0.0.1, standalone). |
| gitlab-org/analytics-section/platform-insights/siphon-helm-charts | Production Siphon Helm chart (v1.0.1), deployed via gitlab-helmfiles on ops.gitlab.net |
| Repository | Purpose |
|---|---|
| gitlab-org/gitlab | Rails integration: AuthZ redaction exchange, feature flags, MCP endpoint (/api/v4/mcp-orbit) |
| gitlab-org/gitaly | Git RPC service -- code indexer fetches repository archives via GetArchive gRPC |
| gitlab-org/gitlab-zoekt-indexer | Zoekt code search indexer (historical context: early KG integration MRs in CNG attempted embedding KG via Zoekt FFI) |
These repositories on ops.gitlab.net manage the Kubernetes infrastructure and deployment configs for the GitLab production and staging environments. GKG/Siphon staging infrastructure is configured here.
| Repository | Purpose |
|---|---|
| gitlab-com/gl-infra/config-mgmt | Terraform modules for GKE clusters, Vault integration, Private Service Connect (PSC) networking for Patroni connectivity, and CI runner signing (KMS HSM via OIDC). Contains 8+ MRs for Siphon PSC setup (Jan-Feb 2026). |
| gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles | Monorepo for Helm release configuration across all GitLab.com services (except the GitLab Helm Chart itself). Managed by Helmfile. Contains Siphon (releases/siphon/), NATS (releases/nats/), and DIP (releases/data-insights-platform/) release configs. Push-mirrored to ops.gitlab.net for CI/CD deployments and gitlab.com outage resilience. |
| gitlab-com/gl-infra/k8s-workloads/gitlab-com | Main gitlab.com Kubernetes workloads (no GKG content currently) |
| gitlab-com/gl-infra/chef-repo | Chef node configuration (no GKG/Siphon content found) |
| Location | Purpose |
|---|---|
| Readiness (current) | New official PREP readiness process. GKG assessment MR !64. |
| GKG design documents | Architectural design documents for GKG |
| Data Insights Platform design doc | DIP design document (Siphon's parent platform) |
| Internal program page | R&D PMO program landing page (source) |
| orbit-artifacts | Offsite transcripts and summary (Feb 3-5, 2026): architecture, indexing, query engine, infra, DIP, deployment, billing |
| Readiness reviews (old) | Legacy readiness repo. Siphon review MR !231 (open, 78 comments), NATS review MR !240 (merged). |
| In-repo dev/sandbox docs | INFRASTRUCTURE.md and RUNBOOK.md -- GCP sandbox environment details and operational runbook (dev/sandbox only) |
| Chart | Repository | Purpose |
|---|---|---|
| GKG (official) | gitlab-org/orbit/gkg-helm-charts | Production Helm chart for GKG (v1.0.0, application type). Uses common-ci-tasks patterns. |
| GKG (dev) | helm-dev/gkg/ |
Development/sandbox chart. Deploys full stack: GKG server (4 modes) + Siphon producer/consumer + NATS subchart + GitLab Runner. Values files for local, sandbox. |
| Observability (dev) | helm-dev/observability/ |
kube-prometheus-stack + Loki + Alloy + Grafana dashboards (ETL engine, GKG overview, NATS JetStream) |
| Siphon (standalone) | siphon/helm/siphon/ |
Minimal standalone chart (v0.0.1). Superseded by the GKG dev chart for GKG deployments. |
| Siphon (production) | siphon-helm-charts | v1.0.1, deployed via gitlab-helmfiles on ops.gitlab.net |
| Resource | Details |
|---|---|
| GCP Project | gl-knowledgegraph-prj-f2eec59d |
| GKE Cluster | knowledge-graph-test (us-central1) |
| ClickHouse VM | vm-clickhouse (n4-standard-16) |
| GitLab VM | vm-gitlab-omnibus (n4-standard-8, includes Gitaly + PostgreSQL) |
| Domain | gitlab.gkg.dev |
| Secrets | GCP Secret Manager -> External Secrets Operator |
See docs/dev/INFRASTRUCTURE.md for full details.
Staging is deployed to the analytics-eventsdot-stg environment. All configs live in gitlab-helmfiles:
| Resource | Config |
|---|---|
| Staging environment | bases/environments/analytics-eventsdot-stg.yaml |
| Siphon helmfile | releases/siphon/helmfile.yaml.gotmpl |
| Siphon staging values | releases/siphon/analytics-eventsdot-stg.yaml.gotmpl |
| Siphon staging secrets | releases/siphon/values-secrets/analytics-eventsdot-stg.yaml.gotmpl |
| NATS staging values | releases/nats/analytics-eventsdot-stg.yaml.gotmpl |
| NATS base + network policy | releases/nats/values.yaml.gotmpl |
| NATS production values | releases/nats/analytics-eventsdot-prod.yaml.gotmpl |
| Vault secrets | ESO pulls PostgreSQL credentials from {cluster}/siphon/postgresql |
| PSC | Primary + replica connections to gstg Patroni via Backend Service + ILB + PSC (managed in config-mgmt) |
Siphon is the CDC pipeline that feeds GKG. Its staging deployment is tracked across several issues and epics:
| Reference | Purpose |
|---|---|
| epic #16 | Producer-only Siphon deployment to staging |
| siphon#174 | Patroni connectivity (PSC networking, firewall rules) |
| siphon#175 | Staging validation test plan |
| #586 | DBRE counterpart request for staging support |
| #28386 | Production engineering SRE/DBRE support (DRI: Alex Hanselka) |
| readiness#120 | Siphon readiness review issue |
| readiness !231 | Siphon readiness review MR |
| readiness !240 | NATS readiness review MR (merged) |
All Terraform lives in config-mgmt on ops.gitlab.net, managed via Atlantis. No dedicated GKG Terraform project exists -- the sandbox is managed via Helm charts and GCP console.
| Environment | Path | Manages |
|---|---|---|
siphon-staging |
environments/siphon-staging/ |
Dedicated GKE cluster (data-stg-siphon-5b03df32, us-east1), VPC, Cloud NAT, Vault |
analytics-eventsdot-stg |
environments/analytics-eventsdot-stg/ |
GKE cluster with siphon-pool (tainted), PSC to Patroni primary/replica, ArgoCD, logging |
analytics-eventsdot-prod |
environments/analytics-eventsdot-prod/ |
Production GKE cluster, Vault, ArgoCD (no siphon pool yet) |
ci-runners-signing |
environments/ci-runners-signing/ |
KMS HSM code signing for knowledge-graph binaries via OIDC (project 69095239) |
vault-production |
environments/vault-production/ |
K8s auth roles for Siphon across all clusters, secrets policies for analytics_siphon group |
gstg |
environments/gstg/clickhouse-cloud.tf |
ClickHouse Cloud staging: PSC, firewall rules, private DNS |
gprd |
environments/gprd/clickhouse-cloud.tf |
ClickHouse Cloud production: PSC, firewall rules, private DNS |
| Module | Source | Purpose |
|---|---|---|
| DIP infra | data-insights-platform-infra | Base GKE/VPC Terraform module used by both eventsdot environments |
| GKE | ops.gitlab.net/gitlab-com/gke/google v16.13.0 |
GKE cluster provisioning (mirror) |
| GCP OIDC | ops.gitlab.net/gitlab-com/gcp-oidc/google v3.4.0 |
OIDC federation between GitLab CI and GCP (used for CI signing) |
| ArgoCD bootstrap | gitlab.com/gitlab-com/gke-argocd-bootstrap/google v1.5.0 |
ArgoCD setup on GKE clusters |
| Image | Registry |
|---|---|
| GKG Server | gitlab-org/orbit/knowledge-graph/gkg |
| Siphon | gitlab-org/analytics-section/siphon |
| Rust Builder | gitlab-org/orbit/build-images/rust-builder |
| CNG Gitaly (CI only) | gitlab-org/build/cng/gitaly |
Billing infrastructure for GKG is not yet implemented. It will leverage the existing consumption-based billing system (CustomersDot, Snowplow, ClickHouse) used by AI Gateway and other services.
Architecture and implementation details are TODO -- to be filled out as this workstream progresses.
| Reference | Purpose |
|---|---|
| #79 | Monetization - Usage-Based Billing epic |
| Offsite billing session | Day 2 Session 4: credit-based model, blocking vs non-blocking, SOX, namespace billing, storage considerations |
| Usage billing system doc | GitLab Usage Billing System design document |
| Pricing multipliers | SKU definitions and credit multipliers in CustomersDot |
| Usage billing runbook | CDot billing system architecture and operational runbook |
| SOX ITGC controls | SOX audit requirements for billing code paths |
| Credits dashboard | End-user credits dashboard documentation |
Jerome Ng (@jeromezng, usage billing system architect).
TODO -- this section will cover production runbooks, alerting, and observability metrics.
| Runbook | Location |
|---|---|
| Dev/sandbox runbook | docs/dev/RUNBOOK.md |
| Production runbook | TODO |
| Component | Status |
|---|---|
| Grafana dashboards (dev) | Deployed via helm-dev/observability/ (ETL engine, GKG overview, NATS JetStream) |
| Production Grafana dashboards | TODO |
| Alerting rules | TODO |
| SLIs / SLOs | TODO -- to be defined as part of PREP |
| Metrics | TODO -- gRPC latency, indexing throughput, query latency, redaction exchange timing, ClickHouse query performance |
| Person | Role |
|---|---|
| Nitin Singhal (@nitinsinghal74) | ELT Lead |
| Angelo Rivera (@michaelangeloio) | GKG Lead |
| Meg Corren (@mcorren) | Product Manager |
| Jean-Gabriel Doyon (@jgdoyon1) | SDLC Indexing, Code Indexing, Schema Management |
| Michael Usachenko (@michaelusa) | Graph Query Engine / Compiler |
| Bohdan Parkhomchuk (@bohdanpk) | CI/CD, Deployment, Helm Charts |
| Stephanie Jackson (@stejacks-gitlab) | Infrastructure / SRE, PREP |
| Lyle Kozloff (@lyle) | TPM |
| Adam Hegyi (@ahegyi) | Siphon / DIP Architecture |
| Ankit Bhatnagar (@ankitbhatnagar) | NATS, DIP |
| Arun Sori (@arun.sori) | Siphon Connectivity DRI |
| Alex Hanselka (@ahanselka) | Production Engineering DRI |
| Gus Gray (@ggray-gitlab) | Security, AuthZ Design |
| Jason Plum (@WarheadsSE) | Delivery, SM/Dedicated |
| Brian Greene (@bgreene1) | Ontology Standards |
| Dennis Tang (@dennis) | Analytics Stage, ClickHouse Operations |
| Mark Unthank (@munthank) | Product Designer |