diff --git a/.githubification/README.md b/.githubification/README.md
new file mode 100644
index 0000000000..66f06f6724
--- /dev/null
+++ b/.githubification/README.md
@@ -0,0 +1,476 @@
+# Githubification Assessment: Serving NeMo Agent Toolkit from GitHub Workflows
+
+This document provides a detailed assessment of the possibilities—and limitations—of
+migrating the main functionality of the NVIDIA NeMo Agent Toolkit (NAT) to run entirely
+as a **GitHub-infrastructured application**, where GitHub Actions workflows serve as
+the primary compute and orchestration layer.
+
+---
+
+## 1. Overview of Current Architecture
+
+NeMo Agent Toolkit is an enterprise-grade Python platform for building, instrumenting,
+evaluating, and optimizing AI agents across multiple frameworks. Its runtime involves:
+
+| Component | Technology |
+|-----------|-----------|
+| **Core Runtime** | Python 3.11–3.13, FastAPI + Uvicorn |
+| **CLI** | `nat` command (workflow execution, configuration, evaluation) |
+| **LLM Providers** | NVIDIA NIM, OpenAI, Azure OpenAI, HuggingFace, Ollama, LiteLLM |
+| **Vector Databases** | Milvus, Pinecone, Weaviate, ChromaDB |
+| **Data Stores** | Redis, MySQL, PostgreSQL, S3/MinIO, DuckDB |
+| **Observability** | OpenTelemetry, Arize Phoenix, Weights & Biases Weave |
+| **Protocols** | Model Context Protocol (MCP), Agent-to-Agent (A2A) |
+| **Front-Ends** | FastAPI REST API, Console CLI, built-in Chat UI |
+| **Auth** | OAuth 2.0, API keys, JWT, PKCE |
+| **Packaging** | 30+ modular sub-packages, Docker container, PyPI wheels |
+
+The existing CI already uses GitHub Actions (`pr.yaml` → `ci_pipe.yml`) for linting,
+testing across Python 3.11/3.12/3.13 on amd64/arm64, documentation builds, and wheel
+packaging. A parallel GitLab CI pipeline adds integration tests with a full service
+stack (Redis, MySQL, Milvus, MinIO, Phoenix, Langfuse, Piston, etc.).
+
+---
+
+## 2. Functions That Map Well to GitHub Workflows
+
+### 2.1 CI/CD Pipeline (Already Implemented)
+
+**Feasibility: ✅ Fully feasible — already in place.**
+
+The existing `pr.yaml` and `ci_pipe.yml` workflows already demonstrate that GitHub
+Actions can handle:
+
+- **Code quality checks** — linting via Ruff, formatting via YAPF, pre-commit hooks.
+- **Unit tests** — pytest across a 3×2 matrix (3 Python versions × 2 architectures).
+- **Documentation builds** — Sphinx-based doc generation and artifact upload.
+- **Wheel packaging** — building and uploading distribution wheels for all 30+ packages.
+- **Coverage reporting** — Codecov integration.
+
+### 2.2 Scheduled Evaluation Runs
+
+**Feasibility: ✅ Highly feasible.**
+
+NAT's evaluation system (`nvidia_nat_eval`) runs offline benchmarks against agent
+workflows using configurable evaluators. These are batch jobs that:
+
+- Accept a workflow YAML configuration and a test dataset.
+- Execute the agent, collect outputs, and score them.
+- Produce JSON/XML evaluation reports.
+
+GitHub Actions `schedule` triggers (cron) could run nightly or weekly evaluation
+sweeps. Results would be stored as workflow artifacts or committed back to the
+repository as versioned reports.
+
+```yaml
+on:
+  schedule:
+    - cron: '0 3 * * 1'  # Weekly Monday 3 AM
+jobs:
+  evaluate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - run: pip install "nvidia-nat[eval]"
+      - run: nat evaluate --config_file eval_config.yml
+      - uses: actions/upload-artifact@v4
+        with:
+          name: eval-results
+          path: results/
+```
+
+### 2.3 Prompt and Hyper-Parameter Optimization
+
+**Feasibility: ✅ Feasible with caveats (API keys, runtime limits).**
+
+The optimizer (`nat optimize`) performs iterative prompt tuning and hyper-parameter
+search using Optuna. Each iteration calls an LLM API and evaluates the result. This
+is compute-light but latency-bound (waiting on API responses).
+
+GitHub Actions supports up to **6 hours per job** (or 72 hours for self-hosted
+runners), which is sufficient for most optimization runs. Matrix strategies could
+parallelize the search space across multiple jobs.
+
+**Requirements:**
+- LLM API keys stored as GitHub Secrets.
+- Optimization state persisted across runs via artifacts or external storage.
+
+### 2.4 Documentation Publishing
+
+**Feasibility: ✅ Fully feasible.**
+
+Documentation is already built in the CI pipeline. Adding a deployment step to
+GitHub Pages is straightforward:
+
+```yaml
+- uses: actions/deploy-pages@v4
+```
+
+This would serve the full NAT documentation at
+`https://<org>.github.io/<repo>/`.
+
+### 2.5 Release Automation and Wheel Publishing
+
+**Feasibility: ✅ Fully feasible.**
+
+Wheels are already built in the CI pipeline. A release workflow triggered on Git tags
+could:
+
+1. Build all 30+ package wheels.
+2. Publish them to PyPI via `twine` or the `pypa/gh-action-pypi-publish` action.
+3. Create a GitHub Release with changelogs and attached artifacts.
+4. Build and push Docker images to GHCR (GitHub Container Registry).
+
+### 2.6 Security Scanning and Dependency Auditing
+
+**Feasibility: ✅ Fully feasible.**
+
+GitHub-native tools integrate directly:
+
+- **Dependabot** for automated dependency updates across all 30+ packages.
+- **CodeQL** for static analysis of the Python codebase.
+- **Secret scanning** for accidental credential leaks.
+- **`pip-audit`** or **`uv audit`** run as workflow steps.
+
+### 2.7 Agent Workflow Smoke Tests (Headless)
+
+**Feasibility: ✅ Feasible for API-backed agents.**
+
+Simple workflows that call external LLM APIs (e.g., the "Hello World" Wikipedia
+example) can run end-to-end in a GitHub Actions runner:
+
+```bash
+nat run --config_file workflow.yml --input "List five subspecies of Aardvarks"
+```
+
+This only requires an API key (stored as a secret) and network access to the LLM
+provider. No GPU or local model inference is needed.
+
+---
+
+## 3. Functions That Are Partially Feasible
+
+### 3.1 Integration Testing with Service Dependencies
+
+**Feasibility: ⚠️ Partially feasible — requires service containers.**
+
+The GitLab CI configuration reveals that full integration testing depends on 12+
+external services running simultaneously:
+
+| Service | Purpose |
+|---------|---------|
+| Redis | Caching, session memory |
+| MySQL | Relational data storage |
+| PostgreSQL | Langfuse backend |
+| MinIO (S3) | Object storage |
+| Milvus | Vector database |
+| etcd | Milvus coordination |
+| Arize Phoenix | LLM observability |
+| ClickHouse | Analytics database |
+| Langfuse (server + worker) | LLM tracing platform |
+| OpenSearch | Search/analytics engine |
+| Piston | Code execution sandbox |
+| OAuth2 Server | Authentication testing |
+
+**GitHub Actions can run service containers** via the `services:` key in job
+definitions. However:
+
+- **Container limit:** GitHub-hosted runners support service containers, but
+  orchestrating 12+ services simultaneously may hit memory limits on the standard
+  runners (7 GB RAM for `ubuntu-latest`).
+- **Custom images:** Some services (Piston, OAuth2 server) use custom registry images
+  (`$CI_REGISTRY_IMAGE/...`) that would need to be rebuilt and pushed to GHCR.
+- **Startup ordering:** Complex dependency chains (e.g., Milvus → etcd + MinIO,
+  Langfuse → PostgreSQL + ClickHouse + Redis) require careful health-check scripting.
+
+**Mitigation strategies:**
+- Use **larger runners** (`ubuntu-latest-16-cores` with 64 GB RAM) for integration tests.
+- Split the integration test suite by service dependency into separate jobs.
+- Use **GitHub-hosted larger runners** or **self-hosted runners** for the full stack.
+
+### 3.2 MCP and A2A Server Hosting
+
+**Feasibility: ⚠️ Partially feasible — for testing only, not production hosting.**
+
+NAT can serve tools and agents as MCP (Model Context Protocol) servers and A2A
+(Agent-to-Agent) protocol endpoints. These are long-running FastAPI servers.
+
+GitHub Actions can start these servers within a job for **integration testing**
+purposes (background process + test client in the same job). However, Actions
+workflows are not suitable for **production hosting** of persistent servers because:
+
+- Jobs have a maximum runtime of 6 hours.
+- There is no inbound network routing to workflow runners.
+- Runners are ephemeral and stateless between runs.
+
+**For production MCP/A2A serving**, the recommendation is to use GitHub-adjacent
+infrastructure (e.g., Azure Container Apps triggered by GitHub deployments).
+
+### 3.3 Fine-Tuning Orchestration
+
+**Feasibility: ⚠️ Partially feasible — orchestration only, not GPU compute.**
+
+NAT supports fine-tuning LLMs via:
+- **DPO with NeMo Customizer** — calls the NVIDIA NeMo Customizer API.
+- **GRPO with OpenPipe ART** — calls the OpenPipe API.
+
+The **orchestration** (preparing datasets, launching fine-tuning jobs, monitoring
+progress, running post-training evaluations) can run in GitHub Actions. The actual
+GPU-intensive training happens on remote infrastructure (NVIDIA cloud, OpenPipe).
+
+A workflow could:
+1. Prepare training data from evaluation results.
+2. Submit the fine-tuning job to the NeMo Customizer API.
+3. Poll for completion (with `workflow_dispatch` for manual re-triggers).
+4. Run evaluation against the newly fine-tuned model.
+5. Open a PR with updated model configuration if results improve.
+
+### 3.4 Data Flywheel Automation
+
+**Feasibility: ⚠️ Partially feasible.**
+
+The data flywheel package (`nvidia_nat_data_flywheel`) collects runtime traces,
+identifies failure patterns, and generates training data. In a GitHub-infrastructured
+model:
+
+- **Scheduled workflows** could pull traces from an Elasticsearch/OpenSearch instance.
+- **Processing and analysis** would run in the workflow job.
+- **Output** (curated training datasets) would be stored as artifacts or pushed to S3.
+
+The limitation is that the flywheel depends on a live observability backend to read
+from, which must be hosted externally.
+
+---
+
+## 4. Functions That Are Not Feasible
+
+### 4.1 Production Agent Serving
+
+**Feasibility: ❌ Not feasible.**
+
+NAT's core value proposition is running AI agents in production via FastAPI servers
+that handle user requests in real-time. This requires:
+
+- **Persistent, low-latency HTTP endpoints** — GitHub Actions runners cannot serve
+  inbound traffic and are ephemeral.
+- **Stateful sessions** — agent conversations require session persistence across
+  requests; runners are destroyed after each job.
+- **Horizontal scaling** — production workloads need load balancing and auto-scaling,
+  which Actions does not provide.
+- **GPU access** — local model inference (Ollama, HuggingFace, Dynamo) requires GPU
+  hardware not available on standard GitHub runners.
+
+**Alternative:** Use GitHub Actions for **deployment automation** (build container →
+push to registry → deploy to Kubernetes/Cloud Run/ECS), not for hosting the agent
+itself.
+
+### 4.2 Real-Time Observability and Monitoring
+
+**Feasibility: ❌ Not feasible.**
+
+NAT's observability stack (OpenTelemetry, Phoenix, Weave) requires continuously
+running collectors and dashboards. These are long-lived services that:
+
+- Ingest streaming telemetry data from running agents.
+- Provide real-time dashboards and alerting.
+- Store historical traces for analysis.
+
+GitHub Actions is a batch job runner and cannot host persistent monitoring
+infrastructure.
+
+### 4.3 Built-In Chat UI
+
+**Feasibility: ❌ Not feasible for interactive use.**
+
+NAT provides a built-in chat interface served by FastAPI. This requires a running web
+server accessible to users. GitHub Actions cannot serve this because:
+
+- No inbound HTTP routing to runners.
+- Jobs are time-limited and non-interactive from a user's perspective.
+
+**Alternative:** Deploy the UI as a static site (if it's a SPA) to GitHub Pages with
+an API backend on external infrastructure, or use GitHub Codespaces for a development
+preview.
+
+### 4.4 Local/On-Premise Model Inference
+
+**Feasibility: ❌ Not feasible.**
+
+Running LLMs locally via Ollama, HuggingFace Transformers, or NVIDIA Dynamo requires:
+
+- GPU hardware (CUDA-capable).
+- Large model weights (multi-GB downloads).
+- Persistent model caches.
+
+GitHub-hosted runners do not provide GPU access. Even with self-hosted GPU runners,
+the ephemeral nature of workflow jobs makes model caching inefficient.
+
+---
+
+## 5. Proposed GitHub-Infrastructured Architecture
+
+Below is a vision for maximizing the use of GitHub infrastructure while acknowledging
+its boundaries.
+
+### Tier 1: Fully on GitHub Actions
+
+| Function | Trigger | Implementation |
+|----------|---------|----------------|
+| CI/CD (lint, test, build) | `push`, `pull_request` | Already implemented via `pr.yaml`/`ci_pipe.yml` |
+| Nightly evaluations | `schedule` (cron) | New workflow calling `nat evaluate` |
+| Prompt optimization | `workflow_dispatch` | Manual trigger with input parameters |
+| Documentation publishing | `push` to `main` | Build docs → deploy to GitHub Pages |
+| Release & PyPI publishing | Tag push | Build wheels → publish to PyPI + GHCR |
+| Security scanning | `push`, `schedule` | Dependabot, CodeQL, pip-audit |
+| Agent smoke tests | `pull_request` | Run headless agent workflows with API keys |
+
+### Tier 2: GitHub Actions as Orchestrator
+
+| Function | Trigger | Implementation |
+|----------|---------|----------------|
+| Fine-tuning orchestration | `workflow_dispatch` | Submit jobs to NeMo Customizer/OpenPipe APIs |
+| Data flywheel processing | `schedule` | Pull traces → process → store datasets |
+| Integration testing | `push` | Service containers on larger runners |
+| Container builds | `push` to `main` | Build Docker image → push to GHCR |
+
+### Tier 3: External Infrastructure (Deployed by GitHub Actions)
+
+| Function | Infrastructure | Deployment Trigger |
+|----------|---------------|-------------------|
+| Agent serving (FastAPI) | Azure Container Apps / AWS ECS / GKE | GitHub Actions CD workflow |
+| MCP/A2A servers | Kubernetes | GitHub Actions CD workflow |
+| Observability stack | Managed services (Datadog, Grafana Cloud) | GitHub Actions IaC (Terraform) |
+| Chat UI | Cloud hosting or GitHub Pages (static) | GitHub Actions CD workflow |
+| Redis/MySQL/PostgreSQL | Managed cloud databases | IaC via GitHub Actions |
+
+### Architecture Diagram
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    GitHub Platform                       │
+│                                                         │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
+│  │  GitHub       │  │  GitHub      │  │  GitHub      │  │
+│  │  Actions      │  │  Pages       │  │  Packages    │  │
+│  │              │  │              │  │  (GHCR)      │  │
+│  │  • CI/CD     │  │  • Docs      │  │  • Docker    │  │
+│  │  • Eval      │  │  • Reports   │  │    images    │  │
+│  │  • Optimize  │  │              │  │  • Wheels     │  │
+│  │  • Deploy    │  │              │  │              │  │
+│  └──────┬───────┘  └──────────────┘  └──────────────┘  │
+│         │                                               │
+│  ┌──────┴───────┐  ┌──────────────┐  ┌──────────────┐  │
+│  │  Secrets &   │  │  Dependabot  │  │  CodeQL      │  │
+│  │  Variables   │  │              │  │  Scanning    │  │
+│  │  (API keys)  │  │              │  │              │  │
+│  └──────────────┘  └──────────────┘  └──────────────┘  │
+└─────────────────────────┬───────────────────────────────┘
+                          │ Deploys to / calls
+                          ▼
+┌─────────────────────────────────────────────────────────┐
+│              External Infrastructure                    │
+│                                                         │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
+│  │  Cloud       │  │  LLM APIs    │  │  Managed     │  │
+│  │  Compute     │  │              │  │  Databases   │  │
+│  │              │  │  • NVIDIA NIM│  │              │  │
+│  │  • Agent     │  │  • OpenAI   │  │  • Redis     │  │
+│  │    servers   │  │  • Azure    │  │  • PostgreSQL│  │
+│  │  • MCP/A2A  │  │  • Bedrock  │  │  • Milvus    │  │
+│  │  • Chat UI  │  │              │  │              │  │
+│  └──────────────┘  └──────────────┘  └──────────────┘  │
+└─────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 6. GitHub Actions Resource Constraints
+
+Understanding GitHub's runner limits is essential for planning:
+
+| Resource | GitHub-Hosted (Standard) | GitHub-Hosted (Larger) | Self-Hosted |
+|----------|------------------------|----------------------|-------------|
+| **vCPUs** | 4 | Up to 64 | Custom |
+| **RAM** | 16 GB | Up to 256 GB | Custom |
+| **Storage** | 14 GB SSD | Up to 2 TB | Custom |
+| **Job timeout** | 6 hours | 6 hours | Configurable |
+| **Workflow timeout** | 35 days | 35 days | Configurable |
+| **Concurrent jobs** | 20 (free) / 500 (enterprise) | Same | Unlimited |
+| **GPU** | Not available | Not available | Custom |
+| **Network ingress** | Not routable | Not routable | Custom |
+
+**Key implications for NAT:**
+- The 12+ service integration test stack requires ≥16 GB RAM → use larger runners.
+- No GPU means no local model inference → must use API-based LLM providers.
+- No inbound routing means no production serving → use external compute for agents.
+- 6-hour job limit is sufficient for evaluations and optimization runs.
+
+---
+
+## 7. Cost Estimation
+
+For a typical month with active development:
+
+| Activity | Runner Type | Minutes/Month | Est. Cost |
+|----------|-----------|---------------|-----------|
+| CI per PR (lint+test+docs+wheels) | Standard (Linux) | ~3,000 | ~$24 |
+| Nightly evaluations | Standard | ~1,500 | ~$12 |
+| Weekly optimization runs | Standard | ~600 | ~$5 |
+| Integration tests (larger runner) | 16-core Linux | ~500 | ~$32 |
+| Container builds | Standard | ~200 | ~$2 |
+| **Total GitHub Actions** | | | **~$75/month** |
+
+*Note: LLM API costs for evaluations and smoke tests are separate and depend on usage.*
+
+---
+
+## 8. Migration Recommendations
+
+### Phase 1: Consolidate CI on GitHub Actions (Low Effort)
+
+The GitHub Actions CI is already in place. The remaining work is:
+
+1. **Port integration tests from GitLab CI** — Recreate the service container stack
+   in a GitHub Actions workflow using the `services:` key and larger runners.
+2. **Add Dependabot configuration** — Enable automated dependency updates for all
+   30+ packages.
+3. **Add CodeQL scanning** — Enable static analysis for the Python codebase.
+
+### Phase 2: Add Automation Workflows (Medium Effort)
+
+4. **Nightly evaluation workflow** — Scheduled runs of the evaluation system with
+   results stored as artifacts.
+5. **Release automation** — Tag-triggered workflow that builds wheels, publishes to
+   PyPI, builds Docker images, and creates GitHub Releases.
+6. **Documentation deployment** — Publish docs to GitHub Pages on merge to `main`.
+
+### Phase 3: Orchestration via GitHub Actions (Higher Effort)
+
+7. **Fine-tuning orchestration** — Workflow that prepares data and submits training
+   jobs to external APIs, then runs evaluation on the resulting model.
+8. **Data flywheel automation** — Scheduled workflow that processes observability
+   traces and produces training datasets.
+9. **Deployment workflows** — CD pipelines that deploy agent servers, MCP endpoints,
+   and the chat UI to cloud infrastructure, triggered by GitHub releases.
+
+---
+
+## 9. Conclusion
+
+**GitHub Actions can serve approximately 60–70% of NAT's operational needs**, covering
+CI/CD, batch evaluation, optimization orchestration, release management, security
+scanning, and documentation publishing. These functions align well with GitHub's
+event-driven, batch-job execution model.
+
+**The remaining 30–40%—production agent serving, real-time observability, interactive
+UI hosting, and GPU-based inference—require persistent, routable, and often
+GPU-equipped infrastructure** that is fundamentally outside GitHub Actions' design.
+However, GitHub Actions excels as the **orchestration and deployment layer** for
+these external services, managing the full lifecycle from code change to production
+deployment.
+
+The recommended approach is a **hybrid model**: maximize GitHub's native capabilities
+for all batch and event-driven workloads while using GitHub Actions as the deployment
+control plane for external runtime infrastructure. This provides a unified developer
+experience centered on GitHub while leveraging the right infrastructure for each
+workload type.