Skip to content

feat(physical-ai/cosmos3): add Cosmos3 data flywheel reference architecture#76

Open
ntnadkarni wants to merge 1 commit into
mainfrom
nnadkarni/cosmos3-data-flywheel
Open

feat(physical-ai/cosmos3): add Cosmos3 data flywheel reference architecture#76
ntnadkarni wants to merge 1 commit into
mainfrom
nnadkarni/cosmos3-data-flywheel

Conversation

@ntnadkarni
Copy link
Copy Markdown
Contributor

@ntnadkarni ntnadkarni commented May 22, 2026

Summary

End-to-end CKS reference architecture for NVIDIA Cosmos3: synthetic data gen → SFT → DCP→HF export → Ray Serve → marimo notebook.

Lives at physical-ai/cosmos3/, packaged as a Helm chart. Every pipeline step is a single values.yaml entry; enable the ones you want and helm template . | kubectl apply -f -. See physical-ai/cosmos3/README.md for the walkthrough and UPSTREAM_BUGS.md for the nine upstream patches applied.

Apply pattern

export NS=<your-namespace>
kubectl -n "$NS" create secret generic hf-token --from-literal=HF_TOKEN="<token>"
helm template . | kubectl -n "$NS" apply -f -                              # prereqs
helm template . --set steps.prefetch.enabled=true | kubectl -n "$NS" apply -f -
helm template . --set steps.sft.enabled=true     | kubectl -n "$NS" apply -f -
helm template . --set rayServe.enabled=true --set marimo.enabled=true \
  | kubectl -n "$NS" apply -f -

🤖 Generated with Claude Code

@ntnadkarni ntnadkarni marked this pull request as draft May 22, 2026 19:01
@ntnadkarni ntnadkarni force-pushed the nnadkarni/cosmos3-data-flywheel branch 2 times, most recently from 3caafad to aa7f9d6 Compare May 22, 2026 19:05
@ntnadkarni ntnadkarni changed the title feat(cosmos3-data-flywheel): add Cosmos3 data flywheel reference architecture feat(physical-ai/cosmos3): add Cosmos3 data flywheel reference architecture May 22, 2026
@ntnadkarni ntnadkarni force-pushed the nnadkarni/cosmos3-data-flywheel branch 2 times, most recently from 408443f to 7f77d34 Compare May 22, 2026 21:11
…ecture

End-to-end CKS reference for NVIDIA Cosmos3: synthetic robotics-manipulation
data generation -> SFT fine-tuning -> DCP->HF export -> Ray Serve -> marimo.

Packaged as a small Helm chart (Chart.yaml, values.yaml, 4 templates) so
every pipeline step is a single value entry. Default values.yaml renders
to a no-op; enable steps individually with `--set steps.<name>.enabled=true`
or via an overrides file. Replaces what would otherwise be ~23 nearly-
identical Job manifests with one templated definition + per-step bash.

Validated single-node on 8x RTX Pro 6000 Blackwell with VAST CSI. Includes
the nine upstream workarounds the CKS path required (UPSTREAM_BUGS.md) as
candidate PRs back to NVIDIA. A multi-node H100 SXM variant via cw-mpijob
is in development on a separate branch.

Lives under physical-ai/ so future Cosmos / robotics / world-foundation-model
reference architectures can sit alongside.

Signed-off-by: Nisha Nadkarni <nnadkarni@coreweave.com>
@ntnadkarni ntnadkarni force-pushed the nnadkarni/cosmos3-data-flywheel branch from 7f77d34 to b4a951b Compare May 26, 2026 21:32
@ntnadkarni ntnadkarni marked this pull request as ready for review May 26, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant