feat(devx): local dev setup for control plane and full end-to-end flow (MLI-6681)#823
Open
feat(devx): local dev setup for control plane and full end-to-end flow (MLI-6681)#823
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a one-command local development workflow for the model engine control plane so developers can iterate on gateway/service-builder code without building prod images or touching live infra. - docker-compose.local.yml: spins up Postgres 15 + Redis 7 - service_configs/service_config_local.yaml: HMI config for local services - Makefile: dev-up / dev-migrate / dev-server / dev-down / test targets - LOCAL=true env var now activates fake queue/docker implementations (parallel to existing CIRCLECI=true path) and skips GIT_TAG requirement - README: new "Control Plane Local Setup" section with full walkthrough Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…G_PATH - service_config_local.yaml: switch from cache_redis_aws_url to cache_redis_onprem_url so the Redis URL is resolved before the cloud_provider assertion fires — fixes startup failure for non-AWS configs - Makefile: pin ML_INFRA_SERVICES_CONFIG_PATH to default.yaml so local dev is not affected by a developer's ambient infra config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- README: add ML_INFRA_SERVICES_CONFIG_PATH to the manual env-var snippet so developers with non-AWS ambient configs don't accidentally hit the cloud_provider assertion - docker-compose.local.yml: mount a named volume for Postgres so the database survives dev-down/dev-up cycles Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the manual until-loops in dev-up with `docker compose up --wait`, which blocks until healthchecks pass and exits non-zero if they fail — eliminating the infinite-spin on container crash. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
Author
|
@greptile review |
Collaborator
Author
|
/greptile |
Extends the local dev setup so the complete control plane → Service Builder → k8s inference pod flow can be tested locally without cloud credentials. Changes: - local-full.yaml: new onprem infra config pointing to localhost Redis/kind - dependencies.py: LOCAL=true + cloud_provider=onprem falls through to real Redis queue delegate instead of the fake (enabling full k8s flow) - service_builder/celery.py: fix onprem to use redis backend not s3 - env_vars.py: default GIT_TAG to "local" when LOCAL=true so k8s templates reference the correct model-engine:local image loaded into kind - Makefile: kind-up/kind-down/kind-image targets + dev-server-full, dev-service-builder, dev-k8s-cacher targets using FULL_LOCAL_ENV - README: full end-to-end setup section with step-by-step instructions, example endpoint creation, and flow table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The gateway's module-level backend_protocol had the same aws/gcp/azure mapping as service_builder/celery.py. Without this fix, the Service Builder writes task results to Redis but the Gateway looks in S3, leaving endpoints stuck in PENDING under the kind-based full local flow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
Author
|
@greptile review |
Collaborator
Author
|
/greptile |
The exporter package was imported unconditionally under the OTEL_AVAILABLE flag which only checked the base SDK, not the exporter. Include it in the try block so OTEL_AVAILABLE stays False when the exporter is absent, fixing the ImportError that caused run_unit_tests_server to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trol-plane-local-devx
…chema gateway - Reformat correlation.py and celery.py to satisfy black - Move noqa comment to the from...import( line so ruff F401 is suppressed correctly - Pass schema_generator=GenerateJsonSchema() (new required kwarg) to get_definitions() and get_openapi_path() in live_model_endpoints_schema_gateway, creating a fresh instance per route since pydantic rejects reuse Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oes not have this param The param was added to fix a local test failure (FastAPI 0.110.0 requires it) but FastAPI 0.135.1 (pinned in requirements.txt, used by CI) does not accept it, causing mypy call-arg errors. Revert to the original signature. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xample Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a complete local development workflow for model-engine so developers can iterate on both control plane code and the full endpoint lifecycle without cloud credentials or prod images.
Control-plane-only mode (
make dev-server):LOCAL=trueactivates fake queue/docker/k8s implementations (mirrorsCIRCLECI=true)Full end-to-end mode (
make dev-server-full+make dev-service-builder+make dev-k8s-cacher):make kind-up+make kind-imagecreates a local kind cluster and loadsmodel-engine:localinto itmodel-engine:local) used as the inference container — no GPU requiredCode fixes included:
service_builder/celery.py+celery_task_queue_gateway.py:onpremcloud provider now usesredisCelery backend instead ofs3— without this, the Service Builder writes results to Redis but the Gateway looks in S3, leaving endpoints stuck in PENDINGdependencies.py:LOCAL=true+cloud_provider=onpremfalls through to realOnPremQueueEndpointResourceDelegateinstead of the fakeenv_vars.py:GIT_TAGdefaults to"local"whenLOCAL=trueso k8s templates reference the correctmodel-engine:localimageNew files:
docker-compose.local.yml— Postgres 15 + Redis 7 with healthchecks and persistent volumeservice_configs/service_config_local.yaml— HMI config for local servicesmodel_engine_server/core/configs/local-full.yaml— onprem infra config for kindMakefile— all dev targets in one placeTest plan
make dev-up && make dev-migrate && make dev-server— gateway starts,GET /v1/model-endpointsreturns 200make kind-up && make kind-image— kind cluster created,model-engine:localloadedmake dev-server-full+make dev-service-builder+make dev-k8s-cacher— all three processes start cleanlykubectl --context kind-llm-engine get pods -n model-engineand endpoint transitions to READYmake testCloses MLI-6681
🤖 Generated with Claude Code
Greptile Summary
make dev-server(control-plane-only with fake k8s/queue/docker) andmake dev-server-full+make dev-service-builder+make dev-k8s-cacher(full end-to-end via kind), backed by Postgres 15 + Redis 7 via docker-compose.onpremCelery backend mismatch: bothcelery_task_queue_gateway.pyandservice_builder/celery.pynow selectredisasbackend_protocolforonprem(previouslys3), consistent withdependencies.pywhich already routed onprem to Redis task queues.LOCAL=truenow defaultsGIT_TAGto\"local\"(avoiding the missing-tag error), routes the gateway to the realOnPremQueueEndpointResourceDelegatewhencloud_provider=onprem, and uses Redis task-queue gateways instead of SQS.Confidence Score: 5/5
Safe to merge — all previously flagged P1s are addressed and no new issues found.
All three prior P1 findings (cache_redis_aws_url, ML_INFRA_SERVICES_CONFIG_PATH not pinned, celery_task_queue_gateway.py backend mismatch) are resolved in the current diff. The backend_protocol fix is applied symmetrically in both the Gateway and Service Builder. local-full.yaml sets celery_broker_type_redis: true, keeping broker and backend consistent for the onprem/kind path. No new logic errors identified.
No files require special attention.
Important Files Changed
onpremto the redis backend_protocol condition, fixing the Gateway/Service Builder result-backend mismatch for onprem/local deployments.onpremto the redis backend_protocol tuple so Service Builder and Gateway agree on where Celery results are stored.Reviews (9): Last reviewed commit: "docs(devx): replace curl endpoint exampl..." | Re-trigger Greptile