Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ For the prose architecture sketch see [`docs/development.md`](docs/development.m

## Cloudscale SDK usage

- Do not `import "github.com/cloudscale-ch/cloudscale-go-sdk/v8"` outside
- Do not `import "github.com/cloudscale-ch/cloudscale-go-sdk/v9"` outside
`internal/cloudscale/`. Controllers and webhooks talk to the SDK through
the service interfaces on `cloudscale.Client`
(`internal/cloudscale/client.go:32`).
Expand Down
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
FROM golang:1.26 AS builder
ARG TARGETOS
ARG TARGETARCH
ARG VERSION=dev

WORKDIR /workspace
# Copy the Go Modules manifests
Expand All @@ -19,7 +20,7 @@ COPY . .
# was called. For example, if we call make docker-build in a local env which has the Apple Silicon M1 SO
# the docker BUILDPLATFORM arg will be linux/arm64 when for Apple x86 it will be linux/amd64. Therefore,
# by leaving it empty we can ensure that the container and binary shipped on it will have the same platform.
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o manager cmd/main.go
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -ldflags "-X main.version=${VERSION}" -a -o manager cmd/main.go

# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
Expand Down
9 changes: 5 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ TAG ?= dev
IMG ?= quay.io/cloudscalech/capcs-staging:$(TAG)
# YEAR defines the year value used for substituting the YEAR placeholder in the boilerplate header.
YEAR ?= $(shell date +%Y)
LDFLAGS ?= -X main.version=$(TAG)

# E2E image configuration
E2E_TAG ?= e2e-$(shell git rev-parse --short HEAD)
Expand Down Expand Up @@ -287,18 +288,18 @@ test-e2e-conformance-fast: $(GINKGO) generate-e2e-templates generate-e2e-config

.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
go build -o bin/manager cmd/main.go
go build -ldflags '$(LDFLAGS)' -o bin/manager cmd/main.go

.PHONY: run
run: manifests generate fmt vet ## Run a controller from your host.
go run ./cmd/main.go
go run -ldflags '$(LDFLAGS)' ./cmd/main.go

# If you wish to build the manager image targeting other platforms you can use the --platform flag.
# (i.e. docker build --platform linux/arm64). However, you must enable docker buildKit for it.
# More info: https://docs.docker.com/develop/develop-images/build_enhancements/
.PHONY: docker-build
docker-build: ## Build docker image with the manager.
$(CONTAINER_TOOL) build --platform linux/amd64 -t ${IMG} .
$(CONTAINER_TOOL) build --platform linux/amd64 --build-arg VERSION=$(TAG) -t ${IMG} .

.PHONY: docker-push
docker-push: ## Push docker image with the manager.
Expand All @@ -321,7 +322,7 @@ docker-buildx: ## Build and push docker image for the manager for cross-platform
sed -e '1 s/\(^FROM\)/FROM --platform=\$$\{BUILDPLATFORM\}/; t' -e ' 1,// s//FROM --platform=\$$\{BUILDPLATFORM\}/' Dockerfile > Dockerfile.cross
- $(CONTAINER_TOOL) buildx create --name cluster-api-provider-cloudscale-builder
$(CONTAINER_TOOL) buildx use cluster-api-provider-cloudscale-builder
- $(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --tag ${IMG} -f Dockerfile.cross .
- $(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --build-arg VERSION=$(TAG) --tag ${IMG} -f Dockerfile.cross .
- $(CONTAINER_TOOL) buildx rm cluster-api-provider-cloudscale-builder
rm Dockerfile.cross

Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ variables and the other template flavors.
|-------------------------------------|----------------------------------------------------------------------------------------------------------------|
| New to Cluster API, or new to CAPCS | [Getting Started](docs/getting-started.md) |
| Looking up a CRD field | `kubectl explain cloudscalecluster.spec` (or the generated CRDs under [`config/crd/bases/`](config/crd/bases)) |
| Setting up monitoring or tracing | [Observability](docs/observability.md) |
| Hitting an error | [Troubleshooting](docs/troubleshooting.md) |
| Contributing to CAPCS | [Development](docs/development.md), [CONTRIBUTING.md](CONTRIBUTING.md) |
| Cutting a release | [Releasing](docs/releasing.md), [Testing releases](docs/testing-releases.md) |
Expand Down
104 changes: 62 additions & 42 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,33 +25,37 @@ import (
"os"
"time"

"golang.org/x/sync/errgroup"

// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"

"github.com/cloudscale-ch/cloudscale-go-sdk/v9/instrumentation"
"go.opentelemetry.io/otel"
"golang.org/x/sync/errgroup"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
clusterv1 "sigs.k8s.io/cluster-api/api/core/v1beta2"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
ctrlmetrics "sigs.k8s.io/controller-runtime/pkg/metrics"
"sigs.k8s.io/controller-runtime/pkg/metrics/filters"
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
"sigs.k8s.io/controller-runtime/pkg/webhook"

infrastructurev1beta2 "github.com/cloudscale-ch/cluster-api-provider-cloudscale/api/v1beta2"
"github.com/cloudscale-ch/cluster-api-provider-cloudscale/internal/cloudscale"
"github.com/cloudscale-ch/cluster-api-provider-cloudscale/internal/controller"
"github.com/cloudscale-ch/cluster-api-provider-cloudscale/internal/observability"
webhookv1beta2 "github.com/cloudscale-ch/cluster-api-provider-cloudscale/internal/webhook/v1beta2"
// +kubebuilder:scaffold:imports
)

var (
scheme = runtime.NewScheme()
setupLog = ctrl.Log.WithName("setup")
version = "dev"
)

func init() {
Expand All @@ -62,8 +66,14 @@ func init() {
// +kubebuilder:scaffold:scheme
}

// nolint:gocyclo
func main() {
if err := run(); err != nil {
fmt.Fprintf(os.Stderr, "%v\n", err)
os.Exit(1)
}
}

func run() error {
var metricsAddr string
var metricsCertPath, metricsCertName, metricsCertKey string
var webhookCertPath, webhookCertName, webhookCertKey string
Expand All @@ -75,6 +85,9 @@ func main() {
var machineConcurrency int
var watchFilter string
var tlsOpts []func(*tls.Config)
var enableTracing bool
var tracingSampleRate float64
var profilerAddress string
flag.StringVar(&metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+
"Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
Expand All @@ -99,6 +112,11 @@ func main() {
flag.StringVar(&watchFilter, "watch-filter", "",
fmt.Sprintf("Label value that the controller watches to reconcile cluster-api objects. Label key is always %s. "+
"If unspecified, the controller watches for all cluster-api objects.", clusterv1.WatchLabel))
flag.BoolVar(&enableTracing, "enable-tracing", false, "Enable OpenTelemetry tracing")
flag.Float64Var(&tracingSampleRate, "tracing-sample-rate", 0.1,
"Trace sampling rate, between 0.0 and 1.0 (1.0 = always sample)")
flag.StringVar(&profilerAddress, "profiler-address", "",
"Bind address to expose the pprof profiler (e.g. localhost:6060)")
opts := zap.Options{
Development: true,
}
Expand All @@ -108,14 +126,10 @@ func main() {
ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))

if clusterConcurrency < 1 || clusterConcurrency > 4 {
setupLog.Error(
fmt.Errorf("--cluster-concurrency must be between 1 and 4, got %d", clusterConcurrency), "invalid flag")
os.Exit(1)
return fmt.Errorf("invalid flag: --cluster-concurrency must be between 1 and 4, got %d", clusterConcurrency)
}
if machineConcurrency < 1 || machineConcurrency > 10 {
setupLog.Error(
fmt.Errorf("--machine-concurrency must be between 1 and 10, got %d", machineConcurrency), "invalid flag")
os.Exit(1)
return fmt.Errorf("invalid flag: --machine-concurrency must be between 1 and 10, got %d", machineConcurrency)
}

// if the enable-http2 flag is false (the default), http/2 should be disabled
Expand Down Expand Up @@ -192,24 +206,37 @@ func main() {
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "cloudscale.infrastructure.cluster.x-k8s.io",
PprofBindAddress: profilerAddress,
// LeaderElectionReleaseOnCancel: true,
})
if err != nil {
setupLog.Error(err, "Failed to start manager")
os.Exit(1)
return fmt.Errorf("failed to start manager: %w", err)
}

ctx := ctrl.SetupSignalHandler()

// Create a shared HTTP transport for all cloudscale API clients.
// This enables connection pooling and HTTP/2 multiplexing across reconciles.
transport := cloudscale.NewTransport()
if enableTracing {
shutdown, err := observability.InitTracing(ctx, setupLog, "capcs", version, tracingSampleRate)
if err != nil {
return fmt.Errorf("failed to initialize tracing: %w", err)
}
defer shutdown()
}

// Wrap the transport with SDK instrumentation so all cloudscale API calls
// emit Prometheus metrics and OpenTelemetry spans.
//
// The wrapped transport is shared for all cloudscale API clients to enable connection pooling and HTTP/2 multiplexing
// across reconciles.
instrumentedTransport := instrumentation.InstrumentedTransport(cloudscale.NewTransport(), instrumentation.Options{
PrometheusRegistry: ctrlmetrics.Registry,
Tracer: otel.Tracer("cloudscale-go-sdk"),
})

// Fetch region information for controllers and webhooks
regionInfo, flavorInfo, err := fetchAPIInfo(transport)
regionInfo, flavorInfo, err := fetchAPIInfo(instrumentedTransport, version)
if err != nil {
setupLog.Error(err, "unable to fetch API information")
os.Exit(1)
return fmt.Errorf("failed to fetch API info: %w", err)
}
setupLog.Info("fetched region information", "regions", regionInfo.GetAllRegions())
setupLog.Info("fetched flavor information", "flavors", len(flavorInfo.GetAllFlavors()))
Expand All @@ -218,72 +245,65 @@ func main() {
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
WatchFilter: watchFilter,
Transport: transport,
Transport: instrumentedTransport,
Version: version,
MaxConcurrentReconciles: clusterConcurrency,
}).SetupWithManager(ctx, mgr); err != nil {
setupLog.Error(err, "Failed to create controller", "controller", "CloudscaleCluster")
os.Exit(1)
return fmt.Errorf("failed to create controller CloudscaleCluster: %w", err)
}
if err := (&controller.CloudscaleMachineReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
WatchFilter: watchFilter,
Transport: transport,
Transport: instrumentedTransport,
Version: version,
MaxConcurrentReconciles: machineConcurrency,
}).SetupWithManager(ctx, mgr); err != nil {
setupLog.Error(err, "Failed to create controller", "controller", "CloudscaleMachine")
os.Exit(1)
return fmt.Errorf("failed to create controller CloudscaleMachine: %w", err)
}
if err := (&controller.CloudscaleMachineTemplateReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
FlavorInfo: flavorInfo,
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "Failed to create controller", "controller", "CloudscaleMachineTemplate")
os.Exit(1)
return fmt.Errorf("failed to create controller CloudscaleMachineTemplate: %w", err)
}

webhooksEnabled := os.Getenv("ENABLE_WEBHOOKS") != "false"

if webhooksEnabled {
if err := webhookv1beta2.SetupCloudscaleClusterWebhookWithManager(mgr, regionInfo); err != nil {
setupLog.Error(err, "Failed to create webhook", "webhook", "CloudscaleCluster")
os.Exit(1)
return fmt.Errorf("failed to setup webhook validation webhook CloudscaleCluster: %w", err)
}
if err := webhookv1beta2.SetupCloudscaleMachineWebhookWithManager(mgr, flavorInfo); err != nil {
setupLog.Error(err, "Failed to create webhook", "webhook", "CloudscaleMachine")
os.Exit(1)
return fmt.Errorf("failed to setup webhook validation webhook CloudscaleMachine: %w", err)
}
if err := webhookv1beta2.SetupCloudscaleMachineTemplateWebhookWithManager(mgr, flavorInfo); err != nil {
setupLog.Error(err, "Failed to create webhook", "webhook", "CloudscaleMachineTemplate")
os.Exit(1)
return fmt.Errorf("failed to setup webhook validation webhook CloudscaleMachineTemplate: %w", err)
}
if err := webhookv1beta2.SetupCloudscaleClusterTemplateWebhookWithManager(mgr, regionInfo); err != nil {
setupLog.Error(err, "Failed to create webhook", "webhook", "CloudscaleClusterTemplate")
os.Exit(1)
return fmt.Errorf("failed to setup webhook validation webhook CloudscaleClusterTemplate: %w", err)
}
}
// +kubebuilder:scaffold:builder

if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
setupLog.Error(err, "Failed to set up health check")
os.Exit(1)
return fmt.Errorf("failed to set up health check: %w", err)
}
if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
setupLog.Error(err, "Failed to set up ready check")
os.Exit(1)
return fmt.Errorf("failed to set up ready check: %w", err)
}

setupLog.Info("Starting manager")
setupLog.Info("Starting manager", "version", version)
if err := mgr.Start(ctx); err != nil {
setupLog.Error(err, "Failed to run manager")
os.Exit(1)
return fmt.Errorf("failed to run manager: %w", err)
}
return nil
}

// fetchAPIInfo fetches region and flavor information from cloudscale.ch API.
// Requires CLOUDSCALE_API_TOKEN environment variable.
func fetchAPIInfo(transport *http.Transport) (*cloudscale.RegionInfo, *cloudscale.FlavorInfo, error) {
func fetchAPIInfo(transport http.RoundTripper, version string) (*cloudscale.RegionInfo, *cloudscale.FlavorInfo, error) {
token := os.Getenv("CLOUDSCALE_API_TOKEN")
if token == "" {
return nil, nil, fmt.Errorf("CLOUDSCALE_API_TOKEN environment variable is required")
Expand All @@ -292,7 +312,7 @@ func fetchAPIInfo(transport *http.Transport) (*cloudscale.RegionInfo, *cloudscal
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

client := cloudscale.NewClient(token, transport)
client := cloudscale.NewClient(token, version, transport)

var regionInfo *cloudscale.RegionInfo
var flavorInfo *cloudscale.FlavorInfo
Expand Down
18 changes: 18 additions & 0 deletions docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,10 +119,28 @@ template_dirs:
- ./test/infrastructure/docker/templates
cloudscale:
- path/to/local/clone/cluster-api-provider-cloudscale/templates
# optional, if wanting to deploy the observability stack
#deploy_observability:
# - grafana
# - kube-state-metrics
# - loki
# - metrics-server
# - prometheus
# - alloy
# - parca
# - tempo
```

Then `tilt up` from the cluster-api checkout.

The `deploy_observability` block is processed by the cluster-api Tiltfile and
brings up Prometheus, Grafana, Tempo, and friends in the management cluster;
see [Cluster API's Tilt documentation](https://cluster-api.sigs.k8s.io/developer/core/tilt)
for what each component does and how to reach the resulting UIs. CAPCS's
`ServiceMonitor` is auto-discovered once the prometheus kustomization is
enabled. For production metric/tracing setup, see
[Observability](observability.md).

## Tests

| Layer | Location | What it covers |
Expand Down
Loading
Loading