|
| 1 | +# Observability Stack for Java Operator SDK |
| 2 | + |
| 3 | +This directory contains scripts and configuration for setting up a complete observability stack on minikube. |
| 4 | + |
| 5 | +## Quick Start |
| 6 | + |
| 7 | +```bash |
| 8 | +./install-observability.sh |
| 9 | +``` |
| 10 | + |
| 11 | +This script installs: |
| 12 | +- **OpenTelemetry Operator** - For collecting metrics and traces |
| 13 | +- **Prometheus** - For metrics storage and querying |
| 14 | +- **Grafana** - For visualization and dashboards |
| 15 | +- **cert-manager** - Required for OpenTelemetry Operator webhooks |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +- kubectl configured |
| 20 | +- Helm 3.x installed |
| 21 | + |
| 22 | +## Components Installed |
| 23 | + |
| 24 | +### OpenTelemetry Collector |
| 25 | +- Receives metrics and traces via OTLP (gRPC and HTTP) |
| 26 | +- Exports metrics to Prometheus format |
| 27 | +- Configured with memory limiter and batch processing |
| 28 | + |
| 29 | +**Endpoints:** |
| 30 | +- OTLP gRPC: `otel-collector-collector.observability.svc.cluster.local:4317` |
| 31 | +- OTLP HTTP: `otel-collector-collector.observability.svc.cluster.local:4318` |
| 32 | +- Prometheus metrics: `http://otel-collector-prometheus.observability.svc.cluster.local:8889/metrics` |
| 33 | + |
| 34 | +### Prometheus |
| 35 | +- Scrapes metrics from OpenTelemetry Collector |
| 36 | +- Supports ServiceMonitor and PodMonitor CRDs |
| 37 | +- Configured to discover all metrics automatically |
| 38 | + |
| 39 | +**Access:** |
| 40 | +```bash |
| 41 | +kubectl port-forward -n observability svc/kube-prometheus-stack-prometheus 9090:9090 |
| 42 | +``` |
| 43 | +Open http://localhost:9090 |
| 44 | + |
| 45 | +### Grafana |
| 46 | +- Pre-configured with Prometheus as data source |
| 47 | +- Includes Kubernetes monitoring dashboards |
| 48 | + |
| 49 | +**Access:** |
| 50 | +```bash |
| 51 | +kubectl port-forward -n observability svc/kube-prometheus-stack-grafana 3000:80 |
| 52 | +``` |
| 53 | +Open http://localhost:3000 |
| 54 | +- **Username:** admin |
| 55 | +- **Password:** admin |
| 56 | + |
| 57 | +## Integrating with Your Operator |
| 58 | + |
| 59 | +### 1. Add OpenTelemetry Dependency |
| 60 | + |
| 61 | +Add to your `pom.xml`: |
| 62 | + |
| 63 | +```xml |
| 64 | +<dependency> |
| 65 | + <groupId>io.javaoperatorsdk</groupId> |
| 66 | + <artifactId>operator-framework-opentelemetry-support</artifactId> |
| 67 | + <version>${josdk.version}</version> |
| 68 | +</dependency> |
| 69 | +``` |
| 70 | + |
| 71 | +### 2. Configure OpenTelemetry in Your Operator |
| 72 | + |
| 73 | +In your operator code: |
| 74 | + |
| 75 | +```java |
| 76 | +import io.javaoperatorsdk.operator.monitoring.opentelemetry.OpenTelemetryMetrics; |
| 77 | +import io.opentelemetry.api.OpenTelemetry; |
| 78 | +import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdk; |
| 79 | + |
| 80 | +// Initialize OpenTelemetry |
| 81 | +OpenTelemetry openTelemetry = AutoConfiguredOpenTelemetrySdk.initialize() |
| 82 | + .getOpenTelemetrySdk(); |
| 83 | + |
| 84 | +// Create JOSDK metrics instance |
| 85 | +Metrics metrics = OpenTelemetryMetrics.builder(openTelemetry) |
| 86 | + .build(); |
| 87 | + |
| 88 | +// Configure operator with metrics |
| 89 | +Operator operator = new Operator(client, o -> o.withMetrics(metrics)); |
| 90 | +``` |
| 91 | + |
| 92 | +### 3. Set Environment Variables |
| 93 | + |
| 94 | +In your operator deployment YAML: |
| 95 | + |
| 96 | +```yaml |
| 97 | +env: |
| 98 | + - name: OTEL_SERVICE_NAME |
| 99 | + value: "your-operator-name" |
| 100 | + - name: OTEL_EXPORTER_OTLP_ENDPOINT |
| 101 | + value: "http://otel-collector-collector.observability.svc.cluster.local:4318" |
| 102 | + - name: OTEL_METRICS_EXPORTER |
| 103 | + value: "otlp" |
| 104 | + - name: OTEL_TRACES_EXPORTER |
| 105 | + value: "otlp" |
| 106 | + - name: OTEL_EXPORTER_OTLP_PROTOCOL |
| 107 | + value: "http/protobuf" |
| 108 | +``` |
| 109 | +
|
| 110 | +## Available JOSDK Metrics |
| 111 | +
|
| 112 | +The following metrics are exported by JOSDK: |
| 113 | +
|
| 114 | +| Metric | Type | Description | |
| 115 | +|--------|------|-------------| |
| 116 | +| `operator_sdk_reconciliations_started_total` | Counter | Total number of reconciliations started | |
| 117 | +| `operator_sdk_reconciliations_success_total` | Counter | Total number of successful reconciliations | |
| 118 | +| `operator_sdk_reconciliations_failed_total` | Counter | Total number of failed reconciliations | |
| 119 | +| `operator_sdk_reconciliations_queue_size` | Gauge | Current reconciliation queue size | |
| 120 | +| `operator_sdk_events_received_total` | Counter | Total number of Kubernetes events received | |
| 121 | +| `operator_sdk_controllers_execution_reconcile_seconds` | Timer | Time taken for reconciliations | |
| 122 | +| `operator_sdk_controllers_execution_cleanup_seconds` | Timer | Time taken for cleanup operations | |
| 123 | + |
| 124 | +## Creating Grafana Dashboards |
| 125 | + |
| 126 | +### Example PromQL Queries |
| 127 | + |
| 128 | +**Reconciliation Rate:** |
| 129 | +```promql |
| 130 | +sum(rate(operator_sdk_reconciliations_started_total[5m])) by (controller) |
| 131 | +``` |
| 132 | + |
| 133 | +**Success Rate:** |
| 134 | +```promql |
| 135 | +sum(rate(operator_sdk_reconciliations_success_total[5m])) / |
| 136 | +sum(rate(operator_sdk_reconciliations_started_total[5m])) |
| 137 | +``` |
| 138 | + |
| 139 | +**Error Rate:** |
| 140 | +```promql |
| 141 | +sum(rate(operator_sdk_reconciliations_failed_total[5m])) by (controller, exception) |
| 142 | +``` |
| 143 | + |
| 144 | +**Queue Size:** |
| 145 | +```promql |
| 146 | +operator_sdk_reconciliations_queue_size |
| 147 | +``` |
| 148 | + |
| 149 | +**Average Reconciliation Duration:** |
| 150 | +```promql |
| 151 | +rate(operator_sdk_controllers_execution_reconcile_seconds_sum[5m]) / |
| 152 | +rate(operator_sdk_controllers_execution_reconcile_seconds_count[5m]) |
| 153 | +``` |
| 154 | + |
| 155 | +### Sample Dashboard Configuration |
| 156 | + |
| 157 | +1. Open Grafana (http://localhost:3000) |
| 158 | +2. Go to "Dashboards" → "New Dashboard" |
| 159 | +3. Add panels with the PromQL queries above |
| 160 | +4. Configure visualization types: |
| 161 | + - Time series for rates and durations |
| 162 | + - Gauge for queue size |
| 163 | + - Stat for current values |
| 164 | + |
| 165 | +## Troubleshooting |
| 166 | + |
| 167 | +### Check Pod Status |
| 168 | +```bash |
| 169 | +kubectl get pods -n observability |
| 170 | +``` |
| 171 | + |
| 172 | +### Check OpenTelemetry Collector Logs |
| 173 | +```bash |
| 174 | +kubectl logs -n observability -l app.kubernetes.io/name=otel-collector -f |
| 175 | +``` |
| 176 | + |
| 177 | +### Check Prometheus Targets |
| 178 | +```bash |
| 179 | +kubectl port-forward -n observability svc/kube-prometheus-stack-prometheus 9090:9090 |
| 180 | +``` |
| 181 | +Then open http://localhost:9090/targets |
| 182 | + |
| 183 | +### Verify Metrics are Being Collected |
| 184 | +```bash |
| 185 | +# Check if OpenTelemetry is receiving metrics |
| 186 | +kubectl port-forward -n observability svc/otel-collector-prometheus 8889:8889 |
| 187 | +curl http://localhost:8889/metrics | grep operator_sdk |
| 188 | +``` |
| 189 | + |
| 190 | +### Test OTLP Endpoint |
| 191 | +```bash |
| 192 | +# Port forward the OTLP HTTP endpoint |
| 193 | +kubectl port-forward -n observability svc/otel-collector-collector 4318:4318 |
| 194 | +
|
| 195 | +# Send a test metric (requires curl and valid OTLP JSON) |
| 196 | +# This is just for testing connectivity |
| 197 | +curl -X POST http://localhost:4318/v1/metrics \ |
| 198 | + -H "Content-Type: application/json" \ |
| 199 | + -d '{"resourceMetrics":[]}' |
| 200 | +``` |
| 201 | + |
| 202 | +## Uninstalling |
| 203 | + |
| 204 | +To remove all components: |
| 205 | + |
| 206 | +```bash |
| 207 | +# Delete OpenTelemetry resources |
| 208 | +kubectl delete -n observability OpenTelemetryCollector otel-collector |
| 209 | +
|
| 210 | +# Uninstall Helm releases |
| 211 | +helm uninstall -n observability kube-prometheus-stack |
| 212 | +helm uninstall -n observability opentelemetry-operator |
| 213 | +helm uninstall -n cert-manager cert-manager |
| 214 | +
|
| 215 | +# Delete namespaces |
| 216 | +kubectl delete namespace observability cert-manager |
| 217 | +``` |
| 218 | + |
| 219 | +## References |
| 220 | + |
| 221 | +- [JOSDK Observability Documentation](https://javaoperatorsdk.io/docs/documentation/observability/) |
| 222 | +- [OpenTelemetry Java Documentation](https://opentelemetry.io/docs/instrumentation/java/) |
| 223 | +- [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator) |
| 224 | +- [Grafana Documentation](https://grafana.com/docs/) |
| 225 | +- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) |
0 commit comments