Skip to content

Commit 1daf33c

Browse files
committed
wip
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
1 parent 0fc6620 commit 1daf33c

File tree

5 files changed

+557
-1
lines changed

5 files changed

+557
-1
lines changed

grafana/README.md

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
# Observability Stack for Java Operator SDK
2+
3+
This directory contains scripts and configuration for setting up a complete observability stack on minikube.
4+
5+
## Quick Start
6+
7+
```bash
8+
./install-observability.sh
9+
```
10+
11+
This script installs:
12+
- **OpenTelemetry Operator** - For collecting metrics and traces
13+
- **Prometheus** - For metrics storage and querying
14+
- **Grafana** - For visualization and dashboards
15+
- **cert-manager** - Required for OpenTelemetry Operator webhooks
16+
17+
## Prerequisites
18+
19+
- kubectl configured
20+
- Helm 3.x installed
21+
22+
## Components Installed
23+
24+
### OpenTelemetry Collector
25+
- Receives metrics and traces via OTLP (gRPC and HTTP)
26+
- Exports metrics to Prometheus format
27+
- Configured with memory limiter and batch processing
28+
29+
**Endpoints:**
30+
- OTLP gRPC: `otel-collector-collector.observability.svc.cluster.local:4317`
31+
- OTLP HTTP: `otel-collector-collector.observability.svc.cluster.local:4318`
32+
- Prometheus metrics: `http://otel-collector-prometheus.observability.svc.cluster.local:8889/metrics`
33+
34+
### Prometheus
35+
- Scrapes metrics from OpenTelemetry Collector
36+
- Supports ServiceMonitor and PodMonitor CRDs
37+
- Configured to discover all metrics automatically
38+
39+
**Access:**
40+
```bash
41+
kubectl port-forward -n observability svc/kube-prometheus-stack-prometheus 9090:9090
42+
```
43+
Open http://localhost:9090
44+
45+
### Grafana
46+
- Pre-configured with Prometheus as data source
47+
- Includes Kubernetes monitoring dashboards
48+
49+
**Access:**
50+
```bash
51+
kubectl port-forward -n observability svc/kube-prometheus-stack-grafana 3000:80
52+
```
53+
Open http://localhost:3000
54+
- **Username:** admin
55+
- **Password:** admin
56+
57+
## Integrating with Your Operator
58+
59+
### 1. Add OpenTelemetry Dependency
60+
61+
Add to your `pom.xml`:
62+
63+
```xml
64+
<dependency>
65+
<groupId>io.javaoperatorsdk</groupId>
66+
<artifactId>operator-framework-opentelemetry-support</artifactId>
67+
<version>${josdk.version}</version>
68+
</dependency>
69+
```
70+
71+
### 2. Configure OpenTelemetry in Your Operator
72+
73+
In your operator code:
74+
75+
```java
76+
import io.javaoperatorsdk.operator.monitoring.opentelemetry.OpenTelemetryMetrics;
77+
import io.opentelemetry.api.OpenTelemetry;
78+
import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdk;
79+
80+
// Initialize OpenTelemetry
81+
OpenTelemetry openTelemetry = AutoConfiguredOpenTelemetrySdk.initialize()
82+
.getOpenTelemetrySdk();
83+
84+
// Create JOSDK metrics instance
85+
Metrics metrics = OpenTelemetryMetrics.builder(openTelemetry)
86+
.build();
87+
88+
// Configure operator with metrics
89+
Operator operator = new Operator(client, o -> o.withMetrics(metrics));
90+
```
91+
92+
### 3. Set Environment Variables
93+
94+
In your operator deployment YAML:
95+
96+
```yaml
97+
env:
98+
- name: OTEL_SERVICE_NAME
99+
value: "your-operator-name"
100+
- name: OTEL_EXPORTER_OTLP_ENDPOINT
101+
value: "http://otel-collector-collector.observability.svc.cluster.local:4318"
102+
- name: OTEL_METRICS_EXPORTER
103+
value: "otlp"
104+
- name: OTEL_TRACES_EXPORTER
105+
value: "otlp"
106+
- name: OTEL_EXPORTER_OTLP_PROTOCOL
107+
value: "http/protobuf"
108+
```
109+
110+
## Available JOSDK Metrics
111+
112+
The following metrics are exported by JOSDK:
113+
114+
| Metric | Type | Description |
115+
|--------|------|-------------|
116+
| `operator_sdk_reconciliations_started_total` | Counter | Total number of reconciliations started |
117+
| `operator_sdk_reconciliations_success_total` | Counter | Total number of successful reconciliations |
118+
| `operator_sdk_reconciliations_failed_total` | Counter | Total number of failed reconciliations |
119+
| `operator_sdk_reconciliations_queue_size` | Gauge | Current reconciliation queue size |
120+
| `operator_sdk_events_received_total` | Counter | Total number of Kubernetes events received |
121+
| `operator_sdk_controllers_execution_reconcile_seconds` | Timer | Time taken for reconciliations |
122+
| `operator_sdk_controllers_execution_cleanup_seconds` | Timer | Time taken for cleanup operations |
123+
124+
## Creating Grafana Dashboards
125+
126+
### Example PromQL Queries
127+
128+
**Reconciliation Rate:**
129+
```promql
130+
sum(rate(operator_sdk_reconciliations_started_total[5m])) by (controller)
131+
```
132+
133+
**Success Rate:**
134+
```promql
135+
sum(rate(operator_sdk_reconciliations_success_total[5m])) /
136+
sum(rate(operator_sdk_reconciliations_started_total[5m]))
137+
```
138+
139+
**Error Rate:**
140+
```promql
141+
sum(rate(operator_sdk_reconciliations_failed_total[5m])) by (controller, exception)
142+
```
143+
144+
**Queue Size:**
145+
```promql
146+
operator_sdk_reconciliations_queue_size
147+
```
148+
149+
**Average Reconciliation Duration:**
150+
```promql
151+
rate(operator_sdk_controllers_execution_reconcile_seconds_sum[5m]) /
152+
rate(operator_sdk_controllers_execution_reconcile_seconds_count[5m])
153+
```
154+
155+
### Sample Dashboard Configuration
156+
157+
1. Open Grafana (http://localhost:3000)
158+
2. Go to "Dashboards" → "New Dashboard"
159+
3. Add panels with the PromQL queries above
160+
4. Configure visualization types:
161+
- Time series for rates and durations
162+
- Gauge for queue size
163+
- Stat for current values
164+
165+
## Troubleshooting
166+
167+
### Check Pod Status
168+
```bash
169+
kubectl get pods -n observability
170+
```
171+
172+
### Check OpenTelemetry Collector Logs
173+
```bash
174+
kubectl logs -n observability -l app.kubernetes.io/name=otel-collector -f
175+
```
176+
177+
### Check Prometheus Targets
178+
```bash
179+
kubectl port-forward -n observability svc/kube-prometheus-stack-prometheus 9090:9090
180+
```
181+
Then open http://localhost:9090/targets
182+
183+
### Verify Metrics are Being Collected
184+
```bash
185+
# Check if OpenTelemetry is receiving metrics
186+
kubectl port-forward -n observability svc/otel-collector-prometheus 8889:8889
187+
curl http://localhost:8889/metrics | grep operator_sdk
188+
```
189+
190+
### Test OTLP Endpoint
191+
```bash
192+
# Port forward the OTLP HTTP endpoint
193+
kubectl port-forward -n observability svc/otel-collector-collector 4318:4318
194+
195+
# Send a test metric (requires curl and valid OTLP JSON)
196+
# This is just for testing connectivity
197+
curl -X POST http://localhost:4318/v1/metrics \
198+
-H "Content-Type: application/json" \
199+
-d '{"resourceMetrics":[]}'
200+
```
201+
202+
## Uninstalling
203+
204+
To remove all components:
205+
206+
```bash
207+
# Delete OpenTelemetry resources
208+
kubectl delete -n observability OpenTelemetryCollector otel-collector
209+
210+
# Uninstall Helm releases
211+
helm uninstall -n observability kube-prometheus-stack
212+
helm uninstall -n observability opentelemetry-operator
213+
helm uninstall -n cert-manager cert-manager
214+
215+
# Delete namespaces
216+
kubectl delete namespace observability cert-manager
217+
```
218+
219+
## References
220+
221+
- [JOSDK Observability Documentation](https://javaoperatorsdk.io/docs/documentation/observability/)
222+
- [OpenTelemetry Java Documentation](https://opentelemetry.io/docs/instrumentation/java/)
223+
- [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)
224+
- [Grafana Documentation](https://grafana.com/docs/)
225+
- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/)

0 commit comments

Comments
 (0)