diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml index 300e3f38e291..2ee38f383a29 100644 --- a/_topic_maps/_topic_map.yml +++ b/_topic_maps/_topic_map.yml @@ -3285,12 +3285,16 @@ Topics: File: understanding-network-observability-operator - Name: Configuring the Network Observability Operator File: configuring-operator + - Name: Network observability per-tenant model + File: network-observability-per-tenant-model - Name: Network Policy File: network-observability-network-policy + - Name: Network observability DNS resolution analysis + File: network-observability-dns-resolution-analysis - Name: Observing the network traffic File: observing-network-traffic - - Name: Network observability alerts - File: network-observability-alerts + - Name: Network observability health rules + File: network-observability-health-rules - Name: Using metrics with dashboards and alerts File: metrics-alerts-dashboards - Name: Monitoring the Network Observability Operator diff --git a/modules/network-observability-alerts-about.adoc b/modules/network-observability-alerts-about.adoc deleted file mode 100644 index ea0103d66a00..000000000000 --- a/modules/network-observability-alerts-about.adoc +++ /dev/null @@ -1,60 +0,0 @@ -// Module included in the following assemblies: -// -// * network_observability/network-observability-alerts.adoc - -:_mod-docs-content-type: CONCEPT -[id="network-observability-alerts-about_{context}"] -= About network observability alerts - -[role="_abstract"] -Network observability includes predefined alerts. Use these alerts to gain insight into the health and performance of your {product-title} applications and infrastructure. - -The predefined alerts provide a quick health indication of your cluster's network in the *Network Health* dashboard. You can also customize alerts using Prometheus Query Language (PromQL) queries. - -By default, network observability creates alerts that are contextual to the features you enable. - -For example, packet drop-related alerts are created only if the `PacketDrop` agent feature is enabled in the `FlowCollector` custom resource (CR). Alerts are built on metrics, and you might see configuration warnings if enabled alerts are missing their required metrics. - -You can configure these metrics in the `spec.processor.metrics.includeList` object of the `FlowCollector` CR. - -[id="network-observability-default-alert-templates_{context}"] -== List of default alert templates - -These alert templates are installed by default: - -`PacketDropsByDevice`:: Triggers on high percentage of packet drops from devices (`/proc/net/dev`). -`PacketDropsByKernel`:: Triggers on high percentage of packet drops by the kernel; it requires the `PacketDrop` agent feature. -`IPsecErrors`:: Triggers when IPsec encryption errors are detected by network observability; it requires the `IPSec` agent feature. -`NetpolDenied`:: Triggers when traffic denied by network policies is detected by network observability; it requires the `NetworkEvents` agent feature. -`LatencyHighTrend`:: Triggers when an increase of TCP latency is detected by network observability; it requires the `FlowRTT` agent feature. -`DNSErrors`:: Triggers when DNS errors are detected by network observability; it requires the `DNSTracking` agent feature. -//* `ExternalEgressHighTrend`: TODO. -//* `ExternalIngressHighTrend`: TODO. - -These are operational alerts that relate to the self-health of network observability: - -`NetObservNoFlows`:: Triggers when no flows are being observed for a certain period. -`NetObservLokiError`:: Triggers when flows are being dropped due to Loki errors. - -You can configure, extend, or disable alerts for network observability. You can view the resulting `PrometheusRule` resource in the default `netobserv` namespace by running the following command: - -[source,terminal] ----- -$ oc get prometheusrules -n netobserv -oyaml ----- - -[id="network-health-dashboard_{context}"] -== Network Health dashboard - -When alerts are enabled in the Network Observability Operator, two things happen: - -* New alerts appear in *Observe* → *Alerting* → *Alerting rules* tab in the {product-title} web console. -* A new *Network Health* dashboard appears in {product-title} web console → *Observe*. - -The *Network Health* dashboard provides a summary of triggered alerts and pending alerts, distinguishing between critical, warning, and minor issues. Alerts for rule violations are displayed in the following tabs: - -* *Global*: Shows alerts that are global to the cluster. -* *Nodes*: Shows alerts for rule violations per node. -* *Namespaces*: Shows alerts for rule violations per namespace. - -Click on a resource card to see more information. Next to each alert, a three dot menu appears. From this menu, you can navigate to *Network Traffic* → *Traffic flows* to see more detailed information for the selected resource. \ No newline at end of file diff --git a/modules/network-observability-architecture.adoc b/modules/network-observability-architecture.adoc index d32dd426b655..456892a73c6b 100644 --- a/modules/network-observability-architecture.adoc +++ b/modules/network-observability-architecture.adoc @@ -17,6 +17,20 @@ If you do not use Loki, you can generate metrics with Prometheus. Those metrics image::network-observability-architecture.png[Network Observability eBPF export architecture] -If you are using the Kafka option, the eBPF agent sends the network flow data to Kafka, and the `flowlogs-pipeline` reads from the Kafka topic before sending to Loki, as shown in the following diagram. +There are three deployment model options for the Network Observability Operator. +[NOTE] +==== +The Network Observability Operator does not manage Loki or other data stores. You must install Loki separately by using the {loki-op}. If you use Kafka, you must install it separately by using the Kafka Operator. +==== + +Service deployment model:: +When the `spec.deploymentModel` field in the `FlowCollector` resource is set to `Service`, agents are deployed per node as daemon sets. The `flowlogs-pipeline` is a standard deployment with a service. You can scale the `flowlogs-pipeline` component by using the `spec.processor.consumerReplicas` field. + +Direct deployment model:: +When the `spec.deploymentModel` field is set to `Direct`, agents and the `flowlogs-pipeline` are both deployed per node as daemon sets. This model is suitable for technology assessments and small clusters. However, it is less memory-efficient in large clusters because each instance of `flowlogs-pipeline` caches the same cluster information. + +Kafka deployment model (optional):: +If you use the Kafka option, the `eBPF agent` sends the network flow data to Kafka. You can scale the `flowlogs-pipeline` component by using the `spec.processor.consumerReplicas` field. The `flowlogs-pipeline` component reads from the Kafka topic before sending data to Loki, as shown in the following diagram. ++ image::network-observability-arch-kafka-FLP.png[Network Observability using Kafka] \ No newline at end of file diff --git a/modules/network-observability-configuring-predefined-alerts.adoc b/modules/network-observability-configuring-predefined-alerts.adoc deleted file mode 100644 index b11e222ca1a1..000000000000 --- a/modules/network-observability-configuring-predefined-alerts.adoc +++ /dev/null @@ -1,44 +0,0 @@ -// Module included in the following assemblies: -// -// network_observability/network-observability-alerts.adoc - -:_mod-docs-content-type: CONCEPT -[id="network-observability-configuring-predefined-alerts_{context}"] -= Configuring predefined alerts - -[role="_abstract"] -Alerts in the Network Observability Operator are defined using alert templates and variants in the `spec.processor.metrics.alerts` object of the `FlowCollector` custom resource (CR). You can customize the default templates and variants for flexible, fine-grained alerting. - -After you enable alerts, the *Network Health* dashboard appears in the *Observe* section of the {product-title} web console. - -For each template, you can define a list of variants, each with their own thresholds and grouping configurations. For more information, see the "List of default alert templates". - -Here is an example: - -[source,yaml,subs="attributes,verbatim"] ----- -apiVersion: flows.netobserv.io/v1beta1 -kind: FlowCollector -metadata: - name: flow-collector -spec: - processor: - metrics: - alerts: - - template: PacketDropsByKernel - variants: - # triggered when the whole cluster traffic (no grouping) reaches 10% of drops - - thresholds: - critical: "10" - # triggered when per-node traffic reaches 5% of drops, with gradual severity - - thresholds: - critical: "15" - warning: "10" - info: "5" - groupBy: Node ----- - -[NOTE] -==== -Customizing an alert replaces the default configuration for that template. If you want to keep the default configurations, you must manually replicate them. -==== \ No newline at end of file diff --git a/modules/network-observability-creating-custom-alert-rules.adoc b/modules/network-observability-custom-health-rule-configuration.adoc similarity index 88% rename from modules/network-observability-creating-custom-alert-rules.adoc rename to modules/network-observability-custom-health-rule-configuration.adoc index 3786b2a7c688..0a49482eb9ba 100644 --- a/modules/network-observability-creating-custom-alert-rules.adoc +++ b/modules/network-observability-custom-health-rule-configuration.adoc @@ -3,8 +3,8 @@ // * network_observability/network-observability-alerts.adoc :_mod-docs-content-type: PROCEDURE -[id="network-observability-creating-custom-alert-rules_{context}"] -= Creating custom alert rules +[id="network-observability-custom-health-rule-configuration_{context}"] += Custom health rule configuration [role="_abstract"] Use the Prometheus Query Language (`PromQL`) to define a custom `AlertingRule` resource to trigger alerts based on specific network metrics (e.g., traffic surges). @@ -12,7 +12,7 @@ Use the Prometheus Query Language (`PromQL`) to define a custom `AlertingRule` r .Prerequisites * Familiarity with `PromQL`. -* You have installed {product-title} 4.14 or later. +* You have installed {product-title} 4.16 or later. * You have access to the cluster as a user with the `cluster-admin` role. * You have installed the Network Observability Operator. diff --git a/modules/network-observability-disable-predefined-rules.adoc b/modules/network-observability-disable-predefined-rules.adoc new file mode 100644 index 000000000000..b52cd285381c --- /dev/null +++ b/modules/network-observability-disable-predefined-rules.adoc @@ -0,0 +1,12 @@ +// Module included in the following assemblies: +// +// * network_observability/network-observability-alerts.adoc + +:_mod-docs-content-type: REFERENCE +[id="network-observability-disable-predefined-rules_{context}"] += Disable predefined rules + +[role="_abstract"] +Rule templates can be disabled in the `spec.processor.metrics.disableAlerts` field of the `FlowCollector` custom resource (CR). This setting accepts a list of rule template names. For a list of alert template names, see: "List of default rules". + +If a template is disabled and overridden in the `spec.processor.metrics.healthRules` field, the disable setting takes precedence and the alert rule is not created. \ No newline at end of file diff --git a/modules/network-observability-disabling-predefined-alerts.adoc b/modules/network-observability-disabling-predefined-alerts.adoc deleted file mode 100644 index 3d84aca736cd..000000000000 --- a/modules/network-observability-disabling-predefined-alerts.adoc +++ /dev/null @@ -1,12 +0,0 @@ -// Module included in the following assemblies: -// -// * network_observability/network-observability-alerts.adoc - -:_mod-docs-content-type: REFERENCE -[id="network-observability-disabling-predefined-alerts_{context}"] -= Disabling predefined alerts - -[role="_abstract"] -Alert templates can be disabled in the `spec.processor.metrics.disableAlerts` field of the `FlowCollector` custom resource (CR). This setting accepts a list of alert template names. For a list of alert template names, see: "List of default alerts". - -If a template is disabled and overridden in the `spec.processor.metrics.alerts` field, the disable setting takes precedence and the alert rule is not created. \ No newline at end of file diff --git a/modules/network-observability-dns-resolution-analysis-configure.adoc b/modules/network-observability-dns-resolution-analysis-configure.adoc new file mode 100644 index 000000000000..2bf82a316391 --- /dev/null +++ b/modules/network-observability-dns-resolution-analysis-configure.adoc @@ -0,0 +1,54 @@ +// Module included in the following assemblies: +// +// * network_observability/network-observability-dns-decoding.adoc + +:_mod-docs-content-type: PROCEDURE +[id="network-observability-dns-resolution-analysis-configure_{context}"] += Configure DNS domain tracking for network observability + +[role="_abstract"] +Enable DNS tracking in the Network Observability Operator to monitor DNS query names, response codes, and latency for network flows within the cluster. + +.Prerequisites + +* The Network Observability Operator is installed. +* You have `cluster-admin` privileges. +* You are familiar with the `FlowCollector` custom resource. + +.Procedure + +. Edit the `FlowCollector` resource by running the following command: ++ +[source,terminal] +---- +$ oc edit flowcollector cluster +---- + +. Configure the eBPF agent to enable the DNS tracking feature: ++ +[source,yaml] +---- +apiVersion: flows.netobserv.io/v1alpha1 +kind: FlowCollector +metadata: + name: cluster +spec: + agent: + type: eBPF + ebpf: + features: + - DNSTracking +---- ++ +where: + +`spec.agent.type.ebpf.features`:: Specifies the list of features to enable for the eBPF agent. To enable DNS tracking, add `DNSTracking` to this list. + +. Save and exit the editor. + +.Verification +. In the {product-title} web console, navigate to *Observe* -> *Network Traffic*. +. In the *Traffic Flows* view, click the *Manage columns* icon. +. Ensure that the *DNS Query Name*, *DNS Response Code*, and *DNS Latency* columns are selected. +. Filter the results by setting *Port* to `53`. +. Confirm that the flow table columns are populated with domain names and DNS metadata. \ No newline at end of file diff --git a/modules/network-observability-dns-resolution-analysis-reference.adoc b/modules/network-observability-dns-resolution-analysis-reference.adoc new file mode 100644 index 000000000000..fb38f3800dc9 --- /dev/null +++ b/modules/network-observability-dns-resolution-analysis-reference.adoc @@ -0,0 +1,48 @@ +// Module included in the following assemblies: +// +// * network_observability/network-observability-dns-decoding.adoc + +:_mod-docs-content-type: REFERENCE +[id="network-observability-dns-resolution-analysis-reference_{context}"] += DNS flow enrichment and analysis reference + +[role="_abstract"] +Identify metadata added to network flows, leverage DNS data for network optimization, and understand the performance and storage impacts on the cluster. + +The following table describes the metadata fields added to network flows when DNS tracking is enabled. + +[NOTE] +==== +Query names might be missing or truncated because of compression pointers or cache limitations. +==== + +.DNS flow metadata +[cols="1,2,1",options="header"] +|=== +|Field |Description |Example +|`dns_query_name` |The Fully Qualified Domain Name (FQDN) being queried. |`example.com` +|`dns_response_code` |The status code returned by the DNS server. |`NoError`, `NXDomain` +|`dns_id` |The transaction ID used to match queries with responses. |`45213` +|=== + +[id="leverage-dns-data-optimization_{context}"] +== Leverage DNS data for network optimization +Use the captured DNS metadata for the following operational outcomes: + +* Audit external dependencies: Ensure workloads are not reaching out to unauthorized external APIs or high-risk domains. +* Performance tuning: Monitor `DNS Latency` to identify if `CoreDNS` pods require additional scaling or if upstream DNS providers are lagging. + +[id="identify-misconfiguration-errors_{context}"] +== Identify misconfiguration errors +A high frequency of `NXDOMAIN` responses typically indicates service discovery errors in application code or stale environment variables. + +`NXDOMAIN` errors can be frequent in Kubernetes because of DNS searches on services and pods. While these results do not necessarily indicate a misconfiguration or broken URL, they can negatively impact performance. + +When `NXDOMAIN` errors are returned despite an apparently valid Service or Pod host name, such as `my-svc.my-namespace.svc`, the resolver is likely configured to query DNS for different suffixes. You can optimize this by adding a trailing dot to fully qualified domain names to tell the resolver that the name is unambiguous. + +For example, instead of `https://my-svc.my-namespace.svc`, use `https://my-svc.my-namespace.svc.cluster.local.` with a trailing dot. + +[id="loki-storage-considerations_{context}"] +== Loki storage considerations + +DNS tracking increases the number of labels and the amount of metadata per flow. Ensure that the Loki storage is sized to accommodate the increased log volume. \ No newline at end of file diff --git a/modules/network-observability-dns-resolution-analysis-strategic-benefits.adoc b/modules/network-observability-dns-resolution-analysis-strategic-benefits.adoc new file mode 100644 index 000000000000..aa5f9c1283bf --- /dev/null +++ b/modules/network-observability-dns-resolution-analysis-strategic-benefits.adoc @@ -0,0 +1,23 @@ +// Module included in the following assemblies: +// +// * network_observability/network-observability-dns-decoding.adoc + +:_mod-docs-content-type: CONCEPT +[id="network-observability-dns-resolution-analysis-strategic-benefits_{context}"] += Strategic benefits of DNS resolution analysis + +[role="_abstract"] +Use DNS resolution analysis to differentiate between network transport failures and service discovery issues by enriching eBPF flow records with domain names and status codes. + +Standard flow logs only show that traffic occurred on port 53. DNS resolution analysis allows you to complete the following tasks: + +* Reduced Mean time to identify (Mtti): Distinguish immediately between a network routing failure and a DNS resolution failure, such as an `NXDOMAIN` error. +* Measure internal service latency: Track the time it takes for CoreDNS to respond to specific internal lookups (e.g., `my-service.namespace.svc.cluster.local`). +* Audit external dependencies: Audit which external APIs or third-party domains your workloads are communicating with without requiring sidecars or manual packet captures. +* Improved security posture: Detect potential data exfiltration or Command and Control (C2) activity by auditing the Fully Qualified Domain Names (FQDNs) queried by internal workloads. + +[id="dns-flow-enrichment_{context}"] +== DNS flow enrichment +When this feature is active, the eBPF agent enriches the flow records. This metadata allows you to group and filter traffic by the intent of the connection (the domain) rather than just the source IP. + +Enhanced DNS decoding allows the eBPF agent to inspect UDP and TCP DNS traffic on port 53 along with the query names for the DNS request. \ No newline at end of file diff --git a/modules/network-observability-flowcollector-api-specifications.adoc b/modules/network-observability-flowcollector-api-specifications.adoc index 0d628a84eaf8..b3f2a6d3c7fb 100644 --- a/modules/network-observability-flowcollector-api-specifications.adoc +++ b/modules/network-observability-flowcollector-api-specifications.adoc @@ -94,11 +94,15 @@ Type:: | `string` | `deploymentModel` defines the desired type of deployment for flow processing. Possible values are: + -- `Direct` (default) to make the flow processor listen directly from the agents. Only recommended on small clusters, below 15 nodes. + +- `Service` (default) to make the flow processor listen as a Kubernetes Service, backed by a scalable Deployment. + - `Kafka` to make flows sent to a Kafka pipeline before consumption by the processor. + -Kafka can provide better scalability, resiliency, and high availability (for more details, see https://www.redhat.com/en/topics/integration/what-is-apache-kafka). +- `Direct` to make the flow processor listen directly from the agents using the host network, backed by a DaemonSet. Only recommended on small clusters, below 15 nodes. + + +Kafka can provide better scalability, resiliency, and high availability (for more details, see https://www.redhat.com/en/topics/integration/what-is-apache-kafka). + + +`Direct` is not recommended on large clusters as it is less memory efficient. | `exporters` | `array` @@ -185,13 +189,13 @@ override the default Linux capabilities from there. | `cacheActiveTimeout` | `string` -| `cacheActiveTimeout` is the max period during which the reporter aggregates flows before sending. +| `cacheActiveTimeout` is the period during which the agent aggregates flows before sending. Increasing `cacheMaxFlows` and `cacheActiveTimeout` can decrease the network traffic overhead and the CPU load, however you can expect higher memory consumption and an increased latency in the flow collection. | `cacheMaxFlows` | `integer` -| `cacheMaxFlows` is the max number of flows in an aggregate; when reached, the reporter sends the flows. +| `cacheMaxFlows` is the maximum number of flows in an aggregate; when reached, the reporter sends the flows. Increasing `cacheMaxFlows` and `cacheActiveTimeout` can decrease the network traffic overhead and the CPU load, however you can expect higher memory consumption and an increased latency in the flow collection. @@ -802,7 +806,8 @@ such as `GOGC` and `GOMAXPROCS` environment variables. Set these values at your | `autoscaler` | `object` -| `autoscaler` spec of a horizontal pod autoscaler to set up for the plugin Deployment. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). +| `autoscaler` [deprecated (*)] spec of a horizontal pod autoscaler to set up for the plugin Deployment. +Deprecation notice: managed autoscaler will be removed in a future version. You might configure instead an autoscaler of your choice, and set `spec.consolePlugin.unmanagedReplicas` to `true`. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). | `enable` | `boolean` @@ -810,19 +815,20 @@ such as `GOGC` and `GOMAXPROCS` environment variables. Set these values at your | `imagePullPolicy` | `string` -| `imagePullPolicy` is the Kubernetes pull policy for the image defined above +| `imagePullPolicy` is the Kubernetes pull policy for the image defined above. | `logLevel` | `string` -| `logLevel` for the console plugin backend +| `logLevel` for the console plugin backend. | `portNaming` | `object` -| `portNaming` defines the configuration of the port-to-service name translation +| `portNaming` defines the configuration of the port-to-service name translation. | `quickFilters` | `array` -| `quickFilters` configures quick filter presets for the Console plugin +| `quickFilters` configures quick filter presets for the Console plugin. +Filters for external traffic assume the subnet labels are configured to distinguish internal and external traffic (see `spec.processor.subnetLabels`). | `replicas` | `integer` @@ -831,7 +837,17 @@ such as `GOGC` and `GOMAXPROCS` environment variables. Set these values at your | `resources` | `object` | `resources`, in terms of compute resources, required by this container. -For more information, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ +For more information, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/. + +| `standalone` +| `boolean` +| Deploy as a standalone console, instead of a plugin of the {product-title} Console. +This is not recommended when using with {product-title}, as it doesn't provide an integrated experience. +[Unsupported (*)]. + +| `unmanagedReplicas` +| `boolean` +| If `unmanagedReplicas` is `true`, the operator will not reconcile `replicas`. This is useful when using a pod autoscaler. |=== == .spec.consolePlugin.advanced @@ -950,7 +966,8 @@ Type:: Description:: + -- -`autoscaler` spec of a horizontal pod autoscaler to set up for the plugin Deployment. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). +`autoscaler` [deprecated (*)] spec of a horizontal pod autoscaler to set up for the plugin Deployment. +Deprecation notice: managed autoscaler will be removed in a future version. You might configure instead an autoscaler of your choice, and set `spec.consolePlugin.unmanagedReplicas` to `true`. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). -- Type:: @@ -963,7 +980,7 @@ Type:: Description:: + -- -`portNaming` defines the configuration of the port-to-service name translation +`portNaming` defines the configuration of the port-to-service name translation. -- Type:: @@ -990,7 +1007,8 @@ for example, `portNames: {"3100": "loki"}`. Description:: + -- -`quickFilters` configures quick filter presets for the Console plugin +`quickFilters` configures quick filter presets for the Console plugin. +Filters for external traffic assume the subnet labels are configured to distinguish internal and external traffic (see `spec.processor.subnetLabels`). -- Type:: @@ -1038,7 +1056,7 @@ Description:: + -- `resources`, in terms of compute resources, required by this container. -For more information, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ +For more information, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/. -- Type:: @@ -1124,6 +1142,7 @@ Type:: `object` Required:: + - `enterpriseID` - `targetHost` - `targetPort` @@ -1133,6 +1152,12 @@ Required:: |=== | Property | Type | Description +| `enterpriseID` +| `integer` +| EnterpriseID, or Private Enterprise Number (PEN). To date, Network Observability does not own an assigned number, +so it is left open for configuration. The PEN is needed to collect non standard data, such as Kubernetes names, +RTT, etc. + | `targetHost` | `string` | Address of the IPFIX external receiver. @@ -2521,6 +2546,12 @@ Type:: |=== | Property | Type | Description +| `installDemoLoki` +| `boolean` +| Set `installDemoLoki` to `true` to automatically create Loki deployment, service and storage. +This is useful for development and demo purposes. Do not use it in production. +[Unsupported (*)]. + | `tenantID` | `string` | `tenantID` is the Loki `X-Scope-OrgID` header that identifies the tenant for each request. @@ -2698,7 +2729,7 @@ Type:: | `addZone` | `boolean` -| `addZone` allows availability zone awareness by labelling flows with their source and destination zones. +| `addZone` allows availability zone awareness by labeling flows with their source and destination zones. This feature requires the "topology.kubernetes.io/zone" label to be set on nodes. | `advanced` @@ -2711,6 +2742,11 @@ such as `GOGC` and `GOMAXPROCS` environment variables. Set these values at your | `string` | `clusterName` is the name of the cluster to appear in the flows data. This is useful in a multi-cluster context. When using {product-title}, leave empty to make it automatically determined. +| `consumerReplicas` +| `integer` +| `consumerReplicas` defines the number of replicas (pods) to start for `flowlogs-pipeline`, default is 3. +This setting is ignored when `spec.deploymentModel` is `Direct` or when `spec.processor.unmanagedReplicas` is `true`. + | `deduper` | `object` | `deduper` allows you to sample or drop flows identified as duplicates, in order to save on resource usage. @@ -2727,8 +2763,9 @@ but with a lesser improvement in performance. | `kafkaConsumerAutoscaler` | `object` -| `kafkaConsumerAutoscaler` is the spec of a horizontal pod autoscaler to set up for `flowlogs-pipeline-transformer`, which consumes Kafka messages. -This setting is ignored when Kafka is disabled. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). +| `kafkaConsumerAutoscaler` [deprecated (*)] is the spec of a horizontal pod autoscaler to set up for `flowlogs-pipeline-transformer`, which consumes Kafka messages. +This setting is ignored when Kafka is disabled. +Deprecation notice: managed autoscaler will be removed in a future version. You might configure instead an autoscaler of your choice, and set `spec.processor.unmanagedReplicas` to `true`. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). | `kafkaConsumerBatchSize` | `integer` @@ -2740,8 +2777,9 @@ This setting is ignored when Kafka is disabled. Refer to HorizontalPodAutoscaler | `kafkaConsumerReplicas` | `integer` -| `kafkaConsumerReplicas` defines the number of replicas (pods) to start for `flowlogs-pipeline-transformer`, which consumes Kafka messages. +| `kafkaConsumerReplicas` [deprecated (*)] defines the number of replicas (pods) to start for `flowlogs-pipeline-transformer`, which consumes Kafka messages. This setting is ignored when Kafka is disabled. +Deprecation notice: use `spec.processor.consumerReplicas` instead. | `logLevel` | `string` @@ -2773,11 +2811,19 @@ This setting is ignored when Kafka is disabled. | `resources` are the compute resources required by this container. For more information, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ +| `slicesConfig` +| `object` +| Global configuration managing FlowCollectorSlices custom resources. + | `subnetLabels` | `object` -| `subnetLabels` allows to define custom labels on subnets and IPs or to enable automatic labelling of recognized subnets in {product-title}, which is used to identify cluster external traffic. +| `subnetLabels` allows to define custom labels on subnets and IPs or to enable automatic labeling of recognized subnets in {product-title}, which is used to identify cluster external traffic. When a subnet matches the source or destination IP of a flow, a corresponding field is added: `SrcSubnetLabel` or `DstSubnetLabel`. +| `unmanagedReplicas` +| `boolean` +| If `unmanagedReplicas` is `true`, the operator will not reconcile `consumerReplicas`. This is useful when using a pod autoscaler. + |=== == .spec.processor.advanced Description:: @@ -3043,8 +3089,9 @@ Type:: Description:: + -- -`kafkaConsumerAutoscaler` is the spec of a horizontal pod autoscaler to set up for `flowlogs-pipeline-transformer`, which consumes Kafka messages. -This setting is ignored when Kafka is disabled. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). +`kafkaConsumerAutoscaler` [deprecated (*)] is the spec of a horizontal pod autoscaler to set up for `flowlogs-pipeline-transformer`, which consumes Kafka messages. +This setting is ignored when Kafka is disabled. +Deprecation notice: managed autoscaler will be removed in a future version. You might configure instead an autoscaler of your choice, and set `spec.processor.unmanagedReplicas` to `true`. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). -- Type:: @@ -3070,18 +3117,18 @@ Type:: |=== | Property | Type | Description -| `alerts` -| `array` -| `alerts` is a list of alerts to be created for Prometheus AlertManager, organized by templates and variants [Unsupported (*)]. -This is currently an experimental feature behind a feature gate. To enable, edit `spec.processor.advanced.env` by adding `EXPERIMENTAL_ALERTS_HEALTH` set to `true`. -More information on alerts: https://github.com/netobserv/network-observability-operator/blob/main/docs/Alerts.md - | `disableAlerts` | `array (string)` | `disableAlerts` is a list of alert groups that should be disabled from the default set of alerts. Possible values are: `NetObservNoFlows`, `NetObservLokiError`, `PacketDropsByKernel`, `PacketDropsByDevice`, `IPsecErrors`, `NetpolDenied`, -`LatencyHighTrend`, `DNSErrors`, `ExternalEgressHighTrend`, `ExternalIngressHighTrend`, `CrossAZ`. -More information on alerts: https://github.com/netobserv/network-observability-operator/blob/main/docs/Alerts.md +`LatencyHighTrend`, `DNSErrors`, `DNSNxDomain`, `ExternalEgressHighTrend`, `ExternalIngressHighTrend`, `Ingress5xxErrors`, `IngressHTTPLatencyTrend`. +More information on alerts: https://github.com/netobserv/network-observability-operator/blob/main/docs/HealthRules.md + +| `healthRules` +| `array` +| `healthRules` is a list of health rules to be created for Prometheus, organized by templates and variants. +Each health rule can be configured to generate either alerts or recording rules based on the mode field. +More information on health rules: https://github.com/netobserv/network-observability-operator/blob/main/docs/HealthRules.md | `includeList` | `array (string)` @@ -3101,13 +3148,13 @@ More information, with full list of available metrics: https://github.com/netobs | Metrics server endpoint configuration for Prometheus scraper |=== -== .spec.processor.metrics.alerts +== .spec.processor.metrics.healthRules Description:: + -- -`alerts` is a list of alerts to be created for Prometheus AlertManager, organized by templates and variants [Unsupported (*)]. -This is currently an experimental feature behind a feature gate. To enable, edit `spec.processor.advanced.env` by adding `EXPERIMENTAL_ALERTS_HEALTH` set to `true`. -More information on alerts: https://github.com/netobserv/network-observability-operator/blob/main/docs/Alerts.md +`healthRules` is a list of health rules to be created for Prometheus, organized by templates and variants. +Each health rule can be configured to generate either alerts or recording rules based on the mode field. +More information on health rules: https://github.com/netobserv/network-observability-operator/blob/main/docs/HealthRules.md -- Type:: @@ -3116,7 +3163,7 @@ Type:: -== .spec.processor.metrics.alerts[] +== .spec.processor.metrics.healthRules[] Description:: + -- @@ -3136,19 +3183,28 @@ Required:: |=== | Property | Type | Description +| `mode` +| `string` +| Mode defines whether this health rule should be generated as an alert or a recording rule. +Possible values are: `Alert` (default), `Recording`. +Recording rules violations are visible in the Network Health dashboard without generating any Prometheus alert. +This provides an alternative way of getting Health information for SRE and cluster admins who might find +many new alerts burdensome. + | `template` | `string` -| Alert template name. +| Health rule template name. Possible values are: `PacketDropsByKernel`, `PacketDropsByDevice`, `IPsecErrors`, `NetpolDenied`, -`LatencyHighTrend`, `DNSErrors`, `ExternalEgressHighTrend`, `ExternalIngressHighTrend`, `CrossAZ`. -More information on alerts: https://github.com/netobserv/network-observability-operator/blob/main/docs/Alerts.md +`LatencyHighTrend`, `DNSErrors`, `DNSNxDomain`, `ExternalEgressHighTrend`, `ExternalIngressHighTrend`, `Ingress5xxErrors`, `IngressHTTPLatencyTrend`. +Note: `NetObservNoFlows` and `NetObservLokiError` are alert-only and cannot be used as health rules. +More information on health rules: https://github.com/netobserv/network-observability-operator/blob/main/docs/HealthRules.md | `variants` | `array` | A list of variants for this template |=== -== .spec.processor.metrics.alerts[].variants +== .spec.processor.metrics.healthRules[].variants Description:: + -- @@ -3161,7 +3217,7 @@ Type:: -== .spec.processor.metrics.alerts[].variants[] +== .spec.processor.metrics.healthRules[].variants[] Description:: + -- @@ -3190,26 +3246,34 @@ Required:: It is provided as an absolute rate (bytes per second or packets per second, depending on the context). When provided, it must be parsable as a float. +| `mode` +| `string` +| Mode overrides the health rule mode for this specific variant. +If not specified, inherits from the parent health rule's mode. +Possible values are: `Alert`, `Recording`. + | `thresholds` | `object` -| Thresholds of the alert per severity. +| Thresholds of the health rule per severity. They are expressed as a percentage of errors above which the alert is triggered. They must be parsable as floats. +Required for both alert and recording modes | `trendDuration` | `string` -| For trending alerts, the duration interval for baseline comparison. For example, "2h" means comparing against a 2-hours average. Defaults to 2h. +| For trending health rules, the duration interval for baseline comparison. For example, "2h" means comparing against a 2-hours average. Defaults to 2h. | `trendOffset` | `string` -| For trending alerts, the time offset for baseline comparison. For example, "1d" means comparing against yesterday. Defaults to 1d. +| For trending health rules, the time offset for baseline comparison. For example, "1d" means comparing against yesterday. Defaults to 1d. |=== -== .spec.processor.metrics.alerts[].variants[].thresholds +== .spec.processor.metrics.healthRules[].variants[].thresholds Description:: + -- -Thresholds of the alert per severity. +Thresholds of the health rule per severity. They are expressed as a percentage of errors above which the alert is triggered. They must be parsable as floats. +Required for both alert and recording modes -- Type:: @@ -3406,12 +3470,51 @@ If Requests is omitted for a container, it defaults to Limits if that is explici otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ +|=== +== .spec.processor.slicesConfig +Description:: ++ +-- +Global configuration managing FlowCollectorSlices custom resources. +-- + +Type:: + `object` + +Required:: + - `enable` + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `collectionMode` +| `string` +| `collectionMode` determines how the FlowCollectorSlice custom resources impacts the flow collection process: + + +- When set to `AlwaysCollect`, all flows are collected regardless of the presence of FlowCollectorSlice. + + +- When set to `AllowList`, only the flows related to namespaces where a FlowCollectorSlice resource is present, or configured via the global `namespacesAllowList`, are collected. + + + +| `enable` +| `boolean` +| `enable` determines if the FlowCollectorSlice feature is enabled. If not, all resources of kind FlowCollectorSlice are simply ignored. + +| `namespacesAllowList` +| `array (string)` +| `namespacesAllowList` is a list of namespaces for which flows are always collected, regardless of the presence of FlowCollectorSlice in those namespaces. +An entry enclosed by slashes, such as `/openshift-.*/`, is matched as a regular expression. +This setting is ignored if `collectionMode` is different from `AllowList`. + |=== == .spec.processor.subnetLabels Description:: + -- -`subnetLabels` allows to define custom labels on subnets and IPs or to enable automatic labelling of recognized subnets in {product-title}, which is used to identify cluster external traffic. +`subnetLabels` allows to define custom labels on subnets and IPs or to enable automatic labeling of recognized subnets in {product-title}, which is used to identify cluster external traffic. When a subnet matches the source or destination IP of a flow, a corresponding field is added: `SrcSubnetLabel` or `DstSubnetLabel`. -- @@ -3427,8 +3530,13 @@ Type:: | `customLabels` | `array` -| `customLabels` allows to customize subnets and IPs labelling, such as to identify cluster-external workloads or web services. -If you enable `openShiftAutoDetect`, `customLabels` can override the detected subnets in case they overlap. +| `customLabels` allows you to customize subnets and IPs labeling, such as to identify cluster external workloads or web services. +External subnets must be labeled with the prefix `EXT:`, or not labeled at all, in order to work with default quick filters and some metrics examples provided. + + +If `openShiftAutoDetect` is disabled or you are not using {product-title}, it is recommended to manually configure labels for the cluster subnets, to distinguish internal traffic from external traffic. + + +If `openShiftAutoDetect` is enabled, `customLabels` overrides the detected subnets when they overlap. + + | `openShiftAutoDetect` | `boolean` @@ -3441,8 +3549,13 @@ external traffic: flows that are not labeled for those subnets are external to t Description:: + -- -`customLabels` allows to customize subnets and IPs labelling, such as to identify cluster-external workloads or web services. -If you enable `openShiftAutoDetect`, `customLabels` can override the detected subnets in case they overlap. +`customLabels` allows you to customize subnets and IPs labeling, such as to identify cluster external workloads or web services. +External subnets must be labeled with the prefix `EXT:`, or not labeled at all, in order to work with default quick filters and some metrics examples provided. + + +If `openShiftAutoDetect` is disabled or you are not using {product-title}, it is recommended to manually configure labels for the cluster subnets, to distinguish internal traffic from external traffic. + + +If `openShiftAutoDetect` is enabled, `customLabels` overrides the detected subnets when they overlap. + + -- Type:: @@ -3478,6 +3591,8 @@ Required:: | `name` | `string` | Label name, used to flag matching flows. +External subnets must be labeled with the prefix `EXT:`, or not labeled at all, in order to work with default quick filters and some metrics examples provided. + + |=== == .spec.prometheus @@ -3539,9 +3654,9 @@ If they are both disabled, the Console plugin is not deployed. | `string` | `mode` must be set according to the type of Prometheus installation that stores Network Observability metrics: + -- Use `Auto` to try configuring automatically. In {product-title}, it uses the Thanos querier from {product-title} Cluster Monitoring + +- Use `Auto` to try configuring automatically. In {product-title}, it uses the Thanos querier from {product-title} Cluster Monitoring. + -- Use `Manual` for a manual setup + +- Use `Manual` for a manual setup. + | `timeout` @@ -3567,6 +3682,12 @@ Type:: |=== | Property | Type | Description +| `alertManager` +| `object` +| AlertManager configuration. This is used in the console to query silenced alerts, for displaying health information. +When used in {product-title} it can be left empty to use the Console API instead. +[Unsupported (*)]. + | `forwardUserToken` | `boolean` | Set `true` to forward logged in user token in queries to Prometheus @@ -3579,6 +3700,147 @@ Type:: | `string` | `url` is the address of an existing Prometheus service to use for querying metrics. +|=== +== .spec.prometheus.querier.manual.alertManager +Description:: ++ +-- +AlertManager configuration. This is used in the console to query silenced alerts, for displaying health information. +When used in {product-title} it can be left empty to use the Console API instead. +[Unsupported (*)]. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `tls` +| `object` +| TLS client configuration for Prometheus AlertManager URL. + +| `url` +| `string` +| `url` is the address of an existing Prometheus AlertManager service to use for querying alerts. + +|=== +== .spec.prometheus.querier.manual.alertManager.tls +Description:: ++ +-- +TLS client configuration for Prometheus AlertManager URL. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `caCert` +| `object` +| `caCert` defines the reference of the certificate for the Certificate Authority. + +| `enable` +| `boolean` +| Enable TLS + +| `insecureSkipVerify` +| `boolean` +| `insecureSkipVerify` allows skipping client-side verification of the server certificate. +If set to `true`, the `caCert` field is ignored. + +| `userCert` +| `object` +| `userCert` defines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. + +|=== +== .spec.prometheus.querier.manual.alertManager.tls.caCert +Description:: ++ +-- +`caCert` defines the reference of the certificate for the Certificate Authority. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `certFile` +| `string` +| `certFile` defines the path to the certificate file name within the config map or secret. + +| `certKey` +| `string` +| `certKey` defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary. + +| `name` +| `string` +| Name of the config map or secret containing certificates. + +| `namespace` +| `string` +| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. +If the namespace is different, the config map or the secret is copied so that it can be mounted as required. + +| `type` +| `string` +| Type for the certificate reference: `configmap` or `secret`. + +|=== +== .spec.prometheus.querier.manual.alertManager.tls.userCert +Description:: ++ +-- +`userCert` defines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `certFile` +| `string` +| `certFile` defines the path to the certificate file name within the config map or secret. + +| `certKey` +| `string` +| `certKey` defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary. + +| `name` +| `string` +| Name of the config map or secret containing certificates. + +| `namespace` +| `string` +| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. +If the namespace is different, the config map or the secret is copied so that it can be mounted as required. + +| `type` +| `string` +| Type for the certificate reference: `configmap` or `secret`. + |=== == .spec.prometheus.querier.manual.tls Description:: @@ -3692,4 +3954,4 @@ If the namespace is different, the config map or the secret is copied so that it | `string` | Type for the certificate reference: `configmap` or `secret`. -|=== +|=== \ No newline at end of file diff --git a/modules/network-observability-flowmetric-api-specifications.adoc b/modules/network-observability-flowmetric-api-specifications.adoc index d3ca752965c2..f0bd87d6227c 100644 --- a/modules/network-observability-flowmetric-api-specifications.adoc +++ b/modules/network-observability-flowmetric-api-specifications.adoc @@ -110,6 +110,10 @@ Refer to the documentation for the list of available fields: https://docs.redhat | `flatten` is a list of array-type fields that must be flattened, such as Interfaces or NetworkEvents. Flattened fields generate one metric per item in that field. For instance, when flattening `Interfaces` on a bytes counter, a flow having Interfaces [br-ex, ens5] increases one counter for `br-ex` and another for `ens5`. +| `help` +| `string` +| Help text of the metric, as it appears in Prometheus. + | `labels` | `array (string)` | `labels` is a list of fields that should be used as Prometheus labels, also known as dimensions (for example: `SrcK8S_Namespace`). diff --git a/modules/network-observability-flows-format.adoc b/modules/network-observability-flows-format.adoc index 1698d1852b70..6150157f8805 100644 --- a/modules/network-observability-flows-format.adoc +++ b/modules/network-observability-flows-format.adoc @@ -57,6 +57,13 @@ The "Cardinality" column gives information about the implied metric cardinality | no | avoid | dns.latency +| `DnsName` +| string +| DNS queried name +| `dns_name` +| no +| careful +| n/a | `Dscp` | number | Differentiated Services Code Point (DSCP) value diff --git a/modules/network-observability-health-rule-structure-customization.adoc b/modules/network-observability-health-rule-structure-customization.adoc new file mode 100644 index 000000000000..1983a9595130 --- /dev/null +++ b/modules/network-observability-health-rule-structure-customization.adoc @@ -0,0 +1,51 @@ +// Module included in the following assemblies: +// +// network_observability/network-observability-alerts.adoc + +:_mod-docs-content-type: CONCEPT +[id="network-observability-health-rule-structure-customization_{context}"] += Network observability health rule structure and customization + +[role="_abstract"] +Health rules in the Network Observability Operator are defined using rule templates and variants in the `spec.processor.metrics.healthRules` object of the `FlowCollector` custom resource (CR). You can customize the default templates and variants for flexible, fine-grained alerting. + +For each template, you can define a list of variants, each with their own thresholds and grouping configurations. For more information, see the "List of default alert templates". + +Here is an example: + +[source,yaml] +---- +apiVersion: flows.netobserv.io/v1beta1 +kind: FlowCollector +metadata: + name: flow-collector +spec: + processor: + metrics: + healthRules: + - template: PacketDropsByKernel + mode: Alert # or Recording + variants: + # triggered when the whole cluster traffic (no grouping) reaches 10% of drops + - thresholds: + critical: "10" + # triggered when per-node traffic reaches 5% of drops, with gradual severity + - thresholds: + critical: "15" + warning: "10" + info: "5" + groupBy: Node +---- + +where: + +`spec.processor.metrics.healthRules.template`:: Specifies the name of the predefined rule template. +`spec.processor.metrics.healthRules.mode`:: Specifies whether the rule functions as an `Alert` or a `Recording` rule. This setting can either be defined per variant, or for the whole template. +`spec.processor.metrics.healthRules.variants.thresholds`:: Specifies the numerical values that trigger the rule. You can define multiple severity levels, such as `critical`, `warning`, or `info`, within a single variant. +`cluster-wide variant`:: Specifies a variant defined without a `groupBy` setting. In the provided example, this variant triggers when the total cluster traffic reaches 10% drops. +`spec.processor.metrics.healthRules.variants.groupBy`:: Specifies the dimension used to aggregate the metric. In the provided example, the alert is evaluated independently for each *Node8. + +[NOTE] +==== +Customizing a rule replaces the default configuration for that template. If you want to keep the default configurations, you must manually replicate them. +==== \ No newline at end of file diff --git a/modules/network-observability-health-rules-and-performance.adoc b/modules/network-observability-health-rules-and-performance.adoc new file mode 100644 index 000000000000..4ae25543b631 --- /dev/null +++ b/modules/network-observability-health-rules-and-performance.adoc @@ -0,0 +1,22 @@ +// Module included in the following assemblies: +// +// * network_observability/network-observability-health-rules.adoc + +:_mod-docs-content-type: CONCEPT +[id="network-observability-health-rules-and-performance_{context}"] += Network observability rules for health and performance + +[role="_abstract"] +Network observability includes a system for managing Prometheus-based rules. Use these rules to monitor the health and performance of {product-title} applications and infrastructure. + +The Network Observability Operator converts these rules into a `PrometheusRule` resource. The Network Observability Operator supports the following rule types: + +* Alerting rules: Specifies rules managed by the Prometheus `AlertManager` to provide notification of network anomalies or infrastructure failures. +* Recording rules: Specifies pre-compute complex Prometheus Query Language (PromQL) expressions into new time series to improve dashboard performance and visualization. + +View the `PrometheusRule` resource in the `netobserv` namespace by running the following command: + +[source,terminal] +---- +$ oc get prometheusrules -n netobserv -o yaml +---- \ No newline at end of file diff --git a/modules/network-observability-health-rules-monitoring-and-alerting.adoc b/modules/network-observability-health-rules-monitoring-and-alerting.adoc new file mode 100644 index 000000000000..d3eaa8488fed --- /dev/null +++ b/modules/network-observability-health-rules-monitoring-and-alerting.adoc @@ -0,0 +1,50 @@ +// Module included in the following assemblies: +// +// * network_observability/network-observability-health-rules.adoc + +:_mod-docs-content-type: CONCEPT +[id="network-observability-health-rules-monitoring-and-alerting_{context}"] += Network health monitoring and alerting rules + +[role="_abstract"] +The Network Observability Operator includes a rule-based system to detect network anomalies and infrastructure failures. By converting configurations into alerting rules, the Operator enables automated monitoring and troubleshooting through the {product-title} web console. + +[id="network-observability-health-outcomes_{context}"] +== Monitoring outcomes +The Network Observability Operator surfaces network status in the following areas: + +*Alerting* UI:: Specific alerts appear in *Observe* → *Alerting*, where notifications are managed through the Prometheus `AlertManager`. +*Network Health* dashboard:: A specialized dashboard in *Observe* → *Network Health* provides a high-level summary of cluster network status. + +The *Network Health* dashboard categorizes violations into tabs to isolate the scope of an issue: + +* *Global*: Aggregate health of the entire cluster. +* *Nodes*: Violations specific to infrastructure nodes. +* *Namespaces*: Violations specific to individual namespaces. +* *Workloads*: Violations specific to resources, such as `Deployments` or `DaemonSets`. + +[id="network-observability-default-rules_{context}"] +== Predefined health rules +The Network Observability Operator provides default rules for common networking scenarios. These rules are active only if the corresponding feature is enabled in the `FlowCollector` custom resource (CR). + +The following list contains a subset of available default rules: + +`PacketDropsByDevice`:: Triggers on a high percentage of packet drops from network devices. It is based on standard node-exporter metrics and does not require the `PacketDrop` agent feature. +`PacketDropsByKernel`:: Triggers on a high percentage of packet drops by the kernel. Requires the `PacketDrop` agent feature. +`IPsecErrors`:: Triggers when IPsec encryption errors are detected. Requires the `IPSec` agent feature. +`NetpolDenied`:: Triggers when traffic denied by network policies is detected. Requires the `NetworkEvents` agent feature. +`LatencyHighTrend`:: Triggers when a significant increase in TCP latency is detected. Requires the `FlowRTT` agent feature. +`DNSErrors`:: Triggers when DNS errors are detected. Requires the `DNSTracking` agent feature. + +Operational alerts for the Network Observability Operator: + +`NetObservNoFlows`:: Triggers when the pipeline is active but no flows are observed. +`NetObservLokiError`:: Triggers when flows are dropped because of Loki errors. + +For a complete list of rules and runbooks, see the link:https://github.com/openshift/runbooks/tree/master/alerts/network-observability-operator[Network Observability Operator runbooks]. + +[id="network-observability-rule-dependencies_{context}"] +== Rule dependencies and feature requirements +The Network Observability Operator creates rules based on the features enabled in the `FlowCollector` custom resource (CR). + +For example, packet drop-related rules are created only if the `PacketDrop` agent feature is enabled. Rules are built on metrics; if the required metrics are missing, configuration warnings might appear. Configure metrics in the `spec.processor.metrics.includeList` object of the `FlowCollector` resource. \ No newline at end of file diff --git a/modules/network-observability-alerts-about-promql-expression.adoc b/modules/network-observability-health-rules-promql-expressions-metadata.adoc similarity index 80% rename from modules/network-observability-alerts-about-promql-expression.adoc rename to modules/network-observability-health-rules-promql-expressions-metadata.adoc index 198760c614b8..c9a597d5b38d 100644 --- a/modules/network-observability-alerts-about-promql-expression.adoc +++ b/modules/network-observability-health-rules-promql-expressions-metadata.adoc @@ -3,13 +3,13 @@ // * network_observability/network-observability-alerts.adoc :_mod-docs-content-type: REFERENCE -[id="network-observability-alerts-about-promql-expression_{context}"] -= About the PromQL expression for alerts +[id="network-observability-health-rules-promql-expressions-metadata_{context}"] += PromQL expressions and metadata for health rules [role="_abstract"] Learn about the base query for Prometheus Query Language (`PromQL`), and how to customize it so you can configure network observability alerts for your specific needs. -The alerting API in the network observability `FlowCollector` custom resource (`CR`) is mapped to the Prometheus Operator API, generating a `PrometheusRule`. You can see the `PrometheusRule` in the default `netobserv` namespace by running the following command: +The health rule API in the network observability `FlowCollector` custom resource (`CR`) is mapped to the Prometheus Operator API, generating a `PrometheusRule`. You can see the `PrometheusRule` in the default `netobserv` namespace by running the following command: [source,terminal] ---- @@ -64,7 +64,7 @@ Together, the complete expression for the `PrometheusRule` looks like the follow The Network Observability Operator uses components from other {product-title} features, such as the monitoring stack, to enhance visibility into network traffic. For more information, see: "Monitoring stack architecture". -Some metadata must be configured for the alert definitions. This metadata is used by Prometheus and the `Alertmanager` service from the monitoring stack, or by the *Network Health* dashboard. +Some metadata must be configured for the rule definitions. This metadata is used by Prometheus and the `Alertmanager` service from the monitoring stack, or by the *Network Health* dashboard. The following example shows an `AlertingRule` resource with the configured metadata: @@ -128,6 +128,14 @@ The `netobserv_io_network_health` annotation is a JSON string consisting of the | List of strings | One or more labels that hold node names. When provided, the alert appears under the *Nodes* tab. +|`workloadLabels`: +| List of strings +| One or more labels that hold owner/workload names. When provided alongside with `kindLabels`, the alert will show up under the "Owners" tab. + +|`kindLabels`: +| List of strings +| One or more labels that hold owner/workload kinds. When provided alongside with `workloadLabels`, the alert will show up under the "Owners" tab. + | `threshold` | String | The alert threshold, expected to match the threshold defined in the `PromQL` expression. @@ -144,9 +152,25 @@ The `netobserv_io_network_health` annotation is a JSON string consisting of the | List of objects | A list of links to display contextually with the alert. Each link requires a `name` (display name) and `url`. -| `trafficLinkFilter` +| `trafficLink` | String -| An additional filter to inject into the URL for the *Network Traffic* page. +| Information related to the link to the *Network Traffic* page, for URL building. Some filters will be set automatically, such as the `node` or `namespace` filter. +|=== + +.`trafficLink` fields +[cols="1,3",options="header"] +|=== +| Field +| Description + +| `extraFilter` +| Additional filter to inject (for example, a DNS response code for DNS-related alerts). + +| `backAndForth` +| Whether the filter should include return traffic (`true` or `false`). + +| `filterDestination` +| Whether the filter should target the destination of the traffic instead of the source (`true` or `false`). |=== The `namespaceLabels` and `nodeLabels` are mutually exclusive. If neither is provided, the alert appears under the *Global* tab. \ No newline at end of file diff --git a/modules/network-observability-health-rules-recording-rules-performance-optimization.adoc b/modules/network-observability-health-rules-recording-rules-performance-optimization.adoc new file mode 100644 index 000000000000..53899058db58 --- /dev/null +++ b/modules/network-observability-health-rules-recording-rules-performance-optimization.adoc @@ -0,0 +1,47 @@ +// Module included in the following assemblies: +// +// * network_observability/network-observability-health-rules.adoc + +:_mod-docs-content-type: CONCEPT +[id="network-observability-health-rules-recording-rules-performance-optimization_{context}"] += Performance optimization with recording rules + +[role="_abstract"] +For large-scale clusters, recording rules optimize how Prometheus handles network data. Recording rules improve dashboard responsiveness and reduce the computational overhead of complex queries. + +[id="network-observability-recording-rules-benefits_{context}"] +== Optimization benefits +Recording rules pre-compute complex Prometheus Query Language (PromQL) expressions and save the results as new time series. Unlike alerting rules, recording rules do not monitor thresholds. + +Using recording rules provides the following advantages: + +Improved performance:: Pre-computing Prometheus queries allows dashboards to load faster by avoiding on-demand calculations for long-term trends. +Resource efficiency:: Calculating data at fixed intervals reduces CPU load on the Prometheus server compared to recalculating data on every dashboard refresh. +Simplified queries:: Using short metric names, such as `cluster:network_traffic:rate_5m`, simplifies complex aggregate calculations in custom dashboards. + +[id="network-observability-alert-vs-recording-comparison_{context}"] +== Comparison of rule modes +The following table compares rule modes based on the expected outcome: + +[cols="1,2,2",options="header"] +|=== +| Description +| Alerting rules +| Recording rules + +| Goal +| Issue notification. +| Save history of high level metrics. + +| Data result +| Generates an alerting state. +| Creates a persistent metric. + +| Visibility +| *Alerting* UI and *Network Health* view. +| *Metrics Explorer* and *Network Health* view. + +| Notifications +| Triggers `AlertManager` notifications. +| Does not trigger notifications. +|=== \ No newline at end of file diff --git a/modules/network-observability-operator-release-notes-1-11-advisory.adoc b/modules/network-observability-operator-release-notes-1-11-advisory.adoc new file mode 100644 index 000000000000..9a27de1c52fc --- /dev/null +++ b/modules/network-observability-operator-release-notes-1-11-advisory.adoc @@ -0,0 +1,11 @@ +// Module included in the following assemblies: +// * network_observability/network-observability-release-notes.adoc + +:_mod-docs-content-type: REFERENCE +[id="network-observability-operator-release-notes_{context}"] += Network Observability Operator 1.11 advisory + +[role="_abstract"] +You can review the advisory for Network Observability Operator 1.11 release. + +* link:https://access.redhat.com/errata/RHSA-2026:2900[RHSA-2026:2900 Network Observability Operator 1.11] \ No newline at end of file diff --git a/modules/network-observability-operator-release-notes-1-11-fixed-issues.adoc b/modules/network-observability-operator-release-notes-1-11-fixed-issues.adoc new file mode 100644 index 000000000000..ad4f18705eb3 --- /dev/null +++ b/modules/network-observability-operator-release-notes-1-11-fixed-issues.adoc @@ -0,0 +1,80 @@ +// Module included in the following assemblies: +// * network_observability/network-observability-release-notes.adoc + +:_mod-docs-content-type: REFERENCE +[id="network-observability-operator-release-notes-1-11-fixed-issues_{context}"] += Network Observability Operator 1.11 fixed issues + +[role="_abstract"] +The Network Observability Operator 1.11 release contains several fixed issues that improve performance and the user experience. + +Missing dates in charts:: +Before this update, the chart tooltip date was not displayed as intended, due to a breaking change in a dependency. As a consequence, users experienced missing date information in the {product-title} web console plugin's *Overview* tab chart, affecting data context. ++ +With this release, the chart tooltip date display is restored. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2518[NETOBSERV-2518] + +Warning message for Direct mode not refreshed after upscaling:: +Before this update, cluster information was not refreshed after scaling, causing a warning message to persist in large clusters, not updating with changes. ++ +With this release, cluster information is now refreshed when it changes, resulting in the warning message for large clusters in `Direct` mode updating with changes in cluster size, improving user visibility. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2494[NETOBSERV-2494] + +Unenriched OVN IPs:: +Before this update, some IPs declared by OVN-Kubernetes were not enriched, causing unenriched IPs like `100.64.0.x` to not appear in `Machines` network. As a consequence, IPs not enriched caused the wrong network visibility for users. ++ +With this release, missing IPs in OVN-Kubernetes are now enriched. As a result, IPs declared by OVN-Kubernetes are correctly enriched and appear in the `Machines` network improving the visibility of network traffic sources in the `Machines` network. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2484[NETOBSERV-2484] + +Improved operator API discovery reliability:: +Before this update, a race condition during Network Observability Operator startup could cause API discovery to fail silently. As a consequence, the operator could fail to recognize the {product-title} cluster, leading to missing mandatory `ClusterRoleBinding` resources and preventing components from functioning correctly. ++ +With this release, the Network Observability Operator continues to check for API availability over time and reconciliation is blocked if discovery fails. As a result, the operator correctly identifies the environment and ensures all required roles are created. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2574[NETOBSERV-2574] + +Added missing translation fields to IPFIX exports:: +Before this update, some network flow fields were missing translations during the IPFIX export process. As a result, exported IPFIX data was incomplete or difficult to interpret in external collectors. ++ +With this release, the missing translation fields (xlat) have been added to the `flowlogs-pipeline` IPFIX exporter. IPFIX exports now provide a complete set of translated fields for consistent network observability. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2553[NETOBSERV-2553] + +Fixed FlowMetric form creation link and defaults:: +Before this update, the link to create a `FlowMetric` custom resource incorrectly directed users to a YAML editor instead of the intended form view. Additionally, the editor was pre-filled with incorrect default values. ++ +With this release, the link correctly leads to the `FlowMetric` resource creation form with the expected default settings. As a result, users can now easily create `FlowMetric` resources through the user interface. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2520[NETOBSERV-2520] + +Virtual machine resource type icon in Topology view:: +Before this update, virtual machine (VM) owner types incorrectly displayed a generic question mark (?) icon in the *Topology* view. ++ +With this release, the user interface now includes a specific icon for VM resources. As a result, users can more easily identify and distinguish VM traffic within the network topology. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2487[NETOBSERV-2487] + +DNS optimization, update DNS Alerts:: +Before this update, many DNS "NXDOMAIN" errors were returned due to ambiguous URLs being used in network observability. ++ +With this release, these URLs have been disambiguated, resulting in a more optimal use of DNS. + +link:https://issues.redhat.com/browse/NETOBSERV-2485[NETOBSERV-2485] + + + + +//// +Follow this format: + +MetricName and Remap fields are validated:: ++ +Before this update, users could create a `FlowMetric` custom resource (CR) with an invalid metric name. Although the `FlowMetric` CR was successfully created, the underlying metric would fail silently without providing any error feedback to the user. ++ +With this release, the `FlowMetric`, `metricName`, and `remap` fields are now validated before creation, so users are immediately notified if they enter an invalid name. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2348[NETOBSERV-2348] +//// diff --git a/modules/network-observability-operator-release-notes-1-11-known-issues.adoc b/modules/network-observability-operator-release-notes-1-11-known-issues.adoc new file mode 100644 index 000000000000..d179e34a91d5 --- /dev/null +++ b/modules/network-observability-operator-release-notes-1-11-known-issues.adoc @@ -0,0 +1,24 @@ +// Module included in the following assemblies: +// * network_observability/network-observability-release-notes.adoc + +:_mod-docs-content-type: REFERENCE +[id="network-observability-operator-release-notes-1-11-known-issues_{context}"] += Network Observability Operator 1.11 known issues + +[role="_abstract"] +The following known issues affect the Network Observability Operator 1.11 release. + +Health rules do not trigger when the sampling rate increases because of `lowVolumeThreshold`:: +Network observability alerts might not trigger when an elevated sampling rate causes the volume to fall below the `lowVolumeThreshold` filter. This results in fewer alerts being evaluated or displayed. ++ +To work around this problem, adjust the `lowVolumeThreshold` value to align with the sampling rate to ensure consistent alert evaluation. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2613[NETOBSERV-2613] + + +DNS metrics unavailable when Loki is disabled:: +When the `DNSTracking` feature is enabled in a "Loki-less" installation, the required metrics for DNS graphs are unavailable. As a consequence, you cannot view DNS latency and response codes in the dashboard. ++ +To work around this problem, you must either disable the `DNSTracking` option or enable Loki in the `FlowCollector` resource by setting `spec.loki.enable` to true. ++ +link:https://issues.redhat.com/browse/NETOBSERV-2621[NETOBSERV-2621] \ No newline at end of file diff --git a/modules/network-observability-operator-release-notes-1-11-new-features-enhancements.adoc b/modules/network-observability-operator-release-notes-1-11-new-features-enhancements.adoc new file mode 100644 index 000000000000..8251a1ab4f08 --- /dev/null +++ b/modules/network-observability-operator-release-notes-1-11-new-features-enhancements.adoc @@ -0,0 +1,82 @@ +// Module included in the following assemblies: +// * network_observability/network-observability-release-notes.adoc + +:_mod-docs-content-type: REFERENCE +[id="network-observability-operator-release-notes-1-11-new-features-enhancements_{context}"] += Network Observability Operator 1.11 new features and enhancements + +[role="_abstract"] +Learn about the new features and enhancements in the Network Observability Operator 1.11 release, including hierarchical governance with the `FlowCollectorSlice` resource, a new Service deployment model, and the general availability of health rules. + +Per-tenant hierarchical governance with the FlowCollectorSlice resource:: +This release introduces the `FlowCollectorSlice` API to support hierarchical governance, allowing project administrators to independently manage sampling and subnet labeling for their specific namespaces. ++ +This feature was implemented to reduce global processing overhead and provide tenant autonomy in large-scale environments where individual teams require self-service visibility without cluster-wide configuration changes. As a result, organizations can selectively collect traffic and delegate data enrichment tasks to the project level while maintaining centralized cluster control. + +New Service deployment model for the `FlowCollector` resource:: +This release introduces a new `Service` deployment model in the `FlowCollector` custom resource. This model provides an intermediate option between the `Direct` and `Kafka` models. In the `Service` model, the eBPF agent is deployed as a `daemon` set, and the `flowlogs-pipeline` component is deployed as a scalable service. ++ +This model offers improved performance in large clusters by reducing cache duplication across component instances. + +Health rules are generally available:: +The health alerts feature, introduced in previous versions as a Technology Preview feature, is fully supported as health rules in the Network Observability Operator 1.11 release. ++ +[IMPORTANT] +==== +Network Observability health rules are available on {product-title} 4.16 and later. +==== ++ +This eBPF-based system correlates network metrics with infrastructure metadata to provide proactive notifications and automated insights into cluster health, such as traffic surges or latency trends. As a result, you can use the *Network Health* dashboard in the {product-title} web console to manage categorized alerts, customize thresholds, and create recording rules for improved visualization performance. + +Enhanced network traffic visualization and filtering:: +This release introduces enhanced visualization and filtering tools in the {product-title} web console. + +* Inline filter editing: You can now edit filter chips directly within the filter input field. This enhancement provides a more efficient method for modifying long filter values that were previously truncated, eliminating the need to manually copy and paste values. This update adopts an inline editing convention consistent with the Saved filters feature. +* External traffic quick filters: New quick filters allow you to monitor external ingress and egress traffic actively. This enhancement streamlines network management, enabling you to identify and address issues related to external network communication quickly. +* Intuitive resource iconography: The {product-title} console now uses specific icons for Kubernetes kinds, groups, and filters. These icons provide a more intuitive and visually consistent experience, making it easier to navigate the network topology and identify applied filters at a glance. + +DNS resolution analysis:: +This release includes eBPF-based DNS tracking to enrich network flow records with domain names. ++ +This feature was implemented to reduce the mean time to identify (MTTI) by allowing administrators to immediately distinguish between network routing failures and service discovery issues, such as `NXDOMAIN` errors. + +Integration with Gateway API:: +This release introduces automatic integration between the Network Observability Operator and the Gateway API when a `GatewayClass` resource is created. This feature provides high-level traffic attribution for cluster ingress and egress traffic without requiring manual configuration of the `FlowCollector` resource. ++ +[IMPORTANT] +==== +Integration with Gateway API is available on {product-title} 4.19 and later. +==== ++ +You can verify the automated mapping of network flows to Gateway API resources in the *Observe* -> *Network Traffic* view of the {product-title} web console. The *Owner* column displays the Gateway name, providing a direct link to the associated Gateway resource page. + +Improved data resilience in the Overview and Topology views:: +With this release, functional data remains visible in the *Overview* and *Topology* views even if some background queries fail. This enhancement ensures that the scope and group drop-down menus in the Topology view remain accessible during partial service disruptions. ++ +Additionally, the *Overview* page now displays active error messages to assist with troubleshooting, providing better visibility into system health without interrupting the monitoring workflow. + +Improved categorization of unknown network flows:: +With this release, network flows from unknown sources are categorized into four distinct groups: external, unknown service, unknown node, and unknown pod. ++ +This enhancement uses subnet labels to separate unknown IP subnets, providing a clearer network topology. This improved visibility helps to identify potential security threats and allows for a more targeted analysis of unknown elements within the cluster. + +Improved performance for new Network Observability installations:: +The default performance of the Network Observability Operator is improved for new installations. The default value for `cacheActiveTimeout` is increased from 5 to 15 seconds, and the `cacheMaxFlows` value is increased from 100,000 to 120,000 to accommodate higher flow volumes. ++ +[IMPORTANT] +==== +These new default values apply only to new installations; existing installations retain their current configurations. +==== ++ +These changes reduce CPU load by up to 40%. + +Improved LokiStack status monitoring and reporting:: +With this release, the Network Observability Operator monitors the status of the `LokiStack` resource and reports errors or configuration issues. The Network Observability Operator verifies `LokiStack` conditions, including pending or failed pods and specific warning conditions. ++ +This enhancement provides more actionable information in the `FlowCollector` status, allowing for more effective troubleshooting of the `LokiStack` component within network observability. + +Visual indicators for Loki indexed fields in the filter menu:: +With this release, functional data remains visible in the *Overview* and *Topology* views even if some background queries fail. This enhancement ensures that the scope and group drop-down menus in the Topology view remain accessible during partial service disruptions. ++ +This enhancement improves query performance by indicating which fields are indexed for faster data retrieval. Using indexed fields when filtering data reduces the time required to browse and analyze network flows within the console. + diff --git a/modules/network-observability-operator-release-notes-1-4-0-new-features-and-enhancements.adoc b/modules/network-observability-operator-release-notes-1-4-0-new-features-and-enhancements.adoc index e1a39a78d401..1ebc2d091331 100644 --- a/modules/network-observability-operator-release-notes-1-4-0-new-features-and-enhancements.adoc +++ b/modules/network-observability-operator-release-notes-1-4-0-new-features-and-enhancements.adoc @@ -37,7 +37,7 @@ For more information, see: For more information, see: * xref:../../../observability/network_observability/configuring-operator.adoc#network-observability-flowcollector-view_network_observability[Flow Collector sample resource] -* xref:../../../observability/network_observability/flowcollector-api.adoc#network-observability-flowcollector-api-specifications_network_observability[Flow Collector API Reference] +* xref:../../../observability/network_observability/flowcollector-api.adoc#network-observability-flowcollector-api-specifications_network_observability[FlowCollector API reference] [id="network-observability-without-loki-1.4_{context}"] diff --git a/modules/network-observability-per-tenant-flowcollector-slice-api-reference.adoc b/modules/network-observability-per-tenant-flowcollector-slice-api-reference.adoc new file mode 100644 index 000000000000..cbc48277b5b1 --- /dev/null +++ b/modules/network-observability-per-tenant-flowcollector-slice-api-reference.adoc @@ -0,0 +1,138 @@ + +//Module included in the following assemblies: +// +// network_observability/network-observability-per-tenant-model.adoc + +// Automatically generated by 'openshift-apidocs-gen'. Do not edit. +:_mod-docs-content-type: REFERENCE +[id="flowcollectorslice-flows-netobserv-io-v1alpha1_{context}"] += FlowCollectorSlice [flows.netobserv.io/v1alpha1] + + + +Description:: ++ +-- +FlowCollectorSlice is the API allowing to decentralize some of the FlowCollector configuration per namespace tenant. +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `apiVersion` +| `string` +| APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and might reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + +| `kind` +| `string` +| Kind is a string value representing the REST resource this object represents. Servers might infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + +| `metadata` +| `object` +| Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata + +| `spec` +| `object` +| FlowCollectorSliceSpec defines the desired state of FlowCollectorSlice + +|=== +== .metadata +Description:: ++ +-- +Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata +-- + +Type:: + `object` + + + + +== .spec +Description:: ++ +-- +FlowCollectorSliceSpec defines the desired state of FlowCollectorSlice +-- + +Type:: + `object` + + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `sampling` +| `integer` +| `sampling` is an optional sampling interval to apply to this slice. For example, a value of `50` means that 1 matching flow in 50 is sampled. + +| `subnetLabels` +| `array` +| `subnetLabels` allows you to customize subnets and IPs labeling, such as to identify cluster external workloads or web services. +External subnets must be labeled with the prefix `EXT:`, or not labeled at all, in order to work with default quick filters and some metrics examples provided. + + +Beware that the subnet labels configured in FlowCollectorSlice are not limited to the flows of the related namespace: any flow +in the whole cluster can be labeled using this configuration. However, subnet labels defined in the cluster-scoped FlowCollector take +precedence in case of conflicting rules. + +|=== +== .spec.subnetLabels +Description:: ++ +-- +`subnetLabels` allows you to customize subnets and IPs labeling, such as to identify cluster external workloads or web services. +External subnets must be labeled with the prefix `EXT:`, or not labeled at all, in order to work with default quick filters and some metrics examples provided. + + +Beware that the subnet labels configured in FlowCollectorSlice are not limited to the flows of the related namespace: any flow +in the whole cluster can be labeled using this configuration. However, subnet labels defined in the cluster-scoped FlowCollector take +precedence in case of conflicting rules. +-- + +Type:: + `array` + + + + +== .spec.subnetLabels[] +Description:: ++ +-- +SubnetLabel allows to label subnets and IPs, such as to identify cluster-external workloads or web services. +-- + +Type:: + `object` + +Required:: + - `cidrs` + - `name` + + + +[cols="1,1,1",options="header"] +|=== +| Property | Type | Description + +| `cidrs` +| `array (string)` +| List of CIDRs, such as `["1.2.3.4/32"]`. + +| `name` +| `string` +| Label name, used to flag matching flows. +External subnets must be labeled with the prefix `EXT:`, or not labeled at all, in order to work with default quick filters and some metrics examples provided. + + + +|=== \ No newline at end of file diff --git a/modules/network-observability-per-tenant-flowcollector-slice-configure-project-administrator.adoc b/modules/network-observability-per-tenant-flowcollector-slice-configure-project-administrator.adoc new file mode 100644 index 000000000000..2518c507839f --- /dev/null +++ b/modules/network-observability-per-tenant-flowcollector-slice-configure-project-administrator.adoc @@ -0,0 +1,46 @@ +//Module included in the following assemblies: +// +// network_observability/network-observability-per-tenant-model.adoc + +:_mod-docs-content-type: PROCEDURE +[id="network-observability-per-tenant-flowcollector-slice-configure-project-administrator_{context}"] += Configure the FlowCollectorSlice as a project administrator + +[role="_abstract"] +Project administrators can manage flow collection and data enrichment within their own namespaces by configuring a `FlowCollectorSlice` custom resource for decentralized network traffic analysis. + +.Prerequisites + +* The Network Observability Operator is installed. +* You have `project-admin` permissions for the namespace. + +.Procedure + +. Create a YAML file named `flowCollectorSlice.yaml`: ++ +[source,yaml] +---- +apiVersion: flows.netobserv.io/v1alpha1 +kind: FlowCollectorSlice +metadata: + name: flowcollectorslice-sample + namespace: my-app +spec: + sampling: 1 + subnetLabels: + - name: EXT:Database + cidrs: + - 192.168.50.0/24 +---- + +. Apply the configuration by running the following command: ++ +[source,terminal] +---- +$ oc apply -f flowCollectorSlice.yaml +---- + +.Verification + +. In the {product-title} console, navigate to *Observe* -> *Network Traffic*. +. Ensure flows to `192.168.50.0/24` subnet are observed with the `EXT:Database` label. \ No newline at end of file diff --git a/modules/network-observability-per-tenant-flowcollector-slice-disable.adoc b/modules/network-observability-per-tenant-flowcollector-slice-disable.adoc new file mode 100644 index 000000000000..c9ab00dfdce8 --- /dev/null +++ b/modules/network-observability-per-tenant-flowcollector-slice-disable.adoc @@ -0,0 +1,39 @@ +//Module included in the following assemblies: +// +// network_observability/network-observability-per-tenant-model.adoc + +:_mod-docs-content-type: PROCEDURE +[id="network-observability-per-tenant-flowcollector-slice-disable_{context}"] += Disable the Network Observability Operator FlowCollectorSlice + +[role="_abstract"] +Disable slice-based filtering in the Network Observability Operator to resume global flow collection while preserving existing `FlowCollectorSlice` resources. + +.Procedure + +. Edit the `FlowCollector` resource by running the following command: ++ +[source,terminal] +---- +$ oc edit flowcollector cluster +---- + +. Set the `spec.processor.slicesConfig.collectionMode` field to `AlwaysCollect`: ++ +[source,yaml] +---- +apiVersion: flows.netobserv.io/v1beta2 +kind: FlowCollector +metadata: + name: cluster +spec: + processor: + slicesConfig: + enable: true + collectionMode: AlwaysCollect + ... +---- + +. Save the changes. ++ +Flow collection resumes for all traffic, and existing `FlowCollectorSlice` resources remain available for future use. \ No newline at end of file diff --git a/modules/network-observability-per-tenant-flowcollector-slice-enable.adoc b/modules/network-observability-per-tenant-flowcollector-slice-enable.adoc new file mode 100644 index 000000000000..8cdf3c1250ea --- /dev/null +++ b/modules/network-observability-per-tenant-flowcollector-slice-enable.adoc @@ -0,0 +1,59 @@ +//Module included in the following assemblies: +// +// network_observability/network-observability-per-tenant-model.adoc + +:_mod-docs-content-type: PROCEDURE +[id="network-observability-per-tenant-flowcollector-slice-enable_{context}"] += Enable the Network Observability Operator FlowCollectorSlice + +[role="_abstract"] +Enabling the `FlowCollectorSlice` feature in the `FlowCollector` resource allows cluster administrators to delegate flow collection and data enrichment management to specific namespaces. + +Before project administrators can manage their own settings, a cluster administrator must enable the `FlowCollector` custom resource to watch for the `FlowCollectorSlice` custom resource. + +.Prerequisites + +* The Network Observability Operator is installed. +* A `FlowCollector` custom resource exists in the cluster. +* You have `cluster-admin` privileges. + +.Procedure +. Edit the `FlowCollector` custom resource by running the following command: ++ +[source,terminal] +---- +$ oc edit flowcollector cluster +---- + +. Configure the `spec.processor.slicesConfig` field to define which namespaces are permitted to use slices: ++ +[source,yaml] +---- +apiVersion: flows.netobserv.io/v1beta2 +kind: FlowCollector +metadata: + name: cluster +spec: + processor: + slicesConfig: + enable: true + collectionMode: AllowList + namespacesAllowList: + - /openshift-.*|netobserv.*/ +---- ++ +where: + +`spec.processor.sliceConfig.enable`:: Specifies if the `FlowCollectorSlice` feature is enabled. If not, all resources of kind `FlowCollectorSlice` are ignored. +`spec.processor.sliceConfig.collectionMode`:: Specifies how the `FlowCollectorSlice` custom resources impacts the flow collection process. When set to `AlwaysCollect`, all flows are collected regardless of the presence of `FlowCollectorSlice`. When set to `AllowList`, only the flows related to namespaces where a `FlowCollectorSlice` resource is present, or configured via the global `namespacesAllowList`, are collected. +`spec.processor.sliceConfig.namespacesAllowList`:: Specifies a list of namespaces for which flows are always collected, regardless of the presence of `FlowCollectorSlice` in those namespaces. ++ +[NOTE] +==== +The `namespacesAllowList` field supports regular expressions, such as `/openshift-.*/` to capture multiple namespaces, or strict equality, such as `netobserv`, to match a specific namespace. +==== + +. Save the changes and exit the editor. + +.Verification +* Verify that only network flows from the `netobserv` namespace and namespaces starting with `openshift-` are displayed in the *Network Traffic* page of the web console. \ No newline at end of file diff --git a/modules/network-observability-per-tenant-flowcollector-slice-granular-flow-collection.adoc b/modules/network-observability-per-tenant-flowcollector-slice-granular-flow-collection.adoc new file mode 100644 index 000000000000..b91c155da32b --- /dev/null +++ b/modules/network-observability-per-tenant-flowcollector-slice-granular-flow-collection.adoc @@ -0,0 +1,59 @@ +//Module included in the following assemblies: +// +// network_observability/network-observability-per-tenant-model.adoc + +:_mod-docs-content-type: CONCEPT +[id="network-observability-per-tenant-flowcollector-slice-granular-flow-collection_{context}"] += FlowCollectorSlice resource for granular flow collection + +[role="_abstract"] +The `FlowCollectorSlice` is a custom resource definition (CRD) that enables granular, multi-tenant network flow collection. By defining logical slices based on namespaces or subnets, you can selectively collect traffic and apply custom sampling to specific workloads rather than the entire cluster. + +It complements the existing `FlowCollector` custom resource by enabling granular, selective, and multi-tenant-aware flow collection, instead of a single global configuration that applies uniformly to all traffic. + +When slice-based collection is enabled, only traffic that matches at least one `FlowCollectorSlice` is collected, allowing administrators to precisely control which network flows are observed. + +[id="benefits-of-flowcollector-slice_{context}"] +== Benefits of FlowCollectorSlice + +By default, network flow collection applies uniformly to all traffic in the cluster. This can result in excessive data volume and limited flexibility. + +Using `FlowCollectorSlice` provides the following benefits: + +* Enables selective flow collection for specific namespaces or workloads. +* Supports multi-tenant and environment-based observability. +* Reduces storage and processing costs by filtering irrelevant traffic. +* Preserves backward compatibility through opt-in configuration. + +[id="relationship-flowcollector-flowcollector-slice_{context}"] +== Relationship between FlowCollector and FlowCollectorSlice + +While the `FlowCollector` resource defines global flow collection behavior for the cluster, the `FlowCollectorSlice` resource defines which traffic is eligible for collection when slice-based filtering is enabled. + +The `FlowCollector.spec.slicesConfig` field controls how slice definitions are applied. + +[id="collection-modes_{context}"] +== Collection modes + +Slice behavior is governed by the `FlowCollector.spec.slicesConfig.collectionMode` field. Set the field to one of the following collection modes: + +AlwaysCollect:: +* Collects network flows from all cluster namespaces. +* Applies the subnet and sampling configurations defined in `FlowCollectorSlice` resources. +* Ignores the namespace selection logic in `FlowCollectorSlice` resources. +* Maintains the default collection behavior for backward compatibility. + +AllowList:: +* Collects only traffic that matches at least one `FlowCollectorSlice` resource. +* An optional namespace allow list includes selected namespaces in the collection. + +[id="flowcollector-slice-status_{context}"] +== FlowCollectorSlice status + +Each `FlowCollectorSlice` resource exposes a `status` subresource that reports: + +* Validation results. +* Reconciliation state. +* Whether the slice is successfully applied. + +This status allows administrators to verify that slice definitions are active and functioning as expected. \ No newline at end of file diff --git a/modules/network-observability-per-tenant-hierarchical-governance-and-tenant-autonomy.adoc b/modules/network-observability-per-tenant-hierarchical-governance-and-tenant-autonomy.adoc new file mode 100644 index 000000000000..741d025fa53a --- /dev/null +++ b/modules/network-observability-per-tenant-hierarchical-governance-and-tenant-autonomy.adoc @@ -0,0 +1,17 @@ +//Module included in the following assemblies: +// +// network_observability/network-observability-per-tenant-model.adoc + +:_mod-docs-content-type: CONCEPT +[id="network-observability-per-tenant-hierarchical-governance-and-tenant-autonomy_{context}"] += Per-tenant hierarchical governance and tenant autonomy + +[role="_abstract"] +Cluster administrators can maintain global governance while allowing project administrators to manage network traffic observability within their specific namespaces. + +The Network Observability Operator uses a hierarchical configuration model to support multitenancy. This architecture is beneficial for large-scale deployments and {hcp} environments where individual teams require self-service visibility without cluster administrator intervention. + +The hierarchical model consists of the following components: + +Global governance:: The cluster administrator manages the global `FlowCollector` resource. This resource defines the observability infrastructure and determines if per-tenant configuration is permitted. +Tenant autonomy:: The project administrator manages the `FlowCollectorSlice` resource. This namespace-scoped custom resource (CR) allows teams to define specific observability settings for their workloads. \ No newline at end of file diff --git a/modules/network-observability-resources-table.adoc b/modules/network-observability-resources-table.adoc index 7f072faf8f5f..692ae0a1ddc7 100644 --- a/modules/network-observability-resources-table.adoc +++ b/modules/network-observability-resources-table.adoc @@ -6,32 +6,64 @@ = Resource considerations [role="_abstract"] -Review the resource considerations table, which provides baseline examples for configuration settings, such as eBPF memory limits and LokiStack size, tailored to various cluster workload sizes. +The Network Observability Operator configuration can be adjusted based on the cluster workload size. Use the following baseline examples to determine the appropriate resource limits and configuration settings for the environment. -The following table outlines examples of resource considerations for clusters with certain workload sizes. - -[IMPORTANT] -==== The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs. -==== -.Resource recommendations -[options="header"] -|=== -| | Extra small (10 nodes) | Small (25 nodes) | Large (250 nodes) ^[2]^ -| *Worker Node vCPU and memory* | 4 vCPUs\| 16GiB mem ^[1]^ | 16 vCPUs\| 64GiB mem ^[1]^ |16 vCPUs\| 64GiB Mem ^[1]^ -| *LokiStack size* | `1x.extra-small` | `1x.small` | `1x.medium` -| *Network Observability controller memory limit* | 400Mi (default) | 400Mi (default) | 400Mi (default) -| *eBPF sampling interval* | 50 (default) | 50 (default) | 50 (default) -| *eBPF memory limit* | 800Mi (default) | 800Mi (default) | 1600Mi -| *cacheMaxSize* | 50,000 | 100,000 (default) | 100,000 (default) -| *FLP memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) -| *FLP Kafka partitions* | – | 48 | 48 -| *Kafka consumer replicas* | – | 6 | 18 -| *Kafka brokers* | – | 3 (default) | 3 (default) +The test beds used for these recommendations are: + +* Extra small: 10-node cluster, 4 vCPUs and 16 GiB memory per worker, `LokiStack` size `1x.extra-small`, tested on AWS M6i instances. +* Small: 25-node cluster, 16 vCPUs and 64 GiB memory per worker, `LokiStack` size `1x.small`, tested on AWS M6i instances. +* Large: 250-node cluster, 16 vCPUs and 64 GiB memory per worker, `LokiStack` size `1x.medium`, tested on AWS M6i instances. In addition to the worker and controller nodes, three infrastructure nodes (size `M6i.12xlarge`) and one workload node (size `M6i.8xlarge`) were tested. + +[id="network-observability-resource-recommendations-table_{context}"] +.Resource recommendations for cluster sizes +[cols="2h,1,1,1",options="header"] |=== -[.small] --- -1. Tested with AWS M6i instances. -2. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested. --- \ No newline at end of file +| Criterion | Extra small (10 nodes) | Small (25 nodes) | Large (250 nodes) + +| Operator memory limit: `Subscription` `spec.config.resources` +| `400Mi` (default) +| `400Mi` (default) +| `400Mi` (default) + +| eBPF agent sampling interval: `FlowCollector` `spec.agent.ebpf.sampling` +| `50` (default) +| `50` (default) +| `50` (default) + +| eBPF agent memory limit: `FlowCollector` `spec.agent.ebpf.resources` +| `800Mi` (default) +| `800Mi` (default) +| `1600Mi` + +| eBPF agent cache size: `FlowCollector` `spec.agent.ebpf.cacheMaxSize` +| `50,000` +| `120,000` (default) +| `120,000` (default) + +| Processor memory limit: `FlowCollector` `spec.processor.resources` +| `800Mi` (default) +| `800Mi` (default) +| `800Mi` (default) + +| Processor replicas: `FlowCollector` `spec.processor.consumerReplicas` +| `3` (default) +| `6` +| `18` + +| Deployment model: `FlowCollector` `spec.deploymentModel` +| `Service` (default) +| `Kafka` +| `Kafka` + +| Kafka partitions: Kafka installation +| N/A +| `48` +| `48` + +| Kafka brokers: Kafka installation +| N/A +| `3` (default) +| `3` (default) +|=== \ No newline at end of file diff --git a/observability/network_observability/network-observability-alerts.adoc b/observability/network_observability/network-observability-alerts.adoc deleted file mode 100644 index 6939fd0abf30..000000000000 --- a/observability/network_observability/network-observability-alerts.adoc +++ /dev/null @@ -1,38 +0,0 @@ - -:_mod-docs-content-type: ASSEMBLY -[id="network-observability-alerts_{context}"] -= Network observability alerts -:context: network-observability-alerts -:toc: -include::_attributes/common-attributes.adoc[] - -toc::[] - -[role="_abstract"] -The Network Observability Operator provides alerts using built-in metrics and the {product-title} monitoring stack to quickly indicate your cluster's network health. - -:FeatureName: Network observability alerts -include::snippets/technology-preview.adoc[] - -include::modules/network-observability-alerts-about.adoc[leveloffset=+1] - -include::modules/network-observability-enabling-alerts.adoc[leveloffset=+1] - -include::modules/network-observability-configuring-predefined-alerts.adoc[leveloffset=+2] - -include::modules/network-observability-alerts-about-promql-expression.adoc[leveloffset=+2] - -include::modules/network-observability-creating-custom-alert-rules.adoc[leveloffset=+2] - -include::modules/network-observability-disabling-predefined-alerts.adoc[leveloffset=+2] - - -[role="_additional-resources"] -.Additional resources -* xref:../../observability/network_observability/network-observability-alerts.adoc#network-observability-default-alert-templates_network-observability-alerts[List of default alerts] -* xref:../../observability/network_observability/metrics-alerts-dashboards.adoc#network-observability-viewing-dashboards_metrics-dashboards-alerts[Viewing network observability metrics dashboards] -* xref:../../observability/network_observability/metrics-alerts-dashboards.adoc#network-observability-netobserv-dashboard-high-traffic-alert_metrics-dashboards-alerts[Creating alerts] -* link:https://docs.redhat.com/en/documentation/monitoring_stack_for_red_hat_openshift/4.20/html/about_monitoring/monitoring-stack-architecture[Monitoring stack architecture] - -//file name for Creating alerts: network-observability-includelist-example.adoc -//URL for Creating alerts: https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/network_observability/metrics-dashboards-alerts#network-observability-netobserv-dashboard-high-traffic-alert_metrics-dashboards-alerts diff --git a/observability/network_observability/network-observability-dns-resolution-analysis.adoc b/observability/network_observability/network-observability-dns-resolution-analysis.adoc new file mode 100644 index 000000000000..369886d2adad --- /dev/null +++ b/observability/network_observability/network-observability-dns-resolution-analysis.adoc @@ -0,0 +1,22 @@ +:_mod-docs-content-type: ASSEMBLY +[id="network-observability-dns-resolution-analysis_{context}"] += Network observability DNS resolution analysis +:context: network-observability-dns-decoding +:toc: +include::_attributes/common-attributes.adoc[] + +toc::[] + +[role="_abstract"] +Learn how DNS resolution analysis uses eBPF-based decoding to identify service discovery issues and follow the steps to enable DNS tracking in the FlowCollector resource to enrich network flow records with domain names. + +include::modules/network-observability-dns-resolution-analysis-strategic-benefits.adoc[leveloffset=+1] + +include::modules/network-observability-dns-resolution-analysis-configure.adoc[leveloffset=+1] + +include::modules/network-observability-dns-resolution-analysis-reference.adoc[leveloffset=+1] + +[role="_additional-resources"] +.Additional resources +* xref:../../observability/network_observability/json-flows-format-reference.adoc#network-observability-flows-format_json_reference[Network flows format reference] +* link:https://github.com/openshift/runbooks/tree/master/alerts/network-observability-operator[Network Observability Operator runbooks] \ No newline at end of file diff --git a/observability/network_observability/network-observability-health-rules.adoc b/observability/network_observability/network-observability-health-rules.adoc new file mode 100644 index 000000000000..c29d28eec8ed --- /dev/null +++ b/observability/network_observability/network-observability-health-rules.adoc @@ -0,0 +1,43 @@ + +:_mod-docs-content-type: ASSEMBLY +[id="network-observability-health-rules_{context}"] += Network observability health rules +:context: network-observability-health-rules +:toc: +include::_attributes/common-attributes.adoc[] + +toc::[] + +[role="_abstract"] +The Network Observability Operator provides alerts by using built-in metrics and the {product-title} monitoring stack to report cluster network health. + +[IMPORTANT] +==== +Network observability health alerts require {product-title} 4.16 or later. +==== + +include::modules/network-observability-health-rules-and-performance.adoc[leveloffset=+1] + +include::modules/network-observability-health-rules-monitoring-and-alerting.adoc[leveloffset=+2] + +include::modules/network-observability-health-rules-recording-rules-performance-optimization.adoc[leveloffset=+1] + +include::modules/network-observability-health-rule-structure-customization.adoc[leveloffset=+1] + +include::modules/network-observability-health-rules-promql-expressions-metadata.adoc[leveloffset=+2] + +include::modules/network-observability-custom-health-rule-configuration.adoc[leveloffset=+2] + +include::modules/network-observability-disable-predefined-rules.adoc[leveloffset=+1] + + +[role="_additional-resources"] +.Additional resources +* xref:../../observability/network_observability/network-observability-health-rules.adoc#network-observability-default-rules_network-observability-health-rules[List of default rules] +* xref:../../observability/network_observability/metrics-alerts-dashboards.adoc#network-observability-viewing-dashboards_metrics-dashboards-alerts[Viewing network observability metrics dashboards] +* xref:../../observability/network_observability/metrics-alerts-dashboards.adoc#network-observability-netobserv-dashboard-high-traffic-alert_metrics-dashboards-alerts[Creating alerts] +* link:https://docs.redhat.com/en/documentation/monitoring_stack_for_red_hat_openshift/4.21/html/about_monitoring/monitoring-stack-architecture[Monitoring stack architecture] +* link:https://github.com/openshift/runbooks/tree/master/alerts/network-observability-operator[Network Observability Operator runbooks] + +//file name for Creating alerts: network-observability-includelist-example.adoc +//URL for Creating alerts: https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/network_observability/metrics-dashboards-alerts#network-observability-netobserv-dashboard-high-traffic-alert_metrics-dashboards-alerts diff --git a/observability/network_observability/network-observability-operator-release-notes.adoc b/observability/network_observability/network-observability-operator-release-notes.adoc index a884e1261076..7d0f3ec9dd0d 100644 --- a/observability/network_observability/network-observability-operator-release-notes.adoc +++ b/observability/network_observability/network-observability-operator-release-notes.adoc @@ -13,20 +13,12 @@ These release notes track the development of the Network Observability Operator For an overview of the Network Observability Operator, see xref:../../observability/network_observability/network-observability-overview.adoc#network-observability-overview[About network observability]. -include::modules/network-observability-operator-release-notes-1-10-1.adoc[leveloffset=+1] +include::modules/network-observability-operator-release-notes-1-11-advisory.adoc[leveloffset=+1] -include::modules/network-observability-operator-release-notes-1-10-1-cves.adoc[leveloffset=+1] +include::modules/network-observability-operator-release-notes-1-11-new-features-enhancements.adoc[leveloffset=+1] -include::modules/network-observability-operator-release-notes-1-10-1-fixed-issues.adoc[leveloffset=+1] +include::modules/network-observability-operator-release-notes-1-11-known-issues.adoc[leveloffset=+1] -include::modules/network-observability-operator-release-notes-1-10-advisory.adoc[leveloffset=+1] +include::modules/network-observability-operator-release-notes-1-11-fixed-issues.adoc[leveloffset=+1] -include::modules/network-observability-operator-release-notes-1-10-new-features-enhancements.adoc[leveloffset=+1] - -include::modules/network-observability-operator-release-notes-1-10-technology-preview-features.adoc[leveloffset=+1] - -include::modules/network-observability-operator-release-notes-1-10-removed-features.adoc[leveloffset=+1] - -include::modules/network-observability-operator-release-notes-1-10-known-issues.adoc[leveloffset=+1] - -include::modules/network-observability-operator-release-notes-1-10-fixed-issues.adoc[leveloffset=+1] \ No newline at end of file +//Purposely missing [role="_abstract"] as it used to ensure Vale is running correctly. \ No newline at end of file diff --git a/observability/network_observability/network-observability-per-tenant-model.adoc b/observability/network_observability/network-observability-per-tenant-model.adoc new file mode 100644 index 000000000000..0214024cc177 --- /dev/null +++ b/observability/network_observability/network-observability-per-tenant-model.adoc @@ -0,0 +1,42 @@ +:_mod-docs-content-type: ASSEMBLY +[id="network-observability-per-tenant-model_{context}"] += Network observability per-tenant model +:context: network-observability-per-tenant-configuration +include::_attributes/common-attributes.adoc[] + +toc::[] + +[role="_abstract"] +Use the `FlowCollectorSlice` resource to delegate network traffic analysis management to project administrators while maintaining global cluster governance. + +include::modules/network-observability-per-tenant-hierarchical-governance-and-tenant-autonomy.adoc[leveloffset=+1] + +include::modules/network-observability-per-tenant-flowcollector-slice-granular-flow-collection.adoc[leveloffset=+1] + +include::modules/network-observability-per-tenant-flowcollector-slice-enable.adoc[leveloffset=+1] + +include::modules/network-observability-per-tenant-flowcollector-slice-disable.adoc[leveloffset=+2] + +include::modules/network-observability-per-tenant-flowcollector-slice-configure-project-administrator.adoc[leveloffset=+1] + +include::modules/network-observability-per-tenant-flowcollector-slice-api-reference.adoc[leveloffset=+1] + +[id="additional-resources-per-tenand-configuration_{context}"] +[role="_additional-resources"] +== Additional resources + +* xref:../../observability/network_observability/flowcollector-api.adoc#network-observability-flowcollector-api-specifications_network_observability[FlowCollector API reference] + +//// +* drafty draft to get words on the page and an overall sense of new feature. +* may not stay own assembly; may need to be slotted somewhere. but it also might make sense to keep it as its own assembly so as not to make the observing-network-traffic.adoc assembly even larger. +* worth considering collecting FlowCollector info and putting it all together, except for the API since that is auto-generated. +* should enable, configure, disable be leveloffset=+2? +* fix titles and URLs once content is structured/organized +//// + + + + + + diff --git a/observability/network_observability/observing-network-traffic.adoc b/observability/network_observability/observing-network-traffic.adoc index 549199aa4abf..cb99e20f91b5 100644 --- a/observability/network_observability/observing-network-traffic.adoc +++ b/observability/network_observability/observing-network-traffic.adoc @@ -25,6 +25,8 @@ include::modules/network-observability-pktdrop-overview.adoc[leveloffset=+2] include::modules/network-observability-dns-overview.adoc[leveloffset=+2] +//02-12-2026: adding comment here for JTBD for DNS. May make sense to revisit dns-overview for DNS resolution analysis. Might all be a JTBD. modules/network-observability-dns-tracking.adoc part of DNS resolution analysis now. Addressing this is outside scope of no-1.11 release but came up so making a note. This comment will be removed with JTBD. + [role="_additional-resources"] .Additional resources * xref:../../observability/network_observability/observing-network-traffic.adoc#network-observability-dns-tracking_nw-observe-network-traffic[Working with DNS tracking] @@ -80,6 +82,7 @@ include::modules/network-observability-working-with-conversations.adoc[leveloffs include::modules/network-observability-packet-drops.adoc[leveloffset=+2] include::modules/network-observability-dns-tracking.adoc[leveloffset=+2] +//02-12-2026 adding comment as dns-tracking must be addressed as part of JTBD post no-1.11 release. Commenting out include breaks a number of xrefs, and addressing the IA to fix that is outside the scope of no-1.11. include::modules/network-observability-RTT.adoc[leveloffset=+2] diff --git a/observability/network_observability/release_notes_archive/network-observability-operator-release-notes-archive.adoc b/observability/network_observability/release_notes_archive/network-observability-operator-release-notes-archive.adoc index 4b45856cd00d..c944abaade9a 100644 --- a/observability/network_observability/release_notes_archive/network-observability-operator-release-notes-archive.adoc +++ b/observability/network_observability/release_notes_archive/network-observability-operator-release-notes-archive.adoc @@ -13,6 +13,24 @@ These release notes track past developments of the Network Observability Operato The Network Observability Operator enables administrators to observe and analyze network traffic flows for {product-title} clusters. +include::modules/network-observability-operator-release-notes-1-10-1.adoc[leveloffset=+1] + +include::modules/network-observability-operator-release-notes-1-10-1-cves.adoc[leveloffset=+1] + +include::modules/network-observability-operator-release-notes-1-10-1-fixed-issues.adoc[leveloffset=+1] + +include::modules/network-observability-operator-release-notes-1-10-advisory.adoc[leveloffset=+1] + +include::modules/network-observability-operator-release-notes-1-10-new-features-enhancements.adoc[leveloffset=+1] + +include::modules/network-observability-operator-release-notes-1-10-technology-preview-features.adoc[leveloffset=+1] + +include::modules/network-observability-operator-release-notes-1-10-removed-features.adoc[leveloffset=+1] + +include::modules/network-observability-operator-release-notes-1-10-known-issues.adoc[leveloffset=+1] + +include::modules/network-observability-operator-release-notes-1-10-fixed-issues.adoc[leveloffset=+1] + include::modules/network-observability-operator-release-notes-1-9-3-advisory.adoc[leveloffset=+1] include::modules/network-observability-operator-release-notes-1-9-2-advisory.adoc[leveloffset=+1]