Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions _topic_maps/_topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3260,12 +3260,16 @@ Topics:
File: understanding-network-observability-operator
- Name: Configuring the Network Observability Operator
File: configuring-operator
- Name: Network observability per-tenant model
File: network-observability-per-tenant-model
- Name: Network Policy
File: network-observability-network-policy
- Name: Network observability DNS resolution analysis
File: network-observability-dns-resolution-analysis
- Name: Observing the network traffic
File: observing-network-traffic
- Name: Network observability alerts
File: network-observability-alerts
- Name: Network observability health rules
File: network-observability-health-rules
- Name: Using metrics with dashboards and alerts
File: metrics-alerts-dashboards
- Name: Monitoring the Network Observability Operator
Expand Down
60 changes: 0 additions & 60 deletions modules/network-observability-alerts-about.adoc

This file was deleted.

16 changes: 15 additions & 1 deletion modules/network-observability-architecture.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,20 @@ If you do not use Loki, you can generate metrics with Prometheus. Those metrics

image::network-observability-architecture.png[Network Observability eBPF export architecture]

If you are using the Kafka option, the eBPF agent sends the network flow data to Kafka, and the `flowlogs-pipeline` reads from the Kafka topic before sending to Loki, as shown in the following diagram.
There are three deployment model options for the Network Observability Operator.

[NOTE]
====
The Network Observability Operator does not manage Loki or other data stores. You must install Loki separately by using the {loki-op}. If you use Kafka, you must install it separately by using the Kafka Operator.
====

Service deployment model::
When the `spec.deploymentModel` field in the `FlowCollector` resource is set to `Service`, agents are deployed per node as daemon sets. The `flowlogs-pipeline` is a standard deployment with a service. You can scale the `flowlogs-pipeline` component by using the `spec.processor.consumerReplicas` field.

Direct deployment model::
When the `spec.deploymentModel` field is set to `Direct`, agents and the `flowlogs-pipeline` are both deployed per node as daemon sets. This model is suitable for technology assessments and small clusters. However, it is less memory-efficient in large clusters because each instance of `flowlogs-pipeline` caches the same cluster information.

Kafka deployment model (optional)::
If you use the Kafka option, the `eBPF agent` sends the network flow data to Kafka. You can scale the `flowlogs-pipeline` component by using the `spec.processor.consumerReplicas` field. The `flowlogs-pipeline` component reads from the Kafka topic before sending data to Loki, as shown in the following diagram.
+
image::network-observability-arch-kafka-FLP.png[Network Observability using Kafka]
44 changes: 0 additions & 44 deletions modules/network-observability-configuring-predefined-alerts.adoc

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@
// * network_observability/network-observability-alerts.adoc

:_mod-docs-content-type: PROCEDURE
[id="network-observability-creating-custom-alert-rules_{context}"]
= Creating custom alert rules
[id="network-observability-custom-health-rule-configuration_{context}"]
= Custom health rule configuration

[role="_abstract"]
Use the Prometheus Query Language (`PromQL`) to define a custom `AlertingRule` resource to trigger alerts based on specific network metrics (e.g., traffic surges).

.Prerequisites

* Familiarity with `PromQL`.
* You have installed {product-title} 4.14 or later.
* You have installed {product-title} 4.16 or later.
* You have access to the cluster as a user with the `cluster-admin` role.
* You have installed the Network Observability Operator.
Expand Down
12 changes: 12 additions & 0 deletions modules/network-observability-disable-predefined-rules.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// Module included in the following assemblies:
//
// * network_observability/network-observability-alerts.adoc

:_mod-docs-content-type: REFERENCE
[id="network-observability-disable-predefined-rules_{context}"]
= Disable predefined rules

[role="_abstract"]
Rule templates can be disabled in the `spec.processor.metrics.disableAlerts` field of the `FlowCollector` custom resource (CR). This setting accepts a list of rule template names. For a list of alert template names, see: "List of default rules".

If a template is disabled and overridden in the `spec.processor.metrics.healthRules` field, the disable setting takes precedence and the alert rule is not created.
12 changes: 0 additions & 12 deletions modules/network-observability-disabling-predefined-alerts.adoc

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
// Module included in the following assemblies:
//
// * network_observability/network-observability-dns-decoding.adoc

:_mod-docs-content-type: PROCEDURE
[id="network-observability-dns-resolution-analysis-configure_{context}"]
= Configure DNS domain tracking for network observability

[role="_abstract"]
Enable DNS tracking in the Network Observability Operator to monitor DNS query names, response codes, and latency for network flows within the cluster.

.Prerequisites

* The Network Observability Operator is installed.
* You have `cluster-admin` privileges.
* You are familiar with the `FlowCollector` custom resource.

.Procedure

. Edit the `FlowCollector` resource by running the following command:
+
[source,terminal]
----
$ oc edit flowcollector cluster
----

. Configure the eBPF agent to enable the DNS tracking feature:
+
[source,yaml]
----
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
name: cluster
spec:
agent:
type: eBPF
ebpf:
features:
- DNSTracking
----
+
where:

`spec.agent.type.ebpf.features`:: Specifies the list of features to enable for the eBPF agent. To enable DNS tracking, add `DNSTracking` to this list.

. Save and exit the editor.

.Verification
. In the {product-title} web console, navigate to *Observe* -> *Network Traffic*.
. In the *Traffic Flows* view, click the *Manage columns* icon.
. Ensure that the *DNS Query Name*, *DNS Response Code*, and *DNS Latency* columns are selected.
. Filter the results by setting *Port* to `53`.
. Confirm that the flow table columns are populated with domain names and DNS metadata.
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// Module included in the following assemblies:
//
// * network_observability/network-observability-dns-decoding.adoc

:_mod-docs-content-type: REFERENCE
[id="network-observability-dns-resolution-analysis-reference_{context}"]
= DNS flow enrichment and analysis reference

[role="_abstract"]
Identify metadata added to network flows, leverage DNS data for network optimization, and understand the performance and storage impacts on the cluster.

The following table describes the metadata fields added to network flows when DNS tracking is enabled.

[NOTE]
====
Query names might be missing or truncated because of compression pointers or cache limitations.
====

.DNS flow metadata
[cols="1,2,1",options="header"]
|===
|Field |Description |Example
|`dns_query_name` |The Fully Qualified Domain Name (FQDN) being queried. |`example.com`
|`dns_response_code` |The status code returned by the DNS server. |`NoError`, `NXDomain`
|`dns_id` |The transaction ID used to match queries with responses. |`45213`
|===

[id="leverage-dns-data-optimization_{context}"]
== Leverage DNS data for network optimization
Use the captured DNS metadata for the following operational outcomes:

* Audit external dependencies: Ensure workloads are not reaching out to unauthorized external APIs or high-risk domains.
* Performance tuning: Monitor `DNS Latency` to identify if `CoreDNS` pods require additional scaling or if upstream DNS providers are lagging.

[id="identify-misconfiguration-errors_{context}"]
== Identify misconfiguration errors
A high frequency of `NXDOMAIN` responses typically indicates service discovery errors in application code or stale environment variables.

`NXDOMAIN` errors can be frequent in Kubernetes because of DNS searches on services and pods. While these results do not necessarily indicate a misconfiguration or broken URL, they can negatively impact performance.

When `NXDOMAIN` errors are returned despite an apparently valid Service or Pod host name, such as `my-svc.my-namespace.svc`, the resolver is likely configured to query DNS for different suffixes. You can optimize this by adding a trailing dot to fully qualified domain names to tell the resolver that the name is unambiguous.

For example, instead of `https://my-svc.my-namespace.svc`, use `https://my-svc.my-namespace.svc.cluster.local.` with a trailing dot.

[id="loki-storage-considerations_{context}"]
== Loki storage considerations

DNS tracking increases the number of labels and the amount of metadata per flow. Ensure that the Loki storage is sized to accommodate the increased log volume.
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
// Module included in the following assemblies:
//
// * network_observability/network-observability-dns-decoding.adoc

:_mod-docs-content-type: CONCEPT
[id="network-observability-dns-resolution-analysis-strategic-benefits_{context}"]
= Strategic benefits of DNS resolution analysis

[role="_abstract"]
Use DNS resolution analysis to differentiate between network transport failures and service discovery issues by enriching eBPF flow records with domain names and status codes.

Standard flow logs only show that traffic occurred on port 53. DNS resolution analysis allows you to complete the following tasks:

* Reduced Mean time to identify (Mtti): Distinguish immediately between a network routing failure and a DNS resolution failure, such as an `NXDOMAIN` error.
* Measure internal service latency: Track the time it takes for CoreDNS to respond to specific internal lookups (e.g., `my-service.namespace.svc.cluster.local`).
* Audit external dependencies: Audit which external APIs or third-party domains your workloads are communicating with without requiring sidecars or manual packet captures.
* Improved security posture: Detect potential data exfiltration or Command and Control (C2) activity by auditing the Fully Qualified Domain Names (FQDNs) queried by internal workloads.

[id="dns-flow-enrichment_{context}"]
== DNS flow enrichment
When this feature is active, the eBPF agent enriches the flow records. This metadata allows you to group and filter traffic by the intent of the connection (the domain) rather than just the source IP.

Enhanced DNS decoding allows the eBPF agent to inspect UDP and TCP DNS traffic on port 53 along with the query names for the DNS request.
Loading