Skip to content

fix(encapsulation): route Cilium IPIP traffic through VxLAN overlay#6

Draft
kvaps wants to merge 3 commits intomainfrom
fix/cilium-ipip-overlay
Draft

fix(encapsulation): route Cilium IPIP traffic through VxLAN overlay#6
kvaps wants to merge 3 commits intomainfrom
fix/cilium-ipip-overlay

Conversation

@kvaps
Copy link
Member

@kvaps kvaps commented Feb 14, 2026

Summary

  • Rewrite Cilium encapsulator to create IPIP tunnels instead of using cilium_host interface directly, preventing routing loops when --compatibility=cilium with --encapsulation=always or --encapsulation=crosssubnet
  • Each node autodiscovers its cilium_host IP and advertises it via kilo.squat.ai/cilium-internal-ip annotation so other nodes can route IPIP outer packets through Cilium's VxLAN overlay
  • Add LocalIP() method to Encapsulator interface for overlay IP autodiscovery

Test plan

  • Deploy with --compatibility=cilium --encapsulation=always and verify IPIP routes use Cilium internal IPs as gateways
  • Verify cross-node pod traffic flows through VxLAN without routing loops
  • Test with --compatibility=cilium --encapsulation=crosssubnet
  • Verify non-Cilium encapsulators (IPIP, Flannel, Noop) are unaffected

Rewrite Cilium encapsulator to create IPIP tunnels instead of using
cilium_host interface directly. Each node autodiscovers its cilium_host
IP and advertises it via kilo.squat.ai/cilium-internal-ip annotation,
allowing other nodes to route IPIP outer packets through Cilium's VxLAN
overlay and preventing routing loops.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 14, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/cilium-ipip-overlay

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Cilium encapsulation mechanism to resolve routing loop issues when IPIP tunnels are used in conjunction with Cilium's networking. By introducing a dedicated IPIP tunnel for Cilium traffic and leveraging Cilium's VxLAN overlay for routing, it ensures seamless cross-node pod communication. The changes also enable nodes to advertise their internal Cilium IP, allowing for more intelligent gateway selection and robust network topology management.

Highlights

  • Cilium Encapsulation Rework: The Cilium encapsulator has been rewritten to create IPIP tunnels, preventing routing loops when using Cilium with always or crosssubnet encapsulation strategies. This ensures IPIP outer packets are routed through Cilium's VxLAN overlay.
  • Cilium Internal IP Discovery and Advertisement: Each node now automatically discovers its cilium_host interface IP and advertises it via a new Kubernetes annotation (kilo.squat.ai/cilium-internal-ip). This allows other nodes to correctly route IPIP traffic through the Cilium VxLAN overlay.
  • Encapsulator Interface Enhancement: A new LocalIP() method has been added to the Encapsulator interface to facilitate overlay IP autodiscovery. The Gw method signature was also updated across all encapsulators to accept the ciliumIP.
  • IPIP Tunnel Management: The cilium encapsulator now includes logic for creating, initializing, setting addresses, and cleaning up IPIP tunnel interfaces, along with generating necessary iptables rules for IPIP traffic.
  • Node Metadata Update: The mesh.Node structure and Kubernetes backend now support storing and retrieving the CiliumInternalIP for each node, which is then used in topology and route calculations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pkg/encapsulation/cilium.go
    • Refactored the cilium struct, removing netlink subscription and sync.Mutex for interface management.
    • Implemented CleanUp to remove created IPIP devices.
    • Updated Gw method to prioritize ciliumIP for gateway selection.
    • Added LocalIP method to return the cilium_host interface IP.
    • Implemented Init to create and bring up a new IPIP tunnel interface.
    • Added Rules method to generate iptables rules for IPIP traffic acceptance and dropping.
    • Implemented Set method to assign an IP address to the IPIP tunnel interface.
  • pkg/encapsulation/encapsulation.go
    • Added LocalIP() method to the Encapsulator interface.
    • Modified the Gw method signature to include a ciliumIP parameter.
  • pkg/encapsulation/flannel.go
    • Added a no-op LocalIP() method.
    • Modified the Gw method signature to align with the updated Encapsulator interface.
  • pkg/encapsulation/ipip.go
    • Added a no-op LocalIP() method.
    • Modified the Gw method signature to align with the updated Encapsulator interface.
  • pkg/encapsulation/noop.go
    • Added a no-op LocalIP() method.
    • Modified the Gw method signature to align with the updated Encapsulator interface.
  • pkg/k8s/backend.go
    • Defined a new constant ciliumInternalIPAnnotationKey for Kubernetes annotations.
    • Updated the Set method to store the CiliumInternalIP in node annotations.
    • Modified translateNode to parse and retrieve the CiliumInternalIP from node annotations.
  • pkg/mesh/backend.go
    • Added a CiliumInternalIP field of type net.IP to the Node struct.
  • pkg/mesh/mesh.go
    • Updated the handleLocal method to populate the CiliumInternalIP field of the local node using enc.LocalIP().
    • Modified nodesAreEqual to include comparison of CiliumInternalIP for node equality checks.
  • pkg/mesh/routes.go
    • Updated calls to the enc.Gw method in Routes to pass the ciliumInternalIPs for correct gateway determination in route calculations.
  • pkg/mesh/topology.go
    • Added ciliumInternalIPs slice to the segment struct to store Cilium internal IPs of peers.
    • Updated NewTopology to collect and populate the ciliumInternalIPs for each segment.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Cilium encapsulator to use IPIP tunnels routed through Cilium's VxLAN overlay, which is a significant improvement for compatibility. It introduces a mechanism for nodes to discover and advertise their Cilium internal IP via Kubernetes annotations. The changes are well-distributed across the codebase and appear to correctly implement the intended logic. I have identified one critical issue that could lead to a panic and one medium-severity issue related to error handling that should be addressed.

return a.Key.String() == b.Key.String() &&
ipNetsEqual(a.WireGuardIP, b.WireGuardIP) &&
ipNetsEqual(a.InternalIP, b.InternalIP) &&
a.CiliumInternalIP.Equal(b.CiliumInternalIP) &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Calling .Equal() on a nil net.IP slice will cause a panic. The CiliumInternalIP field can be nil, for example, if a node does not have the Cilium internal IP annotation. To prevent a panic, you should use a nil-safe comparison method, such as bytes.Equal.

Suggested change
a.CiliumInternalIP.Equal(b.CiliumInternalIP) &&
bytes.Equal(a.CiliumInternalIP, b.CiliumInternalIP) &&

Comment on lines 40 to 42
if err := iproute.DeleteAddresses(c.iface); err != nil {
return nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error handling in this function is inconsistent. If iproute.DeleteAddresses fails, the error is swallowed, and iproute.RemoveInterface is not called. However, if iproute.RemoveInterface fails, the error is returned. Swallowing the error from DeleteAddresses can hide underlying issues during cleanup. It's better to return the error to the caller for logging and diagnostics.

Suggested change
if err := iproute.DeleteAddresses(c.iface); err != nil {
return nil
}
if err := iproute.DeleteAddresses(c.iface); err != nil {
return err
}

kvaps and others added 2 commits February 14, 2026 11:50
Align constant block formatting for gofmt, add ciliumInternalIPs
to expected topology test segments, use bytes.Equal for nil-safe
CiliumInternalIP comparison, and return error from CleanUp.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
staticcheck SA1021 requires net.IP.Equal for IP comparison.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant