Skip to content

OVS internal nic used as a lxc phys device often fails (with cookbook to reproduce it easily !) #367

@bo911

Description

@bo911

Hallo,

over years now I like to use openvswitch as the basic technology in context with virtualization (qemu, lxc, ...)
I love the possibility to setup internal cards with various attributes like vlan, bonding, ...
While using newer openvswitch Versions there comes up a serious problem with this approach, because the newer the openvswitch version is, the more problems (statistical) occur when using a predefined (means setup after boot) ovs internal network card as a so called lxc phys network card ..
If I use real physical network cards (e.g. enp3s0 + enp4so) instead of the ovs internal nic everything worked as expected ..

To reproduce this problem easily do the following ...

Boot with network access from lubuntu-noble-24.04.2 iso image (beta versions of Ubuntu quenting do the same)..Login ...

Install openvswitch + lxc

apt update && apt upgrade -y
apt install lxc1 lxcfs openvswitch-switch

Create two bridges (lxc-br0, lxc-br1) with each owns one internal device (lxc-lan0, lxc-lan1)

ovs-vsctl add-br lxc-br0
ovs-vsctl add-br lxc-br1
ovs-vsctl add-port lxc-br0 lxc-lan0 -- set interface lxc-lan0 type=internal
ovs-vsctl add-port lxc-br1 lxc-lan1 -- set interface lxc-lan1 type=internal

Create new lxc under /lxc/noble by using the download template

mkdir /lxc
cd /lxc
lxc-create -n noble -P /lxc -tdownload
cd /var/lib/lxc
ln -s /lxc/noble noble

Edit /lxc/noble/config and replace network setup as follow

lxc.net.0.type = phys
lxc.net.0.link   = lxc-lan0
lxc.net.0.flags = up

lxc.net.1.type = phys
lxc.net.1.link   = lxc-lan1
lxc.net.1.flags = up

Then start lxc-start noble to reach the error. For example by doing

rm -rf /tmp/lxc_trace.log;lxc-start noble -l trace -o /tmp/lxc_trace

=> what happens is, that lxc renames lxc-lan0 and lxc-lan1, moves the renamed interface to the lxc-network namespace and then informs the client (running in lxc - namespace) about this new devices.
The client often is not able to find this renamed interface !!
This error is statistical, which means starting the lxc several times it sometime just works !
If you "grep network /tmp/lxc_trace" you can easily see the problem ...
Since it works with real Ethernet cards, I think there is maybe a timing / recognition problem while searching
the renamed card in the net lxc network namespace ! ?
Using older releases (2.X) the problems rarely happen
I use a clustered hierarchical layout (Phys-Server, QEMU-VM (build HA cluster), lxc as HA resource) in real life,
but to reproduce the problem the above showed test is more simple !!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions