-
Notifications
You must be signed in to change notification settings - Fork 42
Description
On systems with NDR fabrics, the IB fabric does not work properly. IB traffic like MPI and IPoIB (with packets larger than ~400 bytes) will fail. When using opensmd from MLNX-OFED, or using inbox drivers and mlnxsm from mlnx_ib_mgmt, the NDR fabric is fully functional with both MPI and IPoIB.
opensm.log lists lots of errors. This is using opensm with a default config on RHEL 9.
To see if the configuration needs have changed, I attempted to use the opensm.conf from mlnxsm. Unfortunately, that also does not enable a proper working fabric. It seems like there is something missing in opensm from linux-rdma for use with Quantum2 switches.
Example basic test failing with IPoIB:
[root@test1 ~]# ping -s 500 -c 1 test2-ib
PING test2-ib (10.3.1.231) 500(528) bytes of data.
From test2-ib (10.3.1.254) icmp_seq=1 Destination Host Unreachable
--- test2-ib ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
Example basic test failing with ibv_rc_pingpong:
[root@test3 ~]# ibv_rc_pingpong test1
local address: LID 0x0013, QPN 0x000063, PSN 0x566748, GID ::
remote address: LID 0x0001, QPN 0x000048, PSN 0xdf4fa4, GID ::
Failed status transport retry counter exceeded (12) for wr_id 2
parse WC failed 1
Both of the above tests pass when using either of the commercial SMs.