== sanity-lnet test 207: Check health and resends for multi-rail remote errors ========================================================== 10:53:42 (1713279222) Cleaning up LNet LNET unconfigure error 22: (null) unloading modules on: 'oleg329-server' oleg329-server: oleg329-server.virtnet: executing unload_modules_local oleg329-server: LNET unconfigure error 22: (null) modules unloaded. Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure --all oleg329-server: Writer error: failed to resolve Netlink family id oleg329-server: opening /dev/lnet failed: No such file or directory oleg329-server: hint: the kernel modules may not be loaded oleg329-server: IOC_LIBCFS_GET_NI error 2: No such file or directory pdsh@oleg329-client: oleg329-server: ssh exited with exit code 1 oleg329-server: oleg329-server.virtnet: executing load_lnet config_on_load=1 oleg329-server: Loading modules from /home/green/git/lustre-release/lustre oleg329-server: detected 4 online CPUs by sysfs oleg329-server: Force libcfs to create 2 CPU partitions oleg329-server: /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl discover 192.168.203.129@tcp discover: - primary nid: 192.168.203.129@tcp Multi-Rail: true peer_ni: - nid: 192.168.203.129@tcp oleg329-server: oleg329-server.virtnet: executing lnet_if_list oleg329-server: oleg329-server.virtnet: executing /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure oleg329-server: oleg329-server.virtnet: executing /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth0 /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth0 net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 192.168.203.29@tcp status: up interfaces: 0: eth0 - net type: tcp1 local NI(s): - nid: 192.168.203.29@tcp1 status: up interfaces: 0: eth0 - primary nid: 192.168.203.129@tcp - nid: 192.168.203.129@tcp health stats: health value: 1000 - nid: 192.168.203.129@tcp1 health stats: health value: 1000 debug=+net Simulate remote_dropped Added drop rule 192.168.203.29@tcp->192.168.203.129@tcp (1/1) Added drop rule 192.168.203.29@tcp->192.168.203.129@tcp1 (1/1) Added drop rule 192.168.203.29@tcp->192.168.203.29@tcp (1/1) Added drop rule 192.168.203.29@tcp->192.168.203.29@tcp1 (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.129@tcp (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.129@tcp1 (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.29@tcp (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.29@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.203.129@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.203.129@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health is unchanged Check that remote NI health has been changed /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all Simulate remote_error Added drop rule 192.168.203.29@tcp->192.168.203.129@tcp (1/1) Added drop rule 192.168.203.29@tcp->192.168.203.129@tcp1 (1/1) Added drop rule 192.168.203.29@tcp->192.168.203.29@tcp (1/1) Added drop rule 192.168.203.29@tcp->192.168.203.29@tcp1 (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.129@tcp (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.129@tcp1 (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.29@tcp (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.29@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.203.129@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.203.129@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that no resends took place Check that local NI health is unchanged Check that remote NI health has been changed /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all Simulate remote_timeout Added drop rule 192.168.203.29@tcp->192.168.203.129@tcp (1/1) Added drop rule 192.168.203.29@tcp->192.168.203.129@tcp1 (1/1) Added drop rule 192.168.203.29@tcp->192.168.203.29@tcp (1/1) Added drop rule 192.168.203.29@tcp->192.168.203.29@tcp1 (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.129@tcp (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.129@tcp1 (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.29@tcp (1/1) Added drop rule 192.168.203.29@tcp1->192.168.203.29@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.203.129@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.203.129@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that no resends took place Check that local NI health is unchanged Check that remote NI health has been changed /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all oleg329-server: oleg329-server.virtnet: executing /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net del --net tcp1 --if eth0 Writer error: failed to resolve Netlink family id unloading modules on: 'oleg329-server' oleg329-server: oleg329-server.virtnet: executing unload_modules_local modules unloaded. oleg329-server: oleg329-server.virtnet: executing unload_modules_local oleg329-server: LNET unconfigure error 22: (null) pdsh@oleg329-client: oleg329-client: ssh exited with exit code 2 pdsh@oleg329-client: oleg329-server: ssh exited with exit code 2 pdsh@oleg329-client: oleg329-client: ssh exited with exit code 2 pdsh@oleg329-client: oleg329-server: ssh exited with exit code 2