== sanity-lnet test 205: Check health and resends for multi-rail local failures ========================================================== 03:26:37 (1713425197) Cleaning up LNet LNET unconfigure error 22: (null) unloading modules on: 'oleg115-server' oleg115-server: oleg115-server.virtnet: executing unload_modules_local oleg115-server: LNET unconfigure error 22: (null) modules unloaded. Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure --all oleg115-server: Writer error: failed to resolve Netlink family id oleg115-server: opening /dev/lnet failed: No such file or directory oleg115-server: hint: the kernel modules may not be loaded oleg115-server: IOC_LIBCFS_GET_NI error 2: No such file or directory pdsh@oleg115-client: oleg115-server: ssh exited with exit code 1 oleg115-server: oleg115-server.virtnet: executing load_lnet config_on_load=1 oleg115-server: Loading modules from /home/green/git/lustre-release/lustre oleg115-server: detected 4 online CPUs by sysfs oleg115-server: Force libcfs to create 2 CPU partitions oleg115-server: /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl discover 192.168.201.115@tcp discover: - primary nid: 192.168.201.115@tcp Multi-Rail: true peer_ni: - nid: 192.168.201.115@tcp oleg115-server: oleg115-server.virtnet: executing lnet_if_list oleg115-server: oleg115-server.virtnet: executing /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure oleg115-server: oleg115-server.virtnet: executing /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth0 /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth0 net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 192.168.201.15@tcp status: up interfaces: 0: eth0 - net type: tcp1 local NI(s): - nid: 192.168.201.15@tcp1 status: up interfaces: 0: eth0 - primary nid: 192.168.201.115@tcp - nid: 192.168.201.115@tcp health stats: health value: 1000 - nid: 192.168.201.115@tcp1 health stats: health value: 1000 debug=+net Simulate local_interrupt Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.201.115@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.201.115@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_dropped Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.201.115@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.201.115@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_aborted Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.201.115@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.201.115@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_no_route Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.201.115@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.201.115@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_timeout Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.201.115@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.201.115@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_error Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp->192.168.201.15@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.115@tcp1 (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp (1/1) Added drop rule 192.168.201.15@tcp1->192.168.201.15@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.201.115@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.201.115@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that no resends took place Check that local NI health has been changed oleg115-server: oleg115-server.virtnet: executing /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net del --net tcp1 --if eth0 Writer error: failed to resolve Netlink family id unloading modules on: 'oleg115-server' oleg115-server: oleg115-server.virtnet: executing unload_modules_local modules unloaded. oleg115-server: oleg115-server.virtnet: executing unload_modules_local oleg115-server: LNET unconfigure error 22: (null) pdsh@oleg115-client: oleg115-client: ssh exited with exit code 2 pdsh@oleg115-client: oleg115-server: ssh exited with exit code 2 pdsh@oleg115-client: oleg115-client: ssh exited with exit code 2 pdsh@oleg115-client: oleg115-server: ssh exited with exit code 2