== sanity-lnet test 205: Check health and resends for multi-rail local failures ========================================================== 20:19:42 (1713485982) Cleaning up LNet LNET unconfigure error 22: (null) unloading modules on: 'oleg249-server' oleg249-server: oleg249-server.virtnet: executing unload_modules_local oleg249-server: LNET unconfigure error 22: (null) modules unloaded. Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure --all oleg249-server: Writer error: failed to resolve Netlink family id oleg249-server: opening /dev/lnet failed: No such file or directory oleg249-server: hint: the kernel modules may not be loaded oleg249-server: IOC_LIBCFS_GET_NI error 2: No such file or directory pdsh@oleg249-client: oleg249-server: ssh exited with exit code 1 oleg249-server: oleg249-server.virtnet: executing load_lnet config_on_load=1 oleg249-server: Loading modules from /home/green/git/lustre-release/lustre oleg249-server: detected 4 online CPUs by sysfs oleg249-server: Force libcfs to create 2 CPU partitions oleg249-server: /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl discover 192.168.202.149@tcp discover: - primary nid: 192.168.202.149@tcp Multi-Rail: true peer_ni: - nid: 192.168.202.149@tcp oleg249-server: oleg249-server.virtnet: executing lnet_if_list oleg249-server: oleg249-server.virtnet: executing /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure oleg249-server: oleg249-server.virtnet: executing /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth0 /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth0 net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 192.168.202.49@tcp status: up interfaces: 0: eth0 - net type: tcp1 local NI(s): - nid: 192.168.202.49@tcp1 status: up interfaces: 0: eth0 - primary nid: 192.168.202.149@tcp - nid: 192.168.202.149@tcp health stats: health value: 1000 - nid: 192.168.202.149@tcp1 health stats: health value: 1000 debug=+net Simulate local_interrupt Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.202.149@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.202.149@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_dropped Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.202.149@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.202.149@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_aborted Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.202.149@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.202.149@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_no_route Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.202.149@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.202.149@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_timeout Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.202.149@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.202.149@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that 2 resends took place Check that local NI health has been changed Simulate local_error Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp->192.168.202.49@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.149@tcp1 (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp (1/1) Added drop rule 192.168.202.49@tcp1->192.168.202.49@tcp1 (1/1) /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl ping 192.168.202.149@tcp manage: - ping: errno: -5 descr: ! 'failed to ping 192.168.202.149@tcp: Input/output error' /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl peer set --health 1000 --all /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net set --health 1000 --all --- set: - net: errno: 3 descr: ! "Object not found" ... Removed 8 drop rules Check that no resends took place Check that local NI health has been changed oleg249-server: oleg249-server.virtnet: executing /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net del --net tcp1 --if eth0 Writer error: failed to resolve Netlink family id unloading modules on: 'oleg249-server' oleg249-server: oleg249-server.virtnet: executing unload_modules_local modules unloaded. oleg249-server: oleg249-server.virtnet: executing unload_modules_local oleg249-server: LNET unconfigure error 22: (null) pdsh@oleg249-client: oleg249-client: ssh exited with exit code 2 pdsh@oleg249-client: oleg249-server: ssh exited with exit code 2 pdsh@oleg249-client: oleg249-client: ssh exited with exit code 2 pdsh@oleg249-client: oleg249-server: ssh exited with exit code 2