-----============= acceptance-small: recovery-small ============----- Fri Apr 19 08:49:57 EDT 2024 excepting tests: 136 === recovery-small: start setup 08:49:59 (1713530999) === oleg326-client.virtnet: executing check_config_client /mnt/lustre oleg326-client.virtnet: Checking config lustre mounted on /mnt/lustre Checking servers environments Checking clients oleg326-client.virtnet environments Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b5d4e000.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b5d4e000.idle_timeout=debug disable quota as required oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all osd-ldiskfs.track_declares_assert=1 === recovery-small: finish setup 08:50:05 (1713531005) === osp.lustre-OST0000-osc-MDT0000.prealloc_force_new_seq=1 osp.lustre-OST0001-osc-MDT0000.prealloc_force_new_seq=1 Creating to objid 33 on ost lustre-OST0000... Creating to objid 33 on ost lustre-OST0001... total: 33 open/close in 0.20 seconds: 164.55 ops/second total: 33 open/close in 0.21 seconds: 159.21 ops/second osp.lustre-OST0000-osc-MDT0000.prealloc_force_new_seq=0 osp.lustre-OST0001-osc-MDT0000.prealloc_force_new_seq=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 1: create, chmod, stat: drop req, drop rep ========================================================== 08:50:08 (1713531008) fail_val=0 fail_loc=0x123 fail_loc=0 fail_loc=0x119 fail_loc=0 fail_val=0 fail_loc=0x123 fail_loc=0 fail_loc=0x119 fail_loc=0 fail_val=0 fail_loc=0x123 fail_loc=0 fail_loc=0x122 fail_loc=0 PASS 1 (100s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 4: open: drop req, drop rep ======= 08:51:50 (1713531110) fail_val=0 fail_loc=0x123 fail_loc=0 fail_loc=0x122 fail_loc=0 PASS 4 (34s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 5: rename: drop req, drop rep ===== 08:52:26 (1713531146) fail_val=0 fail_loc=0x123 fail_loc=0 fail_loc=0x119 fail_loc=0 PASS 5 (34s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 6: link, unlink: drop req, drop rep ========================================================== 08:53:02 (1713531182) fail_val=0 fail_loc=0x123 fail_loc=0 fail_loc=0x119 fail_loc=0 fail_val=0 fail_loc=0x123 fail_loc=0 fail_loc=0x119 fail_loc=0 PASS 6 (67s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 8: touch: drop rep (bug 1423) ===== 08:54:11 (1713531251) fail_loc=0x119 fail_loc=0 PASS 8 (18s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 9: pause bulk on OST (bug 1420) === 08:54:30 (1713531270) timeout is 0/ fail_val=0 fail_loc=0x80000214 fail_loc=0 PASS 9 (7s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 10a: finish request on server after client eviction (bug 1521) ========================================================== 08:54:39 (1713531279) ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800b5d4e000.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff8800b5d4e000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff8800b5d4e000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff8800b5d4e000.early_lock_cancel=0 fail_loc=0x305 fail_loc=0 fail_val=0 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800b5d4e000.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff8800b5d4e000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff8800b5d4e000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff8800b5d4e000.early_lock_cancel=1 /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error Connected clients: oleg326-client.virtnet /mnt/lustre has perms 0777 OK PASS 10a (113s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 10b: re-send BL AST =============== 08:56:34 (1713531394) ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800b5d4e000.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff8800b5d4e000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff8800b5d4e000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff8800b5d4e000.early_lock_cancel=0 fail_loc=0x80000305 fail_loc=0 fail_val=0 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800b5d4e000.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff8800b5d4e000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff8800b5d4e000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff8800b5d4e000.early_lock_cancel=1 Connected clients: oleg326-client.virtnet oleg326-client.virtnet /mnt/lustre has perms 0777 OK PASS 10b (17s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 10c: re-send BL AST vs reconnect race (LU-5569) ========================================================== 08:56:53 (1713531413) ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800b5d4e000.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff8800b5d4e000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff8800b5d4e000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff8800b5d4e000.early_lock_cancel=0 fail_loc=0x80000305 mdc.lustre-MDT0001-mdc-ffff8800b5d4e000.import=connection=192.168.203.126@tcp fail_loc=0 fail_val=0 Connected clients: oleg326-client.virtnet oleg326-client.virtnet ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800b5d4e000.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff8800b5d4e000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff8800b5d4e000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff8800b5d4e000.early_lock_cancel=1 /mnt/lustre/d10c.recovery-small has perms 0777 OK PASS 10c (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 10d: test failed blocking ast ===== 08:56:57 (1713531417) 7+0 records in 7+0 records out 7 bytes (7 B) copied, 0.000140656 s, 49.8 kB/s Stopping client oleg326-client.virtnet /mnt/lustre (opts:) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 1+0 records in 1+0 records out 5 bytes (5 B) copied, 0.00227622 s, 2.2 kB/s fail_err=71 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4fe000.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4fe000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4fe000.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4fe000.early_lock_cancel=0 fail_loc=0x305 fail_loc=0 fail_val=0 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4fe000.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4fe000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4fe000.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4fe000.early_lock_cancel=1 Connected clients: oleg326-client.virtnet oleg326-client.virtnet fail_err=0 192.168.203.126@tcp:/lustre /mnt/lustre2 lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg326-client.virtnet /mnt/lustre2 (opts:) PASS 10d (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 10e: re-send BL AST vs reconnect race 2 ========================================================== 08:57:02 (1713531422) SKIP: recovery-small test_10e need two clients SKIP 10e (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 11: wake up a thread waiting for completion after eviction (b=2460) ========================================================== 08:57:04 (1713531424) ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f8800.early_lock_cancel=0 fail_loc=0x80000305 fail_loc=0 fail_val=0 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f8800.early_lock_cancel=1 PASS 11 (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 12: recover from timed out resend in ptlrpcd (b=2494) ========================================================== 08:57:24 (1713531444) fail_loc=0x115 multiop /mnt/lustre/f12.recovery-small vOS_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 waiting for multiop 19350 clearing fail_loc on mds1 fail_loc=0 PASS 12 (41s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 13: mdc_readpage restart test (bug 1138) ========================================================== 08:58:07 (1713531487) fail_loc=0x80000104 newentry fail_loc=0 PASS 13 (18s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 14: mdc_readpage resend test (bug 1138) ========================================================== 08:58:26 (1713531506) fail_loc=0x80000106 newentry fail_loc=0 PASS 14 (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 15: failed open (-ENOMEM) ========= 08:58:29 (1713531509) fail_loc=0x80000128 touch: cannot touch '/mnt/lustre/f15.recovery-small': Cannot allocate memory PASS 15 (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 16: timeout bulk put, don't evict client (2732) ========================================================== 08:58:32 (1713531512) fail_loc=0x80000504 fail_loc=0 PASS 16 (38s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 17a: timeout bulk get, don't evict client (2732) ========================================================== 08:59:12 (1713531552) at_max=20 fail_loc=0x80000503 fail_loc=0 Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 3456 7210584 1% /mnt/lustre at_max=600 PASS 17a (43s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 17b: timeout bulk get, dont evict client (3582) ========================================================== 08:59:57 (1713531597) SKIP: recovery-small test_17b Needs multiple clients SKIP 17b (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 18a: manual ost invalidate clears page cache immediately ========================================================== 08:59:59 (1713531599) PASS 18a (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 18b: eviction and reconnect clears page cache (2766) ========================================================== 09:00:02 (1713531602) PASS 18b (24s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 18c: Dropped connect reply after eviction handing (14755) ========================================================== 09:00:27 (1713531627) fail_loc=0x80000225 PASS 18c (15s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 19a: test expired_lock_main on mds (2867) ========================================================== 09:00:44 (1713531644) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 fail_loc=0x304 fail_loc=0 192.168.203.126@tcp:/lustre /mnt/lustre2 lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg326-client.virtnet /mnt/lustre2 (opts:) Connected clients: oleg326-client.virtnet oleg326-client.virtnet PASS 19a (102s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 19b: test expired_lock_main on ost (2867) ========================================================== 09:02:28 (1713531748) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 fail_loc=0x304 fail_loc=0 192.168.203.126@tcp:/lustre /mnt/lustre2 lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg326-client.virtnet /mnt/lustre2 (opts:) uname: write error: Input/output error Connected clients: oleg326-client.virtnet PASS 19b (102s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 19c: check reconnect and lock resend do not trigger expired_lock_main ========================================================== 09:04:12 (1713531852) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f9800.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f9800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f9800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f9800.early_lock_cancel=0 File: '/mnt/lustre/f19c.recovery-small' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 162129603815538691 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-19 09:04:12.000000000 -0400 Modify: 2024-04-19 09:04:12.000000000 -0400 Change: 2024-04-19 09:04:12.000000000 -0400 Birth: - fail_loc=0x80000516 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f9800.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f9800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f9800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f9800.early_lock_cancel=1 192.168.203.126@tcp:/lustre /mnt/lustre2 lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg326-client.virtnet /mnt/lustre2 (opts:) PASS 19c (9s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 20a: ldlm_handle_enqueue error (should return error) ========================================================== 09:04:22 (1713531862) multiop /mnt/lustre/d20a.recovery-small/f20a.recovery-small vO_wc TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000308 write: Cannot allocate memory PASS 20a (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 20b: ldlm_handle_enqueue error (should return error) ========================================================== 09:04:25 (1713531865) fail_loc=0x80000308 dd: failed to open '/mnt/lustre/d20b.recovery-small/f20b.recovery-small': Cannot allocate memory PASS 20b (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 21a: drop close request while close and open are both in flight ========================================================== 09:04:29 (1713531869) multiop /mnt/lustre/d21a.recovery-small-1/f vO_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000129 fail_loc=0 fail_loc=0x80000115 fail_loc=0 /mnt/lustre/d21a.recovery-small-1/f has type file OK /mnt/lustre/d21a.recovery-small-2/f has type file OK PASS 21a (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 21b: drop open request while close and open are both in flight ========================================================== 09:04:50 (1713531890) multiop /mnt/lustre/d21b.recovery-small-1/f vO_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000107 fail_loc=0 /mnt/lustre/d21b.recovery-small-1/f has type file OK /mnt/lustre/d21b.recovery-small-2/f has type file OK PASS 21b (142s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 21c: drop both request while close and open are both in flight ========================================================== 09:07:14 (1713532034) multiop /mnt/lustre/d21c.recovery-small-1/f vO_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000107 fail_loc=0 fail_loc=0x80000115 fail_loc=0 /mnt/lustre/d21c.recovery-small-1/f has type file OK /mnt/lustre/d21c.recovery-small-2/f has type file OK PASS 21c (21s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 21d: drop close reply while close and open are both in flight ========================================================== 09:07:37 (1713532057) multiop /mnt/lustre/d21d.recovery-small-1/f vO_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000129 fail_loc=0 fail_loc=0x80000122 fail_loc=0 /mnt/lustre/d21d.recovery-small-1/f has type file OK /mnt/lustre/d21d.recovery-small-2/f has type file OK PASS 21d (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 21e: drop open reply while close and open are both in flight ========================================================== 09:07:58 (1713532078) multiop /mnt/lustre/d21e.recovery-small-1/f vO_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000119 fail_loc=0 /mnt/lustre/d21e.recovery-small-1/f has type file OK /mnt/lustre/d21e.recovery-small-2/f has type file OK PASS 21e (141s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 21f: drop both reply while close and open are both in flight ========================================================== 09:10:21 (1713532221) multiop /mnt/lustre/d21f.recovery-small-1/f vO_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000119 fail_loc=0 fail_loc=0x80000122 fail_loc=0 /mnt/lustre/d21f.recovery-small-1/f has type file OK /mnt/lustre/d21f.recovery-small-2/f has type file OK PASS 21f (21s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 21g: drop open reply and close request while close and open are both in flight ========================================================== 09:10:44 (1713532244) multiop /mnt/lustre/d21g.recovery-small-1/f vO_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000119 fail_loc=0 fail_loc=0x80000115 fail_loc=0 /mnt/lustre/d21g.recovery-small-1/f has type file OK /mnt/lustre/d21g.recovery-small-2/f has type file OK PASS 21g (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 21h: drop open request and close reply while close and open are both in flight ========================================================== 09:11:05 (1713532265) multiop /mnt/lustre/d21h.recovery-small-1/f vO_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000107 fail_loc=0 fail_loc=0x80000122 fail_loc=0 /mnt/lustre/d21h.recovery-small-1/f has type file OK /mnt/lustre/d21h.recovery-small-2/f has type file OK PASS 21h (20s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 22: drop close request and do mknod ========================================================== 09:11:26 (1713532286) fail_loc=0x80000115 fail_loc=0 PASS 22 (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 23: client hang when close a file after mds crash ========================================================== 09:11:47 (1713532307) multiop /mnt/lustre/f23.recovery-small vO_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_val=0 fail_loc=0x123 fail_loc=0 Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:11:53 (1713532313) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:12:07 (1713532327) targets are mounted 09:12:07 (1713532327) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 23 (27s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 24a: fsync error (should return error) ========================================================== 09:12:16 (1713532336) multiop /mnt/lustre/d24a.recovery-small/f24a.recovery-small vOwy_wyc TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fsync: Input/output error fail_loc=0x0 Connected clients: oleg326-client.virtnet oleg326-client.virtnet PASS 24a (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 24b: test dirty page discard due to client eviction ========================================================== 09:12:20 (1713532340) multiop /mnt/lustre/d24b.recovery-small/f24b.recovery-small-1 vOw8192_yc TMPPIPE=/tmp/multiop_open_wait_pipe.7360 multiop /mnt/lustre/d24b.recovery-small/f24b.recovery-small-2 vOw8192_c TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fsync: Input/output error close: Input/output error fail_loc=0x0 Connected clients: oleg326-client.virtnet oleg326-client.virtnet [ 1435.302613] Lustre: 1872:0:(llite_lib.c:4078:ll_dirty_page_discard_warn()) lustre: dirty page discard: 192.168.203.126@tcp:/lustre/fid: [0x240000403:0xb:0x0]// may get corrupted (rc -108) PASS 24b (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 26a: evict dead exports =========== 09:12:23 (1713532343) SKIP: recovery-small test_26a msg and ost1 are at the same node SKIP 26a (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 26b: evict dead exports =========== 09:12:26 (1713532346) SKIP: recovery-small test_26b msg and ost1 are at the same node SKIP 26b (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 27: fail LOV while using OSC's ==== 09:12:29 (1713532349) Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:12:31 (1713532351) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:12:45 (1713532365) targets are mounted 09:12:45 (1713532365) facet_failover done fail_loc=0x80000407 waiting for fail_loc Waiting 90s for '-2147482617' Waiting 80s for '-2147482617' Waiting 70s for '-2147482617' Waiting 60s for '-2147482617' Waiting 50s for '-2147482617' Waiting 40s for '-2147482617' Waiting 30s for '-2147482617' Waiting 20s for '-2147482617' Waiting 10s for '-2147482617' Waiting 0s for '-2147482617' Update not seen after 90s: want '-2147482617' got '2147484679' Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:14:17 (1713532457) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:14:31 (1713532471) targets are mounted 09:14:31 (1713532471) facet_failover done PASS 27 (127s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 28: handle error adding new clients (bug 6086) ========================================================== 09:14:38 (1713532478) ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f8800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f8800.early_lock_cancel=0 fail_loc=0x80000305 fail_loc=0 fail_val=0 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff88012b4f8800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff88012b4f8800.early_lock_cancel=1 fail_loc=0x8000012f Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:14:57 (1713532497) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:15:10 (1713532510) targets are mounted 09:15:10 (1713532510) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 28 (39s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 29a: error adding new clients doesn't cause LBUG (bug 22273) ========================================================== 09:15:19 (1713532519) fail_loc=0x80000711 Stopping /mnt/lustre-mds1 (opts:) on oleg326-server Failover mds1 to oleg326-server Starting mds1: -o localrecov -o abort_recovery /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 oleg326-server: oleg326-server.virtnet: executing wait_import_state FULL os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid 50 oleg326-server: os[cp].lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 0 sec PASS 29a (25s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 29b: error adding new clients doesn't cause LBUG (bug 22273) ========================================================== 09:15:46 (1713532546) fail_loc=0x80000711 Stopping /mnt/lustre-ost1 (opts:) on oleg326-server Failover ost1 to oleg326-server Starting ost1: -o localrecov -o abort_recovery /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 pdsh@oleg326-client: oleg326-client: ssh exited with exit code 5 first stat failed: 5 PASS 29b (20s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 50: failover MDS under load ======= 09:16:08 (1713532568) writemany pid 16968 Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:16:20 (1713532580) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:16:33 (1713532593) targets are mounted 09:16:33 (1713532593) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:17:40 (1713532660) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:17:54 (1713532674) targets are mounted 09:17:54 (1713532674) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:19:01 (1713532741) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:19:14 (1713532754) targets are mounted 09:19:14 (1713532754) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec writemany returned 0 PASS 50 (214s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 51: failover MDS during recovery == 09:19:44 (1713532784) fail_loc=0x00001310 Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:19:47 (1713532787) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:20:00 (1713532800) targets are mounted 09:20:00 (1713532800) facet_failover done will failover at 1 5 10 20 25 30 test_51: failover in 1 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:20:03 (1713532803) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:20:17 (1713532817) targets are mounted 09:20:17 (1713532817) facet_failover done test_51: failover in 5 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:20:24 (1713532824) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:20:38 (1713532838) targets are mounted 09:20:38 (1713532838) facet_failover done test_51: failover in 10 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:20:50 (1713532850) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:21:04 (1713532864) targets are mounted 09:21:04 (1713532864) facet_failover done test_51: failover in 20 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:21:26 (1713532886) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:21:40 (1713532900) targets are mounted 09:21:40 (1713532900) facet_failover done test_51: failover in 25 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:22:07 (1713532927) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:22:21 (1713532941) targets are mounted 09:22:21 (1713532941) facet_failover done test_51: failover in 30 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:22:53 (1713532973) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:23:07 (1713532987) targets are mounted 09:23:07 (1713532987) facet_failover done writemany returned 0 PASS 51 (224s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 52: failover OST under load ======= 09:23:30 (1713533010) writemany pid 23872 Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 09:23:42 (1713533022) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 09:23:55 (1713533035) targets are mounted 09:23:55 (1713533035) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec writemany succeeded writemany pid 24945 Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 09:29:13 (1713533353) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 09:29:27 (1713533367) targets are mounted 09:29:27 (1713533367) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec writemany succeeded writemany pid 26021 Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 09:34:44 (1713533684) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 09:34:57 (1713533697) targets are mounted 09:34:57 (1713533697) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec writemany succeeded Connected clients: oleg326-client.virtnet oleg326-client.virtnet PASS 52 (965s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 53a: touch: drop rep ============== 09:39:37 (1713533977) fail_loc=0x157 Succeed in opening file "/mnt/lustre/f53a.recovery-small"(flags=O_RDWR, mode=755) fail_loc=0 PASS 53a (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 53b: touch: drop rep ============== 09:39:58 (1713533998) fail_loc=0x157 Succeed in opening file "/mnt/lustre/f53b.recovery-small"(flags=O_RDWR, mode=755) fail_loc=0 PASS 53b (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 53c: touch: drop rep ============== 09:40:19 (1713534019) fail_loc=0x157 Succeed in opening file "/mnt/lustre/f53c.recovery-small"(flags=O_RDWR, mode=755) fail_loc=0 PASS 53c (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 54: back in time ================== 09:40:39 (1713534039) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 cat: /mnt/lustre2/f54.recovery-small.missing: No such file or directory Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:40:51 (1713534051) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:41:05 (1713534065) targets are mounted 09:41:05 (1713534065) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 54 (33s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 55: ost_brw_read/write drops timed-out read/write request ========================================================== 09:41:14 (1713534074) step1: testing ...... 4+0 records in 4+0 records out 134217728 bytes (134 MB) copied, 2.71539 s, 49.4 MB/s (dd_pid=32015, time=3)successful fail_loc=0x0000021d step2: testing ...... (dd_pid=32031, time=65)successful fail_loc=0 step3: testing ...... 4+0 records in 4+0 records out 134217728 bytes (134 MB) copied, 81.8571 s, 1.6 MB/s (dd_pid=32031, time=17)successful osc.lustre-OST0000-osc-ffff88012b4f8800.max_dirty_mb=467 osc.lustre-OST0001-osc-ffff88012b4f8800.max_dirty_mb=467 PASS 55 (90s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 56: do not fail on getattr resend ========================================================== 09:42:46 (1713534166) fail_loc=0x80000136 File: '/mnt/lustre/f56.recovery-small' Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144115205339659129 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-19 09:42:47.000000000 -0400 Modify: 2024-04-19 09:42:47.000000000 -0400 Change: 2024-04-19 09:42:47.000000000 -0400 Birth: - fail_loc=0 PASS 56 (42s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 57: read procfs entries causes kernel crash ========================================================== 09:43:30 (1713534210) fail_loc=0x80000B00 Stopping client oleg326-client.virtnet /mnt/lustre (opts:) fail_loc=0x80000B00 Stopping /mnt/lustre-mds1 (opts:) on oleg326-server Failover mds1 to oleg326-server Starting mds1: -o localrecov -o abort_recovery /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 pdsh@oleg326-client: oleg326-client: ssh exited with exit code 95 pdsh@oleg326-client: oleg326-client: ssh exited with exit code 95 /home/green/git/lustre-release/lustre/tests/recovery-small.sh: line 1409: kill: (1274) - No such process fail_loc=0 Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4576 7209464 1% /mnt/lustre PASS 57 (15s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 58: Eviction in the middle of open RPC reply processing ========================================================== 09:43:46 (1713534226) -rw-r--r-- 1 root root 0 Apr 19 09:43 /mnt/lustre/f58.recovery-small fail_loc=0x80000801 fail_loc=0 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800b5d4e800.early_lock_cancel=0 ldlm.namespaces.lustre-MDT0001-mdc-ffff8800b5d4e800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0000-osc-ffff8800b5d4e800.early_lock_cancel=0 ldlm.namespaces.lustre-OST0001-osc-ffff8800b5d4e800.early_lock_cancel=0 fail_loc=0x80000305 fail_loc=0 fail_val=0 ldlm.namespaces.MGC192.168.203.126@tcp.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800b5d4e800.early_lock_cancel=1 ldlm.namespaces.lustre-MDT0001-mdc-ffff8800b5d4e800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0000-osc-ffff8800b5d4e800.early_lock_cancel=1 ldlm.namespaces.lustre-OST0001-osc-ffff8800b5d4e800.early_lock_cancel=1 Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4576 7209464 1% /mnt/lustre PASS 58 (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 59: Read cancel race on client eviction ========================================================== 09:44:07 (1713534247) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 fail_loc=0x311 fail_loc=0 Stopping client oleg326-client.virtnet /mnt/lustre2 (opts:-f) PASS 59 (12s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 60: Add Changelog entries during MDS failover ========================================================== 09:44:21 (1713534261) striped dir -i0 -c2 -H crush2 /mnt/lustre/d60.recovery-small mdd.lustre-MDT0000.changelog_mask=+hsm mdd.lustre-MDT0001.changelog_mask=+hsm Registered 2 changelog users: 'cl1 cl1' - open/close 3795 (time 1713534274.37 total 10.00 last 379.48) total: 5000 open/close in 14.24 seconds: 351.01 ops/second Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:44:45 (1713534285) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:44:59 (1713534299) targets are mounted 09:44:59 (1713534299) facet_failover done - unlinked 0 (time 1713534284 ; total 0 ; last 0) total: 5000 unlinks in 33 seconds: 151.515152 unlinks/second 5000 unlinks in changelog lustre-MDT0000: clear the changelog for cl1 of all records lustre-MDT0000: Deregistered changelog user #1 lustre-MDT0001: clear the changelog for cl1 of all records lustre-MDT0001: Deregistered changelog user #1 lustre-MDT0001: changelog user 'cl1' not found lustre-MDT0000: changelog user 'cl1' not found PASS 60 (61s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 61: Verify to not reuse orphan objects - bug 17025 ========================================================== 09:45:24 (1713534324) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1414116 3112 1284576 1% /mnt/lustre[MDT:0] lustre-MDT0001_UUID 1414116 4884 1282804 1% /mnt/lustre[MDT:1] lustre-OST0000_UUID 3833116 1940 3605080 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3833116 2640 3604380 1% /mnt/lustre[OST:1] filesystem_summary: 7666232 4580 7209460 1% /mnt/lustre total: 10 open/close in 0.05 seconds: 198.00 ops/second Stopping /mnt/lustre-mds1 (opts:) on oleg326-server Failover mds1 to oleg326-server Starting mds1: -o localrecov -o abort_recovery /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 pdsh@oleg326-client: oleg326-client: ssh exited with exit code 5 first stat failed: 5 /home/green/git/lustre-release/lustre/tests/recovery-small.sh: line 1519: [: too many arguments PASS 61 (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 65: lock enqueue for destroyed export ========================================================== 09:45:45 (1713534345) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 fail_loc=0x31e write: Input/output error fail_loc=0 192.168.203.126@tcp:/lustre /mnt/lustre2 lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg326-client.virtnet /mnt/lustre2 (opts:) PASS 65 (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 66: lock enqueue re-send vs client eviction ========================================================== 09:46:06 (1713534366) fail_loc=0x80000157 fail_loc=0x80000136 mdc.lustre-MDT0000-mdc-ffff8800b5d4e800.import=connection=192.168.203.126@tcp /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error stat: cannot stat '/mnt/lustre/f66.recovery-small': Input/output error Connected clients: oleg326-client.virtnet fail_loc=0 PASS 66 (6s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 67: connect vs import invalidate race ========================================================== 09:46:14 (1713534374) fail_loc=0x80000531 /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error mdc.lustre-MDT0000-mdc-ffff8800b5d4e800.import=connection=192.168.203.126@tcp error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown Connected clients: oleg326-client.virtnet oleg326-client.virtnet PASS 67 (17s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 100: IR: Make sure normal recovery still works w/o IR ========================================================== 09:46:33 (1713534393) mgs.MGS.ir_timeout Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 09:46:36 (1713534396) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 09:46:50 (1713534410) targets are mounted 09:46:50 (1713534410) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec PASS 100 (25s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 101a: IR: Make sure IR works w/o normal recovery ========================================================== 09:46:59 (1713534419) mgs.MGS.ir_timeout Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 09:47:02 (1713534422) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 09:47:16 (1713534436) targets are mounted 09:47:16 (1713534436) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec PASS 101a (23s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 101b: IR: Make sure IR works w/o normal recovery and proceed EAGAIN ========================================================== 09:47:24 (1713534444) mgs.MGS.ir_timeout fail_loc=0x247 Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 09:47:27 (1713534447) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 09:48:07 (1713534487) targets are mounted 09:48:07 (1713534487) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec PASS 101b (48s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 102: IR: New client gets updated nidtbl after MGS restart ========================================================== 09:48:14 (1713534494) mgs.MGS.ir_timeout Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 09:48:17 (1713534497) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 09:48:31 (1713534511) targets are mounted 09:48:31 (1713534511) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec Stopping clients: oleg326-client.virtnet /mnt/lustre (opts:) Stopping client oleg326-client.virtnet /mnt/lustre opts: Stopping /mnt/lustre-mds1 (opts:) on oleg326-server Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 09:48:43 (1713534523) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 09:48:57 (1713534537) targets are mounted 09:48:57 (1713534537) facet_failover done pdsh@oleg326-client: oleg326-client: ssh exited with exit code 95 oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid pdsh@oleg326-client: oleg326-client: ssh exited with exit code 95 Starting client oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre Started clients oleg326-client.virtnet: 192.168.203.126@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) PASS 102 (48s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 103: IR: MDS can start w/o MGS and get updated nidtbl later ========================================================== 09:49:04 (1713534544) mgs.MGS.ir_timeout SKIP: recovery-small test_103 needs separate mgs and mds SKIP 103 (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 104: IR: ost can disable IR voluntarily ========================================================== 09:49:08 (1713534548) mgs.MGS.ir_timeout Stopping /mnt/lustre-ost1 (opts:) on oleg326-server Starting ost1: -o localrecov -onoir /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 PASS 104 (11s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 105: IR: NON IR clients support === 09:49:21 (1713534561) SKIP: recovery-small test_105 Needs multiple clients SKIP 105 (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 106: lightweight connection support ========================================================== 09:49:25 (1713534565) fail_loc=0x805 Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 fail_loc=0 UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1414116 2512 1285176 1% /mnt/lustre[MDT:0] lustre-MDT0001_UUID 1414116 4884 1282804 1% /mnt/lustre[MDT:1] lustre-OST0000_UUID 3833116 1944 3605076 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3833116 2640 3604380 1% /mnt/lustre[OST:1] filesystem_summary: 7666232 4584 7209456 1% /mnt/lustre debug=ha Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:49:30 (1713534570) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:49:46 (1713534586) targets are mounted 09:49:46 (1713534586) facet_failover done Debug log: 128 lines, 128 kept, 0 dropped, 0 bad. /mnt/lustre/f106.recovery-small has type file OK 192.168.203.126@tcp:/lustre /mnt/lustre2 lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg326-client.virtnet /mnt/lustre2 (opts:) debug=trace inode super iotrace malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm snapshot layout debug=trace inode super iotrace malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm snapshot layout PASS 106 (26s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 107: drop reint reply, then restart MDT ========================================================== 09:49:53 (1713534593) fail_loc=0x119 fail_loc=0 Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 09:49:56 (1713534596) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 09:50:10 (1713534610) targets are mounted 09:50:10 (1713534610) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec PASS 107 (25s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 108: client eviction don't crash == 09:50:20 (1713534620) dd: error writing '/mnt/lustre/d108.recovery-small/f108.recovery-small': Input/output error 32+0 records in 31+0 records out 33480704 bytes (33 MB) copied, 5.53691 s, 6.0 MB/s PASS 108 (9s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110a: create remote directory: drop client req ========================================================== 09:50:31 (1713534631) fail_val=0 fail_loc=0x123 fail_loc=0 PASS 110a (64s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110b: create remote directory: drop Master rep ========================================================== 09:51:37 (1713534697) fail_loc=0x119 fail_loc=0 PASS 110b (63s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110c: create remote directory: drop update rep on slave MDT ========================================================== 09:52:42 (1713534762) fail_loc=0x1701 fail_loc=0 PASS 110c (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110d: remove remote directory: drop client req ========================================================== 09:53:03 (1713534783) fail_val=0 fail_loc=0x123 fail_loc=0 PASS 110d (63s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110e: remove remote directory: drop master rep ========================================================== 09:54:09 (1713534849) fail_loc=0x119 fail_loc=0 PASS 110e (63s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110f: remove remote directory: drop slave rep ========================================================== 09:55:14 (1713534914) fail_loc=0x1701 fail_loc=0 PASS 110f (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110g: drop reply during migration ========================================================== 09:55:35 (1713534935) fail_loc=0x119 fail_loc=0 PASS 110g (63s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110h: drop update reply during cross-MDT file rename ========================================================== 09:56:40 (1713535000) 0+1 records in 0+1 records out 159 bytes (159 B) copied, 0.0104925 s, 15.2 kB/s fail_loc=0x1701 fail_loc=0 Can't lstat /mnt/lustre/d110h.recovery-small/source_dir/src_file: No such file or directory PASS 110h (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110i: drop update reply during cross-MDT dir rename ========================================================== 09:57:01 (1713535021) fail_loc=0x1701 fail_loc=0 Can't lstat /mnt/lustre/d110i.recovery-small/source_dir/src_dir: No such file or directory /mnt/lustre/d110i.recovery-small/target_dir/tgt_dir has type dir OK /mnt/lustre/d110i.recovery-small/target_dir/tgt_dir/a has type file OK PASS 110i (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110j: drop update reply during cross-MDT ln ========================================================== 09:57:22 (1713535042) fail_loc=0x1701 fail_loc=0 /mnt/lustre/d110j.recovery-small/remote_dir/remote_file has type file OK PASS 110j (19s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110k: FID_QUERY failed during recovery ========================================================== 09:57:43 (1713535063) Stopping /mnt/lustre-mds2 (opts:) on oleg326-server fail_loc=0x80001103 Starting mds2: -o localrecov -o abort_recovery /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0001 fail_loc=0 Stopping /mnt/lustre-mds2 (opts:) on oleg326-server Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0001 Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre PASS 110k (24s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 110m: update resent vs original RPC race ========================================================== 09:58:09 (1713535089) fail_loc=0x80000525 conn_uuid=192.168.203.126@tcp osp.lustre-MDT0000-osp-MDT0001.import=connection=192.168.203.126@tcp PASS 110m (12s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 111: mdd setup fail should not cause umount oops ========================================================== 09:58:23 (1713535103) fail_loc=0x151 Stopping /mnt/lustre-mds1 (opts:) on oleg326-server Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: mount.lustre: mount /dev/mapper/mds1_flakey at /mnt/lustre-mds1 failed: Input/output error oleg326-server: Is the MGS running? pdsh@oleg326-client: oleg326-server: ssh exited with exit code 5 Start of /dev/mapper/mds1_flakey on mds1 failed 5 fail_loc=0 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 PASS 111 (11s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 112a: bulk resend while orignal request is in progress ========================================================== 09:58:36 (1713535116) timeout is 20/20 fail_val=20 fail_loc=0x80000214 fail_loc=0 Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4588 7209452 1% /mnt/lustre PASS 112a (24s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 115a: read: late REQ MDunlink and no bulk ========================================================== 09:59:02 (1713535142) Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4688 7209228 1% /mnt/lustre 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00983615 s, 416 kB/s fail_loc=0x8000051b fail_val=3 fail_val=0 fail_loc=0x8000051a 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 4.01021 s, 1.0 kB/s PASS 115a (5s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 115b: write: late REQ MDunlink and no bulk ========================================================== 09:59:09 (1713535149) Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4592 7209420 1% /mnt/lustre fail_loc=0x8000051b fail_val=4 Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4592 7209420 1% /mnt/lustre fail_val=2 fail_loc=0x80000215 dd: error writing '/mnt/lustre/f115b.recovery-small': No space left on device 1+0 records in 0+0 records out 0 bytes (0 B) copied, 4.01382 s, 0.0 kB/s PASS 115b (5s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 115c: read: late Reply MDunlink and no bulk ========================================================== 09:59:16 (1713535156) Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4592 7209448 1% /mnt/lustre 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00471751 s, 868 kB/s fail_loc=0x8000050f fail_val=3 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00594943 s, 688 kB/s fail_val=0 fail_loc=0x8000051a PASS 115c (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 115d: write: late Reply MDunlink and no bulk ========================================================== 09:59:21 (1713535161) Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4596 7209416 1% /mnt/lustre fail_loc=0x8000050f fail_val=4 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.0148851 s, 275 kB/s Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4596 7209416 1% /mnt/lustre fail_val=0 fail_loc=0x80000215 PASS 115d (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 115e: read: late Bulk MDunlink and no reply ========================================================== 09:59:25 (1713535165) Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4596 7209416 1% /mnt/lustre 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00422186 s, 970 kB/s fail_loc=0x80000510 fail_val=3 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00423719 s, 967 kB/s fail_val=0 fail_loc=0x80000211 PASS 115e (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 115f: read: late REQ MDunlink and no reply ========================================================== 09:59:30 (1713535170) Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4600 7209412 1% /mnt/lustre 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00484222 s, 846 kB/s fail_loc=0x8000051b fail_val=3 fail_val=0 fail_loc=0x80000211 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 4.00735 s, 1.0 kB/s PASS 115f (5s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 115g: read: late REQ MDunlink and Reply MDunlink ========================================================== 09:59:37 (1713535177) Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4608 7209432 1% /mnt/lustre 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00380869 s, 1.1 MB/s fail_loc=0x8000051c fail_val=3 fail_val=0 fail_loc=0 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 59.0115 s, 0.1 kB/s PASS 115g (60s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 120: flock race: completion vs. evict ========================================================== 10:00:39 (1713535239) ** FLOCK REPLY vs. EVICTION race, lock set, CLEANUP cp first fail_loc=0x80000320 FLOCKS_TEST 5: SET write flock ** Evicting and re-connecting client ** /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error Connected clients: oleg326-client.virtnet fcntl cmd 7 failed: Input/output error exiting with rc = 5 fail_loc=0 ** FLOCK REPLY vs. EVICTION race, lock get, CLEANUP cp first ** Taking conflict ** FLOCKS_TEST 5: SET read flock fail_loc=0x80000320 FLOCKS_TEST 5: GET write flock ** Evicting and re-connecting client ** /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error Connected clients: oleg326-client.virtnet fail_loc=0 ** FLOCK REPLY vs. EVICTION race, lock unlock, CLEANUP cp first fail_loc=0x80000320 FLOCKS_TEST 5: UNLOCK write flock ** Evicting and re-connecting client ** /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error Connected clients: oleg326-client.virtnet fail_loc=0 ** FLOCK REPLY vs. EVICTION race, lock set, REPLY cp first fail_loc=0x80000321 FLOCKS_TEST 5: SET write flock ** Evicting and re-connecting client ** /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown fcntl cmd 7 failed: Input/output error exiting with rc = 5 /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown Connected clients: oleg326-client.virtnet oleg326-client.virtnet fail_loc=0 ** FLOCK REPLY vs. EVICTION race, lock get, REPLY cp first ** Taking conflict ** FLOCKS_TEST 5: SET read flock fail_loc=0x80000321 FLOCKS_TEST 5: GET write flock ** Evicting and re-connecting client ** /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error Connected clients: oleg326-client.virtnet fail_loc=0 ** FLOCK REPLY vs. EVICTION race, lock unlock, REPLY cp first fail_loc=0x80000321 FLOCKS_TEST 5: UNLOCK write flock ** Evicting and re-connecting client ** /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown Connected clients: oleg326-client.virtnet fail_loc=0 ** FLOCK REPLY vs. EVICTION race, lock set DEADLOCK, CLEANUP cp first fail_loc=0x80000322 FLOCKS_TEST 5: SET write flock ** Evicting and re-connecting client ** /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error Connected clients: oleg326-client.virtnet fcntl cmd 7 failed: Input/output error exiting with rc = 5 fail_loc=0 ** FLOCK REPLY vs. EVICTION race, lock set DEADLOCK, REPLY cp first fail_loc=0x80000323 FLOCKS_TEST 5: SET write flock ** Evicting and re-connecting client ** /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown fcntl cmd 7 failed: Resource deadlock avoided exiting with rc = 35 /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown Connected clients: oleg326-client.virtnet oleg326-client.virtnet fail_loc=0 FLOCKS_TEST 5: SET write flock fail_loc=0x320 /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Cannot send after transport endpoint shutdown error: invalid path '/mnt/lustre': Cannot send after transport endpoint shutdown bash: /mnt/lustre/recon: Cannot send after transport endpoint shutdown cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown rm: cannot remove '/mnt/lustre/recon': Cannot send after transport endpoint shutdown Connected clients: oleg326-client.virtnet oleg326-client.virtnet fail_loc=0 PASS 120 (62s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 113: ldlm enqueue dropped reply should not cause deadlocks ========================================================== 10:01:43 (1713535303) fail_loc=0x80000157 File: '/mnt/lustre/f113.recovery-small' Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144116010713088002 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-19 10:01:43.000000000 -0400 Modify: 2024-04-19 10:01:43.000000000 -0400 Change: 2024-04-19 10:01:43.000000000 -0400 Birth: - fail_loc=0 fail_loc=0x8000031f Connected clients: oleg326-client.virtnet oleg326-client.virtnet PASS 113 (67s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 130a: enqueue resend on not existing file ========================================================== 10:02:52 (1713535372) striped dir -i0 -c1 -H crush /mnt/lustre/d130a.recovery-small fail_val=0 fail_loc=0x80000160 fail_val=0 fail_loc=0x80000157 stat: cannot stat '/mnt/lustre/d130a.recovery-small': No such file or directory PASS 130a (63s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 130b: enqueue resend on a stale inode ========================================================== 10:03:57 (1713535437) striped dir -i0 -c1 -H all_char /mnt/lustre/d130b.recovery-small fail_val=0 fail_loc=0x80000160 fail_val=0 fail_loc=0x80000157 fail_val=0 fail_loc=0x80000217 stat: cannot stat '/mnt/lustre/d130b.recovery-small': No such file or directory PASS 130b (63s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 130c: layout intent resend on a stale inode ========================================================== 10:05:02 (1713535502) fail_val=0 fail_loc=0x80000160 fail_val=0 fail_loc=0x80000157 fail_val=0 fail_loc=0x80000217 PASS 130c (26s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 132: long punch =================== 10:05:30 (1713535530) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.0181525 s, 226 kB/s fail_val=120 fail_loc=0x80000236 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 119.016 s, 0.0 kB/s 192.168.203.126@tcp:/lustre /mnt/lustre2 lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg326-client.virtnet /mnt/lustre2 (opts:) PASS 132 (123s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 131: IO vs evict results to IO under staled lock ========================================================== 10:07:35 (1713535655) 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00504068 s, 102 kB/s fail_loc=0x80000414 fail_val=4 fail_loc=0x80000414 fail_val=0 fail_loc=0x8000031e uname: write error: Input/output error uname: write error: Input/output error cat: /mnt/lustre/recon: Cannot send after transport endpoint shutdown Connected clients: ls: cannot access /mnt/lustre/recon: Cannot send after transport endpoint shutdown dd: writing to '/mnt/lustre/f131.recovery-small': Cannot send after transport endpoint shutdown 1+0 records in 0+0 records out 0 bytes (0 B) copied, 4.02708 s, 0.0 kB/s PASS 131 (7s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 133: don't fail on flock resend === 10:07:44 (1713535664) bl lock multiop /mnt/lustre/f133.recovery-small vO_jc TMPPIPE=/tmp/multiop_open_wait_pipe.7360 fail_loc=0x80000157 waiting for multiop 15136 bl flock unlocked PASS 133 (42s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 134: race between failover and search for reply data free slot ========================================================== 10:08:28 (1713535708) SKIP: recovery-small test_134 Need 2+ clients, have 1 SKIP 134 (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 135: DOM: open/create resend to return size ========================================================== 10:08:32 (1713535712) fail_loc=0x80000157 Succeed in opening file "/mnt/lustre/d135.recovery-small/f135.recovery-small"(flags=O_RDWR, mode=755) PASS 135 (25s) debug_raw_pointers=0 debug_raw_pointers=0 SKIP: recovery-small test_136 skipping excluded test 136 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 137: late resend must be skipped if already applied ========================================================== 10:08:59 (1713535739) Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 4616 7209424 1% /mnt/lustre 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.0172893 s, 237 kB/s 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00744551 s, 550 kB/s fail_loc=0x80000525 PASS 137 (23s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 138: Umount MDT during recovery === 10:09:25 (1713535765) Stopping clients: oleg326-client.virtnet /mnt/lustre (opts:) Stopping client oleg326-client.virtnet /mnt/lustre opts: fail_loc=0x724 fail_val=5 Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 10:09:37 (1713535777) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 10:09:51 (1713535791) targets are mounted 10:09:51 (1713535791) facet_failover done Stopping /mnt/lustre-mds1 (opts:) on oleg326-server fail_loc=0 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting client oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre Started clients oleg326-client.virtnet: 192.168.203.126@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) PASS 138 (106s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 139: corrupted catid won't cause crash ========================================================== 10:11:12 (1713535872) Stopping /mnt/lustre-mds1 (opts:) on oleg326-server fail_val=0x68 fail_loc=0x80002106 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 PASS 139 (8s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 140a: local mount is flagged properly ========================================================== 10:11:22 (1713535882) mdt.lustre-MDT0000.local_recovery=0 mdt.lustre-MDT0001.local_recovery=0 oleg326-server Starting client: oleg326-server: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all 1 clients with recovery disabled Stopping client oleg326-server /mnt/lustre2 (opts:) pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 mdt.lustre-MDT0000.local_recovery=1 mdt.lustre-MDT0001.local_recovery=1 oleg326-server Starting client: oleg326-server: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all 0 clients with recovery disabled Stopping client oleg326-server /mnt/lustre2 (opts:) pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 mdt.lustre-MDT0000.local_recovery=1 mdt.lustre-MDT0001.local_recovery=1 PASS 140a (10s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 140b: local mount is excluded from recovery ========================================================== 10:11:34 (1713535894) mdt.lustre-MDT0000.local_recovery=0 mdt.lustre-MDT0001.local_recovery=0 oleg326-server Starting client: oleg326-server: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1414116 2660 1285028 1% /mnt/lustre[MDT:0] lustre-MDT0001_UUID 1414116 2240 1285448 1% /mnt/lustre[MDT:1] lustre-OST0000_UUID 3833116 1960 3605060 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3833116 2664 3604356 1% /mnt/lustre[OST:1] filesystem_summary: 7666232 4624 7209416 1% /mnt/lustre Stopping client oleg326-server /mnt/lustre2 (opts:) pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 10:11:42 (1713535902) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 10:11:58 (1713535918) targets are mounted 10:11:58 (1713535918) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec mdt.lustre-MDT0000.local_recovery=1 mdt.lustre-MDT0001.local_recovery=1 PASS 140b (32s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 141: do not lose locks on MGS restart ========================================================== 10:12:08 (1713535928) SKIP: recovery-small test_141 cannot run in local mode or from build tree SKIP 141 (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 142: orphan name stub can be cleaned up in startup ========================================================== 10:12:12 (1713535932) fail_loc=0x165 Stopping /mnt/lustre-mds1 (opts:) on oleg326-server Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 PASS 142 (9s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 143: orphan cleanup thread shouldn't be blocked even delete failed ========================================================== 10:12:23 (1713535943) Stopping /mnt/lustre-mds1 (opts:) on oleg326-server Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 affected facets: mds1 oleg326-server: oleg326-server.virtnet: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 oleg326-server: *.lustre-MDT0000.recovery_status status: WAITING_FOR_CLIENTS oleg326-server: Waiting 1470 secs for *.lustre-MDT0000.recovery_status recovery done. status: WAITING_FOR_CLIENTS oleg326-server: *.lustre-MDT0000.recovery_status status: COMPLETE PASS 143 (16s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 144a: MDT failover should stop precreation threads ========================================================== 10:12:42 (1713535962) striped dir -i0 -c1 -H crush /mnt/lustre/d144a.recovery-small timeout=300 Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 10:12:45 (1713535965) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 10:12:59 (1713535979) targets are mounted 10:12:59 (1713535979) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 10:14:05 (1713536045) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 10:14:19 (1713536059) targets are mounted 10:14:19 (1713536059) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 10:14:26 (1713536066) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 10:14:40 (1713536080) targets are mounted 10:14:40 (1713536080) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec timeout=20 PASS 144a (137s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 144b: orphan cleanup shouldn't be blocked for no objects+failover situation ========================================================== 10:15:01 (1713536101) striped dir -i0 -c1 -H crush /mnt/lustre/d144b.recovery-small timeout=300 Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 10:15:05 (1713536105) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 10:15:21 (1713536121) targets are mounted 10:15:21 (1713536121) facet_failover done oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec touch: cannot touch '/mnt/lustre/d144b.recovery-small/28': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/29': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/30': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/31': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/32': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/33': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/34': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/35': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/36': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/37': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/38': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/42': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/41': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/39': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/44': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/40': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/43': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/45': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/46': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/47': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/48': File too large touch: cannot touch '/mnt/lustre/d144b.recovery-small/49': File too large rc 0 timeout=20 PASS 144b (92s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 144c: reconnection during orphan cleanup shouldn't lose LAST_ID synchronization ========================================================== 10:16:35 (1713536195) - open/close 2055 (time 1713536206.78 total 10.00 last 205.41) - open/close 4099 (time 1713536216.78 total 20.01 last 204.38) - open/close 5980 (time 1713536226.78 total 30.01 last 188.06) - open/close 7731 (time 1713536236.78 total 40.01 last 175.07) total: 9000 open/close in 47.58 seconds: 189.17 ops/second Stopping /mnt/lustre-mds1 (opts:) on oleg326-server fail_loc=0x0000254 fail_val=5 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 affected facets: mds1 oleg326-server: oleg326-server.virtnet: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 oleg326-server: *.lustre-MDT0000.recovery_status status: WAITING_FOR_CLIENTS oleg326-server: Waiting 1470 secs for *.lustre-MDT0000.recovery_status recovery done. status: WAITING_FOR_CLIENTS oleg326-server: *.lustre-MDT0000.recovery_status status: COMPLETE fail_loc=0 fail_val=0 PASS 144c (77s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 145: connect mdtlovs and process update logs after recovery expire ========================================================== 10:17:55 (1713536275) SKIP: recovery-small test_145 needs >= 3 MDTs SKIP 145 (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 146: test eviction is counted properly ========================================================== 10:17:58 (1713536278) /home/green/git/lustre-release/lustre/tests/test-framework.sh: line 3963: /mnt/lustre/recon: Input/output error Connected clients: oleg326-client.virtnet PASS 146 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 147: Check client reconnect ======= 10:18:03 (1713536283) timeout=200 fail_loc=0x00000225 timeout=200 osc.lustre-OST0000-osc-ffff88013731c000.state= current_state: CONNECTING state_history: - [ 1713536148, CONNECTING ] - [ 1713536148, IDLE ] - [ 1713536279, CONNECTING ] - [ 1713536279, FULL ] - [ 1713536285, DISCONN ] - [ 1713536285, CONNECTING ] - [ 1713536300, DISCONN ] - [ 1713536300, CONNECTING ] - [ 1713536320, DISCONN ] - [ 1713536320, CONNECTING ] - [ 1713536345, DISCONN ] - [ 1713536345, CONNECTING ] - [ 1713536375, DISCONN ] - [ 1713536375, CONNECTING ] - [ 1713536410, DISCONN ] - [ 1713536410, CONNECTING ] 6 fail_loc=0 timeout=20 timeout=20 Connected clients: oleg326-client.virtnet oleg326-client.virtnet PASS 147 (167s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 148: data corruption through resend ========================================================== 10:20:52 (1713536452) at_max=0 at_max=0 oleg326-server: error: get_param: param_path 'obdfilter/lustre-OST0000/writethrough_cache_enable': No such file or directory oleg326-server: error: set_param: param_path 'obdfilter/lustre-OST0000/writethrough_cache_enable': No such file or directory oleg326-server: error: set_param: setting 'obdfilter/lustre-OST0000/writethrough_cache_enable'='0': No such file or directory pdsh@oleg326-client: oleg326-server: ssh exited with exit code 2 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.0124293 s, 330 kB/s fail_loc=0x80000227 fail_val=27 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 27.1299 s, 0.2 kB/s 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00973813 s, 421 kB/s at_max=600 at_max=600 oleg326-server: error: set_param: setting : Invalid argument pdsh@oleg326-client: oleg326-server: ssh exited with exit code 22 PASS 148 (34s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 149: skip orphan removal at umount ========================================================== 10:21:29 (1713536489) striped dir -i0 -c2 -H crush2 /mnt/lustre/d149.recovery-small Stopping /mnt/lustre-mds2 (opts:-f) on oleg326-server Stopping /mnt/lustre-mds1 (opts:-f) on oleg326-server Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0001 PASS 149 (29s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 150: statfs when MDT0 offline with lazystatfs option ========================================================== 10:22:00 (1713536520) llite.lustre-ffff88013731c000.lazystatfs=1 llite.lustre-ffff88013731c000.statahead_max=0 Stopping /mnt/lustre-mds1 (opts:) on oleg326-server Filesystem 1K-blocks Used Available Use% Mounted on 192.168.203.126@tcp:/lustre 7666232 5936 7208104 1% /mnt/lustre Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 affected facets: mds1 oleg326-server: oleg326-server.virtnet: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 oleg326-server: *.lustre-MDT0000.recovery_status status: WAITING_FOR_CLIENTS oleg326-server: Waiting 1470 secs for *.lustre-MDT0000.recovery_status recovery done. status: WAITING_FOR_CLIENTS oleg326-server: *.lustre-MDT0000.recovery_status status: COMPLETE llite.lustre-ffff88013731c000.statahead_max=128 llite.lustre-ffff88013731c000.lazystatfs=1 PASS 150 (15s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 152: QoS object allocation could be awakened in case of OST failover ========================================================== 10:22:17 (1713536537) SKIP: recovery-small test_152 MDS Linux kernel does not support killable semaphore SKIP 152 (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 153: evict vs reconnect race ====== 10:22:21 (1713536541) fail_loc=0x174 Stopping /mnt/lustre-mds1 (opts:) on oleg326-server fail_loc=0 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 affected facets: mds1 oleg326-server: oleg326-server.virtnet: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 oleg326-server: *.lustre-MDT0000.recovery_status status: RECOVERING oleg326-server: Waiting 1470 secs for *.lustre-MDT0000.recovery_status recovery done. status: RECOVERING oleg326-server: *.lustre-MDT0000.recovery_status status: COMPLETE PASS 153 (38s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 154a: corruption update llog can be skipped ========================================================== 10:23:01 (1713536581) Stopping /mnt/lustre-mds2 (opts:) on oleg326-server replace file /mnt/lustre-mds2/update_log_dir/[0x240000409:0x1:0x0] oleg326-server: 100+0 records in oleg326-server: 100+0 records out oleg326-server: 51200 bytes (51 kB) copied, 0.00562963 s, 9.1 MB/s Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0001 Stopping /mnt/lustre-mds1 (opts:) on oleg326-server Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 waiting mds1 recovery.... affected facets: mds1 oleg326-server: oleg326-server.virtnet: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 20 oleg326-server: *.lustre-MDT0000.recovery_status status: WAITING oleg326-server: Waiting 15 secs for *.lustre-MDT0000.recovery_status recovery done. status: WAITING oleg326-server: *.lustre-MDT0000.recovery_status status: COMPLETE status: COMPLETE PASS 154a (22s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 154b: restore update llog after failed recovery ========================================================== 10:23:25 (1713536605) Stopping /mnt/lustre-mds1 (opts:) on oleg326-server fail_loc=0x0724 fail_val=5 Starting mds1: -o localrecov -o abort_recov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 waiting mds1 recovery.... affected facets: mds1 oleg326-server: oleg326-server.virtnet: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 30 oleg326-server: *.lustre-MDT0000.recovery_status status: COMPLETE status: COMPLETE Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre2 192.168.203.126@tcp:/lustre /mnt/lustre2 lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg326-client.virtnet /mnt/lustre2 (opts:) Stopping client oleg326-client.virtnet /mnt/lustre (opts:) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre fail_loc=0 PASS 154b (17s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 155: failover after client remount ========================================================== 10:23:44 (1713536624) Stopping client oleg326-client.virtnet /mnt/lustre (opts:) Starting client: oleg326-client.virtnet: -o user_xattr,flock oleg326-server@tcp:/lustre /mnt/lustre Replay barrier on lustre-MDT0000 Failing mds1 on oleg326-server Stopping /mnt/lustre-mds1 (opts:) on oleg326-server 10:23:49 (1713536629) shut down Failover mds1 to oleg326-server mount facets: mds1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-MDT0000 10:24:05 (1713536645) targets are mounted 10:24:05 (1713536645) facet_failover done PASS 155 (28s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 156: tot_granted miscount after client eviction ========================================================== 10:24:15 (1713536655) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1414116 3708 1283980 1% /mnt/lustre[MDT:0] lustre-MDT0001_UUID 1414116 2224 1285464 1% /mnt/lustre[MDT:1] lustre-OST0000_UUID 3833116 2624 3604396 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3833116 3312 3603708 1% /mnt/lustre[OST:1] filesystem_summary: 7666232 5936 7208104 1% /mnt/lustre UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1414116 3708 1283980 1% /mnt/lustre[MDT:0] lustre-MDT0001_UUID 1414116 2224 1285464 1% /mnt/lustre[MDT:1] lustre-OST0000_UUID 3833116 2624 3604396 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 3833116 3312 3603708 1% /mnt/lustre[OST:1] filesystem_summary: 7666232 5936 7208104 1% /mnt/lustre fail_loc=0x80000536 fail_val=45 Failing ost1 on oleg326-server Stopping /mnt/lustre-ost1 (opts:) on oleg326-server 10:24:21 (1713536661) shut down Failover ost1 to oleg326-server mount facets: ost1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg326-server: oleg326-server.virtnet: executing set_default_debug -1 all pdsh@oleg326-client: oleg326-server: ssh exited with exit code 1 Started lustre-OST0000 10:24:37 (1713536677) targets are mounted 10:24:37 (1713536677) facet_failover done pdsh@oleg326-client: oleg326-client: ssh exited with exit code 5 oleg326-client.virtnet: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec PASS 156 (70s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == recovery-small test 157: eviction during mmaped i/o === 10:25:27 (1713536727) 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00473205 s, 866 kB/s fail_loc=0x80001432 fail_val=3 /home/green/git/lustre-release/lustre/tests/recovery-small.sh: line 3618: 12375 Bus error $MULTIOP $DIR/$tfile soO_RDWR:MRUc PASS 157 (5s) debug_raw_pointers=0 debug_raw_pointers=0 == recovery-small test complete, duration 5736 sec ======= 10:25:33 (1713536733) === recovery-small: start cleanup 10:25:34 (1713536734) === === recovery-small: finish cleanup 10:26:53 (1713536813) ===