-----============= acceptance-small: sanity-lfsck ============----- Wed Apr 17 04:56:00 EDT 2024 excepting tests: 23b Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions client=34553367 MDS=34553367 OSS=34553367 Stopping clients: oleg204-client.virtnet /mnt/lustre (opts:) Stopping client oleg204-client.virtnet /mnt/lustre opts: Stopping clients: oleg204-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg204-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg204-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg204-server unloading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing unload_modules_local modules unloaded. === sanity-lfsck: start setup 04:56:46 (1713344206) === Stopping clients: oleg204-client.virtnet /mnt/lustre (opts:-f) Stopping clients: oleg204-client.virtnet /mnt/lustre2 (opts:-f) pdsh@oleg204-client: oleg204-server: ssh exited with exit code 2 oleg204-server: oleg204-server.virtnet: executing set_hostid Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions oleg204-server: ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' oleg204-server: quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: lustre-mdt1/mdt1 Format ost1: lustre-ost1/ost1 Format ost2: lustre-ost2/ost2 Checking servers environments Checking clients oleg204-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Commit the device label on lustre-mdt1/mdt1 Started lustre-MDT0000 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Commit the device label on lustre-ost1/ost1 Started lustre-OST0000 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Commit the device label on lustre-ost2/ost2 Started lustre-OST0001 Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Starting client oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Started clients oleg204-client.virtnet: 192.168.202.104@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b5c1b800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b5c1b800.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 2s: want 'procname_uid' got 'procname_uid' disable quota as required === sanity-lfsck: finish setup 04:57:47 (1713344267) === debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 0: Control LFSCK manually =========== 04:57:49 (1713344269) preparing... 3 * 3 files will be created Wed Apr 17 04:57:49 EDT 2024. total: 3 mkdir in 0.01 seconds: 446.42 ops/second total: 3 create in 0.01 seconds: 504.39 ops/second total: 3 mkdir in 0.00 seconds: 645.84 ops/second prepared Wed Apr 17 04:57:50 EDT 2024. fail_val=3 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace name: lfsck_namespace magic: 0xa06249ff version: 2 status: scanning-phase1 flags: param: last_completed_time: N/A time_since_last_completed: N/A latest_start_time: 1713344271 time_since_latest_start: 1 seconds last_checkpoint_time: N/A time_since_last_checkpoint: N/A latest_start_position: 114, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 0 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 0 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 agent_entries_repaired: 0 success_count: 0 run_time_phase1: 1 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A average_speed_total: 0 items/sec real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 113, N/A, N/A Stopped LFSCK on the device lustre-MDT0000. Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 Started LFSCK on the device lustre-MDT0000: scrub namespace stopall, should NOT crash LU-3649 Stopping clients: oleg204-client.virtnet /mnt/lustre (opts:) Stopping client oleg204-client.virtnet /mnt/lustre opts: Stopping clients: oleg204-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg204-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg204-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg204-server PASS 0 (20s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 1a: LFSCK can find out and repair crashed FID-in-dirent ========================================================== 04:58:11 (1713344291) Checking servers environments Checking clients oleg204-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions oleg204-server: libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-OST0000 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-OST0001 Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Starting client oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Started clients oleg204-client.virtnet: 192.168.202.104@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b6027000.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b6027000.idle_timeout=debug disable quota as required preparing... 1 * 1 files will be created Wed Apr 17 04:58:38 EDT 2024. total: 1 mkdir in 0.00 seconds: 457.44 ops/second total: 1 create in 0.00 seconds: 492.12 ops/second total: 1 mkdir in 0.00 seconds: 563.07 ops/second prepared Wed Apr 17 04:58:39 EDT 2024. fail_loc=0x1501 fail_loc=0 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 1a (31s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 1b: LFSCK can find out and repair the missing FID-in-LMA ========================================================== 04:58:44 (1713344324) SKIP: sanity-lfsck test_1b OI Scrub not implemented for ZFS SKIP 1b (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 1c: LFSCK can find out and repair lost FID-in-dirent ========================================================== 04:58:46 (1713344326) preparing... 1 * 1 files will be created Wed Apr 17 04:58:47 EDT 2024. total: 1 mkdir in 0.00 seconds: 307.68 ops/second total: 1 create in 0.00 seconds: 388.94 ops/second total: 1 mkdir in 0.00 seconds: 351.25 ops/second prepared Wed Apr 17 04:58:49 EDT 2024. fail_loc=0x1504 fail_loc=0 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 1c (6s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2a: LFSCK can find out and repair crashed linkEA entry ========================================================== 04:58:54 (1713344334) preparing... 1 * 1 files will be created Wed Apr 17 04:58:55 EDT 2024. total: 1 mkdir in 0.00 seconds: 343.99 ops/second total: 1 create in 0.00 seconds: 450.27 ops/second total: 1 mkdir in 0.00 seconds: 329.38 ops/second prepared Wed Apr 17 04:58:56 EDT 2024. fail_loc=0x1603 fail_loc=0 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre PASS 2a (5s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2b: LFSCK can find out and remove invalid linkEA entry ========================================================== 04:59:00 (1713344340) preparing... 1 * 1 files will be created Wed Apr 17 04:59:01 EDT 2024. total: 1 mkdir in 0.00 seconds: 364.85 ops/second total: 1 create in 0.00 seconds: 343.29 ops/second total: 1 mkdir in 0.00 seconds: 398.24 ops/second prepared Wed Apr 17 04:59:02 EDT 2024. fail_loc=0x1604 fail_loc=0 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre PASS 2b (4s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2c: LFSCK can find out and remove repeated linkEA entry ========================================================== 04:59:06 (1713344346) preparing... 1 * 1 files will be created Wed Apr 17 04:59:07 EDT 2024. total: 1 mkdir in 0.01 seconds: 129.94 ops/second total: 1 create in 0.00 seconds: 319.59 ops/second total: 1 mkdir in 0.00 seconds: 220.40 ops/second prepared Wed Apr 17 04:59:09 EDT 2024. fail_loc=0x1605 fail_loc=0 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre PASS 2c (9s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2d: LFSCK can recover the missing linkEA entry ========================================================== 04:59:18 (1713344358) preparing... 1 * 1 files will be created Wed Apr 17 04:59:20 EDT 2024. total: 1 mkdir in 0.01 seconds: 186.64 ops/second total: 1 create in 0.01 seconds: 135.96 ops/second total: 1 mkdir in 0.00 seconds: 243.78 ops/second prepared Wed Apr 17 04:59:23 EDT 2024. fail_loc=0x161d fail_loc=0 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre PASS 2d (12s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2e: namespace LFSCK can verify remote object linkEA ========================================================== 04:59:32 (1713344372) SKIP: sanity-lfsck test_2e needs >= 2 MDTs SKIP 2e (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 3: LFSCK can verify multiple-linked objects ========================================================== 04:59:34 (1713344374) preparing... 4 * 4 files will be created Wed Apr 17 04:59:35 EDT 2024. total: 4 mkdir in 0.01 seconds: 494.80 ops/second total: 4 create in 0.01 seconds: 582.16 ops/second total: 4 mkdir in 0.01 seconds: 477.67 ops/second prepared Wed Apr 17 04:59:36 EDT 2024. fail_loc=0x1603 fail_loc=0x1604 fail_loc=0 Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 3 (6s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 4: FID-in-dirent can be rebuilt after MDT file-level backup/restore ========================================================== 04:59:41 (1713344381) SKIP: sanity-lfsck test_4 OI Scrub not implemented for ZFS SKIP 4 (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 5: LFSCK can handle IGIF object upgrading ========================================================== 04:59:44 (1713344384) SKIP: sanity-lfsck test_5 OI Scrub not implemented for ZFS SKIP 5 (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 6a: LFSCK resumes from last checkpoint (1) ========================================================== 04:59:47 (1713344387) preparing... 5 * 5 files will be created Wed Apr 17 04:59:48 EDT 2024. total: 5 mkdir in 0.01 seconds: 721.39 ops/second total: 5 create in 0.01 seconds: 710.85 ops/second total: 5 mkdir in 0.01 seconds: 570.65 ops/second prepared Wed Apr 17 04:59:49 EDT 2024. fail_val=1 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x80001608 Waiting 32s for 'failed' Updated after 2s: want 'failed' got 'failed' fail_val=1 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 PASS 6a (12s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 6b: LFSCK resumes from last checkpoint (2) ========================================================== 05:00:01 (1713344401) preparing... 5 * 5 files will be created Wed Apr 17 05:00:03 EDT 2024. total: 5 mkdir in 0.02 seconds: 224.81 ops/second total: 5 create in 0.02 seconds: 272.64 ops/second total: 5 mkdir in 0.02 seconds: 275.24 ops/second prepared Wed Apr 17 05:00:05 EDT 2024. fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x80001609 Waiting 32s for 'failed' fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Additional debug for 6b name: lfsck_namespace magic: 0xa06249ff version: 2 status: scanning-phase1 flags: param: last_completed_time: 1713344397 time_since_last_completed: 19 seconds latest_start_time: 1713344415 time_since_latest_start: 1 seconds last_checkpoint_time: 1713344413 time_since_last_checkpoint: 3 seconds latest_start_position: 1803, [0x200000402:0x1:0x0], 0x0 last_checkpoint_position: 1511, [0x200000402:0x1:0x0], 0x0 first_failure_position: N/A, N/A, N/A checked_phase1: 4 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 2 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 agent_entries_repaired: 0 success_count: 10 run_time_phase1: 7 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A average_speed_total: 0 items/sec real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 1802, [0x200000402:0x1:0x0], 0x0 fail_loc=0 fail_val=0 PASS 6b (16s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 7a: non-stopped LFSCK should auto restarts after MDS remount (1) ========================================================== 05:00:19 (1713344419) preparing... 5 * 5 files will be created Wed Apr 17 05:00:20 EDT 2024. total: 5 mkdir in 0.01 seconds: 579.31 ops/second total: 5 create in 0.01 seconds: 380.29 ops/second total: 5 mkdir in 0.01 seconds: 544.84 ops/second prepared Wed Apr 17 05:00:22 EDT 2024. 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace stop mds1 fail_loc=0 fail_val=0 start mds1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 PASS 7a (13s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 7b: non-stopped LFSCK should auto restarts after MDS remount (2) ========================================================== 05:00:34 (1713344434) Checking servers environments Checking clients oleg204-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions oleg204-server: libkmod: kmod_module_get_holders: could not open '/sys/module/intel_rapl/holders': No such file or directory Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg204-server: mount.lustre: according to /etc/mtab lustre-mdt1/mdt1 is already mounted on /mnt/lustre-mds1 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 Start of lustre-mdt1/mdt1 on mds1 failed 17 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 oleg204-server: mount.lustre: according to /etc/mtab lustre-ost1/ost1 is already mounted on /mnt/lustre-ost1 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of lustre-ost1/ost1 on ost1 failed 17 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 oleg204-server: mount.lustre: according to /etc/mtab lustre-ost2/ost2 is already mounted on /mnt/lustre-ost2 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of lustre-ost2/ost2 on ost2 failed 17 Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Starting client oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Started clients oleg204-client.virtnet: 192.168.202.104@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b5cc6000.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b5cc6000.idle_timeout=debug disable quota as required preparing... 2 * 2 files will be created Wed Apr 17 05:00:50 EDT 2024. total: 2 mkdir in 0.00 seconds: 457.37 ops/second total: 2 create in 0.00 seconds: 426.99 ops/second total: 2 mkdir in 0.00 seconds: 501.62 ops/second prepared Wed Apr 17 05:00:51 EDT 2024. fail_loc=0x1604 fail_val=1 fail_loc=0x1602 Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) stop mds1 fail_loc=0 fail_val=0 start mds1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 PASS 7b (25s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 8: LFSCK state machine ============== 05:01:01 (1713344461) formatall oleg204-server: oleg204-server.virtnet: executing set_hostid oleg204-server: oleg204-server.virtnet: executing load_modules_local setupall oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Using TIMEOUT=20 preparing... 20 * 20 files will be created Wed Apr 17 05:02:01 EDT 2024. total: 20 mkdir in 0.03 seconds: 768.79 ops/second total: 20 create in 0.02 seconds: 912.24 ops/second total: 20 mkdir in 0.04 seconds: 563.11 ops/second prepared Wed Apr 17 05:02:02 EDT 2024. fail_loc=0x1603 fail_loc=0x1604 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) fail_val=2 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Stopped LFSCK on the device lustre-MDT0000. Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x80001609 Waiting 32s for 'failed' Updated after 3s: want 'failed' got 'failed' fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x160a stop mds1 fail_loc=0x160b start mds1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace stop mds1 fail_loc=0x160b start mds1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 stop mds1 start mds1 without resume LFSCK oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 fail_val=2 fail_loc=0x1602 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 PASS 8 (92s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 9a: LFSCK speed control (1) ========= 05:02:35 (1713344555) Checking servers environments Checking clients oleg204-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg204-server: mount.lustre: according to /etc/mtab lustre-mdt1/mdt1 is already mounted on /mnt/lustre-mds1 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 Start of lustre-mdt1/mdt1 on mds1 failed 17 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 oleg204-server: mount.lustre: according to /etc/mtab lustre-ost1/ost1 is already mounted on /mnt/lustre-ost1 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of lustre-ost1/ost1 on ost1 failed 17 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 oleg204-server: mount.lustre: according to /etc/mtab lustre-ost2/ost2 is already mounted on /mnt/lustre-ost2 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of lustre-ost2/ost2 on ost2 failed 17 Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Starting client oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Started clients oleg204-client.virtnet: 192.168.202.104@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b6f5e800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b6f5e800.idle_timeout=debug disable quota as required - open/close 3708 (time 1713344583.56 total 10.00 last 370.72) total: 5000 open/close in 14.07 seconds: 355.25 ops/second Started LFSCK on the device lustre-MDT0000: scrub layout PASS 9a (56s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 9b: LFSCK speed control (2) ========= 05:03:32 (1713344612) preparing... 0 * 0 files will be created Wed Apr 17 05:03:41 EDT 2024. prepared Wed Apr 17 05:03:42 EDT 2024. Preparing another 50 * 50 files (with error) at Wed Apr 17 05:03:42 EDT 2024. fail_loc=0x1604 total: 50 mkdir in 0.13 seconds: 387.14 ops/second total: 50 create in 0.07 seconds: 721.41 ops/second fail_loc=0x160c Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 10s for 'stopped' fail_loc=0 Prepared at Wed Apr 17 05:03:48 EDT 2024. Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 9b (39s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 10: System is available during LFSCK scanning ========================================================== 05:04:12 (1713344652) SKIP: sanity-lfsck test_10 lookup(..)/linkea on ZFS issue SKIP 10 (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 11a: LFSCK can rebuild lost last_id ========================================================== 05:04:15 (1713344655) total: 64 open/close in 0.17 seconds: 385.13 ops/second stopall remove LAST_ID on ost1: idx=0 removed '/mnt/lustre-ost1/O/0/LAST_ID' oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 fail_val=3 fail_loc=0x160e trigger LFSCK for layout on ost1 to rebuild the LAST_ID(s) Started LFSCK on the device lustre-OST0000: scrub layout fail_val=0 fail_loc=0 Waiting 32s for 'completed' Updated after 2s: want 'completed' got 'completed' the LAST_ID(s) should have been rebuilt PASS 11a (88s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 11b: LFSCK can rebuild crashed last_id ========================================================== 05:05:44 (1713344744) Checking servers environments Checking clients oleg204-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 oleg204-server: mount.lustre: according to /etc/mtab lustre-ost1/ost1 is already mounted on /mnt/lustre-ost1 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of lustre-ost1/ost1 on ost1 failed 17 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-OST0001 Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Starting client oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Started clients oleg204-client.virtnet: 192.168.202.104@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b5cc6800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b5cc6800.idle_timeout=debug disable quota as required set fail_loc=0x160d to skip the updating LAST_ID on-disk fail_loc=0x160d total: 64 open/close in 0.16 seconds: 390.89 ops/second 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) Stopping /mnt/lustre-ost1 (opts:) on oleg204-server fail_loc=0x215 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-OST0000 the on-disk LAST_ID should be smaller than the expected one trigger LFSCK for layout on ost1 to rebuild the on-disk LAST_ID Started LFSCK on the device lustre-OST0000: scrub layout Stopping /mnt/lustre-ost1 (opts:) on oleg204-server Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-OST0000 the on-disk LAST_ID should have been rebuilt fail_loc=0 Stopping clients: oleg204-client.virtnet /mnt/lustre (opts:) Stopping clients: oleg204-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg204-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg204-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg204-server PASS 11b (55s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 12a: single command to trigger LFSCK on all devices ========================================================== 05:06:41 (1713344801) SKIP: sanity-lfsck test_12a needs >= 2 MDTs SKIP 12a (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 12b: auto detect Lustre device ====== 05:06:43 (1713344803) Checking servers environments Checking clients oleg204-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-OST0000 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Started lustre-OST0001 Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Starting client oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Started clients oleg204-client.virtnet: 192.168.202.104@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800aa46b000.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800aa46b000.idle_timeout=debug disable quota as required Start LFSCK without '-M' specified. Started LFSCK on the device lustre-MDT0000: scrub layout namespace PASS 12b (28s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 13: LFSCK can repair crashed lmm_oi ========================================================== 05:07:12 (1713344832) ##### The lmm_oi in layout EA should be consistent with the MDT-object FID; otherwise, the LFSCK should re-generate the lmm_oi from the MDT-object FID. ##### Inject failure stub to simulate bad lmm_oi fail_loc=0x160f total: 1 open/close in 0.00 seconds: 203.67 ops/second fail_loc=0 Trigger layout LFSCK to find out the bad lmm_oi and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 13 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 14a: LFSCK can repair MDT-object with dangling LOV EA reference (1) ========================================================== 05:07:16 (1713344836) ##### The OST-object referenced by the MDT-object should be there; otherwise, the LFSCK should re-create the missing OST-object. without '--delay-create-ostobj' option. ##### Inject failure stub to simulate dangling referenced MDT-object fail_loc=0x1610 total: 47 open/close in 0.13 seconds: 360.60 ops/second touch: setting times of '/mnt/lustre/d14a.sanity-lfsck/guard0': No such file or directory touch: setting times of '/mnt/lustre/d14a.sanity-lfsck/guard1': No such file or directory fail_loc=0 debug=-1 debug_mb=150 debug=-1 debug_mb=150 total: 30 open/close in 0.18 seconds: 171.26 ops/second 'ls' should fail because of dangling referenced MDT-object Trigger layout LFSCK to find out dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should fail because of not repair dangling by default Trigger layout LFSCK to repair dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should success after layout LFSCK repairing debug_mb=21 debug_mb=21 stopall to cleanup object cache setupall oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Using TIMEOUT=20 PASS 14a (40s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 14b: LFSCK can repair MDT-object with dangling LOV EA reference (2) ========================================================== 05:07:58 (1713344878) ##### The OST-object referenced by the MDT-object should be there; otherwise, the LFSCK should re-create the missing OST-object. with '--delay-create-ostobj' option. ##### Inject failure stub to simulate dangling referenced MDT-object fail_loc=0x1610 total: 63 open/close in 0.18 seconds: 345.31 ops/second touch: setting times of '/mnt/lustre/d14b.sanity-lfsck/guard': No such file or directory fail_loc=0 debug=-1 debug_mb=150 debug=-1 debug_mb=150 total: 32 open/close in 0.19 seconds: 167.20 ops/second 'ls' should fail because of dangling referenced MDT-object Trigger layout LFSCK to find out dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should fail because of not repair dangling by default Trigger layout LFSCK to repair dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should success after layout LFSCK repairing debug_mb=21 debug_mb=21 stopall to cleanup object cache setupall libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: libkmod: kmod_module_get_holders: could not open '/sys/module/intel_rapl/holders': No such file or directory oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Using TIMEOUT=20 PASS 14b (41s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 15a: LFSCK can repair unmatched MDT-object/OST-object pairs (1) ========================================================== 05:08:41 (1713344921) ##### If the OST-object referenced by the MDT-object back points to some non-exist MDT-object, then the LFSCK should repair the OST-object to back point to the right MDT-object. ##### Inject failure stub to make the OST-object to back point to non-exist MDT-object. fail_loc=0x1611 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0106796 s, 98.2 MB/s 257+0 records in 257+0 records out 1052672 bytes (1.1 MB) copied, 0.0631113 s, 16.7 MB/s fail_loc=0 Trigger layout LFSCK to find out unmatched pairs and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 15a (4s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 15b: LFSCK can repair unmatched MDT-object/OST-object pairs (2) ========================================================== 05:08:46 (1713344926) ##### If the OST-object referenced by the MDT-object back points to other MDT-object that doesn't recognize the OST-object, then the LFSCK should repair it to back point to the right MDT-object (the first one). ##### 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0103428 s, 101 MB/s Inject failure stub to make the OST-object to back point to other MDT-object fail_loc=0x1612 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00993612 s, 106 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.029236 s, 71.7 MB/s fail_loc=0 Trigger layout LFSCK to find out unmatched pairs and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 15b (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 15c: LFSCK can repair unmatched MDT-object/OST-object pairs (3) ========================================================== 05:08:51 (1713344931) SKIP: sanity-lfsck test_15c needs >= 2 MDTs SKIP 15c (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 15d: LFSCK don't crash upon dir migration failure ========================================================== 05:08:53 (1713344933) SKIP: sanity-lfsck test_15d needs >= 2 MDTs SKIP 15d (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 16: LFSCK can repair inconsistent MDT-object/OST-object owner ========================================================== 05:08:56 (1713344936) ##### If the OST-object's owner information does not match the owner information stored in the MDT-object, then the LFSCK trust the MDT-object and update the OST-object's owner information. ##### 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00893464 s, 117 MB/s running as uid/gid/euid/egid 500/500/500/500, groups: [createmany] [-o] [/mnt/lustre/d16.sanity-lfsck/d1/o] [100] total: 100 open/close in 0.27 seconds: 375.69 ops/second Inject failure stub to skip OST-object owner changing fail_loc=0x1613 fail_loc=0 Trigger layout LFSCK to find out inconsistent OST-object owner and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 16 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 17: LFSCK can repair multiple references ========================================================== 05:09:01 (1713344941) ##### If more than one MDT-objects reference the same OST-object, and the OST-object only recognizes one MDT-object, then the LFSCK should create new OST-objects for such non-recognized MDT-objects. ##### Inject failure stub to make two MDT-objects to refernce the OST-object fail_val=0 fail_loc=0x1614 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0102198 s, 103 MB/s total: 1 open/close in 0.01 seconds: 97.92 ops/second fail_loc=0 fail_val=0 /mnt/lustre/d17.sanity-lfsck/f0 and /mnt/lustre/d17.sanity-lfsck/guard use the same OST-objects /mnt/lustre/d17.sanity-lfsck/f1 and /mnt/lustre/d17.sanity-lfsck/guard use the same OST-objects Trigger layout LFSCK to find out multiple refenced MDT-objects and fix them Started LFSCK on the device lustre-MDT0000: scrub layout /mnt/lustre/d17.sanity-lfsck/f0 and /mnt/lustre/d17.sanity-lfsck/guard should use diff OST-objects 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0146755 s, 143 MB/s /mnt/lustre/d17.sanity-lfsck/f1 and /mnt/lustre/d17.sanity-lfsck/guard should use diff OST-objects 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0178968 s, 117 MB/s PASS 17 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18a: Find out orphan OST-object and repair it (1) ========================================================== 05:09:06 (1713344946) ##### The target MDT-object is there, but related stripe information is lost or partly lost. The LFSCK should regenerate the missing layout EA entries. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0191965 s, 109 MB/s [0x200003ab1:0x7b:0x0] /mnt/lustre/d18a.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3217 0xc91 0x240000400 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0366902 s, 57.2 MB/s [0x200003ab1:0x7c:0x0] /mnt/lustre/d18a.sanity-lfsck/f3 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xc92:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xae6:0x0] } Inject failure, to make the MDT-object lost its layout EA fail_loc=0x1615 fail_loc=0 The file size should be incorrect since layout EA is lost Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout [0x200003ab1:0x7b:0x0] /mnt/lustre/d18a.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 3217 0xc91 0x240000400 [0x200003ab1:0x7c:0x0] /mnt/lustre/d18a.sanity-lfsck/f3 lcm_layout_gen: 1 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xc92:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xae6:0x0] } The file size should be correct after layout LFSCK scanning PASS 18a (5s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18b: Find out orphan OST-object and repair it (2) ========================================================== 05:09:13 (1713344953) ##### The target MDT-object is lost. The LFSCK should re-create the MDT-object under .lustre/lost+found/MDTxxxx. The admin should can move it back to normal namespace manually. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.018932 s, 111 MB/s [0x200003ab1:0x80:0x0] /mnt/lustre/d18b.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3219 0xc93 0x240000400 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0301772 s, 69.5 MB/s [0x200003ab1:0x81:0x0] /mnt/lustre/d18b.sanity-lfsck/f3 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xc94:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xae7:0x0] } Inject failure, to simulate the case of missing the MDT-object fail_loc=0x1616 fail_loc=0 Trigger layout LFSCK --dryrun to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Move the files from ./lustre/lost+found/MDTxxxx to namespace [0x200003ab1:0x80:0x0] /mnt/lustre/d18b.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 3219 0xc93 0x240000400 [0x200003ab1:0x81:0x0] /mnt/lustre/d18b.sanity-lfsck/f3 lcm_layout_gen: 1 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xc94:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xae7:0x0] } The file size should be correct after layout LFSCK scanning PASS 18b (6s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18c: Find out orphan OST-object and repair it (3) ========================================================== 05:09:21 (1713344961) ##### The target MDT-object is lost, and the OST-object FID is missing. The LFSCK should re-create the MDT-object with new FID under the directory .lustre/lost+found/MDTxxxx. ##### Inject failure, to simulate the case of missing parent FID fail_loc=0x1617 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0189063 s, 111 MB/s /mnt/lustre/d18c.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3221 0xc95 0x240000400 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0290582 s, 72.2 MB/s /mnt/lustre/d18c.sanity-lfsck/f3 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xc96:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xae8:0x0] } fail_loc=0 Inject failure, to simulate the case of missing the MDT-object fail_loc=0x1616 fail_loc=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout total 21 144115188109410307 dr-x------ 3 root root 10752 Dec 31 1969 . 144115205289279489 drwx------ 2 root root 10752 Apr 17 05:09 MDT0000 There should NOT be some stub under .lustre/lost+found/MDT0001/ There should be some stub under .lustre/lost+found/MDT0000/ PASS 18c (6s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18d: Find out orphan OST-object and repair it (4) ========================================================== 05:09:29 (1713344969) ##### The target MDT-object layout EA is corrupted, but the right OST-object is still alive as orphan. The layout LFSCK will not create new OST-object to occupy such slot. ##### [0x200003ab1:0x8a:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3223 0xc97 0x240000400 [0x200003ab1:0x8b:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3224 0xc98 0x240000400 [0x200003ab1:0x8c:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f3 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3225 0xc99 0x240000400 [0x200003ab1:0x8d:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xc9a:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 Inject failure to make /mnt/lustre/d18d.sanity-lfsck/a1/f1 and /mnt/lustre/d18d.sanity-lfsck/a1/f2 to reference the same OST-object (which is f1's OST-obejct). Then drop /mnt/lustre/d18d.sanity-lfsck/a1/f1 and its OST-object, so f2 becomes dangling reference case, but f2's old OST-object is there. The failure also makes /mnt/lustre/d18d.sanity-lfsck/a1/f3 and /mnt/lustre/d18d.sanity-lfsck/a1/f4 to reference the same OST-object (which is f3's OST-obejct). Then drop /mnt/lustre/d18d.sanity-lfsck/a1/f3 and its OST-object, so f4 becomes dangling reference case, but f4's old OST-object is there. fail_loc=0x1618 fail_loc=0 stopall to cleanup object cache setupall libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Using TIMEOUT=20 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout The file size should be correct after layout LFSCK scanning The LFSCK should find back the original data. foo [0x200003ab1:0x8b:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 3224 0xc98 0x240000400 foo [0x200003ab1:0x8d:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xc9a:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 PASS 18d (41s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18e: Find out orphan OST-object and repair it (5) ========================================================== 05:10:11 (1713345011) ##### The target MDT-object layout EA slot is occpuied by some new created OST-object when repair dangling reference case. Such conflict OST-object has been modified by others. To keep the new data, the LFSCK will create a new file to refernece this old orphan OST-object. ##### [0x200004281:0x3:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3266 0xcc2 0x240000400 [0x200004281:0x4:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3267 0xcc3 0x240000400 [0x200004281:0x5:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f3 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3268 0xcc4 0x240000400 [0x200004281:0x6:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xcc5:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 Inject failure to make /mnt/lustre/d18e.sanity-lfsck/a1/f1 and /mnt/lustre/d18e.sanity-lfsck/a1/f2 to reference the same OST-object (which is f1's OST-obejct). Then drop /mnt/lustre/d18e.sanity-lfsck/a1/f1 and its OST-object, so f2 becomes dangling reference case, but f2's old OST-object is there. Also the failure makes /mnt/lustre/d18e.sanity-lfsck/a1/f3 and /mnt/lustre/d18e.sanity-lfsck/a1/f4 to reference the same OST-object (which is f3's OST-obejct). Then drop /mnt/lustre/d18e.sanity-lfsck/a1/f3 and its OST-object, so f4 becomes dangling reference case, but f4's old OST-object is there. fail_loc=0x1618 fail_loc=0 stopall to cleanup object cache setupall oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Using TIMEOUT=20 fail_val=10 fail_loc=0x1602 debug=-1 debug_mb=150 debug=-1 debug_mb=150 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Write new data to f2/f4 to modify the new created OST-object. fail_val=0 fail_loc=0 debug_mb=21 debug_mb=21 debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck There should be stub file under .lustre/lost+found/MDT0000/ The stub file should keep the original f2 or f4 data foo [0x200000402:0x5:0x0] /mnt/lustre/.lustre/lost+found/MDT0000/[0x200000402:0x5:0x0]-[0x200004281:0x4:0x0]-0-C-0 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 3267 0xcc3 0x240000400 foo [0x200000402:0x6:0x0] /mnt/lustre/.lustre/lost+found/MDT0000/[0x200000402:0x6:0x0]-[0x200004281:0x6:0x0]-0-C-0 lcm_layout_gen: 4 lcm_mirror_count: 1 lcm_entry_count: 1 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xcc5:0x0] } The f2/f4 should contains new data. dummy [0x200004281:0x4:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3266 0xcc2 0x240000400 dummy [0x200004281:0x6:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f4 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xcc4:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xb22:0x0] } PASS 18e (44s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18f: Skip the failed OST(s) when handle orphan OST-objects ========================================================== 05:10:56 (1713345056) ##### The target MDT-object is lost. The LFSCK should re-create the MDT-object under .lustre/lost+found/MDTxxxx. If some OST fail to verify some OST-object(s) during the first stage-scanning, the LFSCK should skip orphan OST-objects for such OST. Others should not be affected. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0176198 s, 119 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0156239 s, 134 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0160906 s, 130 MB/s /mnt/lustre/d18f.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3299 0xce3 0x240000400 /mnt/lustre/d18f.sanity-lfsck/a2/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3300 0xce4 0x240000400 1 2851 0xb23 0x280000400 Inject failure, to simulate the case of missing the MDT-object fail_loc=0x1616 fail_loc=0 Inject failure, to simulate the OST0 fail to handle MDT0 LFSCK request during the first-stage scanning. fail_loc=0x161c fail_val=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout fail_loc=0 fail_val=0 Trigger layout LFSCK on all devices again to cleanup Started LFSCK on the device lustre-MDT0000: scrub layout PASS 18f (7s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18g: Find out orphan OST-object and repair it (7) ========================================================== 05:11:05 (1713345065) ##### The target MDT-object is lost, but related OI mapping is there The LFSCK should recreate the lost MDT-object without affected by the stale OI mapping. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0193388 s, 108 MB/s [0x200004a51:0xb:0x0] /mnt/lustre/d18g.sanity-lfsck/a1/f1 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3301 0xce5 0x240000400 1 2852 0xb24 0x280000400 Inject failure to simulate lost MDT-object but keep OI mapping fail_loc=0x162e fail_loc=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Move the files from ./lustre/lost+found/MDTxxxx to namespace [0x200004a51:0xb:0x0] /mnt/lustre/d18g.sanity-lfsck/a1/f1 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 0 obdidx objid objid group 0 3301 0xce5 0x240000400 1 2852 0xb24 0x280000400 PASS 18g (4s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18h: LFSCK can repair crashed PFL extent range ========================================================== 05:11:10 (1713345070) ##### The PFL extent crashed. During the first cycle LFSCK scanning, the layout LFSCK will keep the bad PFL file(s) there without scanning its OST-object(s). Then in the second stage scanning, the OST will return related OST-object(s) to the MDT as orphan. And then the LFSCK on the MDT can rebuild the PFL extent with the 'orphan(s)' stripe information. ##### 0+1 records in 0+1 records out 291280 bytes (291 kB) copied, 0.00260717 s, 112 MB/s Inject failure stub to simulate bad PFL extent range fail_loc=0x162f fail_loc=0 dd: error writing '/mnt/lustre/d18h.sanity-lfsck/f0': No data available 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.0039594 s, 0.0 kB/s Trigger layout LFSCK to find out the bad lmm_oi and fix them Started LFSCK on the device lustre-MDT0000: scrub layout Data in /mnt/lustre/d18h.sanity-lfsck/f0 should not be broken Write should succeed after LFSCK repairing the bad PFL range 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00798271 s, 131 MB/s PASS 18h (5s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 19a: OST-object inconsistency self detect ========================================================== 05:11:17 (1713345077) Inject failure, then client will offer wrong parent FID when read fail_loc=0x1619 Read RPC with wrong parent FID should be denied cat: /mnt/lustre/d19a.sanity-lfsck/a0: Operation not permitted cat: /mnt/lustre/d19a.sanity-lfsck/a1: Operation not permitted fail_loc=0 PASS 19a (4s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 19b: OST-object inconsistency self repair ========================================================== 05:11:23 (1713345083) Inject failure stub to make the OST-object to back point to non-exist MDT-object fail_loc=0x1611 fail_loc=0 Nothing should be fixed since self detect and repair is disabled Read RPC with right parent FID should be accepted, and cause parent FID on OST to be fixed foo1 foo2 PASS 19b (6s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 20a: Handle the orphan with dummy LOV EA slot properly ========================================================== 05:11:30 (1713345090) ##### The target MDT-object and some of its OST-object are lost. The LFSCK should find out the left OST-objects and re-create the MDT-object under the direcotry .lustre/lost+found/MDTxxxx/ with the partial OST-objects (LOV EA hole). New client can access the file with LOV EA hole via normal system tools or commands without crash the system. For old client, even though it cannot access the file with LOV EA hole, it should not cause the system crash. ##### 257+0 records in 257+0 records out 1052672 bytes (1.1 MB) copied, 0.0449445 s, 23.4 MB/s 257+0 records in 257+0 records out 1052672 bytes (1.1 MB) copied, 0.0438647 s, 24.0 MB/s 257+0 records in 257+0 records out 1052672 bytes (1.1 MB) copied, 0.0415538 s, 25.3 MB/s [0x200004a51:0x1c:0x0] /mnt/lustre/d20a.sanity-lfsck/a1/f0 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3309 0xced 0x240000400 1 2854 0xb26 0x280000400 [0x200004a51:0x1d:0x0] /mnt/lustre/d20a.sanity-lfsck/a1/f1 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3310 0xcee 0x240000400 1 2855 0xb27 0x280000400 [0x200004a51:0x1e:0x0] /mnt/lustre/d20a.sanity-lfsck/a1/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 3311 0xcef 0x240000400 1 2856 0xb28 0x280000400 Inject failure... To simulate f0 lost MDT-object fail_loc=0x1616 To simulate f1 lost MDT-object and OST-object0 fail_loc=0x161a To simulate f2 lost MDT-object and OST-object1 fail_val=1 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) fail_loc=0 fail_val=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1c:0x0]-R-0, which is the old f0 /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1c:0x0]-R-0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x200004a51 lmm_object_id: 0x1c lmm_fid: [0x200004a51:0x1c:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 0 obdidx objid objid group 0 3309 0xced 0x240000400 1 2854 0xb26 0x280000400 Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1d:0x0]-R-0, it contains the old f1's stripe1 /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1d:0x0]-R-0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x200004a51 lmm_object_id: 0x1d lmm_fid: [0x200004a51:0x1d:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 0 0 0 1 2855 0xb27 0x280000400 cat: /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1d:0x0]-R-0: Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1d:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000443476 s, 0.0 kB/s 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.0194615 s, 210 kB/s /home/green/git/lustre-release/lustre/tests/sanity-lfsck.sh: line 3371: echo: write error: Input/output error Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1e:0x0]-R-0, it contains the old f2's stripe0 /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1e:0x0]-R-0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x200004a51 lmm_object_id: 0x1e lmm_fid: [0x200004a51:0x1e:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 3311 0xcef 0x240000400 0 0 0 0 cat: /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1e:0x0]-R-0: Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a51:0x1e:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000520904 s, 0.0 kB/s PASS 20a (7s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 20b: Handle the orphan with dummy LOV EA slot properly - PFL case ========================================================== 05:11:39 (1713345099) ##### The target MDT-object and some of its OST-object are lost. The LFSCK should find out the left OST-objects and re-create the MDT-object under the direcotry .lustre/lost+found/MDTxxxx/ with the partial OST-objects (LOV EA hole). New client can access the file with LOV EA hole via normal system tools or commands without crash the system - PFL case. ##### 769+0 records in 769+0 records out 3149824 bytes (3.1 MB) copied, 0.140996 s, 22.3 MB/s 769+0 records in 769+0 records out 3149824 bytes (3.1 MB) copied, 0.12875 s, 24.5 MB/s 769+0 records in 769+0 records out 3149824 bytes (3.1 MB) copied, 0.129911 s, 24.2 MB/s [0x200004a52:0x6:0x0] /mnt/lustre/d20b.sanity-lfsck/f0 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xcf0:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x280000400:0xb2a:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xb2d:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x240000400:0xcf3:0x0] } [0x200004a52:0x7:0x0] /mnt/lustre/d20b.sanity-lfsck/f1 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xb2b:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x240000400:0xcf1:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xcf4:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x280000400:0xb2e:0x0] } [0x200004a52:0x8:0x0] /mnt/lustre/d20b.sanity-lfsck/f2 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xcf2:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x280000400:0xb2c:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xb2f:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x240000400:0xcf5:0x0] } Inject failure... To simulate f0 lost MDT-object fail_loc=0x1616 To simulate the case of f1 lost MDT-object and the first OST-object in each PFL component fail_loc=0x161a To simulate the case of f2 lost MDT-object and the second OST-object in each PFL component fail_val=1 fail_loc=0 fail_val=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x6:0x0]-R-0, which is the old f0 /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x6:0x0]-R-0 composite_header: lcm_magic: 0x0BD60BD0 lcm_size: 288 lcm_flags: 0 lcm_layout_gen: 1 lcm_mirror_count: 1 lcm_entry_count: 2 components: - lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lcme_offset: 128 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x200004a52 lmm_object_id: 0x6 lmm_fid: [0x200004a52:0x6:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xcf0:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x280000400:0xb2a:0x0] } - lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lcme_offset: 208 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x200004a52 lmm_object_id: 0x6 lmm_fid: [0x200004a52:0x6:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xb2d:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x240000400:0xcf3:0x0] } Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x7:0x0]-R-0, it contains f1's second OST-object in each COMP /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x7:0x0]-R-0 composite_header: lcm_magic: 0x0BD60BD0 lcm_size: 288 lcm_flags: 0 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 components: - lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lcme_offset: 128 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x200004a52 lmm_object_id: 0x7 lmm_fid: [0x200004a52:0x7:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x0:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x240000400:0xcf1:0x0] } - lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lcme_offset: 208 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x200004a52 lmm_object_id: 0x7 lmm_fid: [0x200004a52:0x7:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x0:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x280000400:0xb2e:0x0] } cat: /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x7:0x0]-R-0: Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x7:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00043917 s, 0.0 kB/s 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00565087 s, 725 kB/s /home/green/git/lustre-release/lustre/tests/sanity-lfsck.sh: line 3729: echo: write error: Input/output error Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x8:0x0]-R-0, it contains f2's first stripe in each COMP /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x8:0x0]-R-0 composite_header: lcm_magic: 0x0BD60BD0 lcm_size: 288 lcm_flags: 0 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 components: - lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lcme_offset: 128 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x200004a52 lmm_object_id: 0x8 lmm_fid: [0x200004a52:0x8:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x240000400:0xcf2:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x0:0x0] } - lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lcme_offset: 208 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x200004a52 lmm_object_id: 0x8 lmm_fid: [0x200004a52:0x8:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x280000400:0xb2f:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x0:0x0] } cat: /mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x8:0x0]-R-0: Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x200004a52:0x8:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000491792 s, 0.0 kB/s 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00577253 s, 710 kB/s /home/green/git/lustre-release/lustre/tests/sanity-lfsck.sh: line 3811: echo: write error: Input/output error PASS 20b (8s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 21: run all LFSCK components by default ========================================================== 05:11:48 (1713345108) total: 100 open/close in 0.28 seconds: 351.80 ops/second Start all LFSCK components by default (-s 1) Started LFSCK on the device lustre-MDT0000: scrub layout namespace namespace LFSCK should be in 'scanning-phase1' status layout LFSCK should be in 'scanning-phase1' status Stop all LFSCK components by default Stopped LFSCK on the device lustre-MDT0000. PASS 21 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 22a: LFSCK can repair unmatched pairs (1) ========================================================== 05:11:53 (1713345113) SKIP: sanity-lfsck test_22a needs >= 2 MDTs SKIP 22a (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 22b: LFSCK can repair unmatched pairs (2) ========================================================== 05:11:55 (1713345115) SKIP: sanity-lfsck test_22b needs >= 2 MDTs SKIP 22b (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 23a: LFSCK can repair dangling name entry (1) ========================================================== 05:11:57 (1713345117) SKIP: sanity-lfsck test_23a needs >= 2 MDTs SKIP 23a (1s) debug_raw_pointers=0 debug_raw_pointers=0 SKIP: sanity-lfsck test_23b skipping excluded test 23b debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 23c: LFSCK can repair dangling name entry (3) ========================================================== 05:12:00 (1713345120) ##### The objectA has multiple hard links, one of them corresponding to the name entry_B. But there is something wrong for the name entry_B and cause entry_B to references non-exist object_C. In the first-stage scanning, the LFSCK will think the entry_B as dangling, and re-create the lost object_C. And then others modified the re-created object_C. When the LFSCK comes to the second-stage scanning, it will find that the former re-creating object_C maybe wrong and try to replace the object_C with the real object_A. But because object_C has been modified, so the LFSCK cannot replace it. ##### debug=-1 debug_mb=150 debug=-1 debug_mb=150 parent_fid=[0x200004a52:0x76:0x0] total: 10 open/close in 0.07 seconds: 145.39 ops/second f0_fid=[0x200004a52:0x81:0x0] f1_fid=[0x200004a52:0x82:0x0] Inject failure stub on MDT0 to simulate dangling name entry fail_val=0x82 fail_loc=0x1621 fail_val=0 fail_loc=0 - unlinked 0 (time 1713345122 ; total 0 ; last 0) total: 10 unlinks in 0 seconds: inf unlinks/second 'ls' should fail because of dangling name entry fail_val=10 fail_loc=0x1602 Trigger namespace LFSCK to find out dangling name entry Started LFSCK on the device lustre-MDT0000: scrub namespace fail_val=0 fail_loc=0 debug_mb=21 debug_mb=21 PASS 23c (5s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 23d: LFSCK can repair a dangling name entry to a remote object ========================================================== 05:12:06 (1713345126) SKIP: sanity-lfsck test_23d needs >= 2 MDTs SKIP 23d (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 24: LFSCK can repair multiple-referenced name entry ========================================================== 05:12:09 (1713345129) SKIP: sanity-lfsck test_24 needs >= 2 MDTs SKIP 24 (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 25: LFSCK can repair bad file type in the name entry ========================================================== 05:12:11 (1713345131) SKIP: sanity-lfsck test_25 only ldiskfs fixes dirent type SKIP 25 (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 26a: LFSCK can add the missing local name entry back to the namespace ========================================================== 05:12:13 (1713345133) ##### The local name entry back referenced by the MDT-object is lost. The namespace LFSCK will add the missing local name entry back to the normal namespace. ##### Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the missing remote name entry Started LFSCK on the device lustre-MDT0000: scrub namespace 144115507279167621 -rw-r--r-- 2 root root 0 Apr 17 05:12 /mnt/lustre/d26a.sanity-lfsck/d0/foo PASS 26a (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 26b: LFSCK can add the missing remote name entry back to the namespace ========================================================== 05:12:18 (1713345138) SKIP: sanity-lfsck test_26b needs >= 2 MDTs SKIP 26b (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 27a: LFSCK can recreate the lost local parent directory as orphan ========================================================== 05:12:20 (1713345140) ##### The local parent referenced by the MDT-object linkEA is lost. The namespace LFSCK will re-create the lost parent as orphan. ##### Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. And then remove another hard link and the parent directory. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the lost parent Started LFSCK on the device lustre-MDT0000: scrub namespace There should be an orphan under .lustre/lost+found/MDT0000/ total 33 144115205289279489 drwx------ 3 root root 10752 Apr 17 05:11 . 144115188109410307 dr-x------ 3 root root 10752 Dec 31 1969 .. 144115507279167623 drwx------ 2 root root 11776 Dec 31 1969 [0x200004a52:0x87:0x0]-P-0 PASS 27a (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 27b: LFSCK can recreate the lost remote parent directory as orphan ========================================================== 05:12:24 (1713345144) SKIP: sanity-lfsck test_27b needs >= 2 MDTs SKIP 27b (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 28: Skip the failed MDT(s) when handle orphan MDT-objects ========================================================== 05:12:27 (1713345147) SKIP: sanity-lfsck test_28 needs >= 2 MDTs SKIP 28 (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 29b: LFSCK can repair bad nlink count (2) ========================================================== 05:12:29 (1713345149) ##### The object's nlink attribute is smaller than the object's known name entries count. The LFSCK will repair the object's nlink attribute to match the known name entries count ##### Inject failure stub on MDT0 to simulate the case that foo's nlink attribute is smaller than its name entries count. fail_loc=0x1626 fail_loc=0 Trigger namespace LFSCK to repair the nlink count Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 29b (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 29c: verify linkEA size limitation == 05:12:33 (1713345153) ##### The namespace LFSCK will create many hard links to the target file as to exceed the linkEA size limitation. Under such case the linkEA will be marked as overflow that will prevent the target file to be migrated. Then remove some hard links to make the left hard links to be held within the linkEA size limitation. But before the namespace LFSCK adding all the missed linkEA entries back, the overflow mark (timestamp) will not be cleared. ##### Create 150 hard links should succeed although the linkEA overflow total: 150 link in 0.27 seconds: 557.05 ops/second Remove 100 hard links to save space for the missed linkEA entries - unlinked 0 (time 1713345155 ; total 0 ; last 0) total: 100 unlinks in 0 seconds: inf unlinks/second Trigger namespace LFSCK to clear the overflow timestamp Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 29c (7s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 29d: accessing non-existing inode shouldn't turn fs read-only (ldiskfs) ========================================================== 05:12:42 (1713345162) SKIP: sanity-lfsck test_29d ldiskfs only problem SKIP 29d (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 30: LFSCK can recover the orphans from backend /lost+found ========================================================== 05:12:44 (1713345164) SKIP: sanity-lfsck test_30 only ldiskfs has lost+found SKIP 30 (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31a: The LFSCK can find/repair the name entry with bad name hash (1) ========================================================== 05:12:46 (1713345166) SKIP: sanity-lfsck test_31a needs >= 2 MDTs SKIP 31a (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31b: The LFSCK can find/repair the name entry with bad name hash (2) ========================================================== 05:12:49 (1713345169) SKIP: sanity-lfsck test_31b needs >= 2 MDTs SKIP 31b (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31c: Re-generate the lost master LMV EA for striped directory ========================================================== 05:12:51 (1713345171) SKIP: sanity-lfsck test_31c needs >= 2 MDTs SKIP 31c (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31d: Set broken striped directory (modified after broken) as read-only ========================================================== 05:12:53 (1713345173) SKIP: sanity-lfsck test_31d needs >= 2 MDTs SKIP 31d (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31e: Re-generate the lost slave LMV EA for striped directory (1) ========================================================== 05:12:55 (1713345175) SKIP: sanity-lfsck test_31e needs >= 2 MDTs SKIP 31e (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31f: Re-generate the lost slave LMV EA for striped directory (2) ========================================================== 05:12:58 (1713345178) SKIP: sanity-lfsck test_31f needs >= 2 MDTs SKIP 31f (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31g: Repair the corrupted slave LMV EA ========================================================== 05:13:00 (1713345180) SKIP: sanity-lfsck test_31g needs >= 2 MDTs SKIP 31g (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31h: Repair the corrupted shard's name entry ========================================================== 05:13:02 (1713345182) SKIP: sanity-lfsck test_31h needs >= 2 MDTs SKIP 31h (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 32a: stop LFSCK when some OST failed ========================================================== 05:13:04 (1713345184) preparing... 5 * 5 files will be created Wed Apr 17 05:13:05 EDT 2024. total: 5 mkdir in 0.01 seconds: 807.62 ops/second total: 5 create in 0.01 seconds: 925.24 ops/second total: 5 mkdir in 0.01 seconds: 855.42 ops/second prepared Wed Apr 17 05:13:06 EDT 2024. 192.168.202.104@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg204-client.virtnet /mnt/lustre (opts:) fail_val=3 fail_loc=0x162d Started LFSCK on the device lustre-MDT0000: scrub layout stop ost1 fail_loc=0 fail_val=0 stop LFSCK Stopped LFSCK on the device lustre-MDT0000. oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 PASS 32a (12s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 32b: stop LFSCK when some MDT failed ========================================================== 05:13:17 (1713345197) SKIP: sanity-lfsck test_32b needs >= 2 MDTs SKIP 32b (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 33: check LFSCK paramters =========== 05:13:20 (1713345200) Checking servers environments Checking clients oleg204-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg204-server: mount.lustre: according to /etc/mtab lustre-mdt1/mdt1 is already mounted on /mnt/lustre-mds1 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 Start of lustre-mdt1/mdt1 on mds1 failed 17 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 oleg204-server: mount.lustre: according to /etc/mtab lustre-ost1/ost1 is already mounted on /mnt/lustre-ost1 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of lustre-ost1/ost1 on ost1 failed 17 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 oleg204-server: mount.lustre: according to /etc/mtab lustre-ost2/ost2 is already mounted on /mnt/lustre-ost2 pdsh@oleg204-client: oleg204-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of lustre-ost2/ost2 on ost2 failed 17 Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Starting client oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Started clients oleg204-client.virtnet: 192.168.202.104@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff880136a77800.idle_timeout=debug osc.lustre-OST0001-osc-ffff880136a77800.idle_timeout=debug disable quota as required preparing... 5 * 5 files will be created Wed Apr 17 05:13:35 EDT 2024. total: 5 mkdir in 0.01 seconds: 762.30 ops/second total: 5 create in 0.01 seconds: 762.07 ops/second total: 5 mkdir in 0.01 seconds: 715.63 ops/second prepared Wed Apr 17 05:13:35 EDT 2024. Started LFSCK on the device lustre-MDT0000: scrub layout Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 33 (18s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 34: LFSCK can rebuild the lost agent object ========================================================== 05:13:39 (1713345219) SKIP: sanity-lfsck test_34 needs >= 2 MDTs SKIP 34 (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 35: LFSCK can rebuild the lost agent entry ========================================================== 05:13:42 (1713345222) SKIP: sanity-lfsck test_35 needs >= 2 MDTs SKIP 35 (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 36a: rebuild LOV EA for mirrored file (1) ========================================================== 05:13:44 (1713345224) SKIP: sanity-lfsck test_36a needs >= 3 OSTs SKIP 36a (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 36b: rebuild LOV EA for mirrored file (2) ========================================================== 05:13:46 (1713345226) SKIP: sanity-lfsck test_36b needs >= 3 OSTs SKIP 36b (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 36c: rebuild LOV EA for mirrored file (3) ========================================================== 05:13:48 (1713345228) SKIP: sanity-lfsck test_36c needs >= 3 OSTs SKIP 36c (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 37: LFSCK must skip a ORPHAN ======== 05:13:50 (1713345230) multiop /mnt/lustre/d37.sanity-lfsck/d0 vD_c TMPPIPE=/tmp/multiop_open_wait_pipe.6886 Started LFSCK on the device lustre-MDT0000: scrub namespace stat: cannot stat '/mnt/lustre/d37.sanity-lfsck/d0': No such file or directory PASS 37 (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 38: LFSCK does not break foreign file and reverse is also true ========================================================== 05:13:54 (1713345234) lfm_magic: 0x0BD70BD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: '4508f2ad-2f6b-4a5e-9911-be844cb254f1@f19ccee3-571c-4f79-a5b5-d642fd77647c' lfs setstripe: setstripe error for '/mnt/lustre/d38.sanity-lfsck/f38.sanity-lfsck': stripe already set Started LFSCK on the device lustre-MDT0000: scrub namespace Started LFSCK on the device lustre-MDT0000: scrub layout post-lfsck checks of foreign file lfm_magic: 0x0BD70BD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: '4508f2ad-2f6b-4a5e-9911-be844cb254f1@f19ccee3-571c-4f79-a5b5-d642fd77647c' lfs setstripe: setstripe error for '/mnt/lustre/d38.sanity-lfsck/f38.sanity-lfsck': stripe already set cat: /mnt/lustre/d38.sanity-lfsck/f38.sanity-lfsck: No data available cat: write error: Bad file descriptor PASS 38 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 39: LFSCK does not break foreign dir and reverse is also true ========================================================== 05:13:58 (1713345238) lfm_magic: 0x0CD50CD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: '6624055b-a2fa-4690-b39b-3c7e9f5ae3de@c538553c-d4d0-4ab9-beb5-99918320611e' touch: cannot touch '/mnt/lustre/d39.sanity-lfsck/d39.sanity-lfsck2/f39.sanity-lfsck': No data available Started LFSCK on the device lustre-MDT0000: scrub namespace Started LFSCK on the device lustre-MDT0000: scrub layout post-lfsck checks of foreign dir lfm_magic: 0x0CD50CD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: '6624055b-a2fa-4690-b39b-3c7e9f5ae3de@c538553c-d4d0-4ab9-beb5-99918320611e' touch: cannot touch '/mnt/lustre/d39.sanity-lfsck/d39.sanity-lfsck2/f39.sanity-lfsck': No data available PASS 39 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 40a: LFSCK correctly fixes lmm_oi in composite layout ========================================================== 05:14:03 (1713345243) SKIP: sanity-lfsck test_40a needs >= 2 MDTs SKIP 40a (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 41: SEL support in LFSCK ============ 05:14:05 (1713345245) debug=+lfsck trigger LFSCK for SEL layout Started LFSCK on the device lustre-MDT0000: scrub layout namespace debug=malloc neterror net warning nettrace error emerg console PASS 41 (7s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 42: LFSCK can repair inconsistent MDT-object/OST-object encryption flags ========================================================== 05:14:13 (1713345253) SKIP: sanity-lfsck test_42 skip ZFS backend SKIP 42 (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug=0 Stopping clients: oleg204-client.virtnet /mnt/lustre (opts:) Stopping client oleg204-client.virtnet /mnt/lustre opts: Stopping clients: oleg204-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg204-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg204-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg204-server unloading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing unload_modules_local modules unloaded. === sanity-lfsck: start setup 05:14:43 (1713345283) === Stopping clients: oleg204-client.virtnet /mnt/lustre (opts:-f) Stopping clients: oleg204-client.virtnet /mnt/lustre2 (opts:-f) pdsh@oleg204-client: oleg204-server: ssh exited with exit code 2 oleg204-server: oleg204-server.virtnet: executing set_hostid Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions oleg204-server: ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' oleg204-server: quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: lustre-mdt1/mdt1 Format ost1: lustre-ost1/ost1 Format ost2: lustre-ost2/ost2 Checking servers environments Checking clients oleg204-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory loading modules on: 'oleg204-server' oleg204-server: oleg204-server.virtnet: executing load_modules_local oleg204-server: Loading modules from /home/green/git/lustre-release/lustre oleg204-server: detected 4 online CPUs by sysfs oleg204-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Commit the device label on lustre-mdt1/mdt1 Started lustre-MDT0000 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Commit the device label on lustre-ost1/ost1 Started lustre-OST0000 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg204-server: oleg204-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg204-client: oleg204-server: ssh exited with exit code 1 Commit the device label on lustre-ost2/ost2 Started lustre-OST0001 Starting client: oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Starting client oleg204-client.virtnet: -o user_xattr,flock oleg204-server@tcp:/lustre /mnt/lustre Started clients oleg204-client.virtnet: 192.168.202.104@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800a76a7000.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800a76a7000.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 5s: want 'procname_uid' got 'procname_uid' disable quota as required === sanity-lfsck: finish setup 05:15:33 (1713345333) === == sanity-lfsck test complete, duration 1173 sec ========= 05:15:33 (1713345333) === sanity-lfsck: start cleanup 05:15:34 (1713345334) === === sanity-lfsck: finish cleanup 05:15:34 (1713345334) ===