-----============= acceptance-small: sanity-lfsck ============----- Thu Apr 18 04:40:40 EDT 2024 excepting tests: 23b Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions client=34553366 MDS=34553366 OSS=34553366 Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:) Stopping client oleg413-client.virtnet /mnt/lustre opts: Stopping clients: oleg413-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg413-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg413-server unloading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing unload_modules_local modules unloaded. === sanity-lfsck: start setup 04:41:27 (1713429687) === Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:-f) Stopping clients: oleg413-client.virtnet /mnt/lustre2 (opts:-f) pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 2 oleg413-server: oleg413-server.virtnet: executing set_hostid Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions oleg413-server: ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' oleg413-server: quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: /dev/vdc pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Format mds2: /dev/vdd pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Format ost1: /dev/vde pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Format ost2: /dev/vdf pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vdc Started lustre-MDT0000 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vdd Started lustre-MDT0001 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vde Started lustre-OST0000 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vdf Started lustre-OST0001 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b6cc7000.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b6cc7000.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 2s: want 'procname_uid' got 'procname_uid' disable quota as required osd-ldiskfs.track_declares_assert=1 === sanity-lfsck: finish setup 04:42:29 (1713429749) === debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 0: Control LFSCK manually =========== 04:42:30 (1713429750) preparing... 3 * 3 files will be created Thu Apr 18 04:42:31 EDT 2024. total: 3 mkdir in 0.01 seconds: 421.35 ops/second total: 3 create in 0.01 seconds: 557.60 ops/second total: 3 mkdir in 0.01 seconds: 578.68 ops/second prepared Thu Apr 18 04:42:32 EDT 2024. fail_val=3 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace name: lfsck_namespace magic: 0xa06249ff version: 2 status: scanning-phase1 flags: param: last_completed_time: N/A time_since_last_completed: N/A latest_start_time: 1713429753 time_since_latest_start: 0 seconds last_checkpoint_time: N/A time_since_last_checkpoint: N/A latest_start_position: 13, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 0 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 0 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 agent_entries_repaired: 0 success_count: 0 run_time_phase1: 0 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A average_speed_total: 0 items/sec real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 12, N/A, N/A Stopped LFSCK on the device lustre-MDT0000. Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 Started LFSCK on the device lustre-MDT0000: scrub namespace stopall, should NOT crash LU-3649 Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:) Stopping client oleg413-client.virtnet /mnt/lustre opts: Stopping clients: oleg413-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg413-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg413-server PASS 0 (40s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 1a: LFSCK can find out and repair crashed FID-in-dirent ========================================================== 04:43:11 (1713429791) Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory libkmod: kmod_module_get_holders: could not open '/sys/module/intel_rapl/holders': No such file or directory loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions oleg413-server: libkmod: kmod_module_get_holders: could not open '/sys/module/intel_rapl/holders': No such file or directory Setup mgs, mdt, osts Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-MDT0001 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-OST0000 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-OST0001 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b6fc1800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b6fc1800.idle_timeout=debug disable quota as required preparing... 1 * 1 files will be created Thu Apr 18 04:43:41 EDT 2024. total: 1 mkdir in 0.00 seconds: 543.44 ops/second total: 1 create in 0.00 seconds: 555.24 ops/second total: 1 mkdir in 0.00 seconds: 588.92 ops/second prepared Thu Apr 18 04:43:41 EDT 2024. fail_loc=0x1501 fail_loc=0 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 1a (44s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 1b: LFSCK can find out and repair the missing FID-in-LMA ========================================================== 04:43:56 (1713429836) preparing... 1 * 1 files will be created Thu Apr 18 04:43:57 EDT 2024. total: 1 mkdir in 0.00 seconds: 499.50 ops/second total: 1 create in 0.00 seconds: 469.74 ops/second total: 1 mkdir in 0.00 seconds: 440.35 ops/second prepared Thu Apr 18 04:43:58 EDT 2024. fail_loc=0x1502 fail_loc=0 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) fail_loc=0x1506 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 1b (15s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 1c: LFSCK can find out and repair lost FID-in-dirent ========================================================== 04:44:13 (1713429853) preparing... 1 * 1 files will be created Thu Apr 18 04:44:14 EDT 2024. total: 1 mkdir in 0.00 seconds: 347.70 ops/second total: 1 create in 0.00 seconds: 371.08 ops/second total: 1 mkdir in 0.00 seconds: 420.73 ops/second prepared Thu Apr 18 04:44:15 EDT 2024. fail_loc=0x1504 fail_loc=0 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 1c (15s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2a: LFSCK can find out and repair crashed linkEA entry ========================================================== 04:44:30 (1713429870) preparing... 1 * 1 files will be created Thu Apr 18 04:44:31 EDT 2024. total: 1 mkdir in 0.00 seconds: 290.10 ops/second total: 1 create in 0.01 seconds: 173.88 ops/second total: 1 mkdir in 0.00 seconds: 297.09 ops/second prepared Thu Apr 18 04:44:32 EDT 2024. fail_loc=0x1603 fail_loc=0 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre PASS 2a (16s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2b: LFSCK can find out and remove invalid linkEA entry ========================================================== 04:44:47 (1713429887) preparing... 1 * 1 files will be created Thu Apr 18 04:44:48 EDT 2024. total: 1 mkdir in 0.00 seconds: 448.44 ops/second total: 1 create in 0.00 seconds: 387.93 ops/second total: 1 mkdir in 0.00 seconds: 510.19 ops/second prepared Thu Apr 18 04:44:49 EDT 2024. fail_loc=0x1604 fail_loc=0 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre PASS 2b (14s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2c: LFSCK can find out and remove repeated linkEA entry ========================================================== 04:45:03 (1713429903) preparing... 1 * 1 files will be created Thu Apr 18 04:45:03 EDT 2024. total: 1 mkdir in 0.00 seconds: 637.82 ops/second total: 1 create in 0.00 seconds: 701.74 ops/second total: 1 mkdir in 0.00 seconds: 752.34 ops/second prepared Thu Apr 18 04:45:04 EDT 2024. fail_loc=0x1605 fail_loc=0 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre PASS 2c (13s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2d: LFSCK can recover the missing linkEA entry ========================================================== 04:45:18 (1713429918) preparing... 1 * 1 files will be created Thu Apr 18 04:45:18 EDT 2024. total: 1 mkdir in 0.00 seconds: 425.00 ops/second total: 1 create in 0.00 seconds: 477.06 ops/second total: 1 mkdir in 0.00 seconds: 381.09 ops/second prepared Thu Apr 18 04:45:19 EDT 2024. fail_loc=0x161d fail_loc=0 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Started LFSCK on the device lustre-MDT0000: scrub namespace oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre PASS 2d (14s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 2e: namespace LFSCK can verify remote object linkEA ========================================================== 04:45:33 (1713429933) fail_loc=0x1603 fail_loc=0 Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 2e (4s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 3: LFSCK can verify multiple-linked objects ========================================================== 04:45:38 (1713429938) preparing... 4 * 4 files will be created Thu Apr 18 04:45:39 EDT 2024. total: 4 mkdir in 0.01 seconds: 435.50 ops/second total: 4 create in 0.01 seconds: 461.52 ops/second total: 4 mkdir in 0.01 seconds: 634.80 ops/second prepared Thu Apr 18 04:45:39 EDT 2024. fail_loc=0x1603 fail_loc=0x1604 fail_loc=0 Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 3 (4s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 4: FID-in-dirent can be rebuilt after MDT file-level backup/restore ========================================================== 04:45:44 (1713429944) preparing... 3 * 3 files will be created Thu Apr 18 04:45:45 EDT 2024. total: 3 mkdir in 0.00 seconds: 667.99 ops/second total: 3 create in 0.00 seconds: 708.54 ops/second total: 3 mkdir in 0.01 seconds: 531.44 ops/second prepared Thu Apr 18 04:45:46 EDT 2024. Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:) Stopping client oleg413-client.virtnet /mnt/lustre opts: file-level backup/restore on mds1:/dev/mapper/mds1_flakey backup data reformat new device Format mds1: /dev/mapper/mds1_flakey restore data remove recovery logs removed '/mnt/lustre-brpt/CATALOGS' start mds1 with disabling OI scrub oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 32s for 'inconsistent' Updated after 3s: want 'inconsistent' got 'inconsistent' fail_loc=0 fail_val=0 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 4 (27s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 5: LFSCK can handle IGIF object upgrading ========================================================== 04:46:12 (1713429972) preparing... 1 * 1 files will be created Thu Apr 18 04:46:13 EDT 2024. fail_loc=0x1504 total: 1 mkdir in 0.00 seconds: 568.87 ops/second total: 1 create in 0.00 seconds: 603.93 ops/second total: 1 mkdir in 0.00 seconds: 490.91 ops/second fail_loc=0 prepared Thu Apr 18 04:46:14 EDT 2024. Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:) Stopping client oleg413-client.virtnet /mnt/lustre opts: file-level backup/restore on mds1:/dev/mapper/mds1_flakey backup data reformat new device Format mds1: /dev/mapper/mds1_flakey restore data remove recovery logs removed '/mnt/lustre-brpt/CATALOGS' start mds1 with disabling OI scrub oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 32s for 'inconsistent,upgrade' Updated after 6s: want 'inconsistent,upgrade' got 'inconsistent,upgrade' fail_loc=0 fail_val=0 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre fail_loc=0x1505 fail_loc=0 PASS 5 (33s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 6a: LFSCK resumes from last checkpoint (1) ========================================================== 04:46:46 (1713430006) preparing... 5 * 5 files will be created Thu Apr 18 04:46:47 EDT 2024. total: 5 mkdir in 0.01 seconds: 685.32 ops/second total: 5 create in 0.01 seconds: 700.47 ops/second total: 5 mkdir in 0.01 seconds: 817.79 ops/second prepared Thu Apr 18 04:46:48 EDT 2024. fail_val=1 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x80001608 Waiting 32s for 'failed' fail_val=1 fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 PASS 6a (10s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 6b: LFSCK resumes from last checkpoint (2) ========================================================== 04:46:58 (1713430018) preparing... 5 * 5 files will be created Thu Apr 18 04:46:58 EDT 2024. total: 5 mkdir in 0.01 seconds: 401.02 ops/second total: 5 create in 0.01 seconds: 400.49 ops/second total: 5 mkdir in 0.01 seconds: 506.53 ops/second prepared Thu Apr 18 04:46:59 EDT 2024. fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x80001609 Waiting 32s for 'failed' fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Additional debug for 6b name: lfsck_namespace magic: 0xa06249ff version: 2 status: scanning-phase1 flags: param: last_completed_time: 1713430015 time_since_last_completed: 14 seconds latest_start_time: 1713430028 time_since_latest_start: 1 seconds last_checkpoint_time: 1713430026 time_since_last_checkpoint: 3 seconds latest_start_position: 20091, [0x2000061c1:0x77:0x0], 0x20e519237b1d918 last_checkpoint_position: 20089, [0x2000061c1:0x77:0x0], 0x172998e7a39b23f first_failure_position: N/A, N/A, N/A checked_phase1: 4 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 2 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 0 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 0 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 agent_entries_repaired: 0 success_count: 14 run_time_phase1: 7 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A average_speed_total: 0 items/sec real_time_speed_phase1: 0 items/sec real_time_speed_phase2: N/A current_position: 20090, [0x2000061c1:0x77:0x0], 0x20e519237b1d918 fail_loc=0 fail_val=0 PASS 6b (13s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 7a: non-stopped LFSCK should auto restarts after MDS remount (1) ========================================================== 04:47:12 (1713430032) preparing... 5 * 5 files will be created Thu Apr 18 04:47:13 EDT 2024. total: 5 mkdir in 0.01 seconds: 588.23 ops/second total: 5 create in 0.01 seconds: 640.21 ops/second total: 5 mkdir in 0.01 seconds: 637.99 ops/second prepared Thu Apr 18 04:47:14 EDT 2024. 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) fail_val=1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace stop mds1 fail_loc=0 fail_val=0 start mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Waiting 30s for 'completed' Updated after 3s: want 'completed' got 'completed' PASS 7a (17s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 7b: non-stopped LFSCK should auto restarts after MDS remount (2) ========================================================== 04:47:31 (1713430051) Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/mds1_flakey is already mounted on /mnt/lustre-mds1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 Start of /dev/mapper/mds1_flakey on mds1 failed 17 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/mds2_flakey is already mounted on /mnt/lustre-mds2 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 Start of /dev/mapper/mds2_flakey on mds2 failed 17 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/ost2_flakey is already mounted on /mnt/lustre-ost2 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of /dev/mapper/ost2_flakey on ost2 failed 17 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff88012ab10800.idle_timeout=debug osc.lustre-OST0001-osc-ffff88012ab10800.idle_timeout=debug disable quota as required preparing... 2 * 2 files will be created Thu Apr 18 04:47:52 EDT 2024. total: 2 mkdir in 0.00 seconds: 572.41 ops/second total: 2 create in 0.00 seconds: 591.37 ops/second total: 2 mkdir in 0.00 seconds: 634.73 ops/second prepared Thu Apr 18 04:47:53 EDT 2024. fail_loc=0x1604 fail_val=1 fail_loc=0x1602 Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) stop mds1 fail_loc=0 fail_val=0 start mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Waiting 30s for 'completed' Updated after 4s: want 'completed' got 'completed' PASS 7b (35s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 8: LFSCK state machine ============== 04:48:08 (1713430088) formatall oleg413-server: oleg413-server.virtnet: executing set_hostid oleg413-server: oleg413-server.virtnet: executing load_modules_local setupall libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory libkmod: kmod_module_get_holders: could not open '/sys/module/intel_rapl/holders': No such file or directory oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Using TIMEOUT=20 preparing... 20 * 20 files will be created Thu Apr 18 04:49:01 EDT 2024. total: 20 mkdir in 0.03 seconds: 665.09 ops/second total: 20 create in 0.03 seconds: 756.00 ops/second total: 20 mkdir in 0.02 seconds: 841.01 ops/second prepared Thu Apr 18 04:49:03 EDT 2024. fail_loc=0x1603 fail_loc=0x1604 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) fail_val=2 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace Stopped LFSCK on the device lustre-MDT0000. Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x80001609 Waiting 32s for 'failed' Updated after 2s: want 'failed' got 'failed' fail_loc=0x1600 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0x160a stop mds1 fail_loc=0x160b start mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 fail_loc=0x1601 Started LFSCK on the device lustre-MDT0000: scrub namespace stop mds1 fail_loc=0x160b start mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 stop mds1 start mds1 without resume LFSCK oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 fail_val=2 fail_loc=0x1602 Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 PASS 8 (86s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 9a: LFSCK speed control (1) ========= 04:49:35 (1713430175) Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/mds1_flakey is already mounted on /mnt/lustre-mds1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 Start of /dev/mapper/mds1_flakey on mds1 failed 17 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/mds2_flakey is already mounted on /mnt/lustre-mds2 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 Start of /dev/mapper/mds2_flakey on mds2 failed 17 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/ost2_flakey is already mounted on /mnt/lustre-ost2 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of /dev/mapper/ost2_flakey on ost2 failed 17 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b6fc5800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b6fc5800.idle_timeout=debug disable quota as required total: 5000 open/close in 9.49 seconds: 526.64 ops/second Started LFSCK on the device lustre-MDT0000: scrub layout PASS 9a (54s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 9b: LFSCK speed control (2) ========= 04:50:31 (1713430231) preparing... 0 * 0 files will be created Thu Apr 18 04:50:38 EDT 2024. prepared Thu Apr 18 04:50:38 EDT 2024. Preparing another 50 * 50 files (with error) at Thu Apr 18 04:50:38 EDT 2024. fail_loc=0x1604 total: 50 mkdir in 0.06 seconds: 816.09 ops/second total: 50 create in 0.05 seconds: 970.82 ops/second fail_loc=0x160c Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 Prepared at Thu Apr 18 04:50:43 EDT 2024. Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 9b (34s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 10: System is available during LFSCK scanning ========================================================== 04:51:07 (1713430267) preparing... 1 * 1 files will be created Thu Apr 18 04:51:13 EDT 2024. total: 1 mkdir in 0.00 seconds: 500.75 ops/second total: 1 create in 0.00 seconds: 471.27 ops/second total: 1 mkdir in 0.00 seconds: 468.85 ops/second prepared Thu Apr 18 04:51:14 EDT 2024. Preparing more files with error at Thu Apr 18 04:51:14 EDT 2024. fail_loc=0x1603 fail_loc=0x1604 fail_loc=0 Prepared at Thu Apr 18 04:51:36 EDT 2024. 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started LFSCK on the device lustre-MDT0000: scrub namespace Waiting 32s for 'completed' Updated after 8s: want 'completed' got 'completed' PASS 10 (62s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 11a: LFSCK can rebuild lost last_id ========================================================== 04:52:11 (1713430331) total: 64 open/close in 0.21 seconds: 311.58 ops/second stopall remove LAST_ID on ost1: idx=0 removed '/mnt/lustre-ost1/O/0/LAST_ID' oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 fail_val=3 fail_loc=0x160e trigger LFSCK for layout on ost1 to rebuild the LAST_ID(s) Started LFSCK on the device lustre-OST0000: scrub layout fail_val=0 fail_loc=0 Waiting 32s for 'completed' Updated after 2s: want 'completed' got 'completed' the LAST_ID(s) should have been rebuilt PASS 11a (129s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 11b: LFSCK can rebuild crashed last_id ========================================================== 04:54:21 (1713430461) Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-MDT0001 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-OST0001 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800abf83800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800abf83800.idle_timeout=debug disable quota as required set fail_loc=0x160d to skip the updating LAST_ID on-disk fail_loc=0x160d total: 64 open/close in 0.16 seconds: 400.83 ops/second 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Stopping /mnt/lustre-ost1 (opts:) on oleg413-server fail_loc=0x215 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-OST0000 the on-disk LAST_ID should be smaller than the expected one trigger LFSCK for layout on ost1 to rebuild the on-disk LAST_ID Started LFSCK on the device lustre-OST0000: scrub layout Stopping /mnt/lustre-ost1 (opts:) on oleg413-server Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-OST0000 the on-disk LAST_ID should have been rebuilt fail_loc=0 Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:) Stopping clients: oleg413-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg413-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg413-server PASS 11b (66s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 12a: single command to trigger LFSCK on all devices ========================================================== 04:55:28 (1713430528) Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-MDT0001 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-OST0000 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-OST0001 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800a8ff8000.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800a8ff8000.idle_timeout=debug disable quota as required total: 100 open/close in 0.31 seconds: 317.64 ops/second total: 100 open/close in 0.27 seconds: 376.62 ops/second Start namespace LFSCK on all targets by single command (-s 1). Started LFSCK on the device lustre-MDT0000: scrub namespace All the LFSCK targets should be in 'scanning-phase1' status. Stop namespace LFSCK on all targets by single lctl command. Stopped LFSCK on the device lustre-MDT0000. All the LFSCK targets should be in 'stopped' status. Re-start namespace LFSCK on all targets by single command (-s 0). Started LFSCK on the device lustre-MDT0000: scrub namespace All the LFSCK targets should be in 'completed' status. debug=-1 debug_mb=150 debug=-1 debug_mb=150 Start layout LFSCK on all targets by single command (-s 1). Started LFSCK on the device lustre-MDT0000: scrub layout All the LFSCK targets should be in 'scanning-phase1' status. Stop layout LFSCK on all targets by single lctl command. Stopped LFSCK on the device lustre-MDT0000. All the LFSCK targets should be in 'stopped' status. Re-start layout LFSCK on all targets by single command (-s 0). Started LFSCK on the device lustre-MDT0000: scrub layout All the LFSCK targets should be in 'completed' status. debug_mb=21 debug_mb=21 debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck PASS 12a (34s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 12b: auto detect Lustre device ====== 04:56:03 (1713430563) Start LFSCK without '-M' specified. Started LFSCK on the device lustre-MDT0000: scrub layout namespace Start layout LFSCK on the node with multipe targets, but not specify '-M'/'-A' option. Should get failure. oleg413-server: Detect multiple devices on current node. Please specify the device explicitly via '-M' option or '-A' option for all. pdsh@oleg413-client: oleg413-server: ssh exited with exit code 22 PASS 12b (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 13: LFSCK can repair crashed lmm_oi ========================================================== 04:56:08 (1713430568) ##### The lmm_oi in layout EA should be consistent with the MDT-object FID; otherwise, the LFSCK should re-generate the lmm_oi from the MDT-object FID. ##### Inject failure stub to simulate bad lmm_oi fail_loc=0x160f total: 1 open/close in 0.01 seconds: 171.97 ops/second fail_loc=0 Trigger layout LFSCK to find out the bad lmm_oi and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 13 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 14a: LFSCK can repair MDT-object with dangling LOV EA reference (1) ========================================================== 04:56:12 (1713430572) ##### The OST-object referenced by the MDT-object should be there; otherwise, the LFSCK should re-create the missing OST-object. without '--delay-create-ostobj' option. ##### Inject failure stub to simulate dangling referenced MDT-object fail_loc=0x1610 total: 61 open/close in 0.14 seconds: 438.86 ops/second touch: setting times of '/mnt/lustre/d14a.sanity-lfsck/guard0': No such file or directory touch: setting times of '/mnt/lustre/d14a.sanity-lfsck/guard1': No such file or directory fail_loc=0 debug=-1 debug_mb=150 debug=-1 debug_mb=150 total: 30 open/close in 0.18 seconds: 169.75 ops/second 'ls' should fail because of dangling referenced MDT-object Trigger layout LFSCK to find out dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should fail because of not repair dangling by default Trigger layout LFSCK to repair dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should success after layout LFSCK repairing debug_mb=21 debug_mb=21 stopall to cleanup object cache setupall oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Using TIMEOUT=20 PASS 14a (64s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 14b: LFSCK can repair MDT-object with dangling LOV EA reference (2) ========================================================== 04:57:18 (1713430638) ##### The OST-object referenced by the MDT-object should be there; otherwise, the LFSCK should re-create the missing OST-object. with '--delay-create-ostobj' option. ##### Inject failure stub to simulate dangling referenced MDT-object fail_loc=0x1610 total: 63 open/close in 0.14 seconds: 439.08 ops/second touch: setting times of '/mnt/lustre/d14b.sanity-lfsck/guard': No such file or directory fail_loc=0 debug=-1 debug_mb=150 debug=-1 debug_mb=150 total: 32 open/close in 0.23 seconds: 139.79 ops/second 'ls' should fail because of dangling referenced MDT-object Trigger layout LFSCK to find out dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should fail because of not repair dangling by default Trigger layout LFSCK to repair dangling reference Started LFSCK on the device lustre-MDT0000: scrub layout 'stat' should success after layout LFSCK repairing debug_mb=21 debug_mb=21 stopall to cleanup object cache setupall oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Using TIMEOUT=20 PASS 14b (96s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 15a: LFSCK can repair unmatched MDT-object/OST-object pairs (1) ========================================================== 04:58:56 (1713430736) ##### If the OST-object referenced by the MDT-object back points to some non-exist MDT-object, then the LFSCK should repair the OST-object to back point to the right MDT-object. ##### Inject failure stub to make the OST-object to back point to non-exist MDT-object. fail_loc=0x1611 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.010968 s, 95.6 MB/s 257+0 records in 257+0 records out 1052672 bytes (1.1 MB) copied, 0.0661164 s, 15.9 MB/s fail_loc=0 Trigger layout LFSCK to find out unmatched pairs and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 15a (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 15b: LFSCK can repair unmatched MDT-object/OST-object pairs (2) ========================================================== 04:59:01 (1713430741) ##### If the OST-object referenced by the MDT-object back points to other MDT-object that doesn't recognize the OST-object, then the LFSCK should repair it to back point to the right MDT-object (the first one). ##### 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0100708 s, 104 MB/s Inject failure stub to make the OST-object to back point to other MDT-object fail_loc=0x1612 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0116753 s, 89.8 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0297736 s, 70.4 MB/s fail_loc=0 Trigger layout LFSCK to find out unmatched pairs and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 15b (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 15c: LFSCK can repair unmatched MDT-object/OST-object pairs (3) ========================================================== 04:59:06 (1713430746) SKIP: sanity-lfsck test_15c MDS newer than 2.7.55, LU-6475 SKIP 15c (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 15d: LFSCK don't crash upon dir migration failure ========================================================== 04:59:08 (1713430748) total: 100 open/close in 0.26 seconds: 391.10 ops/second total: 100 mkdir in 0.13 seconds: 770.34 ops/second Migrate /mnt/lustre/d15d.sanity-lfsck to MDT1 fail_loc=0x1709 lfs migrate: /mnt/lustre/d15d.sanity-lfsck/f58 migrate failed: Input/output error (5) fail_loc=0 fail_loc=0x1709 fail_loc=0 Started LFSCK on the device lustre-MDT0000: scrub namespace lfs rm_entry: error on ioctl 0xc03066f0 for '*' (3): No such file or directory (2) debug=0 Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:) Stopping client oleg413-client.virtnet /mnt/lustre opts: Stopping clients: oleg413-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg413-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg413-server unloading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing unload_modules_local modules unloaded. === sanity-lfsck: start setup 04:59:40 (1713430780) === Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:-f) Stopping clients: oleg413-client.virtnet /mnt/lustre2 (opts:-f) pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 2 oleg413-server: oleg413-server.virtnet: executing set_hostid Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions oleg413-server: ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' oleg413-server: quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: /dev/vdc pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Format mds2: /dev/vdd pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Format ost1: /dev/vde pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Format ost2: /dev/vdf pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vdc Started lustre-MDT0000 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vdd Started lustre-MDT0001 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vde Started lustre-OST0000 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vdf Started lustre-OST0001 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800b6f8d800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800b6f8d800.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 3s: want 'procname_uid' got 'procname_uid' disable quota as required osd-ldiskfs.track_declares_assert=1 === sanity-lfsck: finish setup 05:00:37 (1713430837) === PASS 15d (91s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 16: LFSCK can repair inconsistent MDT-object/OST-object owner ========================================================== 05:00:40 (1713430840) ##### If the OST-object's owner information does not match the owner information stored in the MDT-object, then the LFSCK trust the MDT-object and update the OST-object's owner information. ##### 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00890538 s, 118 MB/s running as uid/gid/euid/egid 500/500/500/500, groups: [createmany] [-o] [/mnt/lustre/d16.sanity-lfsck/d1/o] [100] total: 100 open/close in 0.23 seconds: 436.50 ops/second Inject failure stub to skip OST-object owner changing fail_loc=0x1613 fail_loc=0 Trigger layout LFSCK to find out inconsistent OST-object owner and fix them Started LFSCK on the device lustre-MDT0000: scrub layout PASS 16 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 17: LFSCK can repair multiple references ========================================================== 05:00:45 (1713430845) ##### If more than one MDT-objects reference the same OST-object, and the OST-object only recognizes one MDT-object, then the LFSCK should create new OST-objects for such non-recognized MDT-objects. ##### Inject failure stub to make two MDT-objects to refernce the OST-object fail_val=0 fail_loc=0x1614 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00925307 s, 113 MB/s total: 1 open/close in 0.01 seconds: 183.05 ops/second fail_loc=0 fail_val=0 /mnt/lustre/d17.sanity-lfsck/f0 and /mnt/lustre/d17.sanity-lfsck/guard use the same OST-objects /mnt/lustre/d17.sanity-lfsck/f1 and /mnt/lustre/d17.sanity-lfsck/guard use the same OST-objects Trigger layout LFSCK to find out multiple refenced MDT-objects and fix them Started LFSCK on the device lustre-MDT0000: scrub layout /mnt/lustre/d17.sanity-lfsck/f0 and /mnt/lustre/d17.sanity-lfsck/guard should use diff OST-objects 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0146873 s, 143 MB/s /mnt/lustre/d17.sanity-lfsck/f1 and /mnt/lustre/d17.sanity-lfsck/guard should use diff OST-objects 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0154023 s, 136 MB/s PASS 17 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18a: Find out orphan OST-object and repair it (1) ========================================================== 05:00:49 (1713430849) ##### The target MDT-object is there, but related stripe information is lost or partly lost. The LFSCK should regenerate the missing layout EA entries. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0278253 s, 75.4 MB/s [0x200000402:0x70:0x0] /mnt/lustre/d18a.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 108 0x6c 0x280000401 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.022722 s, 92.3 MB/s [0x240000402:0x2:0x0] /mnt/lustre/d18a.sanity-lfsck/a2/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 2 0x2 0x2c0000400 0 2 0x2 0x280000400 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.039666 s, 52.9 MB/s [0x200000402:0x71:0x0] /mnt/lustre/d18a.sanity-lfsck/f3 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x6d:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x2:0x0] } Inject failure, to make the MDT-object lost its layout EA fail_loc=0x1615 fail_loc=0x1615 fail_loc=0 fail_loc=0 The file size should be incorrect since layout EA is lost Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout [0x200000402:0x70:0x0] /mnt/lustre/d18a.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 108 0x6c 0x280000401 [0x240000402:0x2:0x0] /mnt/lustre/d18a.sanity-lfsck/a2/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 1 obdidx objid objid group 1 2 0x2 0x2c0000400 0 2 0x2 0x280000400 [0x200000402:0x71:0x0] /mnt/lustre/d18a.sanity-lfsck/f3 lcm_layout_gen: 1 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x6d:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x2:0x0] } The file size should be correct after layout LFSCK scanning PASS 18a (7s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18b: Find out orphan OST-object and repair it (2) ========================================================== 05:00:58 (1713430858) ##### The target MDT-object is lost. The LFSCK should re-create the MDT-object under .lustre/lost+found/MDTxxxx. The admin should can move it back to normal namespace manually. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0254437 s, 82.4 MB/s [0x200000402:0x75:0x0] /mnt/lustre/d18b.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 110 0x6e 0x280000401 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0231892 s, 90.4 MB/s [0x240000402:0x4:0x0] /mnt/lustre/d18b.sanity-lfsck/a2/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 3 0x3 0x2c0000400 0 3 0x3 0x280000400 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0364153 s, 57.6 MB/s [0x200000402:0x76:0x0] /mnt/lustre/d18b.sanity-lfsck/f3 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x6f:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x3:0x0] } Inject failure, to simulate the case of missing the MDT-object fail_loc=0x1616 fail_loc=0x1616 fail_loc=0 fail_loc=0 Trigger layout LFSCK --dryrun to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Move the files from ./lustre/lost+found/MDTxxxx to namespace [0x200000402:0x75:0x0] /mnt/lustre/d18b.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 110 0x6e 0x280000401 [0x240000402:0x4:0x0] /mnt/lustre/d18b.sanity-lfsck/a2/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 1 obdidx objid objid group 1 3 0x3 0x2c0000400 0 3 0x3 0x280000400 [0x200000402:0x76:0x0] /mnt/lustre/d18b.sanity-lfsck/f3 lcm_layout_gen: 1 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x6f:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x3:0x0] } The file size should be correct after layout LFSCK scanning PASS 18b (8s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18c: Find out orphan OST-object and repair it (3) ========================================================== 05:01:08 (1713430868) ##### The target MDT-object is lost, and the OST-object FID is missing. The LFSCK should re-create the MDT-object with new FID under the directory .lustre/lost+found/MDTxxxx. ##### Inject failure, to simulate the case of missing parent FID fail_loc=0x1617 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0243731 s, 86.0 MB/s /mnt/lustre/d18c.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 112 0x70 0x280000401 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0259816 s, 80.7 MB/s /mnt/lustre/d18c.sanity-lfsck/a2/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 4 0x4 0x280000400 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0383395 s, 54.7 MB/s /mnt/lustre/d18c.sanity-lfsck/f3 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x71:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4:0x0] } fail_loc=0 Inject failure, to simulate the case of missing the MDT-object fail_loc=0x1616 fail_loc=0x1616 fail_loc=0 fail_loc=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout total 12 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 . 144115205306056705 drwx------ 2 root root 4096 Apr 18 05:01 MDT0000 162129603815538689 drwx------ 2 root root 4096 Apr 18 05:01 MDT0001 There should NOT be some stub under .lustre/lost+found/MDT0001/ There should be some stub under .lustre/lost+found/MDT0000/ PASS 18c (7s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18d: Find out orphan OST-object and repair it (4) ========================================================== 05:01:16 (1713430876) ##### The target MDT-object layout EA is corrupted, but the right OST-object is still alive as orphan. The layout LFSCK will not create new OST-object to occupy such slot. ##### [0x200000402:0x7f:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 114 0x72 0x280000401 [0x200000402:0x80:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 115 0x73 0x280000401 [0x200000402:0x81:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f3 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 116 0x74 0x280000401 [0x200000402:0x82:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x75:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 Inject failure to make /mnt/lustre/d18d.sanity-lfsck/a1/f1 and /mnt/lustre/d18d.sanity-lfsck/a1/f2 to reference the same OST-object (which is f1's OST-obejct). Then drop /mnt/lustre/d18d.sanity-lfsck/a1/f1 and its OST-object, so f2 becomes dangling reference case, but f2's old OST-object is there. The failure also makes /mnt/lustre/d18d.sanity-lfsck/a1/f3 and /mnt/lustre/d18d.sanity-lfsck/a1/f4 to reference the same OST-object (which is f3's OST-obejct). Then drop /mnt/lustre/d18d.sanity-lfsck/a1/f3 and its OST-object, so f4 becomes dangling reference case, but f4's old OST-object is there. fail_loc=0x1618 fail_loc=0 stopall to cleanup object cache setupall oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Using TIMEOUT=20 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout The file size should be correct after layout LFSCK scanning The LFSCK should find back the original data. foo [0x200000402:0x80:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 115 0x73 0x280000401 foo [0x200000402:0x82:0x0] /mnt/lustre/d18d.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0x75:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 PASS 18d (65s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18e: Find out orphan OST-object and repair it (5) ========================================================== 05:02:23 (1713430943) ##### The target MDT-object layout EA slot is occpuied by some new created OST-object when repair dangling reference case. Such conflict OST-object has been modified by others. To keep the new data, the LFSCK will create a new file to refernece this old orphan OST-object. ##### [0x200000bd1:0x3:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 162 0xa2 0x280000401 [0x200000bd1:0x4:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 163 0xa3 0x280000401 [0x200000bd1:0x5:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f3 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 164 0xa4 0x280000401 [0x200000bd1:0x6:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f4 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xa5:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: 0 lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: -1 Inject failure to make /mnt/lustre/d18e.sanity-lfsck/a1/f1 and /mnt/lustre/d18e.sanity-lfsck/a1/f2 to reference the same OST-object (which is f1's OST-obejct). Then drop /mnt/lustre/d18e.sanity-lfsck/a1/f1 and its OST-object, so f2 becomes dangling reference case, but f2's old OST-object is there. Also the failure makes /mnt/lustre/d18e.sanity-lfsck/a1/f3 and /mnt/lustre/d18e.sanity-lfsck/a1/f4 to reference the same OST-object (which is f3's OST-obejct). Then drop /mnt/lustre/d18e.sanity-lfsck/a1/f3 and its OST-object, so f4 becomes dangling reference case, but f4's old OST-object is there. fail_loc=0x1618 fail_loc=0 stopall to cleanup object cache setupall oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Using TIMEOUT=20 fail_val=10 fail_loc=0x1602 debug=-1 debug_mb=150 debug=-1 debug_mb=150 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Write new data to f2/f4 to modify the new created OST-object. fail_val=0 fail_loc=0 debug_mb=21 debug_mb=21 debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck There should be stub file under .lustre/lost+found/MDT0000/ The stub file should keep the original f2 or f4 data foo [0x200000403:0x6:0x0] /mnt/lustre/.lustre/lost+found/MDT0000/[0x200000403:0x6:0x0]-[0x200000bd1:0x4:0x0]-0-C-0 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 163 0xa3 0x280000401 foo [0x200000403:0x7:0x0] /mnt/lustre/.lustre/lost+found/MDT0000/[0x200000403:0x7:0x0]-[0x200000bd1:0x6:0x0]-0-C-0 lcm_layout_gen: 4 lcm_mirror_count: 1 lcm_entry_count: 1 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xa5:0x0] } The f2/f4 should contains new data. dummy [0x200000bd1:0x4:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f2 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 162 0xa2 0x280000401 dummy [0x200000bd1:0x6:0x0] /mnt/lustre/d18e.sanity-lfsck/a1/f4 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xa4:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1048576 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x42:0x0] } PASS 18e (68s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18f: Skip the failed OST(s) when handle orphan OST-objects ========================================================== 05:03:32 (1713431012) ##### The target MDT-object is lost. The LFSCK should re-create the MDT-object under .lustre/lost+found/MDTxxxx. If some OST fail to verify some OST-object(s) during the first stage-scanning, the LFSCK should skip orphan OST-objects for such OST. Others should not be affected. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0159235 s, 132 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0146823 s, 143 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0153819 s, 136 MB/s /mnt/lustre/d18f.sanity-lfsck/a1/f1 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 195 0xc3 0x280000401 /mnt/lustre/d18f.sanity-lfsck/a2/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 196 0xc4 0x280000401 1 67 0x43 0x2c0000401 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.014336 s, 146 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0140267 s, 150 MB/s 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0169983 s, 123 MB/s /mnt/lustre/d18f.sanity-lfsck/a3/f3 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 67 0x43 0x280000400 /mnt/lustre/d18f.sanity-lfsck/a4/f4 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 68 0x44 0x280000400 1 66 0x42 0x2c0000400 Inject failure, to simulate the case of missing the MDT-object fail_loc=0x1616 fail_loc=0x1616 fail_loc=0 fail_loc=0 Inject failure, to simulate the OST0 fail to handle MDT0 LFSCK request during the first-stage scanning. fail_loc=0x161c fail_val=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout fail_loc=0 fail_val=0 Trigger layout LFSCK on all devices again to cleanup Started LFSCK on the device lustre-MDT0000: scrub layout PASS 18f (9s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18g: Find out orphan OST-object and repair it (7) ========================================================== 05:03:43 (1713431023) ##### The target MDT-object is lost, but related OI mapping is there The LFSCK should recreate the lost MDT-object without affected by the stale OI mapping. ##### 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0189413 s, 111 MB/s [0x2000013a1:0xb:0x0] /mnt/lustre/d18g.sanity-lfsck/a1/f1 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 197 0xc5 0x280000401 1 68 0x44 0x2c0000401 Inject failure to simulate lost MDT-object but keep OI mapping fail_loc=0x162e fail_loc=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Move the files from ./lustre/lost+found/MDTxxxx to namespace [0x2000013a1:0xb:0x0] /mnt/lustre/d18g.sanity-lfsck/a1/f1 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 0 obdidx objid objid group 0 197 0xc5 0x280000401 1 68 0x44 0x2c0000401 PASS 18g (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 18h: LFSCK can repair crashed PFL extent range ========================================================== 05:03:48 (1713431028) ##### The PFL extent crashed. During the first cycle LFSCK scanning, the layout LFSCK will keep the bad PFL file(s) there without scanning its OST-object(s). Then in the second stage scanning, the OST will return related OST-object(s) to the MDT as orphan. And then the LFSCK on the MDT can rebuild the PFL extent with the 'orphan(s)' stripe information. ##### 0+1 records in 0+1 records out 291280 bytes (291 kB) copied, 0.00316646 s, 92.0 MB/s Inject failure stub to simulate bad PFL extent range fail_loc=0x162f fail_loc=0 dd: error writing '/mnt/lustre/d18h.sanity-lfsck/f0': No data available 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00277863 s, 0.0 kB/s Trigger layout LFSCK to find out the bad lmm_oi and fix them Started LFSCK on the device lustre-MDT0000: scrub layout Data in /mnt/lustre/d18h.sanity-lfsck/f0 should not be broken Write should succeed after LFSCK repairing the bad PFL range 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0071252 s, 147 MB/s PASS 18h (4s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 19a: OST-object inconsistency self detect ========================================================== 05:03:54 (1713431034) Inject failure, then client will offer wrong parent FID when read fail_loc=0x1619 Read RPC with wrong parent FID should be denied cat: /mnt/lustre/d19a.sanity-lfsck/a0: Operation not permitted cat: /mnt/lustre/d19a.sanity-lfsck/a1: Operation not permitted fail_loc=0 PASS 19a (4s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 19b: OST-object inconsistency self repair ========================================================== 05:04:00 (1713431040) Inject failure stub to make the OST-object to back point to non-exist MDT-object fail_loc=0x1611 fail_loc=0 Nothing should be fixed since self detect and repair is disabled Read RPC with right parent FID should be accepted, and cause parent FID on OST to be fixed foo1 foo2 PASS 19b (6s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 20a: Handle the orphan with dummy LOV EA slot properly ========================================================== 05:04:07 (1713431047) ##### The target MDT-object and some of its OST-object are lost. The LFSCK should find out the left OST-objects and re-create the MDT-object under the direcotry .lustre/lost+found/MDTxxxx/ with the partial OST-objects (LOV EA hole). New client can access the file with LOV EA hole via normal system tools or commands without crash the system. For old client, even though it cannot access the file with LOV EA hole, it should not cause the system crash. ##### 257+0 records in 257+0 records out 1052672 bytes (1.1 MB) copied, 0.0447939 s, 23.5 MB/s 257+0 records in 257+0 records out 1052672 bytes (1.1 MB) copied, 0.0458886 s, 22.9 MB/s 257+0 records in 257+0 records out 1052672 bytes (1.1 MB) copied, 0.0423799 s, 24.8 MB/s [0x2000013a1:0x1c:0x0] /mnt/lustre/d20a.sanity-lfsck/a1/f0 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 205 0xcd 0x280000401 1 70 0x46 0x2c0000401 [0x2000013a1:0x1d:0x0] /mnt/lustre/d20a.sanity-lfsck/a1/f1 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 206 0xce 0x280000401 1 71 0x47 0x2c0000401 [0x2000013a1:0x1e:0x0] /mnt/lustre/d20a.sanity-lfsck/a1/f2 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 207 0xcf 0x280000401 1 72 0x48 0x2c0000401 Inject failure... To simulate f0 lost MDT-object fail_loc=0x1616 To simulate f1 lost MDT-object and OST-object0 fail_loc=0x161a To simulate f2 lost MDT-object and OST-object1 fail_val=1 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) fail_loc=0 fail_val=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1c:0x0]-R-0, which is the old f0 /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1c:0x0]-R-0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a1 lmm_object_id: 0x1c lmm_fid: [0x2000013a1:0x1c:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 0 obdidx objid objid group 0 205 0xcd 0x280000401 1 70 0x46 0x2c0000401 Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1d:0x0]-R-0, it contains the old f1's stripe1 /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1d:0x0]-R-0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a1 lmm_object_id: 0x1d lmm_fid: [0x2000013a1:0x1d:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 0 0 0 1 71 0x47 0x2c0000401 cat: /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1d:0x0]-R-0: Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1d:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000608151 s, 0.0 kB/s 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00428 s, 957 kB/s /home/green/git/lustre-release/lustre/tests/sanity-lfsck.sh: line 3371: echo: write error: Input/output error Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1e:0x0]-R-0, it contains the old f2's stripe0 /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1e:0x0]-R-0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a1 lmm_object_id: 0x1e lmm_fid: [0x2000013a1:0x1e:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 obdidx objid objid group 0 207 0xcf 0x280000401 0 0 0 0 cat: /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1e:0x0]-R-0: Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a1:0x1e:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000561437 s, 0.0 kB/s PASS 20a (7s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 20b: Handle the orphan with dummy LOV EA slot properly - PFL case ========================================================== 05:04:16 (1713431056) ##### The target MDT-object and some of its OST-object are lost. The LFSCK should find out the left OST-objects and re-create the MDT-object under the direcotry .lustre/lost+found/MDTxxxx/ with the partial OST-objects (LOV EA hole). New client can access the file with LOV EA hole via normal system tools or commands without crash the system - PFL case. ##### 769+0 records in 769+0 records out 3149824 bytes (3.1 MB) copied, 0.142415 s, 22.1 MB/s 769+0 records in 769+0 records out 3149824 bytes (3.1 MB) copied, 0.132725 s, 23.7 MB/s 769+0 records in 769+0 records out 3149824 bytes (3.1 MB) copied, 0.136197 s, 23.1 MB/s [0x2000013a3:0x6:0x0] /mnt/lustre/d20b.sanity-lfsck/f0 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xd0:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4a:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4d:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x280000401:0xd3:0x0] } [0x2000013a3:0x7:0x0] /mnt/lustre/d20b.sanity-lfsck/f1 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4b:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x280000401:0xd1:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xd4:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4e:0x0] } [0x2000013a3:0x8:0x0] /mnt/lustre/d20b.sanity-lfsck/f2 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xd2:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4c:0x0] } lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4f:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x280000401:0xd5:0x0] } Inject failure... To simulate f0 lost MDT-object fail_loc=0x1616 To simulate the case of f1 lost MDT-object and the first OST-object in each PFL component fail_loc=0x161a To simulate the case of f2 lost MDT-object and the second OST-object in each PFL component fail_val=1 fail_loc=0 fail_val=0 Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device lustre-MDT0000: scrub layout Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x6:0x0]-R-0, which is the old f0 /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x6:0x0]-R-0 composite_header: lcm_magic: 0x0BD60BD0 lcm_size: 288 lcm_flags: 0 lcm_layout_gen: 1 lcm_mirror_count: 1 lcm_entry_count: 2 components: - lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lcme_offset: 128 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a3 lmm_object_id: 0x6 lmm_fid: [0x2000013a3:0x6:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xd0:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4a:0x0] } - lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lcme_offset: 208 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a3 lmm_object_id: 0x6 lmm_fid: [0x2000013a3:0x6:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 2 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4d:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x280000401:0xd3:0x0] } Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x7:0x0]-R-0, it contains f1's second OST-object in each COMP /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x7:0x0]-R-0 composite_header: lcm_magic: 0x0BD60BD0 lcm_size: 288 lcm_flags: 0 lcm_layout_gen: 2 lcm_mirror_count: 1 lcm_entry_count: 2 components: - lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lcme_offset: 128 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a3 lmm_object_id: 0x7 lmm_fid: [0x2000013a3:0x7:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x0:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x280000401:0xd1:0x0] } - lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lcme_offset: 208 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a3 lmm_object_id: 0x7 lmm_fid: [0x2000013a3:0x7:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x0:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4e:0x0] } cat: /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x7:0x0]-R-0: Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x7:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.0004532 s, 0.0 kB/s 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00459408 s, 892 kB/s /home/green/git/lustre-release/lustre/tests/sanity-lfsck.sh: line 3729: echo: write error: Input/output error Check /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x8:0x0]-R-0, it contains f2's first stripe in each COMP /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x8:0x0]-R-0 composite_header: lcm_magic: 0x0BD60BD0 lcm_size: 288 lcm_flags: 0 lcm_layout_gen: 3 lcm_mirror_count: 1 lcm_entry_count: 2 components: - lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 2097152 lcme_offset: 128 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a3 lmm_object_id: 0x8 lmm_fid: [0x2000013a3:0x8:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x280000401:0xd2:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x0:0x0] } - lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 2097152 lcme_extent.e_end: EOF lcme_offset: 208 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000013a3 lmm_object_id: 0x8 lmm_fid: [0x2000013a3:0x8:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 40000001 lmm_layout_gen: 1 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4f:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x0:0x0] } cat: /mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x8:0x0]-R-0: Input/output error dd: error writing '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0x8:0x0]-R-0': Input/output error 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000669699 s, 0.0 kB/s 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.00580544 s, 706 kB/s /home/green/git/lustre-release/lustre/tests/sanity-lfsck.sh: line 3811: echo: write error: Input/output error PASS 20b (8s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 21: run all LFSCK components by default ========================================================== 05:04:26 (1713431066) total: 100 open/close in 0.26 seconds: 385.10 ops/second Start all LFSCK components by default (-s 1) Started LFSCK on the device lustre-MDT0000: scrub layout namespace namespace LFSCK should be in 'scanning-phase1' status layout LFSCK should be in 'scanning-phase1' status Stop all LFSCK components by default Stopped LFSCK on the device lustre-MDT0000. PASS 21 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 22a: LFSCK can repair unmatched pairs (1) ========================================================== 05:04:31 (1713431071) ##### The parent_A references the child directory via some name entry, but the child directory back references another parent_B via its .. name entry. The parent_B does not exist. Then the namespace LFSCK will repair the child directory's .. name entry. ##### Inject failure stub on MDT0 to simulate bad dotdot name entry The dummy's dotdot name entry references the guard. fail_loc=0x161e fail_loc=0 Trigger namespace LFSCK to repair unmatched pairs Started LFSCK on the device lustre-MDT0000: scrub namespace 'ls' should success after namespace LFSCK repairing PASS 22a (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 22b: LFSCK can repair unmatched pairs (2) ========================================================== 05:04:36 (1713431076) ##### The parent_A references the child directory via the name entry_B, but the child directory back references another parent_C via its .. name entry. The parent_C exists, but there is no the name entry_B under the parent_C. Then the namespace LFSCK will repair the child directory's .. name entry and its linkEA. ##### Inject failure stub on MDT0 to simulate bad dotdot name entry and bad linkEA. The dummy's dotdot name entry references the guard. The dummy's linkEA references n non-exist name entry. fail_loc=0x161e fail_loc=0 fid2path should NOT work on the dummy's FID [0x2000013a3:0x78:0x0] Trigger namespace LFSCK to repair unmatched pairs Started LFSCK on the device lustre-MDT0000: scrub namespace fid2path should work on the dummy's FID [0x2000013a3:0x78:0x0] after LFSCK PASS 22b (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 23a: LFSCK can repair dangling name entry (1) ========================================================== 05:04:40 (1713431080) ##### The name entry is there, but the MDT-object for such name entry does not exist. The namespace LFSCK should find out and repair the inconsistency as required. ##### Inject failure stub on MDT1 to simulate dangling name entry fail_loc=0x1620 fail_loc=0 'ls' should fail because of dangling name entry Trigger namespace LFSCK to find out dangling name entry Started LFSCK on the device lustre-MDT0000: scrub namespace 'ls' should fail because not re-create MDT-object by default Trigger namespace LFSCK again to repair dangling name entry Started LFSCK on the device lustre-MDT0000: scrub namespace 'ls' should success after namespace LFSCK repairing PASS 23a (4s) debug_raw_pointers=0 debug_raw_pointers=0 SKIP: sanity-lfsck test_23b skipping excluded test 23b debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 23c: LFSCK can repair dangling name entry (3) ========================================================== 05:04:46 (1713431086) ##### The objectA has multiple hard links, one of them corresponding to the name entry_B. But there is something wrong for the name entry_B and cause entry_B to references non-exist object_C. In the first-stage scanning, the LFSCK will think the entry_B as dangling, and re-create the lost object_C. And then others modified the re-created object_C. When the LFSCK comes to the second-stage scanning, it will find that the former re-creating object_C maybe wrong and try to replace the object_C with the real object_A. But because object_C has been modified, so the LFSCK cannot replace it. ##### debug=-1 debug_mb=150 debug=-1 debug_mb=150 parent_fid=[0x2000013a3:0x7c:0x0] total: 10 open/close in 0.06 seconds: 179.12 ops/second f0_fid=[0x2000013a3:0x87:0x0] f1_fid=[0x2000013a3:0x88:0x0] Inject failure stub on MDT0 to simulate dangling name entry fail_val=0x88 fail_loc=0x1621 fail_val=0 fail_loc=0 - unlinked 0 (time 1713431088 ; total 0 ; last 0) total: 10 unlinks in 0 seconds: inf unlinks/second 'ls' should fail because of dangling name entry fail_val=10 fail_loc=0x1602 Trigger namespace LFSCK to find out dangling name entry Started LFSCK on the device lustre-MDT0000: scrub namespace fail_val=0 fail_loc=0 debug_mb=21 debug_mb=21 PASS 23c (5s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 23d: LFSCK can repair a dangling name entry to a remote object ========================================================== 05:04:52 (1713431092) Stopping /mnt/lustre-mds1 (opts:) on oleg413-server oleg413-server: debugfs 1.46.2.wc5 (26-Mar-2022) Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-MDT0000 cat: /mnt/lustre/d23d.sanity-lfsck/mdt1dir/foo: Bad address Started LFSCK on the device lustre-MDT0001: scrub namespace Started LFSCK on the device lustre-MDT0000: scrub layout PASS 23d (11s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 24: LFSCK can repair multiple-referenced name entry ========================================================== 05:05:05 (1713431105) ##### Two MDT-objects back reference the same name entry via their each own linkEA entry, but the name entry only references one MDT-object. The namespace LFSCK will remove the linkEA entry for the MDT-object that is not recognized. If such MDT-object has no other linkEA entry after the removing, then the LFSCK will add it as orphan under the .lustre/lost+found/MDTxxxx/. ##### [0x2400013a2:0x8:0x0] [0x2400013a2:0x9:0x0] Inject failure stub on MDT0 to simulate the case that the /mnt/lustre/d24.sanity-lfsck/d0/dummy/foo has the 'bad' linkEA entry that references /mnt/lustre/d24.sanity-lfsck/d0/guard/foo. Then remove the name entry /mnt/lustre/d24.sanity-lfsck/d0/dummy/foo. So the MDT-object /mnt/lustre/d24.sanity-lfsck/d0/dummy/foo will be left there with the same linkEA entry as another MDT-object /mnt/lustre/d24.sanity-lfsck/d0/guard/foo has fail_loc=0x1622 [0x2000013a3:0x8d:0x0] fail_loc=0 stat /mnt/lustre/d24.sanity-lfsck/d0/dummy/foo should fail Trigger namespace LFSCK to repair multiple-referenced name entry Started LFSCK on the device lustre-MDT0000: scrub namespace There should be an orphan under .lustre/lost+found/MDT0000/ total 8 144115272414920845 drwxr-xr-x 2 root root 4096 Apr 18 05:05 . 144115205306056705 drwx------ 3 root root 4096 Apr 18 05:04 .. PASS 24 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 25: LFSCK can repair bad file type in the name entry ========================================================== 05:05:09 (1713431109) ##### The file type in the name entry does not match the file type claimed by the referenced object. Then the LFSCK will update the file type in the name entry. ##### Inject failure stub on MDT0 to simulate the case that the file type stored in the name entry is wrong. fail_loc=0x1623 fail_loc=0 Trigger namespace LFSCK to repair bad file type in the name entry Started LFSCK on the device lustre-MDT0000: scrub namespace total 8 144115272414920847 drwxr-xr-x 2 root root 4096 Apr 18 05:05 . 144115272414920846 drwxr-xr-x 3 root root 4096 Apr 18 05:05 .. 144115272414920848 -rw-r--r-- 1 root root 0 Apr 18 05:05 foo PASS 25 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 26a: LFSCK can add the missing local name entry back to the namespace ========================================================== 05:05:14 (1713431114) ##### The local name entry back referenced by the MDT-object is lost. The namespace LFSCK will add the missing local name entry back to the normal namespace. ##### Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the missing remote name entry Started LFSCK on the device lustre-MDT0000: scrub namespace 144115272414920851 -rw-r--r-- 2 root root 0 Apr 18 05:05 /mnt/lustre/d26a.sanity-lfsck/d0/foo PASS 26a (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 26b: LFSCK can add the missing remote name entry back to the namespace ========================================================== 05:05:18 (1713431118) ##### The remote name entry back referenced by the MDT-object is lost. The namespace LFSCK will add the missing remote name entry back to the normal namespace. ##### Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the missing remote name entry Started LFSCK on the device lustre-MDT0000: scrub namespace total 8 144115272414920853 drwxr-xr-x 2 root root 4096 Apr 18 05:05 . 162129670907625483 drwxr-xr-x 3 root root 4096 Apr 18 05:05 .. PASS 26b (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 27a: LFSCK can recreate the lost local parent directory as orphan ========================================================== 05:05:22 (1713431122) ##### The local parent referenced by the MDT-object linkEA is lost. The namespace LFSCK will re-create the lost parent as orphan. ##### Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. And then remove another hard link and the parent directory. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the lost parent Started LFSCK on the device lustre-MDT0000: scrub namespace There should be an orphan under .lustre/lost+found/MDT0000/ total 12 144115205306056705 drwx------ 3 root root 4096 Apr 18 05:05 . 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 .. 144115272414920855 drwx------ 2 root root 4096 Dec 31 1969 [0x2000013a3:0x97:0x0]-P-0 PASS 27a (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 27b: LFSCK can recreate the lost remote parent directory as orphan ========================================================== 05:05:27 (1713431127) ##### The remote parent referenced by the MDT-object linkEA is lost. The namespace LFSCK will re-create the lost parent as orphan. ##### [0x2400013a2:0xc:0x0] Inject failure stub on MDT0 to simulate the case that foo's name entry will be removed, but the foo's object and its linkEA are kept in the system. And then remove the parent directory. fail_loc=0x1624 fail_loc=0 Trigger namespace LFSCK to repair the missing remote name entry Started LFSCK on the device lustre-MDT0000: scrub namespace total 12 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 . 144115205306056705 drwx------ 2 root root 4096 Apr 18 05:05 MDT0000 162129603815538689 drwx------ 3 root root 4096 Apr 18 05:03 MDT0001 There should be an orphan under .lustre/lost+found/MDT0001/ total 12 162129603815538689 drwx------ 3 root root 4096 Apr 18 05:03 . 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 .. 162129670907625484 drwx------ 3 root root 4096 Dec 31 1969 [0x2400013a2:0xc:0x0]-P-0 PASS 27b (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 28: Skip the failed MDT(s) when handle orphan MDT-objects ========================================================== 05:05:31 (1713431131) ##### The target name entry is lost. The LFSCK should insert the orphan MDT-object under .lustre/lost+found/MDTxxxx. But if the MDT (on which the orphan MDT-object resides) has ever failed to respond some name entry verification during the first stage-scanning, then the LFSCK should skip to handle orphan MDT-object on this MDT. But other MDTs should not be affected. ##### Inject failure stub on MDT0 to simulate the case that d1/a1's name entry will be removed, but the d1/a1's object and its linkEA are kept in the system. And the case that d2/a2's name entry will be removed, but the d2/a2's object and its linkEA are kept in the system. fail_loc=0x1624 fail_loc=0x1624 fail_loc=0 fail_loc=0 Inject failure, to simulate the MDT0 fail to handle MDT1 LFSCK request during the first-stage scanning. fail_loc=0x161c fail_val=0 Trigger namespace LFSCK on all devices to find out orphan object Started LFSCK on the device lustre-MDT0000: scrub namespace fail_loc=0 fail_val=0 Trigger namespace LFSCK on all devices again to cleanup Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 28 (6s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 29b: LFSCK can repair bad nlink count (2) ========================================================== 05:05:38 (1713431138) ##### The object's nlink attribute is smaller than the object's known name entries count. The LFSCK will repair the object's nlink attribute to match the known name entries count ##### Inject failure stub on MDT0 to simulate the case that foo's nlink attribute is smaller than its name entries count. fail_loc=0x1626 fail_loc=0 Trigger namespace LFSCK to repair the nlink count Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 29b (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 29c: verify linkEA size limitation == 05:05:43 (1713431143) ##### The namespace LFSCK will create many hard links to the target file as to exceed the linkEA size limitation. Under such case the linkEA will be marked as overflow that will prevent the target file to be migrated. Then remove some hard links to make the left hard links to be held within the linkEA size limitation. But before the namespace LFSCK adding all the missed linkEA entries back, the overflow mark (timestamp) will not be cleared. ##### Create 150 hard links should succeed although the linkEA overflow total: 150 link in 0.54 seconds: 280.16 ops/second The object with linkEA overflow should NOT be migrated Remove 100 hard links to save space for the missed linkEA entries - unlinked 0 (time 1713431145 ; total 0 ; last 0) total: 100 unlinks in 0 seconds: inf unlinks/second Trigger namespace LFSCK to clear the overflow timestamp Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 29c (8s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 29d: accessing non-existing inode shouldn't turn fs read-only (ldiskfs) ========================================================== 05:05:52 (1713431152) ##### The object's nlink attribute is smaller than the object's known name entries count. The LFSCK will repair the object's nlink attribute to match the known name entries count ##### Inject failure stub on MDT0 to simulate the case that foo's nlink attribute is smaller than its name entries count. fail_loc=0x1626 fail_loc=0 stat: cannot stat '/mnt/lustre/d29d.sanity-lfsck/d0/foo': No such file or directory rm_entry total 0 -rw-r--r-- 1 root root 0 Apr 18 05:05 foo0 PASS 29d (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 30: LFSCK can recover the orphans from backend /lost+found ========================================================== 05:05:56 (1713431156) ##### The namespace LFSCK will move the orphans from backend /lost+found directory to normal client visible namespace or to global visible ./lustre/lost+found/MDTxxxx/ directory ##### Inject failure stub on MDT0 to simulate the case that directory d0 has no linkEA entry, then the LFSCK will move it into .lustre/lost+found/MDTxxxx/ later. fail_loc=0x161d fail_loc=0 Inject failure stub on MDT0 to simulate the case that the object's name entry will be removed, but not destroy the object. Then backend e2fsck will handle it as orphan and add them into the backend /lost+found directory. fail_loc=0x1624 fail_loc=0 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Stopping /mnt/lustre-mds1 (opts:) on oleg413-server run e2fsck on mds1 e2fsck -d -v -t -t -f -y /dev/mapper/mds1_flakey -m8 oleg413-server: e2fsck 1.46.2.wc5 (26-Mar-2022) oleg413-server: Use max possible thread num: 1 instead Pass 1: Checking inodes, blocks, and sizes [Thread 0] Scan group range [0, 2) [Thread 0] jumping to group 0 [Thread 0] e2fsck_pass1_run:2564: increase inode 81 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 82 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 83 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 89 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 90 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 91 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 92 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 93 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 94 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 95 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 96 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 97 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 98 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 99 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 100 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 101 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 102 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 103 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 104 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 105 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 106 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 107 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 108 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 109 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 110 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 111 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 112 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 113 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 114 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 115 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 116 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 117 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 118 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 119 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 120 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 121 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 122 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 123 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 125 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 126 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 127 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 128 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 129 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 130 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 131 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 132 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 133 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 134 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 135 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 136 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 137 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 138 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 139 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 140 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 141 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 142 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 143 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 144 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 145 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 146 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 147 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 148 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 149 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 150 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 151 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 152 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 153 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 154 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 155 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 156 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 157 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 158 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 159 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 160 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 161 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 162 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 163 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 164 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 165 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 166 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 167 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 168 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 169 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 170 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 171 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 172 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 173 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 174 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 175 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 176 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 177 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 178 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 179 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 180 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 181 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 182 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 185 badness 0 to 2 for 10084 [Thread 0] e2fsck_pass1_run:2564: increase inode 186 badness 0 to 2 for 10084 [Thread 0] group 1 finished [Thread 0] e2fsck_pass1_run:2564: increase inode 20036 badness 0 to 2 for 10084 [Thread 0] group 2 finished [Thread 0] Pass 1: Memory used: 264k/0k (131k/134k), time: 0.00/ 0.00/ 0.00 [Thread 0] Pass 1: I/O read: 1MB, write: 0MB, rate: 240.27MB/s [Thread 0] Scanned group range [0, 2), inodes 389 Pass 2: Checking directory structure Pass 2: Memory used: 264k/0k (88k/177k), time: 0.00/ 0.00/ 0.00 Pass 2: I/O read: 1MB, wrioleg413-server: [QUOTA WARNING] Usage inconsistent for ID 0:actual (2719744, 281) != expected (2719744, 282) oleg413-server: [QUOTA WARNING] Usage inconsistent for ID 0:actual (2719744, 281) != expected (2719744, 282) oleg413-server: [QUOTA WARNING] Usage inconsistent for ID 0:actual (2719744, 281) != expected (2719744, 282) pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 te: 0MB, rate: 318.47MB/s Pass 3: Checking directory connectivity Peak memory: Memory used: 264k/0k (89k/176k), time: 0.01/ 0.00/ 0.01 Unconnected directory inode 20101 (was in /ROOT/d30.sanity-lfsck/foo) Connect to /lost+found? yes Unconnected directory inode 20103 (was in /lost+found/#20101) Connect to /lost+found? yes Pass 3A: Memory used: 264k/0k (89k/176k), time: 0.00/ 0.00/ 0.00 Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 3: Memory used: 264k/0k (87k/178k), time: 0.00/ 0.00/ 0.00 Pass 3: I/O read: 1MB, write: 0MB, rate: 2070.39MB/s Pass 4: Checking reference counts Unattached inode 183 Connect to /lost+found? yes Inode 183 ref count is 2, should be 1. Fix? yes Unattached inode 184 Connect to /lost+found? yes Inode 184 ref count is 2, should be 1. Fix? yes Unattached inode 187 Connect to /lost+found? yes Inode 187 ref count is 2, should be 1. Fix? yes Unattached inode 195 Connect to /lost+found? yes Inode 195 ref count is 2, should be 1. Fix? yes Inode 20099 ref count is 1, should be 2. Fix? yes Inode 20103 ref count is 3, should be 2. Fix? yes Pass 4: Memory used: 264k/0k (69k/196k), time: 0.00/ 0.00/ 0.00 Pass 4: I/O read: 1MB, write: 1MB, rate: 500.00MB/s Pass 5: Checking group summary information Pass 5: Memory used: 264k/0k (68k/197k), time: 0.00/ 0.00/ 0.00 Pass 5: I/O read: 1MB, write: 1MB, rate: 601.32MB/s Update quota info for quota type 0? yes Update quota info for quota type 1? yes Update quota info for quota type 2? yes lustre-MDT0000: ***** FILE SYSTEM WAS MODIFIED ***** 291 inodes used (0.73%, out of 40000) 7 non-contiguous files (2.4%) 0 non-contiguous directories (0.0%) # of inodes with ind/dind/tind blocks: 2/0/0 12750 blocks used (51.00%, out of 25000) 0 bad blocks 1 large file 158 regular files 123 directories 0 character device files 0 block device files 0 fifos 4294967294 links 0 symbolic links (0 fast symbolic links) 0 sockets ------------ 275 files Memory used: 264k/0k (67k/197k), time: 0.02/ 0.01/ 0.01 I/O read: 1MB, write: 1MB, rate: 53.27MB/s oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Trigger namespace LFSCK to recover backend orphans Started LFSCK on the device lustre-MDT0000: scrub namespace Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre File: '/mnt/lustre/d30.sanity-lfsck/foo/f0' Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144115272414920875 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:05:57.000000000 -0400 Modify: 2024-04-18 05:05:57.000000000 -0400 Change: 2024-04-18 05:06:03.000000000 -0400 Birth: - total 12 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 . 144115205306056705 drwx------ 3 root root 4096 Apr 18 05:05 MDT0000 162129603815538689 drwx------ 2 root root 4096 Apr 18 05:05 MDT0001 d0 should become orphan under .lustre/lost+found/MDT0000/ total 16 144115205306056705 drwx------ 3 root root 4096 Apr 18 05:05 . 144115188109410307 dr-x------ 4 root root 4096 Dec 31 1969 .. 144115272414920839 -rw-r--r-- 1 root root 6 Apr 18 05:04 [0x2000013a3:0x87:0x0]-O-0 144115272414920856 -rw-r--r-- 1 root root 0 Apr 18 05:05 [0x2000013a3:0x98:0x0]-O-0 144115272414920876 drwxr-xr-x 3 root root 4096 Apr 18 05:06 [0x2000013a3:0xac:0x0]-[0x2000013a3:0xaa:0x0]-D-0 File: '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0xac:0x0]-[0x2000013a3:0xaa:0x0]-D-0/d1' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115272414920878 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:05:57.000000000 -0400 Modify: 2024-04-18 05:05:57.000000000 -0400 Change: 2024-04-18 05:06:03.000000000 -0400 Birth: - File: '/mnt/lustre/.lustre/lost+found/MDT0000/[0x2000013a3:0xac:0x0]-[0x2000013a3:0xaa:0x0]-D-0/f1' Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144115272414920877 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:05:57.000000000 -0400 Modify: 2024-04-18 05:05:57.000000000 -0400 Change: 2024-04-18 05:06:03.000000000 -0400 Birth: - PASS 30 (12s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31a: The LFSCK can find/repair the name entry with bad name hash (1) ========================================================== 05:06:10 (1713431170) ##### For the name entry under a striped directory, if the name hash does not match the shard, then the LFSCK will repair the bad name entry ##### Inject failure stub on client to simulate the case that some name entry should be inserted into other non-first shard, but inserted into the first shard by wrong fail_loc=0x1628 fail_val=0 total: 2 mkdir in 0.01 seconds: 202.39 ops/second fail_loc=0 fail_val=0 Trigger namespace LFSCK to repair bad name hash Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre File: '/mnt/lustre/d31a.sanity-lfsck/striped_dir/d0' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339490230275 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:10.000000000 -0400 Modify: 2024-04-18 05:06:10.000000000 -0400 Change: 2024-04-18 05:06:10.000000000 -0400 Birth: - File: '/mnt/lustre/d31a.sanity-lfsck/striped_dir/d1' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339490230276 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:10.000000000 -0400 Modify: 2024-04-18 05:06:10.000000000 -0400 Change: 2024-04-18 05:06:10.000000000 -0400 Birth: - PASS 31a (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31b: The LFSCK can find/repair the name entry with bad name hash (2) ========================================================== 05:06:14 (1713431174) ##### For the name entry under a striped directory, if the name hash does not match the shard, then the LFSCK will repair the bad name entry ##### Inject failure stub on client to simulate the case that some name entry should be inserted into other non-second shard, but inserted into the secod shard by wrong fail_loc=0x1628 fail_val=1 total: 10 mkdir in 0.04 seconds: 240.04 ops/second fail_loc=0 fail_val=0 Trigger namespace LFSCK to repair bad name hash Started LFSCK on the device lustre-MDT0000: scrub namespace repaired 1 name entries with bad hash 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d0' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007491 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d1' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007492 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d2' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007493 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d3' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007494 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d4' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007495 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d5' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007496 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d6' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007497 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d7' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007498 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d8' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007499 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - File: '/mnt/lustre/d31b.sanity-lfsck/striped_dir/d9' Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 2c54f966h/743766374d Inode: 144115339507007500 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:15.000000000 -0400 Modify: 2024-04-18 05:06:15.000000000 -0400 Change: 2024-04-18 05:06:15.000000000 -0400 Birth: - PASS 31b (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31c: Re-generate the lost master LMV EA for striped directory ========================================================== 05:06:19 (1713431179) ##### For some reason, the master MDT-object of the striped directory may lost its master LMV EA. If nobody created files under the master directly after the master LMV EA lost, then the LFSCK should re-generate the master LMV EA. ##### Inject failure stub on MDT0 to simulate the case that the master MDT-object of the striped directory lost the LMV EA. fail_loc=0x1629 fail_loc=0 Trigger namespace LFSCK to re-generate master LMV EA Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre PASS 31c (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31d: Set broken striped directory (modified after broken) as read-only ========================================================== 05:06:24 (1713431184) ##### For some reason, the master MDT-object of the striped directory may lost its master LMV EA. If somebody created files under the master directly after the master LMV EA lost, then the LFSCK should NOT re-generate the master LMV EA, instead, it should change the broken striped dirctory as read-only to prevent further damage ##### Inject failure stub on MDT0 to simulate the case that the master MDT-object of the striped directory lost the LMV EA. fail_loc=0x1629 fail_loc=0x0 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Stopping /mnt/lustre-mds1 (opts:) on oleg413-server Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-MDT0000 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Trigger namespace LFSCK to find out the inconsistency Started LFSCK on the device lustre-MDT0000: scrub namespace File: '/mnt/lustre/d31d.sanity-lfsck/striped_dir/dummy' Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144115373044662273 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-04-18 05:06:39.000000000 -0400 Modify: 2024-04-18 05:06:39.000000000 -0400 Change: 2024-04-18 05:06:39.000000000 -0400 Birth: - touch: cannot touch '/mnt/lustre/d31d.sanity-lfsck/striped_dir/foo': Permission denied Trigger namespace LFSCK to find out the inconsistency Started LFSCK on the device lustre-MDT0000: scrub namespace Stopping /mnt/lustre-mds1 (opts:) on oleg413-server Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Started lustre-MDT0000 PASS 31d (25s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31e: Re-generate the lost slave LMV EA for striped directory (1) ========================================================== 05:06:51 (1713431211) ##### For some reason, the slave MDT-object of the striped directory may lost its slave LMV EA. The LFSCK should re-generate the slave LMV EA. ##### Inject failure stub on MDT0 to simulate the case that the slave MDT-object (that resides on the same MDT as the master MDT-object resides on) lost the LMV EA. fail_loc=0x162a fail_val=0 fail_loc=0x0 fail_val=0 Trigger namespace LFSCK to re-generate slave LMV EA Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 31e (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31f: Re-generate the lost slave LMV EA for striped directory (2) ========================================================== 05:06:55 (1713431215) ##### For some reason, the slave MDT-object of the striped directory may lost its slave LMV EA. The LFSCK should re-generate the slave LMV EA. ##### Inject failure stub on MDT0 to simulate the case that the slave MDT-object (that resides on different MDT as the master MDT-object resides on) lost the LMV EA. fail_loc=0x162a fail_val=1 fail_loc=0x0 fail_val=0 Trigger namespace LFSCK to re-generate slave LMV EA Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 31f (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31g: Repair the corrupted slave LMV EA ========================================================== 05:07:00 (1713431220) ##### For some reason, the stripe index in the slave LMV EA is corrupted. The LFSCK should repair the slave LMV EA. ##### Inject failure stub on MDT0 to simulate the case that the slave LMV EA on the first shard of the striped directory claims the same index as the second shard claims fail_loc=0x162b fail_val=0 fail_loc=0x0 fail_val=0 Trigger namespace LFSCK to repair the slave LMV EA Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre PASS 31g (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 31h: Repair the corrupted shard's name entry ========================================================== 05:07:05 (1713431225) ##### For some reason, the shard's name entry in the striped directory may be corrupted. The LFSCK should repair the bad shard's name entry. ##### Inject failure stub on MDT0 to simulate the case that the first shard's name entry in the striped directory claims the same index as the second shard's name entry claims. fail_loc=0x162c fail_val=0 fail_loc=0x0 fail_val=0 Trigger namespace LFSCK to repair the shard's name entry Started LFSCK on the device lustre-MDT0000: scrub namespace 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre PASS 31h (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 32a: stop LFSCK when some OST failed ========================================================== 05:07:09 (1713431229) preparing... 5 * 5 files will be created Thu Apr 18 05:07:10 EDT 2024. total: 5 mkdir in 0.01 seconds: 782.23 ops/second total: 5 create in 0.01 seconds: 892.06 ops/second total: 5 mkdir in 0.01 seconds: 748.61 ops/second prepared Thu Apr 18 05:07:10 EDT 2024. 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) fail_val=3 fail_loc=0x162d Started LFSCK on the device lustre-MDT0000: scrub layout stop ost1 fail_loc=0 fail_val=0 stop LFSCK Stopped LFSCK on the device lustre-MDT0000. oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 PASS 32a (13s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 32b: stop LFSCK when some MDT failed ========================================================== 05:07:23 (1713431243) Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions oleg413-server: libkmod: kmod_module_get_holders: could not open '/sys/module/acpi_cpufreq/holders': No such file or directory Setup mgs, mdt, osts Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/mds1_flakey is already mounted on /mnt/lustre-mds1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 Start of /dev/mapper/mds1_flakey on mds1 failed 17 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/mds2_flakey is already mounted on /mnt/lustre-mds2 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 Start of /dev/mapper/mds2_flakey on mds2 failed 17 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/ost2_flakey is already mounted on /mnt/lustre-ost2 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of /dev/mapper/ost2_flakey on ost2 failed 17 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff880131027800.idle_timeout=debug osc.lustre-OST0001-osc-ffff880131027800.idle_timeout=debug disable quota as required preparing... 5 * 5 files will be created Thu Apr 18 05:07:43 EDT 2024. total: 5 mkdir in 0.01 seconds: 823.74 ops/second total: 5 create in 0.01 seconds: 834.72 ops/second total: 5 mkdir in 0.01 seconds: 824.26 ops/second prepared Thu Apr 18 05:07:44 EDT 2024. 192.168.204.113@tcp:/lustre /mnt/lustre lustre rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project 0 0 Stopping client oleg413-client.virtnet /mnt/lustre (opts:) fail_val=3 fail_loc=0x162d Started LFSCK on the device lustre-MDT0000: scrub namespace stop mds2 fail_loc=0 fail_val=0 stop LFSCK Stopped LFSCK on the device lustre-MDT0000. oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 PASS 32b (34s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 33: check LFSCK paramters =========== 05:07:59 (1713431279) Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/mds1_flakey is already mounted on /mnt/lustre-mds1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 Start of /dev/mapper/mds1_flakey on mds1 failed 17 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/mds2_flakey is already mounted on /mnt/lustre-mds2 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 Start of /dev/mapper/mds2_flakey on mds2 failed 17 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/ost1_flakey is already mounted on /mnt/lustre-ost1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 seq.cli-lustre-OST0000-super.width=65536 Start of /dev/mapper/ost1_flakey on ost1 failed 17 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 oleg413-server: mount.lustre: according to /etc/mtab /dev/mapper/ost2_flakey is already mounted on /mnt/lustre-ost2 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 17 seq.cli-lustre-OST0001-super.width=65536 Start of /dev/mapper/ost2_flakey on ost2 failed 17 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8800aca1e000.idle_timeout=debug osc.lustre-OST0001-osc-ffff8800aca1e000.idle_timeout=debug disable quota as required preparing... 5 * 5 files will be created Thu Apr 18 05:08:18 EDT 2024. total: 5 mkdir in 0.01 seconds: 789.62 ops/second total: 5 create in 0.01 seconds: 889.98 ops/second total: 5 mkdir in 0.01 seconds: 896.72 ops/second prepared Thu Apr 18 05:08:19 EDT 2024. Started LFSCK on the device lustre-MDT0000: scrub layout Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 33 (53s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 34: LFSCK can rebuild the lost agent object ========================================================== 05:08:54 (1713431334) SKIP: sanity-lfsck test_34 Only valid for ZFS backend SKIP 34 (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 35: LFSCK can rebuild the lost agent entry ========================================================== 05:08:56 (1713431336) preparing... 1 * 1 files will be created Thu Apr 18 05:08:57 EDT 2024. total: 1 mkdir in 0.00 seconds: 610.17 ops/second total: 1 create in 0.00 seconds: 730.46 ops/second total: 1 mkdir in 0.00 seconds: 590.33 ops/second prepared Thu Apr 18 05:08:57 EDT 2024. fail_loc=0x1631 fail_loc=0 Started LFSCK on the device lustre-MDT0000: scrub namespace stopall to cleanup object cache setupall oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Using TIMEOUT=20 Started LFSCK on the device lustre-MDT0000: scrub namespace PASS 35 (67s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 36a: rebuild LOV EA for mirrored file (1) ========================================================== 05:10:05 (1713431405) SKIP: sanity-lfsck test_36a needs >= 3 OSTs SKIP 36a (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 36b: rebuild LOV EA for mirrored file (2) ========================================================== 05:10:08 (1713431408) SKIP: sanity-lfsck test_36b needs >= 3 OSTs SKIP 36b (0s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 36c: rebuild LOV EA for mirrored file (3) ========================================================== 05:10:10 (1713431410) SKIP: sanity-lfsck test_36c needs >= 3 OSTs SKIP 36c (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 37: LFSCK must skip a ORPHAN ======== 05:10:13 (1713431413) multiop /mnt/lustre/d37.sanity-lfsck/d0 vD_c TMPPIPE=/tmp/multiop_open_wait_pipe.7537 Started LFSCK on the device lustre-MDT0000: scrub namespace stat: cannot stat '/mnt/lustre/d37.sanity-lfsck/d0': No such file or directory PASS 37 (2s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 38: LFSCK does not break foreign file and reverse is also true ========================================================== 05:10:16 (1713431416) striped dir -i0 -c2 -H crush /mnt/lustre/d38.sanity-lfsck lfm_magic: 0x0BD70BD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: '821efe76-1d9f-4dd8-8fdd-abd20358de51@796d4b40-b7d0-463b-ba18-27dd1620bac6' lfs setstripe: setstripe error for '/mnt/lustre/d38.sanity-lfsck/f38.sanity-lfsck': stripe already set Started LFSCK on the device lustre-MDT0000: scrub namespace Started LFSCK on the device lustre-MDT0000: scrub layout post-lfsck checks of foreign file lfm_magic: 0x0BD70BD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: '821efe76-1d9f-4dd8-8fdd-abd20358de51@796d4b40-b7d0-463b-ba18-27dd1620bac6' lfs setstripe: setstripe error for '/mnt/lustre/d38.sanity-lfsck/f38.sanity-lfsck': stripe already set cat: /mnt/lustre/d38.sanity-lfsck/f38.sanity-lfsck: No data available cat: write error: Bad file descriptor PASS 38 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 39: LFSCK does not break foreign dir and reverse is also true ========================================================== 05:10:21 (1713431421) striped dir -i1 -c2 -H all_char /mnt/lustre/d39.sanity-lfsck lfm_magic: 0x0CD50CD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: 'c19e6063-4649-4b23-be3f-11dda4cc1616@b76972df-c457-408f-9aeb-a7b1bdbda807' touch: cannot touch '/mnt/lustre/d39.sanity-lfsck/d39.sanity-lfsck2/f39.sanity-lfsck': No data available Started LFSCK on the device lustre-MDT0000: scrub namespace Started LFSCK on the device lustre-MDT0000: scrub layout post-lfsck checks of foreign dir lfm_magic: 0x0CD50CD0 lfm_length: 73 lfm_type: 0x00000000 (none) lfm_flags: 0x0000DA05 lfm_value: 'c19e6063-4649-4b23-be3f-11dda4cc1616@b76972df-c457-408f-9aeb-a7b1bdbda807' touch: cannot touch '/mnt/lustre/d39.sanity-lfsck/d39.sanity-lfsck2/f39.sanity-lfsck': No data available PASS 39 (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 40a: LFSCK correctly fixes lmm_oi in composite layout ========================================================== 05:10:26 (1713431426) Migrate /mnt/lustre/d40a.sanity-lfsck/dir1 from MDT1 to MDT0 trigger LFSCK for layout Started LFSCK on the device lustre-MDT0000: scrub layout PASS 40a (3s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 41: SEL support in LFSCK ============ 05:10:31 (1713431431) debug=+lfsck trigger LFSCK for SEL layout Started LFSCK on the device lustre-MDT0000: scrub layout namespace debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck PASS 41 (5s) debug_raw_pointers=0 debug_raw_pointers=0 debug_raw_pointers=Y debug_raw_pointers=Y == sanity-lfsck test 42: LFSCK can repair inconsistent MDT-object/OST-object encryption flags ========================================================== 05:10:37 (1713431437) ##### If the MDT-object has the encryption flag but the OST-object does not, add it to the OST-object. ##### SKIP: sanity-lfsck test_42 client encryption not supported SKIP 42 (1s) debug_raw_pointers=0 debug_raw_pointers=0 debug=0 Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:) Stopping client oleg413-client.virtnet /mnt/lustre opts: Stopping clients: oleg413-client.virtnet /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on oleg413-server Stopping /mnt/lustre-mds2 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost1 (opts:-f) on oleg413-server Stopping /mnt/lustre-ost2 (opts:-f) on oleg413-server unloading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing unload_modules_local modules unloaded. === sanity-lfsck: start setup 05:11:17 (1713431477) === Stopping clients: oleg413-client.virtnet /mnt/lustre (opts:-f) Stopping clients: oleg413-client.virtnet /mnt/lustre2 (opts:-f) pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 2 oleg413-server: oleg413-server.virtnet: executing set_hostid Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions oleg413-server: ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' oleg413-server: quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: /dev/vdc pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Format mds2: /dev/vdd pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Format ost1: /dev/vde pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Format ost2: /dev/vdf pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Checking servers environments Checking clients oleg413-client.virtnet environments Loading modules from /home/green/git/lustre-release/lustre detected 4 online CPUs by sysfs Force libcfs to create 2 CPU partitions loading modules on: 'oleg413-server' oleg413-server: oleg413-server.virtnet: executing load_modules_local oleg413-server: Loading modules from /home/green/git/lustre-release/lustre oleg413-server: detected 4 online CPUs by sysfs oleg413-server: Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vdc Started lustre-MDT0000 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting mds2: -o localrecov /dev/mapper/mds2_flakey /mnt/lustre-mds2 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vdd Started lustre-MDT0001 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting ost1: -o localrecov /dev/mapper/ost1_flakey /mnt/lustre-ost1 seq.cli-lustre-OST0000-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vde Started lustre-OST0000 pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Starting ost2: -o localrecov /dev/mapper/ost2_flakey /mnt/lustre-ost2 seq.cli-lustre-OST0001-super.width=65536 oleg413-server: oleg413-server.virtnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all pdsh@oleg413-client: oleg413-server: ssh exited with exit code 1 Commit the device label on /dev/vdf Started lustre-OST0001 Starting client: oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Starting client oleg413-client.virtnet: -o user_xattr,flock oleg413-server@tcp:/lustre /mnt/lustre Started clients oleg413-client.virtnet: 192.168.204.113@tcp:/lustre on /mnt/lustre type lustre (rw,checksum,flock,user_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,noencrypt,statfs_project) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff880136aa9800.idle_timeout=debug osc.lustre-OST0001-osc-ffff880136aa9800.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 4s: want 'procname_uid' got 'procname_uid' disable quota as required osd-ldiskfs.track_declares_assert=1 === sanity-lfsck: finish setup 05:12:28 (1713431548) === == sanity-lfsck test complete, duration 1908 sec ========= 05:12:28 (1713431548) === sanity-lfsck: start cleanup 05:12:28 (1713431548) === === sanity-lfsck: finish cleanup 05:12:28 (1713431548) ===