Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when trying to remove files from _one_ of my datasets #16939

Open
arzeth opened this issue Jan 9, 2025 · 1 comment
Open

Panic when trying to remove files from _one_ of my datasets #16939

arzeth opened this issue Jan 9, 2025 · 1 comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@arzeth
Copy link

arzeth commented Jan 9, 2025

System information

Type Version/Name
Distribution Name Arch Linux
Kernel Version linux-xanmod-edge 6.12.5-1 PREEMPT**_RT**
Architecture x86_64
OpenZFS Version 2.2.7
CPU Ryzen 5 2600
RAM DDR4 3200MHz, 32GB*2, no ECC
MB MSI B450M PRO-M2 MAX

grub options: nospectre_v2 split_lock_mitigate=0 zfs.spa_slop_shift=6 zfs.zfs_dmu_offset_next_sync=0 zfs.zfs_txg_timeout=20 zfs.zio_taskq_batch_pct=42 zfs.zfs_arc_max=1610612736 nvme.max_host_mem_size_mb=256 scsi_mod.use_blk_mq=1 (I read somewhere zfs_dmu_offset_next_sync=0 supposedly decreases the chance of corruption; txg_timeout is because the constant noise of my Seagate Skyhawk HDD was unbearable; zio_taskq_batch_pct is to avoid my PC becoming unresponsive during heavy writing to datasets with compression=zstd-19; I limited ARC size to 1.5GiB to save RAM)

I've been using this config (I mean an RT kernel) since 2024-12-15, i.e. for 23 days. It worked until today, except that waking up didn't succeed in most cases (complete lock-up). Probably nvidia's driver at fault (since they don't support RT officially)?... no idea. Lock-ups occurred also in random times but after many hours of uptime. A few times, immediately after some of successful wake-ups, I saw ZFS-related traces mentioning something txg, unlock; but I ignored them......

My pools:

  • NVMe SSD: one pool named zroot;
  • HDD: three pools.

all have ashift 12, and properly aligned partitions.

So, 2 days ago I was seeding torrents (using files on the HDD), and then I decided to load up an LLM (8 GB .gguf file on an unencrypted partition on the SSD; llama.cpp), and during the loading, my PC suddenly became fully unresponsive (even no SSH and alt+print+b). At that moment what was happening:

  • many random reads on an HDD (many encrypted datasets with compression) by qBittorrent
  • one sequential read of a big file (with mmaping) on an SSD (pool zroot, unencrypted dataset zroot/data/m/c/rseq without compression) by llama.cpp
  • small writes on an SSD (pool zroot, encrypted dataset zroot/data/home) by qBittorrent (metadata), Firefox.
  • all while also using a non-stock RT kernel.

I found the timing of the lock-up suspicious, so today I decided to repeat the same scenario. This time the LLM did get fully loaded (in ~28 secs), but... 5 seconds later my PC became unresponsive again (it's like there is some connection to zfs.zfs_txg_timeout=20).

I rebooted my PC, selected the stock LTS kernel, then I saw:

Arch Linux 6.6.68-1-lts (tty1)

arzeth-old pc login: arzeth (automatic login)
Last login Wed Jan  8 10:37:02 on tty1
[   65.596542] VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 5)
[   65.596573] PANIC at zfs_dir.c:464:zfs_unlinked_add()

Afterwards, I booted from a LiveCD (even earlier kernel: Linux sysrescue 6.1.53-1-lts #1 SMP PREEMPT_DYNAMIC Wed, 13 Sep 2023 09:32:00 +0000 x86_64 GNU/Linux), installed ZFS 2.2.7 (same latest version), mounted all pools, ran zpool scrub zroot, then zpool status zroot:

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:12:38 with 0 errors on Wed Jan  8 15:19:16 2025
config:

	NAME         STATE     READ WRITE CKSUM
	zroot        ONLINE       0     0     0
	  nvme0n1p4  ONLINE       0     0     0

errors: No known data errors
[root@sysrescue /tmp]#  smartctl -a /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.53-1-lts] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       m.2 Smartbuy PS5013-2280T 1024GB
Serial Number:                      296E079B18FC00010017
Firmware Version:                   EDFM00E3
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Total NVM Capacity:                 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size:     4096
Namespace 1 IEEE EUI-64:            6479a7 2ae2673137
Local Time is:                      Wed Jan  8 20:16:53 2025 UTC
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x001f):   Security Format Frmw_DL NS_Mngmt Self_Test
Optional NVM Commands (0x005e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     68 Celsius
Critical Comp. Temp. Threshold:     70 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     4.50W       -        -    0  0  0  0        0       0
 1 +     2.70W       -        -    1  1  1  1        0       0
 2 +     2.16W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3     1000    1000
 4 -   0.0020W       -        -    4  4  4  4     5000   60000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -     512       0         1
 1 +    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        25 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    11%
Data Units Read:                    351,607,959 [180 TB]
Data Units Written:                 157,926,176 [80.8 TB]
Host Read Commands:                 3,239,887,842
Host Write Commands:                5,191,310,103
Controller Busy Time:               92,845
Power Cycles:                       1,910
Power On Hours:                     27,620
Unsafe Shutdowns:                   201
Media and Data Integrity Errors:    0
Error Information Log Entries:      351,361
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 16 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0     351361     0  0x0014  0x4005  0x028            0     0     -  Invalid Field in Command

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

The bad dataset:

[root@sysrescue /tmp/x]#  zfs get all zroot/data/home
NAME             PROPERTY              VALUE                  SOURCE
zroot/data/home  type                  filesystem             -
zroot/data/home  creation              Sun Nov 28 16:27 2021  -
zroot/data/home  used                  76.0G                  -
zroot/data/home  available             16.9G                  -
zroot/data/home  referenced            75.0G                  -
zroot/data/home  compressratio         1.47x                  -
zroot/data/home  mounted               yes                    -
zroot/data/home  quota                 none                   default
zroot/data/home  reservation           none                   default
zroot/data/home  recordsize            256K                   local
zroot/data/home  mountpoint            /home                  local
zroot/data/home  sharenfs              off                    default
zroot/data/home  checksum              on                     default
zroot/data/home  compression           zstd-17                local
zroot/data/home  atime                 off                    inherited from zroot/data
zroot/data/home  devices               off                    inherited from zroot
zroot/data/home  exec                  on                     default
zroot/data/home  setuid                on                     default
zroot/data/home  readonly              off                    default
zroot/data/home  zoned                 off                    default
zroot/data/home  snapdir               hidden                 default
zroot/data/home  aclmode               discard                default
zroot/data/home  aclinherit            restricted             default
zroot/data/home  createtxg             13                     -
zroot/data/home  canmount              on                     default
zroot/data/home  xattr                 sa                     inherited from zroot
zroot/data/home  copies                1                      default
zroot/data/home  version               5                      -
zroot/data/home  utf8only              on                     -
zroot/data/home  normalization         formD                  -
zroot/data/home  casesensitivity       sensitive              -
zroot/data/home  vscan                 off                    default
zroot/data/home  nbmand                off                    default
zroot/data/home  sharesmb              off                    default
zroot/data/home  refquota              none                   default
zroot/data/home  refreservation        none                   default
zroot/data/home  guid                  6560759384699375364    -
zroot/data/home  primarycache          all                    default
zroot/data/home  secondarycache        all                    default
zroot/data/home  usedbysnapshots       0B                     -
zroot/data/home  usedbydataset         75.0G                  -
zroot/data/home  usedbychildren        933M                   -
zroot/data/home  usedbyrefreservation  0B                     -
zroot/data/home  logbias               latency                default
zroot/data/home  objsetid              77                     -
zroot/data/home  dedup                 off                    local
zroot/data/home  mlslabel              none                   default
zroot/data/home  sync                  standard               default
zroot/data/home  dnodesize             legacy                 inherited from zroot
zroot/data/home  refcompressratio      1.47x                  -
zroot/data/home  written               75.0G                  -
zroot/data/home  logicalused           108G                   -
zroot/data/home  logicalreferenced     107G                   -
zroot/data/home  volmode               default                default
zroot/data/home  filesystem_limit      none                   default
zroot/data/home  snapshot_limit        none                   default
zroot/data/home  filesystem_count      none                   default
zroot/data/home  snapshot_count        none                   default
zroot/data/home  snapdev               hidden                 default
zroot/data/home  acltype               posix                  inherited from zroot
zroot/data/home  context               none                   default
zroot/data/home  fscontext             none                   default
zroot/data/home  defcontext            none                   default
zroot/data/home  rootcontext           none                   default
zroot/data/home  relatime              on                     inherited from zroot
zroot/data/home  redundant_metadata    all                    default
zroot/data/home  overlay               on                     default
zroot/data/home  encryption            aes-256-gcm            -
zroot/data/home  keylocation           none                   default
zroot/data/home  keyformat             passphrase             -
zroot/data/home  pbkdf2iters           350000                 -
zroot/data/home  encryptionroot        zroot                  -
zroot/data/home  keystatus             available              -
zroot/data/home  special_small_blocks  0                      default
zroot/data/home  prefetch              all                    default

Its pool:

[root@sysrescue /tmp/x]#  zpool get all zroot
NAME   PROPERTY                       VALUE                          SOURCE
zroot  size                           887G                           -
zroot  capacity                       94%                            -
zroot  altroot                        -                              default
zroot  health                         ONLINE                         -
zroot  guid                           5900436512089678044            -
zroot  version                        -                              default
zroot  bootfs                         zroot/ROOT/default             local
zroot  delegation                     on                             default
zroot  autoreplace                    off                            default
zroot  cachefile                      -                              default
zroot  failmode                       wait                           default
zroot  listsnapshots                  off                            default
zroot  autoexpand                     off                            default
zroot  dedupratio                     1.00x                          -
zroot  free                           44.5G                          -
zroot  allocated                      842G                           -
zroot  readonly                       off                            -
zroot  ashift                         12                             local
zroot  comment                        -                              default
zroot  expandsize                     -                              -
zroot  freeing                        0                              -
zroot  fragmentation                  58%                            -
zroot  leaked                         0                              -
zroot  multihost                      off                            default
zroot  checkpoint                     -                              -
zroot  load_guid                      18281612753199259114           -
zroot  autotrim                       off                            default
zroot  compatibility                  off                            default
zroot  bcloneused                     2.24M                          -
zroot  bclonesaved                    2.24M                          -
zroot  bcloneratio                    2.00x                          -
zroot  feature@async_destroy          enabled                        local
zroot  feature@empty_bpobj            active                         local
zroot  feature@lz4_compress           active                         local
zroot  feature@multi_vdev_crash_dump  enabled                        local
zroot  feature@spacemap_histogram     active                         local
zroot  feature@enabled_txg            active                         local
zroot  feature@hole_birth             active                         local
zroot  feature@extensible_dataset     active                         local
zroot  feature@embedded_data          active                         local
zroot  feature@bookmarks              enabled                        local
zroot  feature@filesystem_limits      enabled                        local
zroot  feature@large_blocks           active                         local
zroot  feature@large_dnode            enabled                        local
zroot  feature@sha512                 enabled                        local
zroot  feature@skein                  enabled                        local
zroot  feature@edonr                  enabled                        local
zroot  feature@userobj_accounting     active                         local
zroot  feature@encryption             active                         local
zroot  feature@project_quota          active                         local
zroot  feature@device_removal         enabled                        local
zroot  feature@obsolete_counts        enabled                        local
zroot  feature@zpool_checkpoint       enabled                        local
zroot  feature@spacemap_v2            active                         local
zroot  feature@allocation_classes     enabled                        local
zroot  feature@resilver_defer         enabled                        local
zroot  feature@bookmark_v2            enabled                        local
zroot  feature@redaction_bookmarks    enabled                        local
zroot  feature@redacted_datasets      enabled                        local
zroot  feature@bookmark_written       enabled                        local
zroot  feature@log_spacemap           active                         local
zroot  feature@livelist               enabled                        local
zroot  feature@device_rebuild         enabled                        local
zroot  feature@zstd_compress          active                         local
zroot  feature@draid                  enabled                        local
zroot  feature@zilsaxattr             active                         local
zroot  feature@head_errlog            active                         local
zroot  feature@blake3                 enabled                        local
zroot  feature@block_cloning          active                         local
zroot  feature@vdev_zaps_v2           active                         local

Then I backed up this dataset into another dataset... on the same pool... (tar -I 'zstd -15 --long -T12' /m/el/r/home.tar.zst /home) successfully.

Then I tried to use zsh under my user, but zsh tried to overwrite a file using mv, the result is bad:
ps aux: 1000 40360 0.0 0.0 26600 764 pts/7 D+ 15:34 0:00 mv -f /home/arzeth/.zcompdump-sysrescue-5.9.0.1-dev.sysrescue.59 /home/arzeth/.zcompdump-sysrescue-5.9.0.1-dev

[ 7207.536498] VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 5)
[ 7207.536507] PANIC at zfs_dir.c:464:zfs_unlinked_add()
[ 7207.536511] Showing stack for process 40360
[ 7207.536514] CPU: 9 PID: 40360 Comm: mv Tainted: P        W  OE      6.1.53-1-lts #1 3321f1751995b4e489a9f363253659a863626916
[ 7207.536520] Hardware name: Micro-Star International Co., Ltd. MS-7B84/B450M PRO-M2 MAX (MS-7B84), BIOS A.I0 04/27/2023
[ 7207.536523] Call Trace:
[ 7207.536526]  <TASK>
[ 7207.536530]  dump_stack_lvl+0x48/0x60
[ 7207.536540]  spl_panic+0xf4/0x10c [spl 45e036db99f8bb7928be41aea061122d23d0d2f4]
[ 7207.536570]  ? zap_add_int+0x86/0xb0 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.536825]  zfs_unlinked_add+0x67/0x70 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.537068]  zfs_link_destroy+0x3bc/0x440 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.537308]  ? dmu_buf_will_dirty_impl+0x154/0x210 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.537548]  zfs_rename+0x10e8/0x1730 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.537797]  ? __legitimize_path+0x27/0x60
[ 7207.537806]  zpl_rename2+0xa7/0x130 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.538043]  vfs_rename+0xa69/0xc10
[ 7207.538052]  ? do_renameat2+0x52d/0x5a0
[ 7207.538057]  do_renameat2+0x52d/0x5a0
[ 7207.538067]  __x64_sys_renameat2+0x4f/0x60
[ 7207.538073]  do_syscall_64+0x60/0x90
[ 7207.538078]  ? exc_page_fault+0x7c/0x180
[ 7207.538084]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[ 7207.538090] RIP: 0033:0x7f091902ac8e
[ 7207.538115] Code: 80 18 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 3c 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 49
[ 7207.538118] RSP: 002b:00007ffdd186cb98 EFLAGS: 00000206 ORIG_RAX: 000000000000013c
[ 7207.538124] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f091902ac8e
[ 7207.538127] RDX: 00000000ffffff9c RSI: 00007ffdd186dc64 RDI: 00000000ffffff9c
[ 7207.538130] RBP: 00007ffdd186cd10 R08: 0000000000000000 R09: 0000000000000000
[ 7207.538132] R10: 00007ffdd186dc9f R11: 0000000000000206 R12: 00007ffdd186dc9f
[ 7207.538134] R13: 00007ffdd186dc64 R14: 00000000ffffff9c R15: 00000000ffffff9c
[ 7207.538141]  </TASK>

This panic causes any process (even ls) trying to use this dataset to become a zombie process.

Then I decided to reboot (still using the livecd), because I wanted to know whether a panic would occur when deleting an old file.

So I tried rm a 3-year-old file (12 bytes) on this dataset. The error in dmesg is similar:

[ 2637.661779] VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 5)
[ 2637.661786] PANIC at zfs_dir.c:464:zfs_unlinked_add()
[ 2637.661790] Showing stack for process 4398
[ 2637.661793] CPU: 0 PID: 4398 Comm: rm Tainted: P        W  OE      6.1.53-1-lts #1 3321f1751995b4e489a9f363253659a863626916
[ 2637.661799] Hardware name: Micro-Star International Co., Ltd. MS-7B84/B450M PRO-M2 MAX (MS-7B84), BIOS A.I0 04/27/2023
[ 2637.661802] Call Trace:
[ 2637.661806]  <TASK>
[ 2637.661811]  dump_stack_lvl+0x48/0x60
[ 2637.661822]  spl_panic+0xf4/0x10c [spl 45e036db99f8bb7928be41aea061122d23d0d2f4]
[ 2637.661851]  ? zap_add_int+0x86/0xb0 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 2637.662105]  zfs_unlinked_add+0x67/0x70 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 2637.662346]  zfs_remove+0x7f6/0xa20 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 2637.662587]  zpl_unlink+0x64/0xb0 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 2637.662823]  vfs_unlink+0x112/0x280
[ 2637.662830]  do_unlinkat+0x148/0x320
[ 2637.662838]  __x64_sys_unlinkat+0x37/0x70
[ 2637.662843]  do_syscall_64+0x60/0x90
[ 2637.662849]  ? handle_mm_fault+0xdf/0x2d0
[ 2637.662854]  ? syscall_exit_to_user_mode+0x2b/0x40
[ 2637.662860]  ? do_syscall_64+0x6c/0x90
[ 2637.662865]  ? exc_page_fault+0x7c/0x180
[ 2637.662871]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[ 2637.662877] RIP: 0033:0x7fa9f0e5d52b
[ 2637.662898] Code: 77 05 c3 0f 1f 40 00 48 8b 15 01 98 13 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 97 13 00 f7 d8 64 89 01 48
[ 2637.662901] RSP: 002b:00007fff259ee6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[ 2637.662906] RAX: ffffffffffffffda RBX: 000055e40c2486c0 RCX: 00007fa9f0e5d52b
[ 2637.662909] RDX: 0000000000000000 RSI: 000055e40c2474a0 RDI: 00000000ffffff9c
[ 2637.662912] RBP: 000055e40c247410 R08: 000055e40c2474a0 R09: 00007fff259ee7ec
[ 2637.662914] R10: 000055e40c249630 R11: 0000000000000246 R12: 0000000000000000
[ 2637.662916] R13: 00007fff259ee7f0 R14: 0000000000000000 R15: 000055e40c2486c0
[ 2637.662922]  </TASK>

BTW, even after a panic occurs, I can still use other datasets in this pool (without rebooting).

I had 2 datasets inside this dataset, so I made another experiment: despite ZFS already having had a panic, I tried to move it out of ..../home/ (I am not talking about mountpoint): zfs rename zroot/data/home{/,_}arzeth_dev, and got a very similar message in dmesg. After the panic, I didn't reboot, and I ran zfs list which showed that the dataset renaming succeeded (even though zfs rename .... became a zombie process).

Then I tried to rename another inner (child) dataset (without rebooting), but this time I saw no changes in zfs list, so I've rebooted, ran zfs list again just in case (the first renaming did indeed succeed). I haven't tried yet to zfs rename this remaining inner dataset again.


Panic does not occur when creating folders and moving files into them (in this corrupted dataset).


So what should I do?

  1. Try zfs destroy zroot/data/home, but what if this irreversibly corrupts the pool due to a panic in the middle of the process?
  2. Recreate the pool (back up everything, zpool destroy zroot, zpool create, zfs create, cp, cp, cp)
  3. Something else?

Maybe also ban PREEMPT_RT in configure.ac, just like recent 410287f banned unsupported kernel versions?

@arzeth arzeth added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 9, 2025
@amotin
Copy link
Member

amotin commented Jan 9, 2025

I would start from memory check, considering it is non-ECC, to not cause more damage if it is the cause. Then make sure you have a backup, which is on your second option already. Then I would guess you should have a good chance to destroy only the specific dataset, since the ZAP it crashed on is dataset-specific, but it is difficult to be sure without knowing what's wrong about it exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants