All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* btrfs errors over NFS
@ 2017-10-12 17:44 Steve Leung
  2017-10-13  2:36 ` Steve Leung
  2017-10-13  9:14 ` Nikolay Borisov
  0 siblings, 2 replies; 5+ messages in thread
From: Steve Leung @ 2017-10-12 17:44 UTC (permalink / raw
  To: linux-btrfs


Hi list,

  TL;DR: ran into some btrfs errors and weird behaviour, but things generally seem 
  to work.  Just posting some details in case it helps devs or other users.

I've run into a btrfs error trying to do a -j8 build of android on a btrfs 
filesystem exported over NFSv3.  That in itself might be unwise, but should be 
legal.  :)  This is on 64-bit Arch, kernel 4.13.3, and the filesystem is RAID1 on 4 
devices, with nearly 4TiB unallocated.

The build itself failed when part of the build process noticed that a file got 
created with mode 000.  i.e. in "ls -l" the mode column looked like ?---------.

That much happened without any errors in dmesg, though the NFS client complained 
about some inodes having a mode of 0.

Then when I tried to delete those files, I got the error below:

[21476.546060] BTRFS error (device sdc1): err add delayed dir index item(index: 17) into the deletion tree of the delayed node(root id: 21780, inode id: 15668343, errno: -17)
[21476.546194] ------------[ cut here ]------------
[21476.546196] kernel BUG at fs/btrfs/delayed-inode.c:1552!
[21476.546231] invalid opcode: 0000 [#1] PREEMPT SMP
[21476.546258] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_filter tun coretemp kvm_intel kvm iTCO_wdt ppdev gpio_ich iTCO_vendor_support evdev i915 input_leds psmouse pcspkr irqbypass i2c_i801 snd_hda_codec_idt snd_hda_codec_generic mac_hid lpc_ich tpm_tis tpm_tis_core parport_pc tpm parport video drm_kms_helper drm syscopyarea winbond_cir sysfillrect sysimgblt rc_core led_class snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer shpchp snd button e1000e soundcore intel_agp intel_gtt acpi_cpufreq mei_me mei fb_sys_fops i2c_algo_bit ptp pps_core sch_fq_codel nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables crc32c_generic btrfs xor raid6_pq sr_mod cdrom
[21476.546623]  sd_mod hid_generic usbhid hid serio_raw atkbd libps2 uhci_hcd pata_it821x ahci libahci libata firewire_ohci scsi_mod firewire_core crc_itu_t i8042 serio ehci_pci ehci_hcd usbcore usb_common
[21476.546725] CPU: 0 PID: 984 Comm: rm Not tainted 4.13.3-1-ARCH #1
[21476.546756] Hardware name:                  /DG33TL, BIOS DPP3510J.86A.0572.2009.0715.2346 07/15/2009
[21476.546801] task: ffff943668131b80 task.stack: ffffaa3840da4000
[21476.546854] RIP: 0010:btrfs_delete_delayed_dir_index+0x24d/0x330 [btrfs]
[21476.546889] RSP: 0018:ffffaa3840da7d18 EFLAGS: 00010286
[21476.546918] RAX: 0000000000000000 RBX: ffff94367c074b90 RCX: 0000000000000000
[21476.546953] RDX: 0000000000000000 RSI: ffff9436abc0dc78 RDI: ffff9436abc0dc78
[21476.546989] RBP: ffffaa3840da7d88 R08: 0000000000000323 R09: 0000000000000000
[21476.547025] R10: ffffaa3840da7bd8 R11: 00000000ffffffff R12: ffff943583503100
[21476.547060] R13: ffff94367c074bd8 R14: ffff94367c074bd0 R15: ffff943583503218
[21476.547096] FS:  00007f57da0f6500(0000) GS:ffff9436abc00000(0000) knlGS:0000000000000000
[21476.547138] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21476.547167] CR2: 00007f57d9c809c0 CR3: 000000005e39b000 CR4: 00000000000006f0
[21476.547203] Call Trace:
[21476.547235]  __btrfs_unlink_inode+0x1bc/0x550 [btrfs]
[21476.547278]  btrfs_unlink_inode+0x1c/0x50 [btrfs]
[21476.547317]  btrfs_unlink+0x88/0xe0 [btrfs]
[21476.547343]  vfs_unlink+0x111/0x1c0
[21476.547365]  ? __lookup_hash+0x22/0xa0
[21476.547389]  do_unlinkat+0x2b1/0x310
[21476.547412]  SyS_unlinkat+0x1b/0x30
[21476.547434]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[21476.547460] RIP: 0033:0x7f57d9c303d7
[21476.547481] RSP: 002b:00007fff4bc1a6b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[21476.547520] RAX: ffffffffffffffda RBX: 00005644fee69540 RCX: 00007f57d9c303d7
[21476.547556] RDX: 0000000000000000 RSI: 00005644fee68310 RDI: 00000000ffffff9c
[21476.547591] RBP: 00005644fee69648 R08: 0000000000000003 R09: 0000000000000000
[21476.547627] R10: 0000000000000100 R11: 0000000000000246 R12: 00005644fee68280
[21476.547662] R13: 00007fff4bc1a7f0 R14: 00005644fd37094f R15: 0000000000000000
[21476.547700] Code: ff ff ff 48 8b 43 10 4c 8b 03 41 b9 ef ff ff ff 48 8b 55 b0 48 8b 7d a8 48 c7 c6 b0 99 3f c0 48 8b 88 38 03 00 00 e8 f3 0d f7 ff <0f> 0b 48 8b 4d a8 49 8b 7e 38 8b 81 70 10 00 00 48 8d b1 c8 01 
[21476.547840] RIP: btrfs_delete_delayed_dir_index+0x24d/0x330 [btrfs] RSP: ffffaa3840da7d18
[21476.549333] ---[ end trace 41039f5467d88882 ]---

After that, a bunch of processes started hanging and I eventually stopped being 
able to do useful things on the machine, so I hard rebooted it.

It booted back up fine.  I tried scrubbing the filesystem at that point and scrub 
returned with no errors.  I then noticed 4.13.5 had some btrfs fixes so I upgraded 
to that.  Now, I still have one bogus file in a directory - it's in the directory 
listing but nothing can stat the file to change it or remove it.  i.e.:

  > ls -l
  /bin/ls: cannot access 'tab_unselected_pressed_holo.9.png': No such file or directory
  total 0
  -????????? ? ? ? ?            ? tab_unselected_pressed_holo.9.png

  > stat tab_unselected_pressed_holo.9.png 
  stat: cannot stat 'tab_unselected_pressed_holo.9.png': No such file or directory

  > rm tab_unselected_pressed_holo.9.png 
  rm: cannot remove 'tab_unselected_pressed_holo.9.png': No such file or directory

  > cp /dev/null tab_unselected_pressed_holo.9.png 
  > ls -l
  total 0
  -rw-r--r-- 1 steve steve 0 Oct 12 10:43 tab_unselected_pressed_holo.9.png
  -rw-r--r-- 1 steve steve 0 Oct 12 10:43 tab_unselected_pressed_holo.9.png

  > rm tab_unselected_pressed_holo.9.png 

  > ls -l
  /bin/ls: cannot access 'tab_unselected_pressed_holo.9.png': No such file or directory
  total 0
  -????????? ? ? ? ?            ? tab_unselected_pressed_holo.9.png

I'm guessing a btrfs check would fix this but haven't gotten around to running it 
yet.

After moving aside the dir with the bogus file, I was able to complete the Android 
build successfully, but noticed these warnings in the log afterward:

[11319.762849] ------------[ cut here ]------------
[11319.762883] WARNING: CPU: 1 PID: 451 at fs/btrfs/ctree.h:1559 btrfs_update_device+0x1be/0x1d0 [btrfs]
[11319.762884] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_filter tun coretemp kvm_intel iTCO_wdt iTCO_vendor_support i915 gpio_ich ppdev kvm video i2c_algo_bit drm_kms_helper snd_hda_codec_idt snd_hda_codec_generic irqbypass drm syscopyarea psmouse sysfillrect snd_hda_intel pcspkr i2c_i801 e1000e mei_me sysimgblt snd_hda_codec evdev fb_sys_fops input_leds mei lpc_ich snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore parport_pc parport mac_hid ptp pps_core shpchp winbond_cir rc_core intel_agp intel_gtt led_class agpgart tpm_tis tpm_tis_core tpm button acpi_cpufreq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables crc32c_generic btrfs xor raid6_pq hid_generic
[11319.762937]  sr_mod cdrom sd_mod usbhid hid serio_raw atkbd libps2 uhci_hcd firewire_ohci pata_it821x ahci libahci libata firewire_core scsi_mod crc_itu_t i8042 serio ehci_pci ehci_hcd usbcore usb_common
[11319.762957] CPU: 1 PID: 451 Comm: nfsd Not tainted 4.13.5-1-ARCH #1
[11319.762959] Hardware name:                  /DG33TL, BIOS DPP3510J.86A.0572.2009.0715.2346 07/15/2009
[11319.762960] task: ffff90dce9e5b700 task.stack: ffff9e1fc1448000
[11319.762975] RIP: 0010:btrfs_update_device+0x1be/0x1d0 [btrfs]
[11319.762977] RSP: 0018:ffff9e1fc144b468 EFLAGS: 00010206
[11319.762978] RAX: 0000000000000fff RBX: ffff90dd227d7a80 RCX: 000003a3816d1e00
[11319.762980] RDX: 0000000000000004 RSI: 0000000000007e98 RDI: ffff90dc990fbb58
[11319.762981] RBP: ffff9e1fc144b4b0 R08: 0000000000007e9c R09: ffff9e1fc144b420
[11319.762982] R10: 0000000000001000 R11: 0000000000000007 R12: ffff90dd21f77c00
[11319.762984] R13: 0000000000000000 R14: 0000000000007e78 R15: ffff90dc990fbb58
[11319.762985] FS:  0000000000000000(0000) GS:ffff90dd2bc80000(0000) knlGS:0000000000000000
[11319.762987] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11319.762988] CR2: 00005572c6c61c90 CR3: 000000016741a000 CR4: 00000000000006e0
[11319.762990] Call Trace:
[11319.763008]  btrfs_finish_chunk_alloc+0x136/0x4f0 [btrfs]
[11319.763020]  ? btrfs_free_path.part.31+0x21/0x30 [btrfs]
[11319.763033]  btrfs_create_pending_block_groups+0x135/0x250 [btrfs]
[11319.763045]  ? btrfs_create_pending_block_groups+0x135/0x250 [btrfs]
[11319.763060]  __btrfs_end_transaction+0x8e/0x2e0 [btrfs]
[11319.763231]  btrfs_end_transaction+0x10/0x20 [btrfs]
[11319.763243]  find_free_extent+0xc05/0x1020 [btrfs]
[11319.763257]  btrfs_reserve_extent+0x9b/0x190 [btrfs]
[11319.763272]  cow_file_range.isra.67+0x13a/0x420 [btrfs]
[11319.763287]  run_delalloc_range+0x359/0x380 [btrfs]
[11319.763302]  ? find_lock_delalloc_range.constprop.56+0x1cf/0x1f0 [btrfs]
[11319.763317]  writepage_delalloc.isra.45+0xbc/0x170 [btrfs]
[11319.763332]  __extent_writepage+0x17c/0x320 [btrfs]
[11319.763348]  extent_write_cache_pages.constprop.53+0x1cd/0x460 [btrfs]
[11319.763364]  extent_writepages+0x5d/0x90 [btrfs]
[11319.763378]  ? btrfs_releasepage+0x20/0x20 [btrfs]
[11319.763392]  btrfs_writepages+0x28/0x30 [btrfs]
[11319.763396]  do_writepages+0x48/0xd0
[11319.763400]  ? exportfs_decode_fh+0x100/0x390
[11319.763404]  __filemap_fdatawrite_range+0xc1/0x100
[11319.763405]  ? __filemap_fdatawrite_range+0xc1/0x100
[11319.763408]  ? iter_div_u64_rem+0x20/0x20
[11319.763411]  filemap_fdatawrite_range+0x13/0x20
[11319.763425]  btrfs_fdatawrite_range+0x20/0x50 [btrfs]
[11319.763440]  start_ordered_ops+0x19/0x30 [btrfs]
[11319.763454]  btrfs_sync_file+0x76/0x420 [btrfs]
[11319.763457]  ? get_empty_filp+0xc9/0x1b0
[11319.763461]  vfs_fsync_range+0x4b/0xb0
[11319.763473]  nfsd_commit+0xb1/0xe0 [nfsd]
[11319.763481]  nfsd3_proc_commit+0x6e/0xd0 [nfsd]
[11319.763487]  nfsd_dispatch+0xfb/0x230 [nfsd]
[11319.763499]  svc_process_common+0x4a6/0x680 [sunrpc]
[11319.763507]  svc_process+0xec/0x1c0 [sunrpc]
[11319.763513]  nfsd+0xf4/0x160 [nfsd]
[11319.763516]  kthread+0x125/0x140
[11319.763522]  ? nfsd_destroy+0x70/0x70 [nfsd]
[11319.763524]  ? kthread_create_on_node+0x70/0x70
[11319.763528]  ret_from_fork+0x25/0x30
[11319.763530] Code: 4c 89 ff 45 31 c0 ba 10 00 00 00 4c 89 f6 e8 fa 21 ff ff 4c 89 ff e8 62 f3 fc ff e9 d3 fe ff ff 41 bd f4 ff ff ff e9 d0 fe ff ff <0f> ff eb b6 e8 d9 e3 c5 d0 66 0f 1f 84 00 00 00 00 00 66 66 66 
[11319.763568] ---[ end trace 40295351501208ff ]---

[11330.152285] ------------[ cut here ]------------
[11330.152319] WARNING: CPU: 0 PID: 203 at fs/btrfs/ctree.h:1559 btrfs_update_device+0x1be/0x1d0 [btrfs]
[11330.152320] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_filter tun coretemp kvm_intel iTCO_wdt iTCO_vendor_support i915 gpio_ich ppdev kvm video i2c_algo_bit drm_kms_helper snd_hda_codec_idt snd_hda_codec_generic irqbypass drm syscopyarea psmouse sysfillrect snd_hda_intel pcspkr i2c_i801 e1000e mei_me sysimgblt snd_hda_codec evdev fb_sys_fops input_leds mei lpc_ich snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore parport_pc parport mac_hid ptp pps_core shpchp winbond_cir rc_core intel_agp intel_gtt led_class agpgart tpm_tis tpm_tis_core tpm button acpi_cpufreq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables crc32c_generic btrfs xor raid6_pq hid_generic
[11330.152389]  sr_mod cdrom sd_mod usbhid hid serio_raw atkbd libps2 uhci_hcd firewire_ohci pata_it821x ahci libahci libata firewire_core scsi_mod crc_itu_t i8042 serio ehci_pci ehci_hcd usbcore usb_common
[11330.152409] CPU: 0 PID: 203 Comm: btrfs-cleaner Tainted: G        W       4.13.5-1-ARCH #1
[11330.152410] Hardware name:                  /DG33TL, BIOS DPP3510J.86A.0572.2009.0715.2346 07/15/2009
[11330.152411] task: ffff90dd221ec4c0 task.stack: ffff9e1fc13f4000
[11330.152427] RIP: 0010:btrfs_update_device+0x1be/0x1d0 [btrfs]
[11330.152428] RSP: 0018:ffff9e1fc13f7d28 EFLAGS: 00010206
[11330.152430] RAX: 0000000000000fff RBX: ffff90dd227d7c40 RCX: 000003a3816d1e00
[11330.152431] RDX: 0000000000000004 RSI: 0000000000007e98 RDI: ffff90dcf76e2578
[11330.152432] RBP: ffff9e1fc13f7d70 R08: 0000000000007e9c R09: ffff9e1fc13f7ce0
[11330.152434] R10: 0000000000001000 R11: 0000000000000007 R12: ffff90dd21f77c00
[11330.152435] R13: 0000000000000000 R14: 0000000000007e78 R15: ffff90dcf76e2578
[11330.152437] FS:  0000000000000000(0000) GS:ffff90dd2bc00000(0000) knlGS:0000000000000000
[11330.152438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11330.152440] CR2: 00007fec55264d30 CR3: 00000001703bf000 CR4: 00000000000006f0
[11330.152441] Call Trace:
[11330.152459]  btrfs_remove_chunk+0x365/0x890 [btrfs]
[11330.152474]  btrfs_delete_unused_bgs+0x30d/0x3b0 [btrfs]
[11330.152488]  cleaner_kthread+0x165/0x180 [btrfs]
[11330.152493]  kthread+0x125/0x140
[11330.152507]  ? __btree_submit_bio_start+0x20/0x20 [btrfs]
[11330.152513]  ? kthread_create_on_node+0x70/0x70
[11330.152516]  ret_from_fork+0x25/0x30
[11330.152518] Code: 4c 89 ff 45 31 c0 ba 10 00 00 00 4c 89 f6 e8 fa 21 ff ff 4c 89 ff e8 62 f3 fc ff e9 d3 fe ff ff 41 bd f4 ff ff ff e9 d0 fe ff ff <0f> ff eb b6 e8 d9 e3 c5 d0 66 0f 1f 84 00 00 00 00 00 66 66 66 
[11330.152556] ---[ end trace 4029535150120900 ]---

That first warning appeared 13 times in the logs, and the second one appeared only 
once.

I haven't rebooted the machine since observing these warnings.  Things seem to be 
working ok.  Please let me know if there's anything interesting to be learned from 
it.

Thanks!

Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs errors over NFS
  2017-10-12 17:44 btrfs errors over NFS Steve Leung
@ 2017-10-13  2:36 ` Steve Leung
  2017-10-13 16:39   ` Peter Grandi
  2017-10-13  9:14 ` Nikolay Borisov
  1 sibling, 1 reply; 5+ messages in thread
From: Steve Leung @ 2017-10-13  2:36 UTC (permalink / raw
  To: linux-btrfs

On 10/12/2017 11:44 AM, Steve Leung wrote:
> Hi list,
> 
>    TL;DR: ran into some btrfs errors and weird behaviour, but things generally seem
>    to work.  Just posting some details in case it helps devs or other users.
> 
> I've run into a btrfs error trying to do a -j8 build of android on a btrfs
> filesystem exported over NFSv3.  That in itself might be unwise, but should be
> legal.  :)  This is on 64-bit Arch, kernel 4.13.3, and the filesystem is RAID1 on 4
> devices, with nearly 4TiB unallocated.
> 
> The build itself failed when part of the build process noticed that a file got
> created with mode 000.  i.e. in "ls -l" the mode column looked like ?---------.
> 
....
> 
> It booted back up fine.  I tried scrubbing the filesystem at that point and scrub
> returned with no errors.  I then noticed 4.13.5 had some btrfs fixes so I upgraded
> to that.  Now, I still have one bogus file in a directory - it's in the directory
> listing but nothing can stat the file to change it or remove it.  i.e.:
> 
>    > ls -l
>    /bin/ls: cannot access 'tab_unselected_pressed_holo.9.png': No such file or directory
>    total 0
>    -????????? ? ? ? ?            ? tab_unselected_pressed_holo.9.png
> 
>    > stat tab_unselected_pressed_holo.9.png
>    stat: cannot stat 'tab_unselected_pressed_holo.9.png': No such file or directory
> 
>    > rm tab_unselected_pressed_holo.9.png
>    rm: cannot remove 'tab_unselected_pressed_holo.9.png': No such file or directory
> 
>    > cp /dev/null tab_unselected_pressed_holo.9.png
>    > ls -l
>    total 0
>    -rw-r--r-- 1 steve steve 0 Oct 12 10:43 tab_unselected_pressed_holo.9.png
>    -rw-r--r-- 1 steve steve 0 Oct 12 10:43 tab_unselected_pressed_holo.9.png
> 
>    > rm tab_unselected_pressed_holo.9.png
> 
>    > ls -l
>    /bin/ls: cannot access 'tab_unselected_pressed_holo.9.png': No such file or directory
>    total 0
>    -????????? ? ? ? ?            ? tab_unselected_pressed_holo.9.png
> 
> I'm guessing a btrfs check would fix this but haven't gotten around to running it
> yet.

Just looked through the archives and noticed this is something
Tomasz Chmielewski posted about a week or so ago.  But that thread 
didn't seem to come to a definite conclusion.

I'm also in the situation where it's not that easy for me to blow away 
the whole subvol that holds this file, but I can at least move its 
directory to somewhere unnoticeable.  Until I can get around to trying 
btrfs check anyway.

Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs errors over NFS
  2017-10-12 17:44 btrfs errors over NFS Steve Leung
  2017-10-13  2:36 ` Steve Leung
@ 2017-10-13  9:14 ` Nikolay Borisov
       [not found]   ` <e110503c-ae55-8f8c-1097-3ccbb7a4d35d@shaw.ca>
  1 sibling, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2017-10-13  9:14 UTC (permalink / raw
  To: Steve Leung, linux-btrfs



On 12.10.2017 20:44, Steve Leung wrote:
> Hi list,
> 
>   TL;DR: ran into some btrfs errors and weird behaviour, but things generally seem 
>   to work.  Just posting some details in case it helps devs or other users.
> 
> I've run into a btrfs error trying to do a -j8 build of android on a btrfs 
> filesystem exported over NFSv3.  That in itself might be unwise, but should be 
> legal.  :)  This is on 64-bit Arch, kernel 4.13.3, and the filesystem is RAID1 on 4 
> devices, with nearly 4TiB unallocated.
> 
> The build itself failed when part of the build process noticed that a file got 
> created with mode 000.  i.e. in "ls -l" the mode column looked like ?---------.
> 
> That much happened without any errors in dmesg, though the NFS client complained 
> about some inodes having a mode of 0.
> 
> Then when I tried to delete those files, I got the error below:
> 
> [21476.546060] BTRFS error (device sdc1): err add delayed dir index item(index: 17) into the deletion tree of the delayed node(root id: 21780, inode id: 15668343, errno: -17)
> [21476.546194] ------------[ cut here ]------------
> [21476.546196] kernel BUG at fs/btrfs/delayed-inode.c:1552!
> [21476.546231] invalid opcode: 0000 [#1] PREEMPT SMP
> [21476.546258] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_filter tun coretemp kvm_intel kvm iTCO_wdt ppdev gpio_ich iTCO_vendor_support evdev i915 input_leds psmouse pcspkr irqbypass i2c_i801 snd_hda_codec_idt snd_hda_codec_generic mac_hid lpc_ich tpm_tis tpm_tis_core parport_pc tpm parport video drm_kms_helper drm syscopyarea winbond_cir sysfillrect sysimgblt rc_core led_class snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer shpchp snd button e1000e soundcore intel_agp intel_gtt acpi_cpufreq mei_me mei fb_sys_fops i2c_algo_bit ptp pps_core sch_fq_codel nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables crc32c_generic btrfs xor raid6_pq sr_mod cdrom
> [21476.546623]  sd_mod hid_generic usbhid hid serio_raw atkbd libps2 uhci_hcd pata_it821x ahci libahci libata firewire_ohci scsi_mod firewire_core crc_itu_t i8042 serio ehci_pci ehci_hcd usbcore usb_common
> [21476.546725] CPU: 0 PID: 984 Comm: rm Not tainted 4.13.3-1-ARCH #1
> [21476.546756] Hardware name:                  /DG33TL, BIOS DPP3510J.86A.0572.2009.0715.2346 07/15/2009
> [21476.546801] task: ffff943668131b80 task.stack: ffffaa3840da4000
> [21476.546854] RIP: 0010:btrfs_delete_delayed_dir_index+0x24d/0x330 [btrfs]
> [21476.546889] RSP: 0018:ffffaa3840da7d18 EFLAGS: 00010286
> [21476.546918] RAX: 0000000000000000 RBX: ffff94367c074b90 RCX: 0000000000000000
> [21476.546953] RDX: 0000000000000000 RSI: ffff9436abc0dc78 RDI: ffff9436abc0dc78
> [21476.546989] RBP: ffffaa3840da7d88 R08: 0000000000000323 R09: 0000000000000000
> [21476.547025] R10: ffffaa3840da7bd8 R11: 00000000ffffffff R12: ffff943583503100
> [21476.547060] R13: ffff94367c074bd8 R14: ffff94367c074bd0 R15: ffff943583503218
> [21476.547096] FS:  00007f57da0f6500(0000) GS:ffff9436abc00000(0000) knlGS:0000000000000000
> [21476.547138] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [21476.547167] CR2: 00007f57d9c809c0 CR3: 000000005e39b000 CR4: 00000000000006f0
> [21476.547203] Call Trace:
> [21476.547235]  __btrfs_unlink_inode+0x1bc/0x550 [btrfs]
> [21476.547278]  btrfs_unlink_inode+0x1c/0x50 [btrfs]
> [21476.547317]  btrfs_unlink+0x88/0xe0 [btrfs]
> [21476.547343]  vfs_unlink+0x111/0x1c0
> [21476.547365]  ? __lookup_hash+0x22/0xa0
> [21476.547389]  do_unlinkat+0x2b1/0x310
> [21476.547412]  SyS_unlinkat+0x1b/0x30
> [21476.547434]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [21476.547460] RIP: 0033:0x7f57d9c303d7
> [21476.547481] RSP: 002b:00007fff4bc1a6b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
> [21476.547520] RAX: ffffffffffffffda RBX: 00005644fee69540 RCX: 00007f57d9c303d7
> [21476.547556] RDX: 0000000000000000 RSI: 00005644fee68310 RDI: 00000000ffffff9c
> [21476.547591] RBP: 00005644fee69648 R08: 0000000000000003 R09: 0000000000000000
> [21476.547627] R10: 0000000000000100 R11: 0000000000000246 R12: 00005644fee68280
> [21476.547662] R13: 00007fff4bc1a7f0 R14: 00005644fd37094f R15: 0000000000000000
> [21476.547700] Code: ff ff ff 48 8b 43 10 4c 8b 03 41 b9 ef ff ff ff 48 8b 55 b0 48 8b 7d a8 48 c7 c6 b0 99 3f c0 48 8b 88 38 03 00 00 e8 f3 0d f7 ff <0f> 0b 48 8b 4d a8 49 8b 7e 38 8b 81 70 10 00 00 48 8d b1 c8 01 
> [21476.547840] RIP: btrfs_delete_delayed_dir_index+0x24d/0x330 [btrfs] RSP: ffffaa3840da7d18
> [21476.549333] ---[ end trace 41039f5467d88882 ]---
> 

Well, the main thing here is that you hit the BUG() in btrfs_delete_delayed_dir_index. 
So this happened because you tried to delete a file, which was already staged for deletion. 

So which directory corresponds to inode 15668343, presumably the android build dir?

If you could print the output of the following commands that could help in narrowing down the problem: 

btrfs inspect-internal dump-tree /dev/sdc1 | grep -A3 "(15668343 DIR_INDEX"

That will print all the DIR_INDEX items of the directory, and I will be particularly 
interested in item with index 17. Which should print something like: 

	item 12 key (257 DIR_INDEX 17) itemoff 15624 itemsize 35
		location key (259 INODE_ITEM 0) type FILE
		namelen 5 datalen 0 name: file2

Your numbers will be different of course (apart from the index).

At which point you can get the inode item corresponding to this file : 

btrfs inspect-internal dump-tree /dev/sdc1 | grep -A4 "(259 INODE_ITEM 0) itemoff" 
where the stuff in the brackets should be the same as the contents of () following 
"location key" string. 

> After that, a bunch of processes started hanging and I eventually stopped being 
> able to do useful things on the machine, so I hard rebooted it.
> 
> It booted back up fine.  I tried scrubbing the filesystem at that point and scrub 
> returned with no errors.  I then noticed 4.13.5 had some btrfs fixes so I upgraded 
> to that.  Now, I still have one bogus file in a directory - it's in the directory 
> listing but nothing can stat the file to change it or remove it.  i.e.:
> 
>   > ls -l
>   /bin/ls: cannot access 'tab_unselected_pressed_holo.9.png': No such file or directory
>   total 0
>   -????????? ? ? ? ?            ? tab_unselected_pressed_holo.9.png
> 
>   > stat tab_unselected_pressed_holo.9.png 
>   stat: cannot stat 'tab_unselected_pressed_holo.9.png': No such file or directory
> 
>   > rm tab_unselected_pressed_holo.9.png 
>   rm: cannot remove 'tab_unselected_pressed_holo.9.png': No such file or directory
> 
>   > cp /dev/null tab_unselected_pressed_holo.9.png 
>   > ls -l
>   total 0
>   -rw-r--r-- 1 steve steve 0 Oct 12 10:43 tab_unselected_pressed_holo.9.png
>   -rw-r--r-- 1 steve steve 0 Oct 12 10:43 tab_unselected_pressed_holo.9.png
> 
>   > rm tab_unselected_pressed_holo.9.png 
> 
>   > ls -l
>   /bin/ls: cannot access 'tab_unselected_pressed_holo.9.png': No such file or directory
>   total 0
>   -????????? ? ? ? ?            ? tab_unselected_pressed_holo.9.png
> 
> I'm guessing a btrfs check would fix this but haven't gotten around to running it 
> yet.
> 
> After moving aside the dir with the bogus file, I was able to complete the Android 
> build successfully, but noticed these warnings in the log afterward:
> 
> [11319.762849] ------------[ cut here ]------------
> [11319.762883] WARNING: CPU: 1 PID: 451 at fs/btrfs/ctree.h:1559 btrfs_update_device+0x1be/0x1d0 [btrfs]
> [11319.762884] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_filter tun coretemp kvm_intel iTCO_wdt iTCO_vendor_support i915 gpio_ich ppdev kvm video i2c_algo_bit drm_kms_helper snd_hda_codec_idt snd_hda_codec_generic irqbypass drm syscopyarea psmouse sysfillrect snd_hda_intel pcspkr i2c_i801 e1000e mei_me sysimgblt snd_hda_codec evdev fb_sys_fops input_leds mei lpc_ich snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore parport_pc parport mac_hid ptp pps_core shpchp winbond_cir rc_core intel_agp intel_gtt led_class agpgart tpm_tis tpm_tis_core tpm button acpi_cpufreq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables crc32c_generic btrfs xor raid6_pq hid_generic
> [11319.762937]  sr_mod cdrom sd_mod usbhid hid serio_raw atkbd libps2 uhci_hcd firewire_ohci pata_it821x ahci libahci libata firewire_core scsi_mod crc_itu_t i8042 serio ehci_pci ehci_hcd usbcore usb_common
> [11319.762957] CPU: 1 PID: 451 Comm: nfsd Not tainted 4.13.5-1-ARCH #1
> [11319.762959] Hardware name:                  /DG33TL, BIOS DPP3510J.86A.0572.2009.0715.2346 07/15/2009
> [11319.762960] task: ffff90dce9e5b700 task.stack: ffff9e1fc1448000
> [11319.762975] RIP: 0010:btrfs_update_device+0x1be/0x1d0 [btrfs]
> [11319.762977] RSP: 0018:ffff9e1fc144b468 EFLAGS: 00010206
> [11319.762978] RAX: 0000000000000fff RBX: ffff90dd227d7a80 RCX: 000003a3816d1e00
> [11319.762980] RDX: 0000000000000004 RSI: 0000000000007e98 RDI: ffff90dc990fbb58
> [11319.762981] RBP: ffff9e1fc144b4b0 R08: 0000000000007e9c R09: ffff9e1fc144b420
> [11319.762982] R10: 0000000000001000 R11: 0000000000000007 R12: ffff90dd21f77c00
> [11319.762984] R13: 0000000000000000 R14: 0000000000007e78 R15: ffff90dc990fbb58
> [11319.762985] FS:  0000000000000000(0000) GS:ffff90dd2bc80000(0000) knlGS:0000000000000000
> [11319.762987] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [11319.762988] CR2: 00005572c6c61c90 CR3: 000000016741a000 CR4: 00000000000006e0
> [11319.762990] Call Trace:
> [11319.763008]  btrfs_finish_chunk_alloc+0x136/0x4f0 [btrfs]
> [11319.763020]  ? btrfs_free_path.part.31+0x21/0x30 [btrfs]
> [11319.763033]  btrfs_create_pending_block_groups+0x135/0x250 [btrfs]
> [11319.763045]  ? btrfs_create_pending_block_groups+0x135/0x250 [btrfs]
> [11319.763060]  __btrfs_end_transaction+0x8e/0x2e0 [btrfs]
> [11319.763231]  btrfs_end_transaction+0x10/0x20 [btrfs]
> [11319.763243]  find_free_extent+0xc05/0x1020 [btrfs]
> [11319.763257]  btrfs_reserve_extent+0x9b/0x190 [btrfs]
> [11319.763272]  cow_file_range.isra.67+0x13a/0x420 [btrfs]
> [11319.763287]  run_delalloc_range+0x359/0x380 [btrfs]
> [11319.763302]  ? find_lock_delalloc_range.constprop.56+0x1cf/0x1f0 [btrfs]
> [11319.763317]  writepage_delalloc.isra.45+0xbc/0x170 [btrfs]
> [11319.763332]  __extent_writepage+0x17c/0x320 [btrfs]
> [11319.763348]  extent_write_cache_pages.constprop.53+0x1cd/0x460 [btrfs]
> [11319.763364]  extent_writepages+0x5d/0x90 [btrfs]
> [11319.763378]  ? btrfs_releasepage+0x20/0x20 [btrfs]
> [11319.763392]  btrfs_writepages+0x28/0x30 [btrfs]
> [11319.763396]  do_writepages+0x48/0xd0
> [11319.763400]  ? exportfs_decode_fh+0x100/0x390
> [11319.763404]  __filemap_fdatawrite_range+0xc1/0x100
> [11319.763405]  ? __filemap_fdatawrite_range+0xc1/0x100
> [11319.763408]  ? iter_div_u64_rem+0x20/0x20
> [11319.763411]  filemap_fdatawrite_range+0x13/0x20
> [11319.763425]  btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> [11319.763440]  start_ordered_ops+0x19/0x30 [btrfs]
> [11319.763454]  btrfs_sync_file+0x76/0x420 [btrfs]
> [11319.763457]  ? get_empty_filp+0xc9/0x1b0
> [11319.763461]  vfs_fsync_range+0x4b/0xb0
> [11319.763473]  nfsd_commit+0xb1/0xe0 [nfsd]
> [11319.763481]  nfsd3_proc_commit+0x6e/0xd0 [nfsd]
> [11319.763487]  nfsd_dispatch+0xfb/0x230 [nfsd]
> [11319.763499]  svc_process_common+0x4a6/0x680 [sunrpc]
> [11319.763507]  svc_process+0xec/0x1c0 [sunrpc]
> [11319.763513]  nfsd+0xf4/0x160 [nfsd]
> [11319.763516]  kthread+0x125/0x140
> [11319.763522]  ? nfsd_destroy+0x70/0x70 [nfsd]
> [11319.763524]  ? kthread_create_on_node+0x70/0x70
> [11319.763528]  ret_from_fork+0x25/0x30
> [11319.763530] Code: 4c 89 ff 45 31 c0 ba 10 00 00 00 4c 89 f6 e8 fa 21 ff ff 4c 89 ff e8 62 f3 fc ff e9 d3 fe ff ff 41 bd f4 ff ff ff e9 d0 fe ff ff <0f> ff eb b6 e8 d9 e3 c5 d0 66 0f 1f 84 00 00 00 00 00 66 66 66 
> [11319.763568] ---[ end trace 40295351501208ff ]---
> 
> [11330.152285] ------------[ cut here ]------------
> [11330.152319] WARNING: CPU: 0 PID: 203 at fs/btrfs/ctree.h:1559 btrfs_update_device+0x1be/0x1d0 [btrfs]
> [11330.152320] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_filter tun coretemp kvm_intel iTCO_wdt iTCO_vendor_support i915 gpio_ich ppdev kvm video i2c_algo_bit drm_kms_helper snd_hda_codec_idt snd_hda_codec_generic irqbypass drm syscopyarea psmouse sysfillrect snd_hda_intel pcspkr i2c_i801 e1000e mei_me sysimgblt snd_hda_codec evdev fb_sys_fops input_leds mei lpc_ich snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore parport_pc parport mac_hid ptp pps_core shpchp winbond_cir rc_core intel_agp intel_gtt led_class agpgart tpm_tis tpm_tis_core tpm button acpi_cpufreq nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables crc32c_generic btrfs xor raid6_pq hid_generic
> [11330.152389]  sr_mod cdrom sd_mod usbhid hid serio_raw atkbd libps2 uhci_hcd firewire_ohci pata_it821x ahci libahci libata firewire_core scsi_mod crc_itu_t i8042 serio ehci_pci ehci_hcd usbcore usb_common
> [11330.152409] CPU: 0 PID: 203 Comm: btrfs-cleaner Tainted: G        W       4.13.5-1-ARCH #1
> [11330.152410] Hardware name:                  /DG33TL, BIOS DPP3510J.86A.0572.2009.0715.2346 07/15/2009
> [11330.152411] task: ffff90dd221ec4c0 task.stack: ffff9e1fc13f4000
> [11330.152427] RIP: 0010:btrfs_update_device+0x1be/0x1d0 [btrfs]
> [11330.152428] RSP: 0018:ffff9e1fc13f7d28 EFLAGS: 00010206
> [11330.152430] RAX: 0000000000000fff RBX: ffff90dd227d7c40 RCX: 000003a3816d1e00
> [11330.152431] RDX: 0000000000000004 RSI: 0000000000007e98 RDI: ffff90dcf76e2578
> [11330.152432] RBP: ffff9e1fc13f7d70 R08: 0000000000007e9c R09: ffff9e1fc13f7ce0
> [11330.152434] R10: 0000000000001000 R11: 0000000000000007 R12: ffff90dd21f77c00
> [11330.152435] R13: 0000000000000000 R14: 0000000000007e78 R15: ffff90dcf76e2578
> [11330.152437] FS:  0000000000000000(0000) GS:ffff90dd2bc00000(0000) knlGS:0000000000000000
> [11330.152438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [11330.152440] CR2: 00007fec55264d30 CR3: 00000001703bf000 CR4: 00000000000006f0
> [11330.152441] Call Trace:
> [11330.152459]  btrfs_remove_chunk+0x365/0x890 [btrfs]
> [11330.152474]  btrfs_delete_unused_bgs+0x30d/0x3b0 [btrfs]
> [11330.152488]  cleaner_kthread+0x165/0x180 [btrfs]
> [11330.152493]  kthread+0x125/0x140
> [11330.152507]  ? __btree_submit_bio_start+0x20/0x20 [btrfs]
> [11330.152513]  ? kthread_create_on_node+0x70/0x70
> [11330.152516]  ret_from_fork+0x25/0x30
> [11330.152518] Code: 4c 89 ff 45 31 c0 ba 10 00 00 00 4c 89 f6 e8 fa 21 ff ff 4c 89 ff e8 62 f3 fc ff e9 d3 fe ff ff 41 bd f4 ff ff ff e9 d0 fe ff ff <0f> ff eb b6 e8 d9 e3 c5 d0 66 0f 1f 84 00 00 00 00 00 66 66 66 
> [11330.152556] ---[ end trace 4029535150120900 ]---

Those warnings are mostly, harmless and they are invoked when your device's size isn't a multiple of of sectorsize (which for now can only be 4kb). The easiest way to get rid of them is to resize your volume and make it a multiple of 4k. There are currently pending changes in btrfs progs which will be able to do that automatically. In any case there is nothing to worry about them. 

> 
> That first warning appeared 13 times in the logs, and the second one appeared only 
> once.
> 
> I haven't rebooted the machine since observing these warnings.  Things seem to be 
> working ok.  Please let me know if there's anything interesting to be learned from 
> it.
> 
> Thanks!
> 
> Steve
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs errors over NFS
  2017-10-13  2:36 ` Steve Leung
@ 2017-10-13 16:39   ` Peter Grandi
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Grandi @ 2017-10-13 16:39 UTC (permalink / raw
  To: Linux fs Btrfs

>> TL;DR: ran into some btrfs errors and weird behaviour, but
>> things generally seem to work. Just posting some details in
>> case it helps devs or other users. [ ... ] I've run into a
>> btrfs error trying to do a -j8 build of android on a btrfs
>> filesystem exported over NFSv3. [ ... ]

I have an NFS server from Btrfs filesystem, and it is mostly
read-only and low-use, unlike a massive build, but so far it has
worked for me. The issue that was reported a while ago was that
the kernel NFS server does not report as errors to clients
checksum validation failures, just prints a warning, so for that
reason and a few others I switch to the Ganesha NFS server.

>From your stack traces I noticed that some go pretty deep so
maybe there is an issue with that (but on 'amd64' the kernel
stack is much bigger than it used to be on 'i386'). Another
possibility is that the volume got somewhat damaged for other
reasons (bugs, media errors, ...) and this is have further
consequences.

BTW 'errno 17' is "File exists", so perhaps there is a race
condition over NFS. The bogus files with mode 0 seem to me to be
bogus directory entries with no files linked to them, which
could be again the result of race conditions. The problems with
chunks allocation reported as "WARNING" are unfamiliar to me.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs errors over NFS
       [not found]   ` <e110503c-ae55-8f8c-1097-3ccbb7a4d35d@shaw.ca>
@ 2017-10-16 11:30     ` Nikolay Borisov
  0 siblings, 0 replies; 5+ messages in thread
From: Nikolay Borisov @ 2017-10-16 11:30 UTC (permalink / raw
  To: Steve Leung, linux-btrfs



On 16.10.2017 08:00, Steve Leung wrote:
> On 10/13/2017 03:14 AM, Nikolay Borisov wrote:
>>
>>
>> On 12.10.2017 20:44, Steve Leung wrote:
>>> I've run into a btrfs error trying to do a -j8 build of android on a
>>> btrfs
>>> filesystem exported over NFSv3.  That in itself might be unwise, but
>>> should be
>>> legal.  :)  This is on 64-bit Arch, kernel 4.13.3, and the filesystem
>>> is RAID1 on 4
>>> devices, with nearly 4TiB unallocated.
>>>
>>> The build itself failed when part of the build process noticed that a
>>> file got
>>> created with mode 000.  i.e. in "ls -l" the mode column looked like
>>> ?---------.
>>>
>>> That much happened without any errors in dmesg, though the NFS client
>>> complained
>>> about some inodes having a mode of 0.
>>>
>>> Then when I tried to delete those files, I got the error below:
>>>
>>> [21476.546060] BTRFS error (device sdc1): err add delayed dir index
>>> item(index: 17) into the deletion tree of the delayed node(root id:
>>> 21780, inode id: 15668343, errno: -17)
>>> [21476.546194] ------------[ cut here ]------------
>>> [21476.546196] kernel BUG at fs/btrfs/delayed-inode.c:1552!
>>> [21476.546231] invalid opcode: 0000 [#1] PREEMPT SMP
>>> [21476.546258] Modules linked in: ipt_MASQUERADE
>>> nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
>>> nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_filter tun coretemp
>>> kvm_intel kvm iTCO_wdt ppdev gpio_ich iTCO_vendor_support evdev i915
>>> input_leds psmouse pcspkr irqbypass i2c_i801 snd_hda_codec_idt
>>> snd_hda_codec_generic mac_hid lpc_ich tpm_tis tpm_tis_core parport_pc
>>> tpm parport video drm_kms_helper drm syscopyarea winbond_cir
>>> sysfillrect sysimgblt rc_core led_class snd_hda_intel snd_hda_codec
>>> snd_hda_core snd_hwdep snd_pcm snd_timer shpchp snd button e1000e
>>> soundcore intel_agp intel_gtt acpi_cpufreq mei_me mei fb_sys_fops
>>> i2c_algo_bit ptp pps_core sch_fq_codel nfsd auth_rpcgss oid_registry
>>> nfs_acl lockd grace sunrpc ip_tables x_tables crc32c_generic btrfs
>>> xor raid6_pq sr_mod cdrom
>>> [21476.546623]  sd_mod hid_generic usbhid hid serio_raw atkbd libps2
>>> uhci_hcd pata_it821x ahci libahci libata firewire_ohci scsi_mod
>>> firewire_core crc_itu_t i8042 serio ehci_pci ehci_hcd usbcore usb_common
>>> [21476.546725] CPU: 0 PID: 984 Comm: rm Not tainted 4.13.3-1-ARCH #1
>>> [21476.546756] Hardware name:                  /DG33TL, BIOS
>>> DPP3510J.86A.0572.2009.0715.2346 07/15/2009
>>> [21476.546801] task: ffff943668131b80 task.stack: ffffaa3840da4000
>>> [21476.546854] RIP: 0010:btrfs_delete_delayed_dir_index+0x24d/0x330
>>> [btrfs]
>>> [21476.546889] RSP: 0018:ffffaa3840da7d18 EFLAGS: 00010286
>>> [21476.546918] RAX: 0000000000000000 RBX: ffff94367c074b90 RCX:
>>> 0000000000000000
>>> [21476.546953] RDX: 0000000000000000 RSI: ffff9436abc0dc78 RDI:
>>> ffff9436abc0dc78
>>> [21476.546989] RBP: ffffaa3840da7d88 R08: 0000000000000323 R09:
>>> 0000000000000000
>>> [21476.547025] R10: ffffaa3840da7bd8 R11: 00000000ffffffff R12:
>>> ffff943583503100
>>> [21476.547060] R13: ffff94367c074bd8 R14: ffff94367c074bd0 R15:
>>> ffff943583503218
>>> [21476.547096] FS:  00007f57da0f6500(0000) GS:ffff9436abc00000(0000)
>>> knlGS:0000000000000000
>>> [21476.547138] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [21476.547167] CR2: 00007f57d9c809c0 CR3: 000000005e39b000 CR4:
>>> 00000000000006f0
>>> [21476.547203] Call Trace:
>>> [21476.547235]  __btrfs_unlink_inode+0x1bc/0x550 [btrfs]
>>> [21476.547278]  btrfs_unlink_inode+0x1c/0x50 [btrfs]
>>> [21476.547317]  btrfs_unlink+0x88/0xe0 [btrfs]
>>> [21476.547343]  vfs_unlink+0x111/0x1c0
>>> [21476.547365]  ? __lookup_hash+0x22/0xa0
>>> [21476.547389]  do_unlinkat+0x2b1/0x310
>>> [21476.547412]  SyS_unlinkat+0x1b/0x30
>>> [21476.547434]  entry_SYSCALL_64_fastpath+0x1a/0xa5
>>> [21476.547460] RIP: 0033:0x7f57d9c303d7
>>> [21476.547481] RSP: 002b:00007fff4bc1a6b8 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000107
>>> [21476.547520] RAX: ffffffffffffffda RBX: 00005644fee69540 RCX:
>>> 00007f57d9c303d7
>>> [21476.547556] RDX: 0000000000000000 RSI: 00005644fee68310 RDI:
>>> 00000000ffffff9c
>>> [21476.547591] RBP: 00005644fee69648 R08: 0000000000000003 R09:
>>> 0000000000000000
>>> [21476.547627] R10: 0000000000000100 R11: 0000000000000246 R12:
>>> 00005644fee68280
>>> [21476.547662] R13: 00007fff4bc1a7f0 R14: 00005644fd37094f R15:
>>> 0000000000000000
>>> [21476.547700] Code: ff ff ff 48 8b 43 10 4c 8b 03 41 b9 ef ff ff ff
>>> 48 8b 55 b0 48 8b 7d a8 48 c7 c6 b0 99 3f c0 48 8b 88 38 03 00 00 e8
>>> f3 0d f7 ff <0f> 0b 48 8b 4d a8 49 8b 7e 38 8b 81 70 10 00 00 48 8d
>>> b1 c8 01
>>> [21476.547840] RIP: btrfs_delete_delayed_dir_index+0x24d/0x330
>>> [btrfs] RSP: ffffaa3840da7d18
>>> [21476.549333] ---[ end trace 41039f5467d88882 ]---
>>>
>>
>> Well, the main thing here is that you hit the BUG() in
>> btrfs_delete_delayed_dir_index.
>> So this happened because you tried to delete a file, which was already
>> staged for deletion.
>>
>> So which directory corresponds to inode 15668343, presumably the
>> android build dir?
> 
> Hi Nikolay, thanks for looking at this.
> 
> Yup, this is definitely an Android build dir full of png files.
> 
>> If you could print the output of the following commands that could
>> help in narrowing down the problem:
>>
>> btrfs inspect-internal dump-tree /dev/sdc1 | grep -A3 "(15668343
>> DIR_INDEX"
>>
>> That will print all the DIR_INDEX items of the directory, and I will
>> be particularly
>> interested in item with index 17. Which should print something like:
> 
> I've attached the output.  I'm a bit surprised to find that the
> directory seems to have every index number in it besides 17. :)

Not having item with index 17 and the BUG_ON failure before. Means that the file was
scheduled for deletion. Was apparently deleted (since it's not on the disk), yet it 
somehow got queued up again for deletion, which caused the BUG_ON to be hit. That's a 
really odd failure scenario I have to admit. 

> 
> Possibly related due to unclean shutdowns - btrfs inspect internal also
> spits this out:
> 
>   parent transid verify failed on 4969828220928 wanted 3176121 found
> 3176123
>   parent transid verify failed on 4969828220928 wanted 3176121 found
> 3176123
>   parent transid verify failed on 4969828220928 wanted 3176121 found
> 3176123
>   parent transid verify failed on 4969828220928 wanted 3176121 found
> 3176123
>   Ignoring transid failure
>   WARNING: eb corrupted: item 4 eb level 0 next level 2, skipping the rest
>   parent transid verify failed on 4969825992704 wanted 3176121 found
> 3176123
>   parent transid verify failed on 4969825992704 wanted 3176121 found
> 3176123
>   parent transid verify failed on 4969825992704 wanted 3176121 found
> 3176123
>   parent transid verify failed on 4969825992704 wanted 3176121 found
> 3176123
>   Ignoring transid failure
>   WARNING: eb corrupted: item 1 eb level 0 next level 2, skipping the rest


Those error messages mean there is a discrepancy between the transactions which 
modified blocks 4969825992704 and 4969828220928 and their parent blocks. This 
is evidence there is internal filesystem inconsistency which might have been caused
by either a bug in the code or hardware failure. 

> 
>> btrfs inspect-internal dump-tree /dev/sdc1 | grep -A4 "(259 INODE_ITEM
>> 0) itemoff"
>> where the stuff in the brackets should be the same as the contents of
>> () following
>> "location key" string.
> 
> The problematic file (tab_unselected_pressed_holo.9.png) has inode
> number 15669905, but I'm unable to locate its details using btrfs
> inspect-internal.  I'm guessing that explains the weird behaviour with
> the directory thinking the file is there, but then nothing can find it.

So if there is no inode item which corresponds to this inode by e.g. 

btrfs inspect-internal dump-tree /dev/sdc1 | grep -A4 "(15669905 INODE_ITEM 0) itemoff"

But you are able to find dir index items by : 

btrfs inspect-internal dump-tree /dev/sdc1 | grep -B2 "name: tab_unselected_pressed_holo.9.png"

Then you are fs is indeed corrupted and you might want to run  btrfs check to see 
what else it finds. 


> 
> I've also started seeing some other weird behaviour from this machine -
> unexpected hangs and reboots.  It might be flaky hardware now, so I
> might need to look at replacing it.  In which case it might mean that
> whatever's gone wrong on the filesystem isn't btrfs's fault.

I'd advise to first rule out hw failure and then report back if you observe 
similar issues. 


> 
> Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-10-16 11:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-12 17:44 btrfs errors over NFS Steve Leung
2017-10-13  2:36 ` Steve Leung
2017-10-13 16:39   ` Peter Grandi
2017-10-13  9:14 ` Nikolay Borisov
     [not found]   ` <e110503c-ae55-8f8c-1097-3ccbb7a4d35d@shaw.ca>
2017-10-16 11:30     ` Nikolay Borisov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.