From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756560AbbFQNf7 (ORCPT ); Wed, 17 Jun 2015 09:35:59 -0400 Received: from arcturus.aphlor.org ([188.246.204.175]:46972 "EHLO arcturus.aphlor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756426AbbFQNfv (ORCPT ); Wed, 17 Jun 2015 09:35:51 -0400 Date: Wed, 17 Jun 2015 09:35:41 -0400 From: Dave Jones To: Chris Mason Cc: dsterba@suse.cz, jbacik@fb.com, Linux Kernel Subject: Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c Message-ID: <20150617133541.GA24428@codemonkey.org.uk> Mail-Followup-To: Dave Jones , Chris Mason , dsterba@suse.cz, jbacik@fb.com, Linux Kernel References: <20150610134017.GA14803@codemonkey.org.uk> <55787743.4070308@fb.com> <20150616171443.GS6761@suse.cz> <55805A98.9020609@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55805A98.9020609@fb.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Report: Spam report generated by SpamAssassin on "arcturus.aphlor.org" Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Authenticated-User: davej@codemonkey.org.uk Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 16, 2015 at 01:19:20PM -0400, Chris Mason wrote: > On 06/16/2015 01:14 PM, David Sterba wrote: > > On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote: > >> On 06/10/2015 09:40 AM, Dave Jones wrote: > >>> Found this on serial console this morning. The machine had rebooted itself shortly > >>> afterwards (surprising, given I don't have panic-on-oops or similar set). > >> > >> We had one other report of this a few months ago. Josef and I read > >> through all of this and decided it was impossible, so someone else must > >> be holding on to that page and unlocking it. > >> > >> (that someone else could easily be btrfs, just not in this code path) > > > > https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug > > symptoms match the "keywords", I haven't inspected it closely. > > > > That one is in my integration-4.2 branch if you want to give it a shot. I was sceptical about this being the same bug, and it looks like I was right.. page:ffffea00027cc640 count:4 mapcount:0 mapping:ffff8800af11d8a0 index:0x0 flags: 0x4000000000000846(error|referenced|active|private) page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) ------------[ cut here ]------------ kernel BUG at mm/filemap.c:745! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2 task: ffff8800b9ec0000 ti: ffff8800843ec000 task.ti: ffff8800843ec000 RIP: 0010:[] [] unlock_page+0x7c/0x80 RSP: 0018:ffff8800843efa58 EFLAGS: 00010292 RAX: 0000000000000036 RBX: 0000000000001000 RCX: 0000000000000000 RDX: 0000000080000000 RSI: ffffffffb20c80c9 RDI: ffffffffb20c7ce4 RBP: ffff8800843efa58 R08: 0000000000000001 R09: 0000000000000d1d R10: 000000000000037c R11: 0000000000000001 R12: ffffea00027cc640 R13: 0000000000000000 R14: 0000000000000fff R15: 0000000000000000 FS: 00007fc9c42b5700(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000050978000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: ffff8800843efb68 ffffffffc02d06ec 0000000000000fff 0000100800000008 ffff8800af11d548 0000000000000000 ffff8800843efab8 0000000000000fff 0000000000000000 ffff88009f319000 ffff8800843efc08 ffff8800af11d728 Call Trace: [] __do_readpage+0x61c/0x7c0 [btrfs] [] ? lock_extent_bits+0x83/0x2e0 [btrfs] [] ? get_parent_ip+0x11/0x50 [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] [] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs] [] __extent_read_full_page+0xc5/0xe0 [btrfs] [] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs] [] extent_read_full_page+0x37/0x60 [btrfs] [] btrfs_readpage+0x25/0x30 [btrfs] [] prepare_uptodate_page+0x4a/0x90 [btrfs] [] prepare_pages+0x101/0x190 [btrfs] [] __btrfs_buffered_write+0x1d3/0x650 [btrfs] [] btrfs_file_write_iter+0x463/0x570 [btrfs] [] ? bad_area+0x4a/0x60 [] __vfs_write+0xb1/0xf0 [] vfs_write+0xa9/0x1b0 [] SyS_pwrite64+0x72/0xb0 [] ? syscall_trace_enter_phase2+0x220/0x260 [] ? syscall_trace_leave+0x95/0x140 [] tracesys_phase2+0x84/0x89 Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d c3 0f 1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f RIP [] unlock_page+0x7c/0x80 Still haven't managed to narrow down a reproducer, but it shows up consistently within 6 hrs or so of fuzzing. Dave