All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: dai.ngo@oracle.com
To: Olga Kornievskaia <aglo@umich.edu>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>,
	linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: help with nfsd kernel oops
Date: Fri, 12 Aug 2022 12:43:21 -0700	[thread overview]
Message-ID: <e86bf23c-2260-5b2b-f6fc-7be22ec6d9d5@oracle.com> (raw)
In-Reply-To: <CAN-5tyFv+fPQ4T1HzWUAs7OcGvz_FyrAsAWwK8X=yu6yxP+Tcw@mail.gmail.com>


On 8/12/22 7:23 AM, Olga Kornievskaia wrote:
> On Thu, Aug 11, 2022 at 9:34 PM <dai.ngo@oracle.com> wrote:
>> On 8/10/22 5:50 PM, Olga Kornievskaia wrote:
>>> Hi folks (Chuck/Jeff specifically),
>>>
>>> We've had this outstanding kernel oops that happens (infrequently) in
>>> copy_offload testing (stack trace in the end). I've been trying to
>>> debug it for a while, added printks and such. I can hand-wavey explain
>>> what I'm seeing but I need help (a) nailing down exactly the problem
>>> and (b) get a helpful hint how to address it?
>>>
>>> Ok so what happens. Test case: source file is opened, locked,
>>> (blah-blah destination file), copy_notify to the source server, copy
>>> is done (src dest), source file unlocked (etc dest file), files
>>> closed. Copy/Copy_notify, uses a locking stateid.
>>>
>>> When unlocking happens it triggers LOCKU and FREE_STATEID. Copy_notify
>>> stateid is associated with the locking stateid on the server. When the
>>> last reference on the locking stateid goes nfs4_put_stateid() also
>>> calls nfs4_free_cpntf_statelist() which deals with cleaning up the
>>> copy_notify stateid.
>>>
>>> In the laundry thread, there is a failsafe that if for some reason the
>>> copy_notify state was not cleaned up/expired, then it would be deleted
>>> there.
>>>
>>> However in the failing case, where everything should be cleaned up as
>>> it's supposed to be, instead I see calling to put_ol_stateid_locked()
>>> (before FREE_STATEID is processed) which cleans up the parent but
>>> doesn't touch the copy_notify stateids so instead the laundry thread
>>> runs and walks the copy_notify list (since it's not empty) and tries
>>> to free the entries but that leads to this oops (since part of the
>>> memory was freed by put_ol_stateid_locked() and parent stateid)?.
>>>
>>> Perhaps the fix is to add the  nfs4_free_cpntf_statelist() to
>>> put_ol_stateid_locked() which I tried and it seems to work. But I'm
>>> not confident about it.
>>>
>>> Thoughts?
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index fa67ecd5fe63..b988d3c4e5fd 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -1408,6 +1408,7 @@ static void put_ol_stateid_locked(struct
>>> nfs4_ol_stateid *stp,
>>>           }
>>>
>>>           idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
>>> +       nfs4_free_cpntf_statelist(clp->net, s);
>>>           list_add(&stp->st_locks, reaplist);
>>>    }
>>>
>>>
>> In the use-after-free scenario, the copy_state is inserted on the
>> sc_cp_list of the lock state.
>>
>> So when put_ol_stateid_locked is called from nfsd4_close_open_stateid,
>> with your proposed patch, nfs4_free_cpntf_statelist does not remove
>> any copy_state since 's' is the open state.
> i dont know about all the cases but i'm 100% certain that "s" is there
> a locking state (in linux client using nfstest_ssc). Since i've tested
> it and it works and i have heavy debugging added to knfsd tracking all
> the pointers of copy_notify stateids and its parents.

The fix for this problem requires the copy state to be freed when
nfsd4_close_open_stateid is called. With your patch, the copy state
is freed from:

	nfsd4_close_open_stateid
		unhash_open_stateid
			release_open_stateid_locks
				put_ol_stateid_locked
					nfs4_free_cpntf_statelist
					
with your patch, nfs4_free_cpntf_statelist is also called for v4.0
client which is unnecessary.

in addition, there are 5 places that call put_ol_stateid_locked:

release_open_stateid_locks
release_open_stateid
release_openowner
nfsd4_close_open_stateid
nfsd4_release_lockowner

only the call to put_ol_stateid_locked from release_open_stateid_locks
is needed. The over 4 are unnecessary overhead.

My patch frees the copy state only from nfsd4_close_open_stateid and
only for v4.x client.

-Dai

  		

>
>> Also put_ol_stateid_locked can return without calling idr_remove
>> and nfs4_free_cpntf_statelist (your proposed fix).
>>
>> -Dai
>>
>>>
>>> [  338.681529] ------------[ cut here ]------------
>>> [  338.683090] kernel BUG at lib/list_debug.c:53!
>>> [  338.684372] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
>>> [  338.685977] CPU: 1 PID: 493 Comm: kworker/u256:27 Tainted: G    B
>>>             5.19.0-rc6+ #104
>>> [  338.688266] Hardware name: VMware, Inc. VMware Virtual
>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>> [  338.691019] Workqueue: nfsd4 laundromat_main [nfsd]
>>> [  338.692224] RIP: 0010:__list_del_entry_valid.cold.3+0x3d/0x53
>>> [  338.693626] Code: 0b 4c 89 e1 4c 89 ee 48 c7 c7 e0 1a e3 8f e8 5b
>>> 60 fe ff 0f 0b 48 89 e9 4c 89 ea 48 89 de 48 c7 c7 60 1a e3 8f e8 44
>>> 60 fe ff <0f> 0b 48 89 ea 48 89 de 48 c7 c7 00 1a e3 8f e8 30 60 fe ff
>>> 0f 0b
>>> [  338.697651] RSP: 0018:ffff88800d03fc68 EFLAGS: 00010286
>>> [  338.698762] RAX: 000000000000006d RBX: ffff888028a14798 RCX: 0000000000000000
>>> [  338.700442] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff917e9240
>>> [  338.702257] RBP: ffff88801bb0ae90 R08: ffffed100a795f0e R09: ffffed100a795f0e
>>> [  338.704016] R10: ffff888053caf86b R11: ffffed100a795f0d R12: ffff88801bb0ae90
>>> [  338.705703] R13: d9c0013300000a39 R14: 000000000000005a R15: ffff88801b9f5800
>>> [  338.707510] FS:  0000000000000000(0000) GS:ffff888053c80000(0000)
>>> knlGS:0000000000000000
>>> [  338.709319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  338.710715] CR2: 00005640baab74d0 CR3: 0000000017574005 CR4: 00000000001706e0
>>> [  338.712282] Call Trace:
>>> [  338.712898]  <TASK>
>>> [  338.713430]  _free_cpntf_state_locked+0x6b/0x120 [nfsd]
>>> [  338.714806]  nfs4_laundromat+0x8ef/0xf30 [nfsd]
>>> [  338.716013]  ? dequeue_entity+0x18b/0x6c0
>>> [  338.716970]  ? release_lock_stateid+0x60/0x60 [nfsd]
>>> [  338.718169]  ? _raw_spin_unlock+0x15/0x30
>>> [  338.719064]  ? __switch_to+0x2fa/0x690
>>> [  338.719879]  ? __schedule+0x67d/0xf20
>>> [  338.720678]  laundromat_main+0x15/0x40 [nfsd]
>>> [  338.721760]  process_one_work+0x3b4/0x6b0
>>> [  338.722629]  worker_thread+0x5a/0x640
>>> [  338.723425]  ? process_one_work+0x6b0/0x6b0
>>> [  338.724324]  kthread+0x162/0x190
>>> [  338.725030]  ? kthread_complete_and_exit+0x20/0x20
>>> [  338.726074]  ret_from_fork+0x22/0x30
>>> [  338.726856]  </TASK>

      reply	other threads:[~2022-08-12 19:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11  0:50 help with nfsd kernel oops Olga Kornievskaia
2022-08-11 10:45 ` Jeff Layton
2022-08-11 13:09   ` Olga Kornievskaia
2022-08-11 13:23     ` Olga Kornievskaia
2022-08-12  1:34 ` dai.ngo
2022-08-12 14:23   ` Olga Kornievskaia
2022-08-12 19:43     ` dai.ngo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e86bf23c-2260-5b2b-f6fc-7be22ec6d9d5@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=aglo@umich.edu \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.