From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5739F2E648 for ; Tue, 2 Apr 2024 07:54:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712044488; cv=none; b=m0eGNpwQ3/PGLcuK3olz/iBCtwnRwYPy0yBq8mtWBR5DUdfoP8Jx206IpV/rqLYanIQuySjZeKJD4XOCd2rSbTthA2gjN/vEtPhvY3Q9m0hy3wzBK6idoYhrFotZMy63+6gCVPSx2h9QGShg+v8pI2WquodxQ3LI1SRa0ULCCvo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712044488; c=relaxed/simple; bh=L6qjDpB9SNgll1zi3jh5f8TOfz7iJJT2ZzT2yX4Wq6I=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=GZ1y8zOy8ksh9UA1GTwIYLQq+ZxdFYST3DVC60plaE9Kyyt+E9XNsc6AT7ZI2C2l2a9qnKm5Ge1ej4AN6G9Ey8G0Xb7e5/sizwhYHagD5WYIv5uurA/tGncnraZ8wJWk3SWKi1jpzfZb7qK7zls2HWbqqo/Ctlg5iy1t3dSIymw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9C65A1042; Tue, 2 Apr 2024 00:55:17 -0700 (PDT) Received: from [10.57.73.65] (unknown [10.57.73.65]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 94A1B3F64C; Tue, 2 Apr 2024 00:54:44 -0700 (PDT) Message-ID: <582b72c3-e572-42e8-88ee-ce2705eceef2@arm.com> Date: Tue, 2 Apr 2024 08:54:42 +0100 Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 6.1 080/272] mm: swap: fix race between free_swap_and_cache() and swapoff() To: Greg Kroah-Hartman , stable@vger.kernel.org Cc: patches@lists.linux.dev, David Hildenbrand , "Huang, Ying" , Andrew Morton , Sasha Levin References: <20240401152530.237785232@linuxfoundation.org> <20240401152533.098385229@linuxfoundation.org> Content-Language: en-GB From: Ryan Roberts In-Reply-To: <20240401152533.098385229@linuxfoundation.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 01/04/2024 16:44, Greg Kroah-Hartman wrote: > 6.1-stable review patch. If anyone has any objections, please let me know. LGTM! > > ------------------ > > From: Ryan Roberts > > [ Upstream commit 82b1c07a0af603e3c47b906c8e991dc96f01688e ] > > There was previously a theoretical window where swapoff() could run and > teardown a swap_info_struct while a call to free_swap_and_cache() was > running in another thread. This could cause, amongst other bad > possibilities, swap_page_trans_huge_swapped() (called by > free_swap_and_cache()) to access the freed memory for swap_map. > > This is a theoretical problem and I haven't been able to provoke it from a > test case. But there has been agreement based on code review that this is > possible (see link below). > > Fix it by using get_swap_device()/put_swap_device(), which will stall > swapoff(). There was an extra check in _swap_info_get() to confirm that > the swap entry was not free. This isn't present in get_swap_device() > because it doesn't make sense in general due to the race between getting > the reference and swapoff. So I've added an equivalent check directly in > free_swap_and_cache(). > > Details of how to provoke one possible issue (thanks to David Hildenbrand > for deriving this): > > --8<----- > > __swap_entry_free() might be the last user and result in > "count == SWAP_HAS_CACHE". > > swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0. > > So the question is: could someone reclaim the folio and turn > si->inuse_pages==0, before we completed swap_page_trans_huge_swapped(). > > Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are > still references by swap entries. > > Process 1 still references subpage 0 via swap entry. > Process 2 still references subpage 1 via swap entry. > > Process 1 quits. Calls free_swap_and_cache(). > -> count == SWAP_HAS_CACHE > [then, preempted in the hypervisor etc.] > > Process 2 quits. Calls free_swap_and_cache(). > -> count == SWAP_HAS_CACHE > > Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls > __try_to_reclaim_swap(). > > __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()-> > put_swap_folio()->free_swap_slot()->swapcache_free_entries()-> > swap_entry_free()->swap_range_free()-> > ... > WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > > What stops swapoff to succeed after process 2 reclaimed the swap cache > but before process1 finished its call to swap_page_trans_huge_swapped()? > > --8<----- > > Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com > Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch") > Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redhat.com/ > Signed-off-by: Ryan Roberts > Cc: David Hildenbrand > Cc: "Huang, Ying" > Cc: > Signed-off-by: Andrew Morton > Signed-off-by: Sasha Levin > --- > mm/swapfile.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 324844f98d67c..0d6182db44a6a 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1229,6 +1229,11 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, > * with get_swap_device() and put_swap_device(), unless the swap > * functions call get/put_swap_device() by themselves. > * > + * Note that when only holding the PTL, swapoff might succeed immediately > + * after freeing a swap entry. Therefore, immediately after > + * __swap_entry_free(), the swap info might become stale and should not > + * be touched without a prior get_swap_device(). > + * > * Check whether swap entry is valid in the swap device. If so, > * return pointer to swap_info_struct, and keep the swap entry valid > * via preventing the swap device from being swapoff, until > @@ -1630,13 +1635,19 @@ int free_swap_and_cache(swp_entry_t entry) > if (non_swap_entry(entry)) > return 1; > > - p = _swap_info_get(entry); > + p = get_swap_device(entry); > if (p) { > + if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) { > + put_swap_device(p); > + return 0; > + } > + > count = __swap_entry_free(p, entry); > if (count == SWAP_HAS_CACHE && > !swap_page_trans_huge_swapped(p, entry)) > __try_to_reclaim_swap(p, swp_offset(entry), > TTRS_UNMAPPED | TTRS_FULL); > + put_swap_device(p); > } > return p != NULL; > }