From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11537C48BE5 for ; Wed, 16 Jun 2021 06:50:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EBE3B61246 for ; Wed, 16 Jun 2021 06:50:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231789AbhFPGwQ (ORCPT ); Wed, 16 Jun 2021 02:52:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231290AbhFPGwO (ORCPT ); Wed, 16 Jun 2021 02:52:14 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4AED8C06175F for ; Tue, 15 Jun 2021 23:50:09 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id x21-20020a17090aa395b029016e25313bfcso1178801pjp.2 for ; Tue, 15 Jun 2021 23:50:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+Kn3PBZvfdrm3M++IQCcW4SjJvH/DG1JZP1myaOpJRg=; b=EOUg1SGoT/6uPzoDXTs7vcmZcQci9D4manCusk+A1HK/afG4VmUTZJ89jjcHHhF+15 uDqVUdG5Jdourr03Cxc0YGw7sbmPEt9Pr3EGYAPrRyqSEEBLdhkj+MTnwPyydv0GBvoM dfqP3Y4INs8cQT5ly23nRsxM3joBbpiFsfbw40tuTTT4WoV1SqU9dQ4Ajfz3c6Ty/0q8 1cZOMl+cehTCGKZQ7JY74pJGDp2MPTk0DAJ7yBKQdp6tvwCQExYpIOb+7tmP7sZ/9HqU rqItzUaIRiLJ5q0EBSfITxk49RBh4zDf49bvuF7oxdH6oq0gdzedyMoEZVeEQCFEmpXu 6UCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+Kn3PBZvfdrm3M++IQCcW4SjJvH/DG1JZP1myaOpJRg=; b=bRB9SISZxhtBoGFqBzAn9UJVJvoIjhCFVl2Cavfekg7/LPEZits/ZBMN/4l5HeJo0I d6e1ege7a8H1Mt6J8S5V2xrCPVkWfBUeiPKNKPVC6oBZW6pRoCzPJjefAthBDF/Csaq8 2nSBq2q3O+FOVUAdz/o1QGGGJbOg/qpeCsvGY2jahpExWjkxWoj1sZHPeeS0FKifWsPo IUp4M1AJPHxQWzJvDxgFg9nvj0PFVPX4/DH5LZg8mEAXyLTcRjtyz0q6VNRDWJPMHqfX TgO9xZaRNkApSfW6IhLSoKUF14SndaP7pPR4E3u5ID1AbcOv0lF8B236AcMyG+zzgmqy HPvA== X-Gm-Message-State: AOAM530Ve3gMAkx8cZ9z3lGTU06fFGuJnLwvLdhmx2eFxLn59tigVVzE Lt6C7RSGEhdnKYvTe2xrwk2iBeNbHx4wAgw4IXoNvQ== X-Google-Smtp-Source: ABdhPJyQRs1D11qRe1LliJmjI3lSiLNlJfJSax7U3r7FSVTrVSQxBNBqbhHBO+w23FsmYdODe6G8Ue0jGeGbsigANto= X-Received: by 2002:a17:902:b497:b029:115:e287:7b55 with SMTP id y23-20020a170902b497b0290115e2877b55mr7677048plr.79.1623826208688; Tue, 15 Jun 2021 23:50:08 -0700 (PDT) MIME-Version: 1.0 References: <20210604011844.1756145-1-ruansy.fnst@fujitsu.com> <20210604011844.1756145-6-ruansy.fnst@fujitsu.com> In-Reply-To: <20210604011844.1756145-6-ruansy.fnst@fujitsu.com> From: Dan Williams Date: Tue, 15 Jun 2021 23:49:57 -0700 Message-ID: Subject: Re: [PATCH v4 05/10] mm, pmem: Implement ->memory_failure() in pmem driver To: Shiyang Ruan Cc: Linux Kernel Mailing List , linux-xfs , Linux MM , linux-fsdevel , device-mapper development , "Darrick J. Wong" , david , Christoph Hellwig , Alasdair Kergon , Mike Snitzer , Goldwyn Rodrigues , Linux NVDIMM Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan wrote: > > Call the ->memory_failure() which is implemented by pmem driver, in > order to finally notify filesystem to handle the corrupted data. The > handler which collects and kills processes are moved into > mf_dax_kill_procs(), which will be called by filesystem. > > Keep the old handler in order to roll back if driver or filesystem > does not support ->memory_failure()/->corrupted_range(). > > Signed-off-by: Shiyang Ruan > --- > block/genhd.c | 30 ++++++++++++++++++ > drivers/nvdimm/pmem.c | 14 +++++++++ > include/linux/genhd.h | 1 + > mm/memory-failure.c | 71 +++++++++++++++++++++++++++---------------- I would not expect a patch that converts the pmem driver to touch mm/memory-failure.c. mf_generic_kill_procs() has nothing to do with the pmem driver. > 4 files changed, 90 insertions(+), 26 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 9f8cb7beaad1..75834bd057df 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -718,6 +718,36 @@ struct block_device *bdget_disk(struct gendisk *disk, int partno) > return bdev; > } > > +/** > + * bdget_disk_sector - get block device by given sector number > + * @disk: gendisk of interest > + * @sector: sector number > + * > + * RETURNS: the found block device where sector locates in > + */ > +struct block_device *bdget_disk_sector(struct gendisk *disk, sector_t sector) > +{ > + struct block_device *part = NULL, *p; > + unsigned long idx; > + > + rcu_read_lock(); > + xa_for_each(&disk->part_tbl, idx, p) { > + if (p->bd_partno == 0) > + continue; > + if (p->bd_start_sect <= sector && > + sector < p->bd_start_sect + bdev_nr_sectors(p)) { > + part = p; > + break; > + } > + } > + rcu_read_unlock(); > + if (!part) > + part = disk->part0; > + > + return bdget_disk(disk, part->bd_partno); > +} > +EXPORT_SYMBOL(bdget_disk_sector); I can't see any justification for this function. The pmem driver does not need it and the upper layer holders should have their own mechanism to go from dax_dev offset to bdev if they need to, but hopefully they don't. A dax_device failure should be independent of a block_device failure. > + > /* > * print a full list of all partitions - intended for places where the root > * filesystem can't be mounted and thus to give the victim some idea of what > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > index ed10a8b66068..98349e7d0a28 100644 > --- a/drivers/nvdimm/pmem.c > +++ b/drivers/nvdimm/pmem.c > @@ -364,9 +364,23 @@ static void pmem_release_disk(void *__pmem) > put_disk(pmem->disk); > } > > +static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap, > + unsigned long pfn, int flags) > +{ > + struct pmem_device *pdev = > + container_of(pgmap, struct pmem_device, pgmap); This local variable is called @pmem not @pdev in other parts of the driver. > + loff_t offset = PFN_PHYS(pfn) - pdev->phys_addr - pdev->data_offset; > + struct block_device *bdev = > + bdget_disk_sector(pdev->disk, offset >> SECTOR_SHIFT); > + > + return dax_corrupted_range(pdev->dax_dev, bdev, offset, > + page_size(pfn_to_page(pfn)), &flags); Per previous comments this interface should be range based. Why is the last argument &flags? That was passed in by value so any changes to it will not be reflected back to memory_failure()? > +} > + > static const struct dev_pagemap_ops fsdax_pagemap_ops = { > .kill = pmem_pagemap_kill, > .cleanup = pmem_pagemap_cleanup, > + .memory_failure = pmem_pagemap_memory_failure, > }; > > static int pmem_attach_disk(struct device *dev, > diff --git a/include/linux/genhd.h b/include/linux/genhd.h > index 6fc26f7bdf71..2ad70c02c343 100644 > --- a/include/linux/genhd.h > +++ b/include/linux/genhd.h > @@ -219,6 +219,7 @@ static inline void add_disk_no_queue_reg(struct gendisk *disk) > > extern void del_gendisk(struct gendisk *gp); > extern struct block_device *bdget_disk(struct gendisk *disk, int partno); > +extern struct block_device *bdget_disk_sector(struct gendisk *disk, sector_t sector); > > void set_disk_ro(struct gendisk *disk, bool read_only); > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 4377e727d478..43017d7f3918 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1247,6 +1247,36 @@ static void unmap_and_kill(struct list_head *to_kill, unsigned long pfn, > kill_procs(to_kill, flags & MF_MUST_KILL, false, pfn, flags); > } > > +static int mf_generic_kill_procs(unsigned long long pfn, int flags) > +{ > + struct page *page = pfn_to_page(pfn); > + LIST_HEAD(to_kill); > + dax_entry_t cookie; > + > + /* > + * Prevent the inode from being freed while we are interrogating > + * the address_space, typically this would be handled by > + * lock_page(), but dax pages do not use the page lock. This > + * also prevents changes to the mapping of this pfn until > + * poison signaling is complete. > + */ > + cookie = dax_lock_page(page); > + if (!cookie) > + return -EBUSY; > + /* > + * Unlike System-RAM there is no possibility to swap in a > + * different physical page at a given virtual address, so all > + * userspace consumption of ZONE_DEVICE memory necessitates > + * SIGBUS (i.e. MF_MUST_KILL) > + */ > + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; > + collect_procs(page, &to_kill, flags & MF_ACTION_REQUIRED); > + > + unmap_and_kill(&to_kill, pfn, page->mapping, page->index, flags); > + dax_unlock_page(page, cookie); > + return 0; > +} > + Per above, I am surprised to find this in this patch, it belong in its own patch if at all. > int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, int flags) > { > LIST_HEAD(to_kill); > @@ -1348,9 +1378,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > struct dev_pagemap *pgmap) > { > struct page *page = pfn_to_page(pfn); > - LIST_HEAD(to_kill); > int rc = -EBUSY; > - dax_entry_t cookie; > > if (flags & MF_COUNT_INCREASED) > /* > @@ -1364,20 +1392,9 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > goto out; > } > > - /* > - * Prevent the inode from being freed while we are interrogating > - * the address_space, typically this would be handled by > - * lock_page(), but dax pages do not use the page lock. This > - * also prevents changes to the mapping of this pfn until > - * poison signaling is complete. > - */ > - cookie = dax_lock_page(page); > - if (!cookie) > - goto out; > - > if (hwpoison_filter(page)) { > rc = 0; > - goto unlock; > + goto out; > } > > if (pgmap->type == MEMORY_DEVICE_PRIVATE) { > @@ -1385,7 +1402,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > * TODO: Handle HMM pages which may need coordination > * with device-side memory. > */ > - goto unlock; > + goto out; > } > > /* > @@ -1395,19 +1412,21 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > SetPageHWPoison(page); > > /* > - * Unlike System-RAM there is no possibility to swap in a > - * different physical page at a given virtual address, so all > - * userspace consumption of ZONE_DEVICE memory necessitates > - * SIGBUS (i.e. MF_MUST_KILL) > + * Call driver's implementation to handle the memory failure, > + * otherwise roll back to generic handler. > */ > - flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; > - collect_procs_file(page, page->mapping, page->index, &to_kill, > - flags & MF_ACTION_REQUIRED); > + if (pgmap->ops->memory_failure) { > + rc = pgmap->ops->memory_failure(pgmap, pfn, flags); > + /* > + * Roll back to generic handler too if operation is not > + * supported inside the driver/device/filesystem. > + */ > + if (rc != EOPNOTSUPP) > + goto out; > + } > + > + rc = mf_generic_kill_procs(pfn, flags); > > - unmap_and_kill(&to_kill, pfn, page->mapping, page->index, flags); > - rc = 0; > -unlock: > - dax_unlock_page(page, cookie); > out: > /* drop pgmap ref acquired in caller */ > put_dev_pagemap(pgmap); > -- > 2.31.1 > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF93BC48BE5 for ; Wed, 16 Jun 2021 06:50:27 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8EC5C611CE for ; Wed, 16 Jun 2021 06:50:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8EC5C611CE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-252-XK7ZEvgWPPaK1oUXnWSqFA-1; Wed, 16 Jun 2021 02:50:25 -0400 X-MC-Unique: XK7ZEvgWPPaK1oUXnWSqFA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E02C4800D62; Wed, 16 Jun 2021 06:50:20 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7AD2A5D6AD; Wed, 16 Jun 2021 06:50:19 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 6ABDF1809CAD; Wed, 16 Jun 2021 06:50:18 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 15G6oFsh028074 for ; Wed, 16 Jun 2021 02:50:15 -0400 Received: by smtp.corp.redhat.com (Postfix) id 5BF3D1054574; Wed, 16 Jun 2021 06:50:15 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast06.extmail.prod.ext.rdu2.redhat.com [10.11.55.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 574C310545AE for ; Wed, 16 Jun 2021 06:50:13 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 12201196F582 for ; Wed, 16 Jun 2021 06:50:13 +0000 (UTC) Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-433-NNOnOvBqPmKyAxujoAz6iw-1; Wed, 16 Jun 2021 02:50:09 -0400 X-MC-Unique: NNOnOvBqPmKyAxujoAz6iw-1 Received: by mail-pl1-f176.google.com with SMTP id 69so625112plc.5 for ; Tue, 15 Jun 2021 23:50:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+Kn3PBZvfdrm3M++IQCcW4SjJvH/DG1JZP1myaOpJRg=; b=h2T1/Kz2g/KJ5n0BMvsXYzCEgLJ49YNfZ3ZQxtRnWuohuv8W0TnD4KCHSbuVCHVftk aPBr7AIDjt/+fqHgWEEXAXRyb0D1hvFbHPPmM2jZG7IEMO1N0dT84wzN/1eq7HgVY2ZV cyjNwRJ/FyPq8N5azUtTn7X1mG1klOvJVqg5Ybidh0+TKG5W7opBGmD8Rk6BUQE162Ob jZ/J97bfzUyDVoSCHSiziTvF8fRxlMJptwoP3qfdfGa6IrGH7Da7prxfTFZUSQlBsqA/ C1qNlAau+8oGI0mKeHMzavOicqJVhtfruY3Iaec08Od2/L/b3xLG5d+yeBPX589zn8Ee Vd8A== X-Gm-Message-State: AOAM530zskUnKVOIDClkE//rLHelzWwkz/SjPbVy0b+OljNoUhKA8f8Y uI7+osagIL1b9mR4ghQkbWzBomNtANJs4Q8xyzzI0A== X-Google-Smtp-Source: ABdhPJyQRs1D11qRe1LliJmjI3lSiLNlJfJSax7U3r7FSVTrVSQxBNBqbhHBO+w23FsmYdODe6G8Ue0jGeGbsigANto= X-Received: by 2002:a17:902:b497:b029:115:e287:7b55 with SMTP id y23-20020a170902b497b0290115e2877b55mr7677048plr.79.1623826208688; Tue, 15 Jun 2021 23:50:08 -0700 (PDT) MIME-Version: 1.0 References: <20210604011844.1756145-1-ruansy.fnst@fujitsu.com> <20210604011844.1756145-6-ruansy.fnst@fujitsu.com> In-Reply-To: <20210604011844.1756145-6-ruansy.fnst@fujitsu.com> From: Dan Williams Date: Tue, 15 Jun 2021 23:49:57 -0700 Message-ID: To: Shiyang Ruan X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-loop: dm-devel@redhat.com Cc: Linux NVDIMM , Mike Snitzer , "Darrick J. Wong" , Goldwyn Rodrigues , david , Linux Kernel Mailing List , linux-xfs , Linux MM , device-mapper development , linux-fsdevel , Christoph Hellwig , Alasdair Kergon Subject: Re: [dm-devel] [PATCH v4 05/10] mm, pmem: Implement ->memory_failure() in pmem driver X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan wrote: > > Call the ->memory_failure() which is implemented by pmem driver, in > order to finally notify filesystem to handle the corrupted data. The > handler which collects and kills processes are moved into > mf_dax_kill_procs(), which will be called by filesystem. > > Keep the old handler in order to roll back if driver or filesystem > does not support ->memory_failure()/->corrupted_range(). > > Signed-off-by: Shiyang Ruan > --- > block/genhd.c | 30 ++++++++++++++++++ > drivers/nvdimm/pmem.c | 14 +++++++++ > include/linux/genhd.h | 1 + > mm/memory-failure.c | 71 +++++++++++++++++++++++++++---------------- I would not expect a patch that converts the pmem driver to touch mm/memory-failure.c. mf_generic_kill_procs() has nothing to do with the pmem driver. > 4 files changed, 90 insertions(+), 26 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 9f8cb7beaad1..75834bd057df 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -718,6 +718,36 @@ struct block_device *bdget_disk(struct gendisk *disk, int partno) > return bdev; > } > > +/** > + * bdget_disk_sector - get block device by given sector number > + * @disk: gendisk of interest > + * @sector: sector number > + * > + * RETURNS: the found block device where sector locates in > + */ > +struct block_device *bdget_disk_sector(struct gendisk *disk, sector_t sector) > +{ > + struct block_device *part = NULL, *p; > + unsigned long idx; > + > + rcu_read_lock(); > + xa_for_each(&disk->part_tbl, idx, p) { > + if (p->bd_partno == 0) > + continue; > + if (p->bd_start_sect <= sector && > + sector < p->bd_start_sect + bdev_nr_sectors(p)) { > + part = p; > + break; > + } > + } > + rcu_read_unlock(); > + if (!part) > + part = disk->part0; > + > + return bdget_disk(disk, part->bd_partno); > +} > +EXPORT_SYMBOL(bdget_disk_sector); I can't see any justification for this function. The pmem driver does not need it and the upper layer holders should have their own mechanism to go from dax_dev offset to bdev if they need to, but hopefully they don't. A dax_device failure should be independent of a block_device failure. > + > /* > * print a full list of all partitions - intended for places where the root > * filesystem can't be mounted and thus to give the victim some idea of what > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > index ed10a8b66068..98349e7d0a28 100644 > --- a/drivers/nvdimm/pmem.c > +++ b/drivers/nvdimm/pmem.c > @@ -364,9 +364,23 @@ static void pmem_release_disk(void *__pmem) > put_disk(pmem->disk); > } > > +static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap, > + unsigned long pfn, int flags) > +{ > + struct pmem_device *pdev = > + container_of(pgmap, struct pmem_device, pgmap); This local variable is called @pmem not @pdev in other parts of the driver. > + loff_t offset = PFN_PHYS(pfn) - pdev->phys_addr - pdev->data_offset; > + struct block_device *bdev = > + bdget_disk_sector(pdev->disk, offset >> SECTOR_SHIFT); > + > + return dax_corrupted_range(pdev->dax_dev, bdev, offset, > + page_size(pfn_to_page(pfn)), &flags); Per previous comments this interface should be range based. Why is the last argument &flags? That was passed in by value so any changes to it will not be reflected back to memory_failure()? > +} > + > static const struct dev_pagemap_ops fsdax_pagemap_ops = { > .kill = pmem_pagemap_kill, > .cleanup = pmem_pagemap_cleanup, > + .memory_failure = pmem_pagemap_memory_failure, > }; > > static int pmem_attach_disk(struct device *dev, > diff --git a/include/linux/genhd.h b/include/linux/genhd.h > index 6fc26f7bdf71..2ad70c02c343 100644 > --- a/include/linux/genhd.h > +++ b/include/linux/genhd.h > @@ -219,6 +219,7 @@ static inline void add_disk_no_queue_reg(struct gendisk *disk) > > extern void del_gendisk(struct gendisk *gp); > extern struct block_device *bdget_disk(struct gendisk *disk, int partno); > +extern struct block_device *bdget_disk_sector(struct gendisk *disk, sector_t sector); > > void set_disk_ro(struct gendisk *disk, bool read_only); > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 4377e727d478..43017d7f3918 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1247,6 +1247,36 @@ static void unmap_and_kill(struct list_head *to_kill, unsigned long pfn, > kill_procs(to_kill, flags & MF_MUST_KILL, false, pfn, flags); > } > > +static int mf_generic_kill_procs(unsigned long long pfn, int flags) > +{ > + struct page *page = pfn_to_page(pfn); > + LIST_HEAD(to_kill); > + dax_entry_t cookie; > + > + /* > + * Prevent the inode from being freed while we are interrogating > + * the address_space, typically this would be handled by > + * lock_page(), but dax pages do not use the page lock. This > + * also prevents changes to the mapping of this pfn until > + * poison signaling is complete. > + */ > + cookie = dax_lock_page(page); > + if (!cookie) > + return -EBUSY; > + /* > + * Unlike System-RAM there is no possibility to swap in a > + * different physical page at a given virtual address, so all > + * userspace consumption of ZONE_DEVICE memory necessitates > + * SIGBUS (i.e. MF_MUST_KILL) > + */ > + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; > + collect_procs(page, &to_kill, flags & MF_ACTION_REQUIRED); > + > + unmap_and_kill(&to_kill, pfn, page->mapping, page->index, flags); > + dax_unlock_page(page, cookie); > + return 0; > +} > + Per above, I am surprised to find this in this patch, it belong in its own patch if at all. > int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, int flags) > { > LIST_HEAD(to_kill); > @@ -1348,9 +1378,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > struct dev_pagemap *pgmap) > { > struct page *page = pfn_to_page(pfn); > - LIST_HEAD(to_kill); > int rc = -EBUSY; > - dax_entry_t cookie; > > if (flags & MF_COUNT_INCREASED) > /* > @@ -1364,20 +1392,9 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > goto out; > } > > - /* > - * Prevent the inode from being freed while we are interrogating > - * the address_space, typically this would be handled by > - * lock_page(), but dax pages do not use the page lock. This > - * also prevents changes to the mapping of this pfn until > - * poison signaling is complete. > - */ > - cookie = dax_lock_page(page); > - if (!cookie) > - goto out; > - > if (hwpoison_filter(page)) { > rc = 0; > - goto unlock; > + goto out; > } > > if (pgmap->type == MEMORY_DEVICE_PRIVATE) { > @@ -1385,7 +1402,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > * TODO: Handle HMM pages which may need coordination > * with device-side memory. > */ > - goto unlock; > + goto out; > } > > /* > @@ -1395,19 +1412,21 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > SetPageHWPoison(page); > > /* > - * Unlike System-RAM there is no possibility to swap in a > - * different physical page at a given virtual address, so all > - * userspace consumption of ZONE_DEVICE memory necessitates > - * SIGBUS (i.e. MF_MUST_KILL) > + * Call driver's implementation to handle the memory failure, > + * otherwise roll back to generic handler. > */ > - flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; > - collect_procs_file(page, page->mapping, page->index, &to_kill, > - flags & MF_ACTION_REQUIRED); > + if (pgmap->ops->memory_failure) { > + rc = pgmap->ops->memory_failure(pgmap, pfn, flags); > + /* > + * Roll back to generic handler too if operation is not > + * supported inside the driver/device/filesystem. > + */ > + if (rc != EOPNOTSUPP) > + goto out; > + } > + > + rc = mf_generic_kill_procs(pfn, flags); > > - unmap_and_kill(&to_kill, pfn, page->mapping, page->index, flags); > - rc = 0; > -unlock: > - dax_unlock_page(page, cookie); > out: > /* drop pgmap ref acquired in caller */ > put_dev_pagemap(pgmap); > -- > 2.31.1 > > > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel