patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alexander Gordeev <agordeev@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Justin Stitt <justinstitt@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Leon Romanovsky <leon@kernel.org>,
	linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org,
	llvm@lists.linux.dev, Ingo Molnar <mingo@redhat.com>,
	Bill Wendling <morbo@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Jijie Shao <shaojijie@huawei.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Arnd Bergmann <arnd@arndb.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Leon Romanovsky <leonro@mellanox.com>,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Michael Guralnik <michaelgur@mellanox.com>,
	patches@lists.linux.dev, Niklas Schnelle <schnelle@linux.ibm.com>,
	Will Deacon <will@kernel.org>
Subject: [PATCH 0/6] Fix mlx5 write combining support on new ARM64 cores
Date: Tue, 20 Feb 2024 21:17:04 -0400	[thread overview]
Message-ID: <0-v1-38290193eace+5-mlx5_arm_wc_jgg@nvidia.com> (raw)

mlx5 has a built in self-test at driver startup to evaluate if the
platform supports write combining to generate a 64 byte PCIe TLP or
not. This has proven necessary because a lot of common scenarios end up
with broken write combining (especially inside virtual machines) and there
is no other way to learn this information.

This self test has been consistently failing on new ARM64 CPU
designs (specifically with NVIDIA Grace's implementation of Neoverse
V2). The C loop around writeq() generates some pretty terrible ARM64
assembly, but historically this has worked on alot of existing ARM64 CPUs
till now.

We see it succeed about 1 time in 10,000 on the worst affected
systems. The CPU architects speculate that the load instructions
interspersed with the stores make the test unreliable.

Arrange things so that the ARM64 uses a predictable inline assembly block
of 8 STR instructions.

Catalin suggested implementing this in terms of the obscure
__iowrite64_copy() interface which was long ago added to optimize write
combining stores on Pathscale RDMA HW for x86. These copy routines have
the advantage of requiring the caller to supply alignment which allows an
optimal assembly implementation.

This is a good suggestion because it turns out that S390 has much the same
problem and already uses the __iowrite64_copy() to try to make its WC
operations work.

The first several patches modernize and improve the performance of
__iowriteXX_copy() so that an ARM64 implementation can be provided which
relies on __builtin_constant_p to generate fast inlined assembly code in a
few common cases.

It should come after the S390 fix:

https://lore.kernel.org/r/0-v1-9b1ea6869554+110c60-iommufd_ck_data_type_jgg@nvidia.com

With acks, I'd like to take this through the RDMA tree.

v2:
 - Rework everything to use __iowrite64_copy().
 - Don't use STP since that is not reliably supported in ARM VMs
 - New patches to tidy up __iowriteXX_copy() on x86 and s390
v1: https://lore.kernel.org/r/cover.1700766072.git.leon@kernel.org

Jason Gunthorpe (6):
  x86: Stop using weak symbols for __iowrite32_copy()
  s390: Implement __iowrite32_copy()
  s390: Stop using weak symbols for __iowrite64_copy()
  arm64/io: Provide a WC friendly __iowriteXX_copy()
  net: hns3: Remove io_stop_wc() calls after __iowrite64_copy()
  IB/mlx5: Use __iowrite64_copy() for write combining stores

 arch/arm64/include/asm/io.h                   | 132 ++++++++++++++++++
 arch/arm64/kernel/io.c                        |  42 ++++++
 arch/s390/include/asm/io.h                    |  15 ++
 arch/s390/pci/pci.c                           |   6 -
 arch/x86/include/asm/io.h                     |  17 +++
 arch/x86/lib/Makefile                         |   1 -
 arch/x86/lib/iomap_copy_64.S                  |  15 --
 drivers/infiniband/hw/mlx5/mem.c              |   8 +-
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |   4 -
 include/linux/io.h                            |   8 +-
 lib/iomap_copy.c                              |  13 +-
 11 files changed, 222 insertions(+), 39 deletions(-)
 delete mode 100644 arch/x86/lib/iomap_copy_64.S


base-commit: 3856559a378586366dc602f93febe1c57cdc366c
-- 
2.43.2


             reply	other threads:[~2024-02-21  1:17 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-21  1:17 Jason Gunthorpe [this message]
2024-02-21  1:17 ` [PATCH 1/6] x86: Stop using weak symbols for __iowrite32_copy() Jason Gunthorpe
2024-02-21  1:17 ` [PATCH 2/6] s390: Implement __iowrite32_copy() Jason Gunthorpe
2024-02-21  1:17 ` [PATCH 3/6] s390: Stop using weak symbols for __iowrite64_copy() Jason Gunthorpe
2024-02-21  1:17 ` [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy() Jason Gunthorpe
2024-02-21 19:22   ` Will Deacon
2024-02-21 23:28     ` Jason Gunthorpe
2024-02-22 22:05   ` David Laight
2024-02-22 22:36     ` Jason Gunthorpe
2024-02-23  9:07       ` David Laight
2024-02-23 11:01         ` Niklas Schnelle
2024-02-23 11:05           ` David Laight
2024-02-23 12:53             ` Jason Gunthorpe
2024-02-23 11:38         ` Niklas Schnelle
2024-02-23 12:19           ` David Laight
2024-02-23 13:03             ` Jason Gunthorpe
2024-02-23 13:52               ` David Laight
2024-02-23 14:44                 ` Jason Gunthorpe
2024-02-23 12:58           ` Jason Gunthorpe
2024-02-23 16:35             ` Niklas Schnelle
2024-02-23 17:05               ` Jason Gunthorpe
2024-02-27 10:37   ` Catalin Marinas
2024-02-28 23:06     ` Jason Gunthorpe
2024-02-29 10:24       ` Catalin Marinas
2024-02-29 13:28         ` Jason Gunthorpe
2024-02-29 10:33   ` Catalin Marinas
2024-02-29 13:29     ` Jason Gunthorpe
2024-03-01 18:52   ` Catalin Marinas
2024-02-21  1:17 ` [PATCH 5/6] net: hns3: Remove io_stop_wc() calls after __iowrite64_copy() Jason Gunthorpe
2024-02-22  0:57   ` Jijie Shao
2024-02-21  1:17 ` [PATCH 6/6] IB/mlx5: Use __iowrite64_copy() for write combining stores Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0-v1-38290193eace+5-mlx5_arm_wc_jgg@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=justinstitt@google.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@mellanox.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=michaelgur@mellanox.com \
    --cc=mingo@redhat.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=salil.mehta@huawei.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shaojijie@huawei.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yisen.zhuang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).