From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0F5F0C47258 for ; Sun, 28 Jan 2024 12:36:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:In-Reply-To:References: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=qWYxkp3S7J5vAOYgQROwFj8HwP0mOLPceAw0ZcpPJt4=; b=pdlvBqQ7y7Bv54 vNuHrQbCJA5WrZIoCMNwdV5ipE56lO5kEWoU1Bx0Q2kpE70oNwN4ngX3p858vZzTEUAd9GHs2kcH8 Q1qbh+5ap3PHimkWnBb0srRVa9/oe1teFZEWLoA/orQRjHvIR6d2QSMLESNJdepMf8CrfQbOHXS55 Gunf0c89UMIHE7iCGa4ZP6UjdgHsk+bBMcYg2H5lpkERtB59VIDTKM7RRfZlOQfIrDDpT40Gpv3em RhAer7MJW0FKPcPuG/DDtsnpK3d0++szLsgxuvpJZSNGriMnzK3vN/ZRYEbpEo6d84VFMwkCw/xRu 81GjEmtqGaVgS3zdYlBA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rU4Of-00000009Ry2-0fFl; Sun, 28 Jan 2024 12:36:25 +0000 Received: from eu-smtp-delivery-151.mimecast.com ([185.58.86.151]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rU4Oa-00000009RxO-2ToR for linux-riscv@lists.infradead.org; Sun, 28 Jan 2024 12:36:22 +0000 Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-317-Eq0wpz7mO1e21GBJ1oOirA-1; Sun, 28 Jan 2024 12:36:16 +0000 X-MC-Unique: Eq0wpz7mO1e21GBJ1oOirA-1 Received: from AcuMS.Aculab.com (10.202.163.4) by AcuMS.aculab.com (10.202.163.4) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Sun, 28 Jan 2024 12:35:53 +0000 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Sun, 28 Jan 2024 12:35:53 +0000 From: David Laight To: 'Jisheng Zhang' , Paul Walmsley , Palmer Dabbelt , Albert Ou CC: "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Matteo Croce , kernel test robot Subject: RE: [PATCH 1/3] riscv: optimized memcpy Thread-Topic: [PATCH 1/3] riscv: optimized memcpy Thread-Index: AQHaUdxgc9Zy837AxkOiZhjgW4/PA7DvJZjg Date: Sun, 28 Jan 2024 12:35:53 +0000 Message-ID: References: <20240128111013.2450-1-jszhang@kernel.org> <20240128111013.2450-2-jszhang@kernel.org> In-Reply-To: <20240128111013.2450-2-jszhang@kernel.org> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240128_043620_962690_81D90D6F X-CRM114-Status: GOOD ( 12.55 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Jisheng Zhang > Sent: 28 January 2024 11:10 > > From: Matteo Croce > > Write a C version of memcpy() which uses the biggest data size allowed, > without generating unaligned accesses. > > The procedure is made of three steps: > First copy data one byte at time until the destination buffer is aligned > to a long boundary. > Then copy the data one long at time shifting the current and the next u8 > to compose a long at every cycle. > Finally, copy the remainder one byte at time. > > On a BeagleV, the TCP RX throughput increased by 45%: ... > +static void __memcpy_aligned(unsigned long *dest, const unsigned long *src, size_t count) > +{ You should be able to remove an instruction from the loop by using: const unsigned long *src_lim = src + count; for (; src < src_lim; ) { > + for (; count > 0; count -= BYTES_LONG * 8) { > + register unsigned long d0, d1, d2, d3, d4, d5, d6, d7; register is completely ignored and pointless. (More annoyingly auto is also ignored.) > + d0 = src[0]; > + d1 = src[1]; > + d2 = src[2]; > + d3 = src[3]; > + d4 = src[4]; > + d5 = src[5]; > + d6 = src[6]; > + d7 = src[7]; > + dest[0] = d0; > + dest[1] = d1; > + dest[2] = d2; > + dest[3] = d3; > + dest[4] = d4; > + dest[5] = d5; > + dest[6] = d6; > + dest[7] = d7; > + dest += 8; > + src += 8; There two lines belong in the for (...) statement. > + } > +} If you __always_inline the function you can pass &src and &dest and use the updated pointers following the loop. I don't believe that risc-v supports 'reg+reg+(imm5<<3)' addressing (although there is probably space in the instruction for it. Actually 'reg+reg' addressing could be supported for loads but not stores - since the latter would require 3 registers be read. We use the Nios-II cpu in some fpgas. Intel are removing support in favour of Risc-V - we are thinking of re-implementing Nios-II ourselves! I don't think they understand what the cpu get used for! David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.86.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37E5A208AD for ; Sun, 28 Jan 2024 12:36:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.58.86.151 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706445382; cv=none; b=WAMXVNy7psDx1mdDNtDLQH32v7wEAUHNFp3ZXek7brWZFXPWe3PQt6qfVQ6jBEReE/mrxAtcJNyr0LhJ36zT1tcAlIopdJJ28lqRyDc9s7wYex8vbdxKQHzXnC/PWOGfUqIs3Latta6zN++v88ywK0qoT5DQJjEc43hA5Md5Ne8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706445382; c=relaxed/simple; bh=AO5e8HxcukgmlxCKaaT+Yy3hUdVZUwg2ySjru/JA0HM=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: MIME-Version:Content-Type; b=E2twj/pH7eENoA04x5bxj+Qm0nLTnh3bVeiCfIFQgVEjuzF90mbolLVml9hA4F9Ioam01ISaRz654EDQMTCegya24jE86qFMQ+sPbCxujV6obk7y3dGI0hEzxEQTxAmvi6rIcvy/sozwWDyY4kmEYj2j76liL8MSkJ8b3RpVvEQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ACULAB.COM; spf=pass smtp.mailfrom=aculab.com; arc=none smtp.client-ip=185.58.86.151 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ACULAB.COM Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=aculab.com Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-317-Eq0wpz7mO1e21GBJ1oOirA-1; Sun, 28 Jan 2024 12:36:16 +0000 X-MC-Unique: Eq0wpz7mO1e21GBJ1oOirA-1 Received: from AcuMS.Aculab.com (10.202.163.4) by AcuMS.aculab.com (10.202.163.4) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Sun, 28 Jan 2024 12:35:53 +0000 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Sun, 28 Jan 2024 12:35:53 +0000 From: David Laight To: 'Jisheng Zhang' , Paul Walmsley , Palmer Dabbelt , Albert Ou CC: "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Matteo Croce , kernel test robot Subject: RE: [PATCH 1/3] riscv: optimized memcpy Thread-Topic: [PATCH 1/3] riscv: optimized memcpy Thread-Index: AQHaUdxgc9Zy837AxkOiZhjgW4/PA7DvJZjg Date: Sun, 28 Jan 2024 12:35:53 +0000 Message-ID: References: <20240128111013.2450-1-jszhang@kernel.org> <20240128111013.2450-2-jszhang@kernel.org> In-Reply-To: <20240128111013.2450-2-jszhang@kernel.org> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable From: Jisheng Zhang > Sent: 28 January 2024 11:10 >=20 > From: Matteo Croce >=20 > Write a C version of memcpy() which uses the biggest data size allowed, > without generating unaligned accesses. >=20 > The procedure is made of three steps: > First copy data one byte at time until the destination buffer is aligned > to a long boundary. > Then copy the data one long at time shifting the current and the next u8 > to compose a long at every cycle. > Finally, copy the remainder one byte at time. >=20 > On a BeagleV, the TCP RX throughput increased by 45%: ... > +static void __memcpy_aligned(unsigned long *dest, const unsigned long *s= rc, size_t count) > +{ You should be able to remove an instruction from the loop by using: =09const unsigned long *src_lim =3D src + count; =09for (; src < src_lim; ) { > +=09for (; count > 0; count -=3D BYTES_LONG * 8) { > +=09=09register unsigned long d0, d1, d2, d3, d4, d5, d6, d7; register is completely ignored and pointless. (More annoyingly auto is also ignored.) > +=09=09d0 =3D src[0]; > +=09=09d1 =3D src[1]; > +=09=09d2 =3D src[2]; > +=09=09d3 =3D src[3]; > +=09=09d4 =3D src[4]; > +=09=09d5 =3D src[5]; > +=09=09d6 =3D src[6]; > +=09=09d7 =3D src[7]; > +=09=09dest[0] =3D d0; > +=09=09dest[1] =3D d1; > +=09=09dest[2] =3D d2; > +=09=09dest[3] =3D d3; > +=09=09dest[4] =3D d4; > +=09=09dest[5] =3D d5; > +=09=09dest[6] =3D d6; > +=09=09dest[7] =3D d7; > +=09=09dest +=3D 8; > +=09=09src +=3D 8; There two lines belong in the for (...) statement. > +=09} > +} If you __always_inline the function you can pass &src and &dest and use the updated pointers following the loop. I don't believe that risc-v supports 'reg+reg+(imm5<<3)' addressing (although there is probably space in the instruction for it. Actually 'reg+reg' addressing could be supported for loads but not stores - since the latter would require 3 registers be read. We use the Nios-II cpu in some fpgas. Intel are removing support in favour of Risc-V - we are thinking of re-implementing Nios-II ourselves! I don't think they understand what the cpu get used for! =09David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1= PT, UK Registration No: 1397386 (Wales)