From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758015AbcBDNmT (ORCPT ); Thu, 4 Feb 2016 08:42:19 -0500 Received: from pegase1.c-s.fr ([93.17.236.30]:57662 "EHLO mailhub1.si.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755329AbcBDNmR (ORCPT ); Thu, 4 Feb 2016 08:42:17 -0500 Subject: Re: [PATCH v5 21/23] powerpc: Simplify test in __dma_sync() To: Denis Kirjanov References: <42d5343703a9e67b5a2d94c8877bc0098448f71b.1454538980.git.christophe.leroy@c-s.fr> Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Scott Wood , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org From: Christophe Leroy Message-ID: <56B35537.3050708@c-s.fr> Date: Thu, 4 Feb 2016 14:42:15 +0100 User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 04/02/2016 12:37, Denis Kirjanov a écrit : > On 2/4/16, Christophe Leroy wrote: >> This simplification helps the compiler. We now have only one test >> instead of two, so it reduces the number of branches. >> >> Signed-off-by: Christophe Leroy >> --- >> v2: new >> v3: no change >> v4: no change >> v5: no change >> >> arch/powerpc/mm/dma-noncoherent.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/mm/dma-noncoherent.c >> b/arch/powerpc/mm/dma-noncoherent.c >> index 169aba4..2dc74e5 100644 >> --- a/arch/powerpc/mm/dma-noncoherent.c >> +++ b/arch/powerpc/mm/dma-noncoherent.c >> @@ -327,7 +327,7 @@ void __dma_sync(void *vaddr, size_t size, int direction) >> * invalidate only when cache-line aligned otherwise there is >> * the potential for discarding uncommitted data from the cache >> */ >> - if ((start & (L1_CACHE_BYTES - 1)) || (size & (L1_CACHE_BYTES - 1))) >> + if ((start | end) & (L1_CACHE_BYTES - 1)) >> flush_dcache_range(start, end); >> else >> invalidate_dcache_range(start, end); > The previous version of address cache-line aligned check reads perfectly fine. > What's the benefit of this micro optimization? With this optimisation we avoid one unneccessary test and two associated jumps. Taking into account that __dma_sync() is one of the top ten CPU consummers, I believe it is worth it: Without the patch: c000d894: 70 6a 00 0f andi. r10,r3,15 c000d898: 39 29 00 0f addi r9,r9,15 c000d89c: 54 63 00 36 rlwinm r3,r3,0,0,27 c000d8a0: 7d 23 48 50 subf r9,r3,r9 c000d8a4: 41 82 00 84 beq c000d928 <__dma_sync+0xb8> [...] c000d8c0: 7c 00 04 ac sync c000d8c4: 4e 80 00 20 blr [...] c000d928: 70 8a 00 0f andi. r10,r4,15 c000d92c: 40 a2 ff 7c bne c000d8a8 <__dma_sync+0x38> c000d930: 55 2a e1 3f rlwinm. r10,r9,28,4,31 c000d934: 41 a2 ff 8c beq c000d8c0 <__dma_sync+0x50> With the patch: c000d894: 7c 89 1b 78 or r9,r4,r3 c000d898: 71 2a 00 0f andi. r10,r9,15 c000d89c: 54 63 00 36 rlwinm r3,r3,0,0,27 c000d8a0: 38 84 00 0f addi r4,r4,15 c000d8a4: 7c 83 20 50 subf r4,r3,r4 c000d8a8: 41 82 00 84 beq c000d92c <__dma_sync+0xbc> [...] c000d8c4: 7c 00 04 ac sync c000d8c8: 4e 80 00 20 blr [...] c000d92c: 54 89 e1 3f rlwinm. r9,r4,28,4,31 c000d930: 41 a2 ff 94 beq c000d8c4 <__dma_sync+0x54> Christophe >> -- >> 2.1.0 >> >> _______________________________________________ >> Linuxppc-dev mailing list >> Linuxppc-dev@lists.ozlabs.org >> https://lists.ozlabs.org/listinfo/linuxppc-dev