From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1758015AbcBDNmT (ORCPT <rfc822;w@1wt.eu>);
	Thu, 4 Feb 2016 08:42:19 -0500
Received: from pegase1.c-s.fr ([93.17.236.30]:57662 "EHLO mailhub1.si.c-s.fr"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1755329AbcBDNmR (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 4 Feb 2016 08:42:17 -0500
Subject: Re: [PATCH v5 21/23] powerpc: Simplify test in __dma_sync()
To: Denis Kirjanov <kda@linux-powerpc.org>
References: <cover.1454538974.git.christophe.leroy@c-s.fr>
 <42d5343703a9e67b5a2d94c8877bc0098448f71b.1454538980.git.christophe.leroy@c-s.fr>
 <CAOJe8K2qmRARai6okSXtvpkt2JOfJCrqwUOinDAyo2Qoypd7uw@mail.gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Paul Mackerras <paulus@samba.org>,
        Michael Ellerman <mpe@ellerman.id.au>, Scott Wood <oss@buserror.net>,
        linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
From: Christophe Leroy <christophe.leroy@c-s.fr>
Message-ID: <56B35537.3050708@c-s.fr>
Date: Thu, 4 Feb 2016 14:42:15 +0100
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101
 Thunderbird/38.5.1
MIME-Version: 1.0
In-Reply-To: <CAOJe8K2qmRARai6okSXtvpkt2JOfJCrqwUOinDAyo2Qoypd7uw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


Le 04/02/2016 12:37, Denis Kirjanov a écrit :
> On 2/4/16, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>> This simplification helps the compiler. We now have only one test
>> instead of two, so it reduces the number of branches.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> v2: new
>> v3: no change
>> v4: no change
>> v5: no change
>>
>>   arch/powerpc/mm/dma-noncoherent.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/dma-noncoherent.c
>> b/arch/powerpc/mm/dma-noncoherent.c
>> index 169aba4..2dc74e5 100644
>> --- a/arch/powerpc/mm/dma-noncoherent.c
>> +++ b/arch/powerpc/mm/dma-noncoherent.c
>> @@ -327,7 +327,7 @@ void __dma_sync(void *vaddr, size_t size, int direction)
>>   		 * invalidate only when cache-line aligned otherwise there is
>>   		 * the potential for discarding uncommitted data from the cache
>>   		 */
>> -		if ((start & (L1_CACHE_BYTES - 1)) || (size & (L1_CACHE_BYTES - 1)))
>> +		if ((start | end) & (L1_CACHE_BYTES - 1))
>>   			flush_dcache_range(start, end);
>>   		else
>>   			invalidate_dcache_range(start, end);
> The previous version of address cache-line aligned check reads perfectly fine.
> What's the benefit of this micro optimization?
With this optimisation we avoid one unneccessary test and two associated 
jumps. Taking into account that __dma_sync() is one of the top ten CPU 
consummers, I believe it is worth it:

Without the patch:

c000d894:    70 6a 00 0f     andi.   r10,r3,15
c000d898:    39 29 00 0f     addi    r9,r9,15
c000d89c:    54 63 00 36     rlwinm  r3,r3,0,0,27
c000d8a0:    7d 23 48 50     subf    r9,r3,r9
c000d8a4:    41 82 00 84     beq     c000d928 <__dma_sync+0xb8>
[...]
c000d8c0:    7c 00 04 ac     sync
c000d8c4:    4e 80 00 20     blr
[...]
c000d928:    70 8a 00 0f     andi.   r10,r4,15
c000d92c:    40 a2 ff 7c     bne     c000d8a8 <__dma_sync+0x38>
c000d930:    55 2a e1 3f     rlwinm. r10,r9,28,4,31
c000d934:    41 a2 ff 8c     beq     c000d8c0 <__dma_sync+0x50>

With the patch:

c000d894:    7c 89 1b 78     or      r9,r4,r3
c000d898:    71 2a 00 0f     andi.   r10,r9,15
c000d89c:    54 63 00 36     rlwinm  r3,r3,0,0,27
c000d8a0:    38 84 00 0f     addi    r4,r4,15
c000d8a4:    7c 83 20 50     subf    r4,r3,r4
c000d8a8:    41 82 00 84     beq     c000d92c <__dma_sync+0xbc>
[...]
c000d8c4:    7c 00 04 ac     sync
c000d8c8:    4e 80 00 20     blr
[...]
c000d92c:    54 89 e1 3f     rlwinm. r9,r4,28,4,31
c000d930:    41 a2 ff 94     beq     c000d8c4 <__dma_sync+0x54>


Christophe
>> --
>> 2.1.0
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev