LKML Archive mirror
 help / color / mirror / Atom feed
From: Denis Kirjanov <kda@linux-powerpc.org>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Scott Wood <oss@buserror.net>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 21/23] powerpc: Simplify test in __dma_sync()
Date: Fri, 5 Feb 2016 10:52:32 +0300	[thread overview]
Message-ID: <CAOJe8K035FFSgew8+fCHL9YBkScRZaHL_fEJFsAsVnT5qCm_ZQ@mail.gmail.com> (raw)
In-Reply-To: <56B35537.3050708@c-s.fr>

On 2/4/16, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>
>
> Le 04/02/2016 12:37, Denis Kirjanov a écrit :
>> On 2/4/16, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>> This simplification helps the compiler. We now have only one test
>>> instead of two, so it reduces the number of branches.
>>>
>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>> ---
>>> v2: new
>>> v3: no change
>>> v4: no change
>>> v5: no change
>>>
>>>   arch/powerpc/mm/dma-noncoherent.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/mm/dma-noncoherent.c
>>> b/arch/powerpc/mm/dma-noncoherent.c
>>> index 169aba4..2dc74e5 100644
>>> --- a/arch/powerpc/mm/dma-noncoherent.c
>>> +++ b/arch/powerpc/mm/dma-noncoherent.c
>>> @@ -327,7 +327,7 @@ void __dma_sync(void *vaddr, size_t size, int
>>> direction)
>>>   		 * invalidate only when cache-line aligned otherwise there is
>>>   		 * the potential for discarding uncommitted data from the cache
>>>   		 */
>>> -		if ((start & (L1_CACHE_BYTES - 1)) || (size & (L1_CACHE_BYTES - 1)))
>>> +		if ((start | end) & (L1_CACHE_BYTES - 1))
>>>   			flush_dcache_range(start, end);
>>>   		else
>>>   			invalidate_dcache_range(start, end);
>> The previous version of address cache-line aligned check reads perfectly
>> fine.
>> What's the benefit of this micro optimization?
> With this optimisation we avoid one unneccessary test and two associated
> jumps. Taking into account that __dma_sync() is one of the top ten CPU
> consummers, I believe it is worth it:
>
> Without the patch:
>
> c000d894:    70 6a 00 0f     andi.   r10,r3,15
> c000d898:    39 29 00 0f     addi    r9,r9,15
> c000d89c:    54 63 00 36     rlwinm  r3,r3,0,0,27
> c000d8a0:    7d 23 48 50     subf    r9,r3,r9
> c000d8a4:    41 82 00 84     beq     c000d928 <__dma_sync+0xb8>
> [...]
> c000d8c0:    7c 00 04 ac     sync
> c000d8c4:    4e 80 00 20     blr
> [...]
> c000d928:    70 8a 00 0f     andi.   r10,r4,15
> c000d92c:    40 a2 ff 7c     bne     c000d8a8 <__dma_sync+0x38>
> c000d930:    55 2a e1 3f     rlwinm. r10,r9,28,4,31
> c000d934:    41 a2 ff 8c     beq     c000d8c0 <__dma_sync+0x50>
>
> With the patch:
>
> c000d894:    7c 89 1b 78     or      r9,r4,r3
> c000d898:    71 2a 00 0f     andi.   r10,r9,15
> c000d89c:    54 63 00 36     rlwinm  r3,r3,0,0,27
> c000d8a0:    38 84 00 0f     addi    r4,r4,15
> c000d8a4:    7c 83 20 50     subf    r4,r3,r4
> c000d8a8:    41 82 00 84     beq     c000d92c <__dma_sync+0xbc>
> [...]
> c000d8c4:    7c 00 04 ac     sync
> c000d8c8:    4e 80 00 20     blr
> [...]
> c000d92c:    54 89 e1 3f     rlwinm. r9,r4,28,4,31
> c000d930:    41 a2 ff 94     beq     c000d8c4 <__dma_sync+0x54>

Yeah, looks better. Did you compile the kernel with default compiler flags?

Thanks!

>
>
> Christophe
>>> --
>>> 2.1.0
>>>
>>> _______________________________________________
>>> Linuxppc-dev mailing list
>>> Linuxppc-dev@lists.ozlabs.org
>>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>

  reply	other threads:[~2016-02-05  7:52 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03 22:53 [PATCH v5 00/23] powerpc/8xx: Use large pages for RAM and IMMR and other improvments Christophe Leroy
2016-02-03 22:53 ` [PATCH v5 01/23] powerpc/8xx: Save r3 all the time in DTLB miss handler Christophe Leroy
2016-02-03 22:53 ` [PATCH v5 02/23] powerpc/8xx: Map linear kernel RAM with 8M pages Christophe Leroy
2016-02-03 22:53 ` [PATCH v5 03/23] powerpc: Update documentation for noltlbs kernel parameter Christophe Leroy
2016-02-03 22:53 ` [PATCH v5 04/23] powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c Christophe Leroy
2016-02-03 22:53 ` [PATCH v5 05/23] powerpc32: Fix pte_offset_kernel() to return NULL for bad pages Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 06/23] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together Christophe Leroy
2016-02-07  9:42   ` kbuild test robot
2016-02-03 22:54 ` [PATCH v5 07/23] powerpc/8xx: Fix vaddr for IMMR early remap Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 08/23] powerpc/8xx: Map IMMR area with 512k page at a fixed address Christophe Leroy
2016-02-04  9:58   ` kbuild test robot
2016-02-03 22:54 ` [PATCH v5 09/23] powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 10/23] powerpc/8xx: map more RAM at startup when needed Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 11/23] powerpc32: Remove useless/wrong MMU:setio progress message Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 12/23] powerpc32: remove ioremap_base Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 13/23] powerpc/8xx: Add missing SPRN defines into reg_8xx.h Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 14/23] powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 15/23] powerpc/8xx: remove special handling of CPU6 errata in set_dec() Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 16/23] powerpc/8xx: rewrite set_context() in C Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 17/23] powerpc/8xx: rewrite flush_instruction_cache() " Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 18/23] powerpc: add inline functions for cache related instructions Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 19/23] powerpc32: Remove clear_pages() and define clear_page() inline Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 20/23] powerpc32: move xxxxx_dcache_range() functions inline Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 21/23] powerpc: Simplify test in __dma_sync() Christophe Leroy
2016-02-04 11:37   ` Denis Kirjanov
2016-02-04 13:42     ` Christophe Leroy
2016-02-05  7:52       ` Denis Kirjanov [this message]
2016-02-05  7:56         ` Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 22/23] powerpc32: small optimisation in flush_icache_range() Christophe Leroy
2016-02-03 22:54 ` [PATCH v5 23/23] powerpc32: Remove one insn in mulhdu Christophe Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOJe8K035FFSgew8+fCHL9YBkScRZaHL_fEJFsAsVnT5qCm_ZQ@mail.gmail.com \
    --to=kda@linux-powerpc.org \
    --cc=benh@kernel.crashing.org \
    --cc=christophe.leroy@c-s.fr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=oss@buserror.net \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).