All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Robin Murphy <robin.murphy@arm.com>, <joro@8bytes.org>,
	<will@kernel.org>, <jejb@linux.ibm.com>,
	<martin.petersen@oracle.com>, <hch@lst.de>,
	<m.szyprowski@samsung.com>
Cc: <iommu@lists.linux-foundation.org>,
	<linux-kernel@vger.kernel.org>, <linux-scsi@vger.kernel.org>,
	<linuxarm@huawei.com>
Subject: Re: [PATCH 3/6] iova: Allow rcache range upper limit to be configurable
Date: Fri, 19 Mar 2021 17:26:12 +0000	[thread overview]
Message-ID: <3375b67f-438c-32d3-a5a6-7e08f37b04e3@huawei.com> (raw)
In-Reply-To: <26fb1b79-2e46-09f6-1814-48fec4205f32@arm.com>

On 19/03/2021 16:25, Robin Murphy wrote:
> On 2021-03-19 13:25, John Garry wrote:
>> Some LLDs may request DMA mappings whose IOVA length exceeds that of the
>> current rcache upper limit.
>>
>> This means that allocations for those IOVAs will never be cached, and
>> always must be allocated and freed from the RB tree per DMA mapping 
>> cycle.
>> This has a significant effect on performance, more so since commit
>> 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
>> fails"), as discussed at [0].
>>
>> Allow the range of cached IOVAs to be increased, by providing an API 
>> to set
>> the upper limit, which is capped at IOVA_RANGE_CACHE_MAX_SIZE.
>>
>> For simplicity, the full range of IOVA rcaches is allocated and 
>> initialized
>> at IOVA domain init time.
>>
>> Setting the range upper limit will fail if the domain is already live
>> (before the tree contains IOVA nodes). This must be done to ensure any
>> IOVAs cached comply with rule of size being a power-of-2.
>>
>> [0] 
>> https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/ 
>>
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/iommu/iova.c | 37 +++++++++++++++++++++++++++++++++++--
>>   include/linux/iova.h | 11 ++++++++++-
>>   2 files changed, 45 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index cecc74fb8663..d4f2f7fbbd84 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -49,6 +49,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned 
>> long granule,
>>       iovad->flush_cb = NULL;
>>       iovad->fq = NULL;
>>       iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
>> +    iovad->rcache_max_size = IOVA_RANGE_CACHE_DEFAULT_SIZE;
>>       rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
>>       rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
>>       init_iova_rcaches(iovad);
>> @@ -194,7 +195,7 @@ static int __alloc_and_insert_iova_range(struct 
>> iova_domain *iovad,
>>        * rounding up anything cacheable to make sure that can't 
>> happen. The
>>        * order of the unadjusted size will still match upon freeing.
>>        */
>> -    if (fast && size < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>> +    if (fast && size < (1 << (iovad->rcache_max_size - 1)))
>>           size = roundup_pow_of_two(size);
>>       if (size_aligned)
>> @@ -901,7 +902,7 @@ static bool iova_rcache_insert(struct iova_domain 
>> *iovad, unsigned long pfn,
>>   {
>>       unsigned int log_size = order_base_2(size);
>> -    if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE)
>> +    if (log_size >= iovad->rcache_max_size)
>>           return false;
>>       return __iova_rcache_insert(iovad, &iovad->rcaches[log_size], pfn);
>> @@ -946,6 +947,38 @@ static unsigned long __iova_rcache_get(struct 
>> iova_rcache *rcache,
>>       return iova_pfn;
>>   }
>> +void iova_rcache_set_upper_limit(struct iova_domain *iovad,
>> +                 unsigned long iova_len)
>> +{
>> +    unsigned int rcache_index = order_base_2(iova_len) + 1;
>> +    struct rb_node *rb_node = &iovad->anchor.node;
>> +    unsigned long flags;
>> +    int count = 0;
>> +
>> +    spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
>> +    if (rcache_index <= iovad->rcache_max_size)
>> +        goto out;
>> +
>> +    while (1) {
>> +        rb_node = rb_prev(rb_node);
>> +        if (!rb_node)
>> +            break;
>> +        count++;
>> +    }
>> +
>> +    /*
>> +     * If there are already IOVA nodes present in the tree, then don't
>> +     * allow range upper limit to be set.
>> +     */
>> +    if (count != iovad->reserved_node_count)
>> +        goto out;
>> +
>> +    iovad->rcache_max_size = min_t(unsigned long, rcache_index,
>> +                       IOVA_RANGE_CACHE_MAX_SIZE);
>> +out:
>> +    spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
>> +}
>> +
>>   /*
>>    * Try to satisfy IOVA allocation range from rcache.  Fail if requested
>>    * size is too big or the DMA limit we are given isn't satisfied by the
>> diff --git a/include/linux/iova.h b/include/linux/iova.h
>> index fd3217a605b2..952b81b54ef7 100644
>> --- a/include/linux/iova.h
>> +++ b/include/linux/iova.h
>> @@ -25,7 +25,8 @@ struct iova {
>>   struct iova_magazine;
>>   struct iova_cpu_rcache;
>> -#define IOVA_RANGE_CACHE_MAX_SIZE 6    /* log of max cached IOVA 
>> range size (in pages) */
>> +#define IOVA_RANGE_CACHE_DEFAULT_SIZE 6
>> +#define IOVA_RANGE_CACHE_MAX_SIZE 10 /* log of max cached IOVA range 
>> size (in pages) */
> 
> No.
> 
> And why? If we're going to allocate massive caches anyway, whatever is 
> the point of *not* using all of them?
> 

I wanted to keep the same effective threshold for devices today, unless 
set otherwise.

The reason is that I didn't know if a blanket increase would cause 
regressions, and I was taking the super-safe road. Specifically some 
systems may be very IOVA space limited, and just work today by not 
caching large IOVAs.

And in the precursor thread you wrote "We can't arbitrarily *increase* 
the scope of caching once a domain is active due to the size-rounding-up 
requirement, which would be prohibitive to larger allocations if applied 
universally" (sorry for quoting)

I took the last part to mean that we shouldn't apply this increase in 
threshold globally.

> It only makes sense for a configuration knob to affect the actual rcache 
> and depot allocations - that way, big high-throughput systems with 
> plenty of memory can spend it on better performance, while small systems 
> - that often need IOMMU scatter-gather precisely *because* memory is 
> tight and thus easily fragmented - don't have to pay the (not 
> insignificant) cost for caches they don't need.

So do you suggest to just make IOVA_RANGE_CACHE_MAX_SIZE a kconfig option?

Thanks,
John

> 
> Robin.
> 
>>   #define MAX_GLOBAL_MAGS 32    /* magazines per bin */
>>   struct iova_rcache {
>> @@ -74,6 +75,7 @@ struct iova_domain {
>>       unsigned long    start_pfn;    /* Lower limit for this domain */
>>       unsigned long    dma_32bit_pfn;
>>       unsigned long    max32_alloc_size; /* Size of last failed 
>> allocation */
>> +    unsigned long    rcache_max_size; /* Upper limit of cached IOVA 
>> RANGE */
>>       struct iova_fq __percpu *fq;    /* Flush Queue */
>>       atomic64_t    fq_flush_start_cnt;    /* Number of TLB flushes that
>> @@ -158,6 +160,8 @@ int init_iova_flush_queue(struct iova_domain *iovad,
>>   struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
>>   void put_iova_domain(struct iova_domain *iovad);
>>   void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
>> *iovad);
>> +void iova_rcache_set_upper_limit(struct iova_domain *iovad,
>> +                 unsigned long iova_len);
>>   #else
>>   static inline int iova_cache_get(void)
>>   {
>> @@ -238,6 +242,11 @@ static inline void free_cpu_cached_iovas(unsigned 
>> int cpu,
>>                        struct iova_domain *iovad)
>>   {
>>   }
>> +
>> +static inline void iova_rcache_set_upper_limit(struct iova_domain 
>> *iovad,
>> +                           unsigned long iova_len)
>> +{
>> +}
>>   #endif
>>   #endif
>>
> .


WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: Robin Murphy <robin.murphy@arm.com>, <joro@8bytes.org>,
	<will@kernel.org>,  <jejb@linux.ibm.com>,
	<martin.petersen@oracle.com>, <hch@lst.de>,
	<m.szyprowski@samsung.com>
Cc: iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, linuxarm@huawei.com
Subject: Re: [PATCH 3/6] iova: Allow rcache range upper limit to be configurable
Date: Fri, 19 Mar 2021 17:26:12 +0000	[thread overview]
Message-ID: <3375b67f-438c-32d3-a5a6-7e08f37b04e3@huawei.com> (raw)
In-Reply-To: <26fb1b79-2e46-09f6-1814-48fec4205f32@arm.com>

On 19/03/2021 16:25, Robin Murphy wrote:
> On 2021-03-19 13:25, John Garry wrote:
>> Some LLDs may request DMA mappings whose IOVA length exceeds that of the
>> current rcache upper limit.
>>
>> This means that allocations for those IOVAs will never be cached, and
>> always must be allocated and freed from the RB tree per DMA mapping 
>> cycle.
>> This has a significant effect on performance, more so since commit
>> 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
>> fails"), as discussed at [0].
>>
>> Allow the range of cached IOVAs to be increased, by providing an API 
>> to set
>> the upper limit, which is capped at IOVA_RANGE_CACHE_MAX_SIZE.
>>
>> For simplicity, the full range of IOVA rcaches is allocated and 
>> initialized
>> at IOVA domain init time.
>>
>> Setting the range upper limit will fail if the domain is already live
>> (before the tree contains IOVA nodes). This must be done to ensure any
>> IOVAs cached comply with rule of size being a power-of-2.
>>
>> [0] 
>> https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/ 
>>
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/iommu/iova.c | 37 +++++++++++++++++++++++++++++++++++--
>>   include/linux/iova.h | 11 ++++++++++-
>>   2 files changed, 45 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index cecc74fb8663..d4f2f7fbbd84 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -49,6 +49,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned 
>> long granule,
>>       iovad->flush_cb = NULL;
>>       iovad->fq = NULL;
>>       iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
>> +    iovad->rcache_max_size = IOVA_RANGE_CACHE_DEFAULT_SIZE;
>>       rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
>>       rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
>>       init_iova_rcaches(iovad);
>> @@ -194,7 +195,7 @@ static int __alloc_and_insert_iova_range(struct 
>> iova_domain *iovad,
>>        * rounding up anything cacheable to make sure that can't 
>> happen. The
>>        * order of the unadjusted size will still match upon freeing.
>>        */
>> -    if (fast && size < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>> +    if (fast && size < (1 << (iovad->rcache_max_size - 1)))
>>           size = roundup_pow_of_two(size);
>>       if (size_aligned)
>> @@ -901,7 +902,7 @@ static bool iova_rcache_insert(struct iova_domain 
>> *iovad, unsigned long pfn,
>>   {
>>       unsigned int log_size = order_base_2(size);
>> -    if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE)
>> +    if (log_size >= iovad->rcache_max_size)
>>           return false;
>>       return __iova_rcache_insert(iovad, &iovad->rcaches[log_size], pfn);
>> @@ -946,6 +947,38 @@ static unsigned long __iova_rcache_get(struct 
>> iova_rcache *rcache,
>>       return iova_pfn;
>>   }
>> +void iova_rcache_set_upper_limit(struct iova_domain *iovad,
>> +                 unsigned long iova_len)
>> +{
>> +    unsigned int rcache_index = order_base_2(iova_len) + 1;
>> +    struct rb_node *rb_node = &iovad->anchor.node;
>> +    unsigned long flags;
>> +    int count = 0;
>> +
>> +    spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
>> +    if (rcache_index <= iovad->rcache_max_size)
>> +        goto out;
>> +
>> +    while (1) {
>> +        rb_node = rb_prev(rb_node);
>> +        if (!rb_node)
>> +            break;
>> +        count++;
>> +    }
>> +
>> +    /*
>> +     * If there are already IOVA nodes present in the tree, then don't
>> +     * allow range upper limit to be set.
>> +     */
>> +    if (count != iovad->reserved_node_count)
>> +        goto out;
>> +
>> +    iovad->rcache_max_size = min_t(unsigned long, rcache_index,
>> +                       IOVA_RANGE_CACHE_MAX_SIZE);
>> +out:
>> +    spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
>> +}
>> +
>>   /*
>>    * Try to satisfy IOVA allocation range from rcache.  Fail if requested
>>    * size is too big or the DMA limit we are given isn't satisfied by the
>> diff --git a/include/linux/iova.h b/include/linux/iova.h
>> index fd3217a605b2..952b81b54ef7 100644
>> --- a/include/linux/iova.h
>> +++ b/include/linux/iova.h
>> @@ -25,7 +25,8 @@ struct iova {
>>   struct iova_magazine;
>>   struct iova_cpu_rcache;
>> -#define IOVA_RANGE_CACHE_MAX_SIZE 6    /* log of max cached IOVA 
>> range size (in pages) */
>> +#define IOVA_RANGE_CACHE_DEFAULT_SIZE 6
>> +#define IOVA_RANGE_CACHE_MAX_SIZE 10 /* log of max cached IOVA range 
>> size (in pages) */
> 
> No.
> 
> And why? If we're going to allocate massive caches anyway, whatever is 
> the point of *not* using all of them?
> 

I wanted to keep the same effective threshold for devices today, unless 
set otherwise.

The reason is that I didn't know if a blanket increase would cause 
regressions, and I was taking the super-safe road. Specifically some 
systems may be very IOVA space limited, and just work today by not 
caching large IOVAs.

And in the precursor thread you wrote "We can't arbitrarily *increase* 
the scope of caching once a domain is active due to the size-rounding-up 
requirement, which would be prohibitive to larger allocations if applied 
universally" (sorry for quoting)

I took the last part to mean that we shouldn't apply this increase in 
threshold globally.

> It only makes sense for a configuration knob to affect the actual rcache 
> and depot allocations - that way, big high-throughput systems with 
> plenty of memory can spend it on better performance, while small systems 
> - that often need IOMMU scatter-gather precisely *because* memory is 
> tight and thus easily fragmented - don't have to pay the (not 
> insignificant) cost for caches they don't need.

So do you suggest to just make IOVA_RANGE_CACHE_MAX_SIZE a kconfig option?

Thanks,
John

> 
> Robin.
> 
>>   #define MAX_GLOBAL_MAGS 32    /* magazines per bin */
>>   struct iova_rcache {
>> @@ -74,6 +75,7 @@ struct iova_domain {
>>       unsigned long    start_pfn;    /* Lower limit for this domain */
>>       unsigned long    dma_32bit_pfn;
>>       unsigned long    max32_alloc_size; /* Size of last failed 
>> allocation */
>> +    unsigned long    rcache_max_size; /* Upper limit of cached IOVA 
>> RANGE */
>>       struct iova_fq __percpu *fq;    /* Flush Queue */
>>       atomic64_t    fq_flush_start_cnt;    /* Number of TLB flushes that
>> @@ -158,6 +160,8 @@ int init_iova_flush_queue(struct iova_domain *iovad,
>>   struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
>>   void put_iova_domain(struct iova_domain *iovad);
>>   void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
>> *iovad);
>> +void iova_rcache_set_upper_limit(struct iova_domain *iovad,
>> +                 unsigned long iova_len);
>>   #else
>>   static inline int iova_cache_get(void)
>>   {
>> @@ -238,6 +242,11 @@ static inline void free_cpu_cached_iovas(unsigned 
>> int cpu,
>>                        struct iova_domain *iovad)
>>   {
>>   }
>> +
>> +static inline void iova_rcache_set_upper_limit(struct iova_domain 
>> *iovad,
>> +                           unsigned long iova_len)
>> +{
>> +}
>>   #endif
>>   #endif
>>
> .

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2021-03-19 17:29 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19 13:25 [PATCH 0/6] dma mapping/iommu: Allow IOMMU IOVA rcache range to be configured John Garry
2021-03-19 13:25 ` John Garry
2021-03-19 13:25 ` [PATCH 1/6] iommu: Move IOVA power-of-2 roundup into allocator John Garry
2021-03-19 13:25   ` John Garry
2021-03-19 16:13   ` Robin Murphy
2021-03-19 16:13     ` Robin Murphy
2021-03-19 16:58     ` John Garry
2021-03-19 16:58       ` John Garry
2021-03-19 19:20       ` Robin Murphy
2021-03-19 19:20         ` Robin Murphy
2021-03-22 15:01         ` John Garry
2021-03-22 15:01           ` John Garry
2021-03-31  9:58           ` Robin Murphy
2021-03-31  9:58             ` Robin Murphy
2021-04-06 16:54             ` John Garry
2021-04-06 16:54               ` John Garry
2021-04-14 17:44               ` John Garry
2021-04-14 17:44                 ` John Garry
2021-03-19 13:25 ` [PATCH 2/6] iova: Add a per-domain count of reserved nodes John Garry
2021-03-19 13:25   ` John Garry
2021-03-19 13:25 ` [PATCH 3/6] iova: Allow rcache range upper limit to be configurable John Garry
2021-03-19 13:25   ` John Garry
2021-03-19 16:25   ` Robin Murphy
2021-03-19 16:25     ` Robin Murphy
2021-03-19 17:26     ` John Garry [this message]
2021-03-19 17:26       ` John Garry
2021-03-31 10:53       ` Robin Murphy
2021-03-31 10:53         ` Robin Murphy
2021-03-19 13:25 ` [PATCH 4/6] iommu: Add iommu_dma_set_opt_size() John Garry
2021-03-19 13:25   ` John Garry
2021-03-19 13:25 ` [PATCH 5/6] dma-mapping/iommu: Add dma_set_max_opt_size() John Garry
2021-03-19 13:25   ` John Garry
2021-03-19 17:00   ` Robin Murphy
2021-03-19 17:00     ` Robin Murphy
2021-03-19 18:02     ` John Garry
2021-03-19 18:02       ` John Garry
2021-03-31  8:01     ` Salil Mehta
2021-03-31  8:01       ` Salil Mehta
2021-03-31  8:08     ` Salil Mehta
2021-03-31  8:08       ` Salil Mehta
2021-03-19 13:25 ` [PATCH 6/6] scsi: hisi_sas: Set max optimal DMA size for v3 hw John Garry
2021-03-19 13:25   ` John Garry
2021-03-19 13:40 ` [PATCH 0/6] dma mapping/iommu: Allow IOMMU IOVA rcache range to be configured Christoph Hellwig
2021-03-19 13:40   ` Christoph Hellwig
2021-03-19 15:42   ` John Garry
2021-03-19 15:42     ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3375b67f-438c-32d3-a5a6-7e08f37b04e3@huawei.com \
    --to=john.garry@huawei.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jejb@linux.ibm.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=m.szyprowski@samsung.com \
    --cc=martin.petersen@oracle.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.