* [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
@ 2025-02-09 14:26 Michael Margolin
2025-02-13 12:51 ` Leon Romanovsky
2025-02-13 12:53 ` Leon Romanovsky
0 siblings, 2 replies; 16+ messages in thread
From: Michael Margolin @ 2025-02-09 14:26 UTC (permalink / raw)
To: jgg, leon, linux-rdma
Cc: sleybo, matua, gal.pressman, Firas Jahjah, Yonatan Nachum
A single scatter-gather entry is limited by a 32 bits "length" field
that is practically 4GB - PAGE_SIZE. This means that even when the
memory is physically contiguous, we might need more than one entry to
represent it. Additionally when using dmabuf, the sg_table might be
originated outside the subsystem and optimized for other needs.
For instance an SGT of 16GB GPU continuous memory might look like this:
(a real life example)
dma_address 34401400000, length fffff000
dma_address 345013ff000, length fffff000
dma_address 346013fe000, length fffff000
dma_address 347013fd000, length fffff000
dma_address 348013fc000, length 4000
Since ib_umem_find_best_pgsz works within SG entries, in the above case
we will result with the worst possible 4KB page size.
Fix this by taking into consideration only the alignment of addresses of
real discontinuity points rather than treating SG entries as such, and
adjust the page iterator to correctly handle cross SG entry pages.
There is currently an assumption that drivers do not ask for pages
bigger than maximal DMA size supported by their devices.
Reviewed-by: Firas Jahjah <firasj@amazon.com>
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
---
drivers/infiniband/core/umem.c | 34 +++++++++++++++++++++++----------
drivers/infiniband/core/verbs.c | 11 ++++++-----
2 files changed, 30 insertions(+), 15 deletions(-)
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 07c571c7b699..e7e428369159 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -80,8 +80,10 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
unsigned long pgsz_bitmap,
unsigned long virt)
{
- struct scatterlist *sg;
+ unsigned long curr_len = 0;
+ dma_addr_t curr_base = ~0;
unsigned long va, pgoff;
+ struct scatterlist *sg;
dma_addr_t mask;
int i;
@@ -107,17 +109,29 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
pgoff = umem->address & ~PAGE_MASK;
for_each_sgtable_dma_sg(&umem->sgt_append.sgt, sg, i) {
- /* Walk SGL and reduce max page size if VA/PA bits differ
- * for any address.
+ /* If the current entry is physically contiguous with the previous
+ * one, no need to take its start addresses into consideration.
*/
- mask |= (sg_dma_address(sg) + pgoff) ^ va;
+ if (curr_base + curr_len != sg_dma_address(sg)) {
+
+ curr_base = sg_dma_address(sg);
+ curr_len = 0;
+
+ /* Reduce max page size if VA/PA bits differ */
+ mask |= (curr_base + pgoff) ^ va;
+
+ /* The alignment of any VA matching a discontinuity point
+ * in the physical memory sets the maximum possible page
+ * size as this must be a starting point of a new page that
+ * needs to be aligned.
+ */
+ if (i != 0)
+ mask |= va;
+ }
+
+ curr_len += sg_dma_len(sg);
va += sg_dma_len(sg) - pgoff;
- /* Except for the last entry, the ending iova alignment sets
- * the maximum possible page size as the low bits of the iova
- * must be zero when starting the next chunk.
- */
- if (i != (umem->sgt_append.sgt.nents - 1))
- mask |= va;
+
pgoff = 0;
}
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 473ee0831307..dc40001072a5 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -3109,22 +3109,23 @@ EXPORT_SYMBOL(__rdma_block_iter_start);
bool __rdma_block_iter_next(struct ib_block_iter *biter)
{
unsigned int block_offset;
- unsigned int sg_delta;
+ unsigned int delta;
if (!biter->__sg_nents || !biter->__sg)
return false;
biter->__dma_addr = sg_dma_address(biter->__sg) + biter->__sg_advance;
block_offset = biter->__dma_addr & (BIT_ULL(biter->__pg_bit) - 1);
- sg_delta = BIT_ULL(biter->__pg_bit) - block_offset;
+ delta = BIT_ULL(biter->__pg_bit) - block_offset;
- if (sg_dma_len(biter->__sg) - biter->__sg_advance > sg_delta) {
- biter->__sg_advance += sg_delta;
- } else {
+ while (biter->__sg_nents && biter->__sg &&
+ sg_dma_len(biter->__sg) - biter->__sg_advance <= delta) {
+ delta -= sg_dma_len(biter->__sg) - biter->__sg_advance;
biter->__sg_advance = 0;
biter->__sg = sg_next(biter->__sg);
biter->__sg_nents--;
}
+ biter->__sg_advance += delta;
return true;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-09 14:26 [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries Michael Margolin
@ 2025-02-13 12:51 ` Leon Romanovsky
2025-02-13 14:04 ` Jason Gunthorpe
2025-02-13 12:53 ` Leon Romanovsky
1 sibling, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2025-02-13 12:51 UTC (permalink / raw)
To: Michael Margolin
Cc: jgg, linux-rdma, sleybo, matua, gal.pressman, Firas Jahjah,
Yonatan Nachum
On Sun, Feb 09, 2025 at 02:26:08PM +0000, Michael Margolin wrote:
> A single scatter-gather entry is limited by a 32 bits "length" field
> that is practically 4GB - PAGE_SIZE. This means that even when the
> memory is physically contiguous, we might need more than one entry to
> represent it. Additionally when using dmabuf, the sg_table might be
> originated outside the subsystem and optimized for other needs.
>
> For instance an SGT of 16GB GPU continuous memory might look like this:
> (a real life example)
>
> dma_address 34401400000, length fffff000
> dma_address 345013ff000, length fffff000
> dma_address 346013fe000, length fffff000
> dma_address 347013fd000, length fffff000
> dma_address 348013fc000, length 4000
>
> Since ib_umem_find_best_pgsz works within SG entries, in the above case
> we will result with the worst possible 4KB page size.
>
> Fix this by taking into consideration only the alignment of addresses of
> real discontinuity points rather than treating SG entries as such, and
> adjust the page iterator to correctly handle cross SG entry pages.
>
> There is currently an assumption that drivers do not ask for pages
> bigger than maximal DMA size supported by their devices.
>
> Reviewed-by: Firas Jahjah <firasj@amazon.com>
> Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
> Signed-off-by: Michael Margolin <mrgolin@amazon.com>
> ---
> drivers/infiniband/core/umem.c | 34 +++++++++++++++++++++++----------
> drivers/infiniband/core/verbs.c | 11 ++++++-----
> 2 files changed, 30 insertions(+), 15 deletions(-)
Applied with the following change to prevent arithmetic overflow.
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index e7e428369159..63a92d6cfbc2 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -112,8 +112,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
/* If the current entry is physically contiguous with the previous
* one, no need to take its start addresses into consideration.
*/
- if (curr_base + curr_len != sg_dma_address(sg)) {
-
+ if (curr_base != sg_dma_address(sg) - curr_len) {
curr_base = sg_dma_address(sg);
curr_len = 0;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-09 14:26 [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries Michael Margolin
2025-02-13 12:51 ` Leon Romanovsky
@ 2025-02-13 12:53 ` Leon Romanovsky
2025-02-14 6:57 ` Leon Romanovsky
1 sibling, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2025-02-13 12:53 UTC (permalink / raw)
To: linux-rdma, Jason Gunthorpe, Michael Margolin
Cc: sleybo, matua, gal.pressman, Firas Jahjah, Yonatan Nachum
On Sun, 09 Feb 2025 14:26:08 +0000, Michael Margolin wrote:
> A single scatter-gather entry is limited by a 32 bits "length" field
> that is practically 4GB - PAGE_SIZE. This means that even when the
> memory is physically contiguous, we might need more than one entry to
> represent it. Additionally when using dmabuf, the sg_table might be
> originated outside the subsystem and optimized for other needs.
>
> For instance an SGT of 16GB GPU continuous memory might look like this:
> (a real life example)
>
> [...]
Applied, thanks!
[1/1] RDMA/core: Fix best page size finding when it can cross SG entries
https://git.kernel.org/rdma/rdma/c/a4b57de5dfef29
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 12:51 ` Leon Romanovsky
@ 2025-02-13 14:04 ` Jason Gunthorpe
2025-02-13 14:30 ` Margolin, Michael
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 14:04 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Michael Margolin, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On Thu, Feb 13, 2025 at 02:51:26PM +0200, Leon Romanovsky wrote:
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index e7e428369159..63a92d6cfbc2 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -112,8 +112,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
> /* If the current entry is physically contiguous with the previous
> * one, no need to take its start addresses into consideration.
> */
> - if (curr_base + curr_len != sg_dma_address(sg)) {
> -
> + if (curr_base != sg_dma_address(sg) - curr_len) {
> curr_base = sg_dma_address(sg);
> curr_len = 0;
I'm not sure about this, what ensures sg_dma_address() > curr_len?
curr_base + curr_len could also overflow, we've seen that AMD IOMMU
sometimes uses the very high addresess already
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 14:04 ` Jason Gunthorpe
@ 2025-02-13 14:30 ` Margolin, Michael
2025-02-13 14:42 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Margolin, Michael @ 2025-02-13 14:30 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: linux-rdma, sleybo, matua, gal.pressman, Firas Jahjah,
Yonatan Nachum
On 2/13/2025 4:04 PM, Jason Gunthorpe wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Thu, Feb 13, 2025 at 02:51:26PM +0200, Leon Romanovsky wrote:
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index e7e428369159..63a92d6cfbc2 100644
>> --- a/drivers/infiniband/core/umem.c
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -112,8 +112,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
>> /* If the current entry is physically contiguous with the previous
>> * one, no need to take its start addresses into consideration.
>> */
>> - if (curr_base + curr_len != sg_dma_address(sg)) {
>> -
>> + if (curr_base != sg_dma_address(sg) - curr_len) {
>> curr_base = sg_dma_address(sg);
>> curr_len = 0;
> I'm not sure about this, what ensures sg_dma_address() > curr_len?
>
> curr_base + curr_len could also overflow, we've seen that AMD IOMMU
> sometimes uses the very high addresess already
I think the only case we care about where curr_base + curr_len can
overflow is when next sg_dma_address() == 0.
But maybe we should just add an explicit check:
- if (curr_base + curr_len != sg_dma_address(sg)) {
+ if (curr_base + curr_len < curr_base ||
+ curr_base + curr_len != sg_dma_address(sg)) {
curr_base = sg_dma_address(sg);
curr_len = 0;
Michael
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 14:30 ` Margolin, Michael
@ 2025-02-13 14:42 ` Jason Gunthorpe
2025-02-13 17:25 ` Margolin, Michael
2025-02-13 17:35 ` Leon Romanovsky
0 siblings, 2 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 14:42 UTC (permalink / raw)
To: Margolin, Michael
Cc: Leon Romanovsky, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On Thu, Feb 13, 2025 at 04:30:11PM +0200, Margolin, Michael wrote:
>
> On 2/13/2025 4:04 PM, Jason Gunthorpe wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> > On Thu, Feb 13, 2025 at 02:51:26PM +0200, Leon Romanovsky wrote:
> > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> > > index e7e428369159..63a92d6cfbc2 100644
> > > --- a/drivers/infiniband/core/umem.c
> > > +++ b/drivers/infiniband/core/umem.c
> > > @@ -112,8 +112,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
> > > /* If the current entry is physically contiguous with the previous
> > > * one, no need to take its start addresses into consideration.
> > > */
> > > - if (curr_base + curr_len != sg_dma_address(sg)) {
> > > -
> > > + if (curr_base != sg_dma_address(sg) - curr_len) {
> > > curr_base = sg_dma_address(sg);
> > > curr_len = 0;
> > I'm not sure about this, what ensures sg_dma_address() > curr_len?
> >
> > curr_base + curr_len could also overflow, we've seen that AMD IOMMU
> > sometimes uses the very high addresess already
>
> I think the only case we care about where curr_base + curr_len can overflow
> is when next sg_dma_address() == 0.
>
> But maybe we should just add an explicit check:
>
> - if (curr_base + curr_len != sg_dma_address(sg)) {
> + if (curr_base + curr_len < curr_base ||
> + curr_base + curr_len != sg_dma_address(sg)) {
> curr_base = sg_dma_address(sg);
> curr_len = 0;
Ugh
I wonder if we should try to make a overflow.h helper for these kinds
of problems.
/* Check if a + n == b, failing if a+n overflows */
check_consecutive(a, n, b)
?
It is a fairly common problem
I suggest to take the patch as it originally was and try to propose
the above helper?
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 14:42 ` Jason Gunthorpe
@ 2025-02-13 17:25 ` Margolin, Michael
2025-02-13 17:35 ` Leon Romanovsky
1 sibling, 0 replies; 16+ messages in thread
From: Margolin, Michael @ 2025-02-13 17:25 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On 2/13/2025 4:42 PM, Jason Gunthorpe wrote:
> I wonder if we should try to make a overflow.h helper for these kinds
> of problems.
>
> /* Check if a + n == b, failing if a+n overflows */
> check_consecutive(a, n, b)
>
> ?
>
> It is a fairly common problem
>
> I suggest to take the patch as it originally was and try to propose
> the above helper?
Sounds reasonable, I can propose such check helpers.
Michael
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 14:42 ` Jason Gunthorpe
2025-02-13 17:25 ` Margolin, Michael
@ 2025-02-13 17:35 ` Leon Romanovsky
2025-02-13 17:40 ` Jason Gunthorpe
1 sibling, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2025-02-13 17:35 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Margolin, Michael, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On Thu, Feb 13, 2025 at 10:42:19AM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 13, 2025 at 04:30:11PM +0200, Margolin, Michael wrote:
> >
> > On 2/13/2025 4:04 PM, Jason Gunthorpe wrote:
> > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > >
> > >
> > >
> > > On Thu, Feb 13, 2025 at 02:51:26PM +0200, Leon Romanovsky wrote:
> > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> > > > index e7e428369159..63a92d6cfbc2 100644
> > > > --- a/drivers/infiniband/core/umem.c
> > > > +++ b/drivers/infiniband/core/umem.c
> > > > @@ -112,8 +112,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
> > > > /* If the current entry is physically contiguous with the previous
> > > > * one, no need to take its start addresses into consideration.
> > > > */
> > > > - if (curr_base + curr_len != sg_dma_address(sg)) {
> > > > -
> > > > + if (curr_base != sg_dma_address(sg) - curr_len) {
> > > > curr_base = sg_dma_address(sg);
> > > > curr_len = 0;
> > > I'm not sure about this, what ensures sg_dma_address() > curr_len?
> > >
> > > curr_base + curr_len could also overflow, we've seen that AMD IOMMU
> > > sometimes uses the very high addresess already
> >
> > I think the only case we care about where curr_base + curr_len can overflow
> > is when next sg_dma_address() == 0.
> >
> > But maybe we should just add an explicit check:
> >
> > - if (curr_base + curr_len != sg_dma_address(sg)) {
> > + if (curr_base + curr_len < curr_base ||
> > + curr_base + curr_len != sg_dma_address(sg)) {
> > curr_base = sg_dma_address(sg);
> > curr_len = 0;
>
> Ugh
>
> I wonder if we should try to make a overflow.h helper for these kinds
> of problems.
>
> /* Check if a + n == b, failing if a+n overflows */
> check_consecutive(a, n, b)
>
> ?
>
> It is a fairly common problem
>
> I suggest to take the patch as it originally was and try to propose
> the above helper?
My concern was with this line:
if (curr_base + curr_len != sg_dma_address(sg)) {
Initially curr_base is 0xFF.....FF and curr_len is 0.
So if this "if ..." is skipped (not possible but static checkers don't know),
we will advance curr_len and curr_base + curr_len will overflow.
I don't want to take original patch.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 17:35 ` Leon Romanovsky
@ 2025-02-13 17:40 ` Jason Gunthorpe
2025-02-13 17:55 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 17:40 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Margolin, Michael, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On Thu, Feb 13, 2025 at 07:35:10PM +0200, Leon Romanovsky wrote:
> Initially curr_base is 0xFF.....FF and curr_len is 0.
curr base can't be so unaligned can it?
> So if this "if ..." is skipped (not possible but static checkers don't know),
> we will advance curr_len and curr_base + curr_len will overflow.
>
> I don't want to take original patch.
Subtracting is no better, it will just randomly fail for low dma addrs
instead of high.
You need to call check_add_overflow()
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 17:40 ` Jason Gunthorpe
@ 2025-02-13 17:55 ` Leon Romanovsky
2025-02-13 18:12 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2025-02-13 17:55 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Margolin, Michael, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On Thu, Feb 13, 2025 at 01:40:43PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 13, 2025 at 07:35:10PM +0200, Leon Romanovsky wrote:
>
> > Initially curr_base is 0xFF.....FF and curr_len is 0.
>
> curr base can't be so unaligned can it?
It is only for first iteration where it is compared with
sg_dma_address(), immediately after that it is overwritten.
>
> > So if this "if ..." is skipped (not possible but static checkers don't know),
> > we will advance curr_len and curr_base + curr_len will overflow.
> >
> > I don't want to take original patch.
>
> Subtracting is no better, it will just randomly fail for low dma addrs
> instead of high.
Aren't sg_dma_address placed in increasing order?
If not, whole if loop is not correct.
If yes, we won't see any failures.
>
> You need to call check_add_overflow()
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 17:55 ` Leon Romanovsky
@ 2025-02-13 18:12 ` Jason Gunthorpe
2025-02-14 5:55 ` Leon Romanovsky
2025-02-16 8:07 ` Leon Romanovsky
0 siblings, 2 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 18:12 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Margolin, Michael, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On Thu, Feb 13, 2025 at 07:55:17PM +0200, Leon Romanovsky wrote:
> On Thu, Feb 13, 2025 at 01:40:43PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 13, 2025 at 07:35:10PM +0200, Leon Romanovsky wrote:
> >
> > > Initially curr_base is 0xFF.....FF and curr_len is 0.
> >
> > curr base can't be so unaligned can it?
>
> It is only for first iteration where it is compared with
> sg_dma_address(), immediately after that it is overwritten.
But this is all working with inherently page aligned stuff, cur_base +
len1 + len2 + len3 + len_n should be page aligned for interior segments..
> > > So if this "if ..." is skipped (not possible but static checkers don't know),
> > > we will advance curr_len and curr_base + curr_len will overflow.
> > >
> > > I don't want to take original patch.
> >
> > Subtracting is no better, it will just randomly fail for low dma addrs
> > instead of high.
>
> Aren't sg_dma_address placed in increasing order?
Not necessarily, dma direct produces something random
> If not, whole if loop is not correct.
The point is to detect cases where they happen to be in order because
they were split up due to the 32 bit limit in the sgl.
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 18:12 ` Jason Gunthorpe
@ 2025-02-14 5:55 ` Leon Romanovsky
2025-02-14 15:33 ` Jason Gunthorpe
2025-02-16 8:07 ` Leon Romanovsky
1 sibling, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2025-02-14 5:55 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Margolin, Michael, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On Thu, Feb 13, 2025 at 02:12:42PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 13, 2025 at 07:55:17PM +0200, Leon Romanovsky wrote:
> > On Thu, Feb 13, 2025 at 01:40:43PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 13, 2025 at 07:35:10PM +0200, Leon Romanovsky wrote:
> > >
> > > > Initially curr_base is 0xFF.....FF and curr_len is 0.
> > >
> > > curr base can't be so unaligned can it?
> >
> > It is only for first iteration where it is compared with
> > sg_dma_address(), immediately after that it is overwritten.
>
> But this is all working with inherently page aligned stuff, cur_base +
> len1 + len2 + len3 + len_n should be page aligned for interior segments..
>
> > > > So if this "if ..." is skipped (not possible but static checkers don't know),
> > > > we will advance curr_len and curr_base + curr_len will overflow.
> > > >
> > > > I don't want to take original patch.
> > >
> > > Subtracting is no better, it will just randomly fail for low dma addrs
> > > instead of high.
> >
> > Aren't sg_dma_address placed in increasing order?
>
> Not necessarily, dma direct produces something random
>
> > If not, whole if loop is not correct.
>
> The point is to detect cases where they happen to be in order because
> they were split up due to the 32 bit limit in the sgl.
There is no promise that this split will create consecutive SG entries
and iterator will fetch them in that order too.
This discussion leads me to the understanding that we need to drop the patch.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 12:53 ` Leon Romanovsky
@ 2025-02-14 6:57 ` Leon Romanovsky
0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2025-02-14 6:57 UTC (permalink / raw)
To: linux-rdma, Jason Gunthorpe, Michael Margolin
Cc: sleybo, matua, gal.pressman, Firas Jahjah, Yonatan Nachum
On Thu, Feb 13, 2025 at 07:53:44AM -0500, Leon Romanovsky wrote:
>
> On Sun, 09 Feb 2025 14:26:08 +0000, Michael Margolin wrote:
> > A single scatter-gather entry is limited by a 32 bits "length" field
> > that is practically 4GB - PAGE_SIZE. This means that even when the
> > memory is physically contiguous, we might need more than one entry to
> > represent it. Additionally when using dmabuf, the sg_table might be
> > originated outside the subsystem and optimized for other needs.
> >
> > For instance an SGT of 16GB GPU continuous memory might look like this:
> > (a real life example)
> >
> > [...]
>
> Applied, thanks!
>
> [1/1] RDMA/core: Fix best page size finding when it can cross SG entries
> https://git.kernel.org/rdma/rdma/c/a4b57de5dfef29
I dropped this patch for now.
Thanks
>
> Best regards,
> --
> Leon Romanovsky <leon@kernel.org>
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-14 5:55 ` Leon Romanovsky
@ 2025-02-14 15:33 ` Jason Gunthorpe
0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2025-02-14 15:33 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Margolin, Michael, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On Fri, Feb 14, 2025 at 07:55:11AM +0200, Leon Romanovsky wrote:
> > The point is to detect cases where they happen to be in order because
> > they were split up due to the 32 bit limit in the sgl.
>
> There is no promise that this split will create consecutive SG entries
> and iterator will fetch them in that order too.
Yes, that is guarenteed, if the memory is contiguous and split then
the SG entries have to be consecutive and the sg iterator has to go in
order.
Otherwise the DMA transfer would be scrambled.
In normal cases we except the SGL to already be fully aggregated so
all SGL boundaries also have a dma_addr discontinuity. This special
case of exceeding the 32 bit limit is the only time we should have
consecutive sgls that are dma_addr contiguous.
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-13 18:12 ` Jason Gunthorpe
2025-02-14 5:55 ` Leon Romanovsky
@ 2025-02-16 8:07 ` Leon Romanovsky
2025-02-17 8:40 ` Margolin, Michael
1 sibling, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2025-02-16 8:07 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Margolin, Michael, linux-rdma, sleybo, matua, gal.pressman,
Firas Jahjah, Yonatan Nachum
On Thu, Feb 13, 2025 at 02:12:42PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 13, 2025 at 07:55:17PM +0200, Leon Romanovsky wrote:
> > On Thu, Feb 13, 2025 at 01:40:43PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 13, 2025 at 07:35:10PM +0200, Leon Romanovsky wrote:
> > >
> > > > Initially curr_base is 0xFF.....FF and curr_len is 0.
> > >
> > > curr base can't be so unaligned can it?
> >
> > It is only for first iteration where it is compared with
> > sg_dma_address(), immediately after that it is overwritten.
>
> But this is all working with inherently page aligned stuff, cur_base +
> len1 + len2 + len3 + len_n should be page aligned for interior segments..
This is unknown to static code analyze tools. I'm not concerned about
logical change, but about possible static code analyze failures.
Thanks
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries
2025-02-16 8:07 ` Leon Romanovsky
@ 2025-02-17 8:40 ` Margolin, Michael
0 siblings, 0 replies; 16+ messages in thread
From: Margolin, Michael @ 2025-02-17 8:40 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe
Cc: linux-rdma, sleybo, matua, gal.pressman, Firas Jahjah,
Yonatan Nachum
On 2/16/2025 10:07 AM, Leon Romanovsky wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Thu, Feb 13, 2025 at 02:12:42PM -0400, Jason Gunthorpe wrote:
>> On Thu, Feb 13, 2025 at 07:55:17PM +0200, Leon Romanovsky wrote:
>>> On Thu, Feb 13, 2025 at 01:40:43PM -0400, Jason Gunthorpe wrote:
>>>> On Thu, Feb 13, 2025 at 07:35:10PM +0200, Leon Romanovsky wrote:
>>>>
>>>>> Initially curr_base is 0xFF.....FF and curr_len is 0.
>>>> curr base can't be so unaligned can it?
>>> It is only for first iteration where it is compared with
>>> sg_dma_address(), immediately after that it is overwritten.
>> But this is all working with inherently page aligned stuff, cur_base +
>> len1 + len2 + len3 + len_n should be page aligned for interior segments..
> This is unknown to static code analyze tools. I'm not concerned about
> logical change, but about possible static code analyze failures.
>
> Thanks
OK, I'll edit the patch to make sure it doesn't produce any SA failures.
Michael
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-02-17 8:41 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-09 14:26 [PATCH for-next] RDMA/core: Fix best page size finding when it can cross SG entries Michael Margolin
2025-02-13 12:51 ` Leon Romanovsky
2025-02-13 14:04 ` Jason Gunthorpe
2025-02-13 14:30 ` Margolin, Michael
2025-02-13 14:42 ` Jason Gunthorpe
2025-02-13 17:25 ` Margolin, Michael
2025-02-13 17:35 ` Leon Romanovsky
2025-02-13 17:40 ` Jason Gunthorpe
2025-02-13 17:55 ` Leon Romanovsky
2025-02-13 18:12 ` Jason Gunthorpe
2025-02-14 5:55 ` Leon Romanovsky
2025-02-14 15:33 ` Jason Gunthorpe
2025-02-16 8:07 ` Leon Romanovsky
2025-02-17 8:40 ` Margolin, Michael
2025-02-13 12:53 ` Leon Romanovsky
2025-02-14 6:57 ` Leon Romanovsky
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.