All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma: mark unmapped DMA scatter/gather invalid
@ 2021-01-11 15:43 ` Marc Orr via iommu
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Orr @ 2021-01-11 15:43 UTC (permalink / raw)
  To: hch, m.szyprowski, robin.murphy, jxgao, iommu, linux-kernel
  Cc: stable, Marc Orr

This patch updates dma_direct_unmap_sg() to mark each scatter/gather
entry invalid, after it's unmapped. This fixes two issues:

1. It makes the unmapping code able to tolerate a double unmap.
2. It prevents the NVMe driver from erroneously treating an unmapped DMA
address as mapped.

The bug that motivated this patch was the following sequence, which
occurred within the NVMe driver, with the kernel flag `swiotlb=force`.

* NVMe driver calls dma_direct_map_sg()
* dma_direct_map_sg() fails part way through the scatter gather/list
* dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
  succeeded.
* NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
  double unmap, which is a bug.

With this patch, a hadoop workload running on a cluster of three AMD
SEV VMs, is able to succeed. Without the patch, the hadoop workload
suffers application-level and even VM-level failures.

Tested-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Marc Orr <marcorr@google.com>
Reviewed-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
---
 kernel/dma/direct.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 0a4881e59aa7..3d9b17fe5771 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 	struct scatterlist *sg;
 	int i;
 
-	for_each_sg(sgl, sg, nents, i)
+	for_each_sg(sgl, sg, nents, i) {
 		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
 			     attrs);
+		sg->dma_address = DMA_MAPPING_ERROR;
+	}
 }
 EXPORT_SYMBOL(dma_direct_unmap_sg);
 #endif
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] dma: mark unmapped DMA scatter/gather invalid
@ 2021-01-11 15:43 ` Marc Orr via iommu
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Orr via iommu @ 2021-01-11 15:43 UTC (permalink / raw)
  To: hch, m.szyprowski, robin.murphy, jxgao, iommu, linux-kernel
  Cc: Marc Orr, stable

This patch updates dma_direct_unmap_sg() to mark each scatter/gather
entry invalid, after it's unmapped. This fixes two issues:

1. It makes the unmapping code able to tolerate a double unmap.
2. It prevents the NVMe driver from erroneously treating an unmapped DMA
address as mapped.

The bug that motivated this patch was the following sequence, which
occurred within the NVMe driver, with the kernel flag `swiotlb=force`.

* NVMe driver calls dma_direct_map_sg()
* dma_direct_map_sg() fails part way through the scatter gather/list
* dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
  succeeded.
* NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
  double unmap, which is a bug.

With this patch, a hadoop workload running on a cluster of three AMD
SEV VMs, is able to succeed. Without the patch, the hadoop workload
suffers application-level and even VM-level failures.

Tested-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Marc Orr <marcorr@google.com>
Reviewed-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
---
 kernel/dma/direct.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 0a4881e59aa7..3d9b17fe5771 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 	struct scatterlist *sg;
 	int i;
 
-	for_each_sg(sgl, sg, nents, i)
+	for_each_sg(sgl, sg, nents, i) {
 		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
 			     attrs);
+		sg->dma_address = DMA_MAPPING_ERROR;
+	}
 }
 EXPORT_SYMBOL(dma_direct_unmap_sg);
 #endif
-- 
2.30.0.284.gd98b1dd5eaa7-goog

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
  2021-01-11 15:43 ` Marc Orr via iommu
@ 2021-01-11 16:08   ` Robin Murphy
  -1 siblings, 0 replies; 12+ messages in thread
From: Robin Murphy @ 2021-01-11 16:08 UTC (permalink / raw)
  To: Marc Orr, hch, m.szyprowski, jxgao, iommu, linux-kernel; +Cc: stable

On 2021-01-11 15:43, Marc Orr wrote:
> This patch updates dma_direct_unmap_sg() to mark each scatter/gather
> entry invalid, after it's unmapped. This fixes two issues:

s/fixes/bodges around (badly)/

> 1. It makes the unmapping code able to tolerate a double unmap.
> 2. It prevents the NVMe driver from erroneously treating an unmapped DMA
> address as mapped.
> 
> The bug that motivated this patch was the following sequence, which
> occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
> 
> * NVMe driver calls dma_direct_map_sg()
> * dma_direct_map_sg() fails part way through the scatter gather/list
> * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
>    succeeded.
> * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
>    double unmap, which is a bug.

So why not just fix that actual bug?

> With this patch, a hadoop workload running on a cluster of three AMD
> SEV VMs, is able to succeed. Without the patch, the hadoop workload
> suffers application-level and even VM-level failures.
> 
> Tested-by: Jianxiong Gao <jxgao@google.com>
> Tested-by: Marc Orr <marcorr@google.com>
> Reviewed-by: Jianxiong Gao <jxgao@google.com>
> Signed-off-by: Marc Orr <marcorr@google.com>
> ---
>   kernel/dma/direct.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 0a4881e59aa7..3d9b17fe5771 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>   	struct scatterlist *sg;
>   	int i;
>   
> -	for_each_sg(sgl, sg, nents, i)
> +	for_each_sg(sgl, sg, nents, i) {
>   		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>   			     attrs);
> +		sg->dma_address = DMA_MAPPING_ERROR;

There are more DMA API backends than just dma-direct, so while this 
might help paper over bugs when SWIOTLB is in use, it's not going to 
have any effect when those same bugs are hit under other circumstances. 
Once again, the moral of the story is that effort is better spent just 
fixing the bugs ;)

Robin.

> +	}
>   }
>   EXPORT_SYMBOL(dma_direct_unmap_sg);
>   #endif
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
@ 2021-01-11 16:08   ` Robin Murphy
  0 siblings, 0 replies; 12+ messages in thread
From: Robin Murphy @ 2021-01-11 16:08 UTC (permalink / raw)
  To: Marc Orr, hch, m.szyprowski, jxgao, iommu, linux-kernel; +Cc: stable

On 2021-01-11 15:43, Marc Orr wrote:
> This patch updates dma_direct_unmap_sg() to mark each scatter/gather
> entry invalid, after it's unmapped. This fixes two issues:

s/fixes/bodges around (badly)/

> 1. It makes the unmapping code able to tolerate a double unmap.
> 2. It prevents the NVMe driver from erroneously treating an unmapped DMA
> address as mapped.
> 
> The bug that motivated this patch was the following sequence, which
> occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
> 
> * NVMe driver calls dma_direct_map_sg()
> * dma_direct_map_sg() fails part way through the scatter gather/list
> * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
>    succeeded.
> * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
>    double unmap, which is a bug.

So why not just fix that actual bug?

> With this patch, a hadoop workload running on a cluster of three AMD
> SEV VMs, is able to succeed. Without the patch, the hadoop workload
> suffers application-level and even VM-level failures.
> 
> Tested-by: Jianxiong Gao <jxgao@google.com>
> Tested-by: Marc Orr <marcorr@google.com>
> Reviewed-by: Jianxiong Gao <jxgao@google.com>
> Signed-off-by: Marc Orr <marcorr@google.com>
> ---
>   kernel/dma/direct.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 0a4881e59aa7..3d9b17fe5771 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>   	struct scatterlist *sg;
>   	int i;
>   
> -	for_each_sg(sgl, sg, nents, i)
> +	for_each_sg(sgl, sg, nents, i) {
>   		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>   			     attrs);
> +		sg->dma_address = DMA_MAPPING_ERROR;

There are more DMA API backends than just dma-direct, so while this 
might help paper over bugs when SWIOTLB is in use, it's not going to 
have any effect when those same bugs are hit under other circumstances. 
Once again, the moral of the story is that effort is better spent just 
fixing the bugs ;)

Robin.

> +	}
>   }
>   EXPORT_SYMBOL(dma_direct_unmap_sg);
>   #endif
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
  2021-01-11 15:43 ` Marc Orr via iommu
@ 2021-01-11 16:15   ` Greg KH
  -1 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2021-01-11 16:15 UTC (permalink / raw)
  To: Marc Orr; +Cc: linux-kernel, stable, iommu, robin.murphy, hch, jxgao

On Mon, Jan 11, 2021 at 07:43:35AM -0800, Marc Orr wrote:
> This patch updates dma_direct_unmap_sg() to mark each scatter/gather
> entry invalid, after it's unmapped. This fixes two issues:
> 
> 1. It makes the unmapping code able to tolerate a double unmap.
> 2. It prevents the NVMe driver from erroneously treating an unmapped DMA
> address as mapped.
> 
> The bug that motivated this patch was the following sequence, which
> occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
> 
> * NVMe driver calls dma_direct_map_sg()
> * dma_direct_map_sg() fails part way through the scatter gather/list
> * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
>   succeeded.
> * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
>   double unmap, which is a bug.
> 
> With this patch, a hadoop workload running on a cluster of three AMD
> SEV VMs, is able to succeed. Without the patch, the hadoop workload
> suffers application-level and even VM-level failures.
> 
> Tested-by: Jianxiong Gao <jxgao@google.com>
> Tested-by: Marc Orr <marcorr@google.com>
> Reviewed-by: Jianxiong Gao <jxgao@google.com>
> Signed-off-by: Marc Orr <marcorr@google.com>
> ---
>  kernel/dma/direct.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
@ 2021-01-11 16:15   ` Greg KH
  0 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2021-01-11 16:15 UTC (permalink / raw)
  To: Marc Orr
  Cc: hch, m.szyprowski, robin.murphy, jxgao, iommu, linux-kernel,
	stable

On Mon, Jan 11, 2021 at 07:43:35AM -0800, Marc Orr wrote:
> This patch updates dma_direct_unmap_sg() to mark each scatter/gather
> entry invalid, after it's unmapped. This fixes two issues:
> 
> 1. It makes the unmapping code able to tolerate a double unmap.
> 2. It prevents the NVMe driver from erroneously treating an unmapped DMA
> address as mapped.
> 
> The bug that motivated this patch was the following sequence, which
> occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
> 
> * NVMe driver calls dma_direct_map_sg()
> * dma_direct_map_sg() fails part way through the scatter gather/list
> * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
>   succeeded.
> * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
>   double unmap, which is a bug.
> 
> With this patch, a hadoop workload running on a cluster of three AMD
> SEV VMs, is able to succeed. Without the patch, the hadoop workload
> suffers application-level and even VM-level failures.
> 
> Tested-by: Jianxiong Gao <jxgao@google.com>
> Tested-by: Marc Orr <marcorr@google.com>
> Reviewed-by: Jianxiong Gao <jxgao@google.com>
> Signed-off-by: Marc Orr <marcorr@google.com>
> ---
>  kernel/dma/direct.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
  2021-01-11 16:08   ` Robin Murphy
@ 2021-01-11 18:03     ` Marc Orr via iommu
  -1 siblings, 0 replies; 12+ messages in thread
From: Marc Orr @ 2021-01-11 18:03 UTC (permalink / raw)
  To: Robin Murphy; +Cc: hch, m.szyprowski, Jianxiong Gao, iommu, linux-kernel

> On 2021-01-11 15:43, Marc Orr wrote:

minus stable@vger.kernel.org, per gregkh@'s email.

> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 0a4881e59aa7..3d9b17fe5771 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> >       struct scatterlist *sg;
> >       int i;
> >
> > -     for_each_sg(sgl, sg, nents, i)
> > +     for_each_sg(sgl, sg, nents, i) {
> >               dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
> >                            attrs);
> > +             sg->dma_address = DMA_MAPPING_ERROR;
>
> There are more DMA API backends than just dma-direct, so while this
> might help paper over bugs when SWIOTLB is in use, it's not going to
> have any effect when those same bugs are hit under other circumstances.
> Once again, the moral of the story is that effort is better spent just
> fixing the bugs ;)

Thanks for the quick feedback. What is the correct fix? I understand
the first half. The NVMe driver should be updated to not call unmap on
an address that has already been unmapped within the DMA direct code.
Where I'm less certain is how to communicate to the NVMe driver that
the mapping failed. In particular, the NVMe code explicitly checks if
the first DMA address in the scatter/gather list is set to
DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
to propagate DMA_MAPPING_ERROR back up to the driver, via the
scatter/gather struct?

I skimmed arch/arm/mm/dma-mapping.c, just now. I can see that this
code sets the address within the scatter/gather struct to
DMA_MAPPING_ERROR before trying to map an IO address and write it into
the struct. Is this a good example to follow?

Thanks,
Marc

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
@ 2021-01-11 18:03     ` Marc Orr via iommu
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Orr via iommu @ 2021-01-11 18:03 UTC (permalink / raw)
  To: Robin Murphy; +Cc: linux-kernel, iommu, Jianxiong Gao, hch

> On 2021-01-11 15:43, Marc Orr wrote:

minus stable@vger.kernel.org, per gregkh@'s email.

> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 0a4881e59aa7..3d9b17fe5771 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> >       struct scatterlist *sg;
> >       int i;
> >
> > -     for_each_sg(sgl, sg, nents, i)
> > +     for_each_sg(sgl, sg, nents, i) {
> >               dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
> >                            attrs);
> > +             sg->dma_address = DMA_MAPPING_ERROR;
>
> There are more DMA API backends than just dma-direct, so while this
> might help paper over bugs when SWIOTLB is in use, it's not going to
> have any effect when those same bugs are hit under other circumstances.
> Once again, the moral of the story is that effort is better spent just
> fixing the bugs ;)

Thanks for the quick feedback. What is the correct fix? I understand
the first half. The NVMe driver should be updated to not call unmap on
an address that has already been unmapped within the DMA direct code.
Where I'm less certain is how to communicate to the NVMe driver that
the mapping failed. In particular, the NVMe code explicitly checks if
the first DMA address in the scatter/gather list is set to
DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
to propagate DMA_MAPPING_ERROR back up to the driver, via the
scatter/gather struct?

I skimmed arch/arm/mm/dma-mapping.c, just now. I can see that this
code sets the address within the scatter/gather struct to
DMA_MAPPING_ERROR before trying to map an IO address and write it into
the struct. Is this a good example to follow?

Thanks,
Marc
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
  2021-01-11 18:03     ` Marc Orr via iommu
@ 2021-01-11 20:39       ` Robin Murphy
  -1 siblings, 0 replies; 12+ messages in thread
From: Robin Murphy @ 2021-01-11 20:39 UTC (permalink / raw)
  To: Marc Orr; +Cc: hch, m.szyprowski, Jianxiong Gao, iommu, linux-kernel

On 2021-01-11 18:03, Marc Orr wrote:
>> On 2021-01-11 15:43, Marc Orr wrote:
> 
> minus stable@vger.kernel.org, per gregkh@'s email.
> 
>>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>>> index 0a4881e59aa7..3d9b17fe5771 100644
>>> --- a/kernel/dma/direct.c
>>> +++ b/kernel/dma/direct.c
>>> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>>>        struct scatterlist *sg;
>>>        int i;
>>>
>>> -     for_each_sg(sgl, sg, nents, i)
>>> +     for_each_sg(sgl, sg, nents, i) {
>>>                dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>>>                             attrs);
>>> +             sg->dma_address = DMA_MAPPING_ERROR;
>>
>> There are more DMA API backends than just dma-direct, so while this
>> might help paper over bugs when SWIOTLB is in use, it's not going to
>> have any effect when those same bugs are hit under other circumstances.
>> Once again, the moral of the story is that effort is better spent just
>> fixing the bugs ;)
> 
> Thanks for the quick feedback. What is the correct fix? I understand
> the first half. The NVMe driver should be updated to not call unmap on
> an address that has already been unmapped within the DMA direct code.
> Where I'm less certain is how to communicate to the NVMe driver that
> the mapping failed. In particular, the NVMe code explicitly checks if
> the first DMA address in the scatter/gather list is set to
> DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
> to propagate DMA_MAPPING_ERROR back up to the driver, via the
> scatter/gather struct?

Erm, you check the return value of dma_map_sg(). If it's zero, the 
request failed; if it's nonzero, that's how many DMA segments you now 
have to process. See Documentation/core-api/dma-api.rst.

The only guarantee offered about the state of the scatterlist itself is 
that if it is successfully mapped, then the dma_address and dma_length 
fields are valid for that many segments, and if that is fewer than the 
total number of physical segments then the next one after the final DMA 
segment will have a dma_length of 0. In particular there are no 
guarantees at all about the state if the mapping was unsuccessful.

If a driver is failing to keep track of the success/failure status and 
later down the line trying to guess what to do with a list that may or 
may not have been mapped, then frankly that driver should be redesigned 
because that is a terrible anti-pattern. At the very very least it 
should explicitly encode its own "known bad" state upon failure that it 
can then reliably recognise later.

Robin.

> I skimmed arch/arm/mm/dma-mapping.c, just now. I can see that this
> code sets the address within the scatter/gather struct to
> DMA_MAPPING_ERROR before trying to map an IO address and write it into
> the struct. Is this a good example to follow?
> 
> Thanks,
> Marc
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
@ 2021-01-11 20:39       ` Robin Murphy
  0 siblings, 0 replies; 12+ messages in thread
From: Robin Murphy @ 2021-01-11 20:39 UTC (permalink / raw)
  To: Marc Orr; +Cc: linux-kernel, iommu, Jianxiong Gao, hch

On 2021-01-11 18:03, Marc Orr wrote:
>> On 2021-01-11 15:43, Marc Orr wrote:
> 
> minus stable@vger.kernel.org, per gregkh@'s email.
> 
>>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>>> index 0a4881e59aa7..3d9b17fe5771 100644
>>> --- a/kernel/dma/direct.c
>>> +++ b/kernel/dma/direct.c
>>> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>>>        struct scatterlist *sg;
>>>        int i;
>>>
>>> -     for_each_sg(sgl, sg, nents, i)
>>> +     for_each_sg(sgl, sg, nents, i) {
>>>                dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>>>                             attrs);
>>> +             sg->dma_address = DMA_MAPPING_ERROR;
>>
>> There are more DMA API backends than just dma-direct, so while this
>> might help paper over bugs when SWIOTLB is in use, it's not going to
>> have any effect when those same bugs are hit under other circumstances.
>> Once again, the moral of the story is that effort is better spent just
>> fixing the bugs ;)
> 
> Thanks for the quick feedback. What is the correct fix? I understand
> the first half. The NVMe driver should be updated to not call unmap on
> an address that has already been unmapped within the DMA direct code.
> Where I'm less certain is how to communicate to the NVMe driver that
> the mapping failed. In particular, the NVMe code explicitly checks if
> the first DMA address in the scatter/gather list is set to
> DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
> to propagate DMA_MAPPING_ERROR back up to the driver, via the
> scatter/gather struct?

Erm, you check the return value of dma_map_sg(). If it's zero, the 
request failed; if it's nonzero, that's how many DMA segments you now 
have to process. See Documentation/core-api/dma-api.rst.

The only guarantee offered about the state of the scatterlist itself is 
that if it is successfully mapped, then the dma_address and dma_length 
fields are valid for that many segments, and if that is fewer than the 
total number of physical segments then the next one after the final DMA 
segment will have a dma_length of 0. In particular there are no 
guarantees at all about the state if the mapping was unsuccessful.

If a driver is failing to keep track of the success/failure status and 
later down the line trying to guess what to do with a list that may or 
may not have been mapped, then frankly that driver should be redesigned 
because that is a terrible anti-pattern. At the very very least it 
should explicitly encode its own "known bad" state upon failure that it 
can then reliably recognise later.

Robin.

> I skimmed arch/arm/mm/dma-mapping.c, just now. I can see that this
> code sets the address within the scatter/gather struct to
> DMA_MAPPING_ERROR before trying to map an IO address and write it into
> the struct. Is this a good example to follow?
> 
> Thanks,
> Marc
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
  2021-01-11 20:39       ` Robin Murphy
@ 2021-01-11 21:36         ` Marc Orr
  -1 siblings, 0 replies; 12+ messages in thread
From: Marc Orr via iommu @ 2021-01-11 21:36 UTC (permalink / raw)
  To: Robin Murphy; +Cc: linux-kernel, iommu, Jianxiong Gao, hch

> >>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> >>> index 0a4881e59aa7..3d9b17fe5771 100644
> >>> --- a/kernel/dma/direct.c
> >>> +++ b/kernel/dma/direct.c
> >>> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> >>>        struct scatterlist *sg;
> >>>        int i;
> >>>
> >>> -     for_each_sg(sgl, sg, nents, i)
> >>> +     for_each_sg(sgl, sg, nents, i) {
> >>>                dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
> >>>                             attrs);
> >>> +             sg->dma_address = DMA_MAPPING_ERROR;
> >>
> >> There are more DMA API backends than just dma-direct, so while this
> >> might help paper over bugs when SWIOTLB is in use, it's not going to
> >> have any effect when those same bugs are hit under other circumstances.
> >> Once again, the moral of the story is that effort is better spent just
> >> fixing the bugs ;)
> >
> > Thanks for the quick feedback. What is the correct fix? I understand
> > the first half. The NVMe driver should be updated to not call unmap on
> > an address that has already been unmapped within the DMA direct code.
> > Where I'm less certain is how to communicate to the NVMe driver that
> > the mapping failed. In particular, the NVMe code explicitly checks if
> > the first DMA address in the scatter/gather list is set to
> > DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
> > to propagate DMA_MAPPING_ERROR back up to the driver, via the
> > scatter/gather struct?
>
> Erm, you check the return value of dma_map_sg(). If it's zero, the
> request failed; if it's nonzero, that's how many DMA segments you now
> have to process. See Documentation/core-api/dma-api.rst.
>
> The only guarantee offered about the state of the scatterlist itself is
> that if it is successfully mapped, then the dma_address and dma_length
> fields are valid for that many segments, and if that is fewer than the
> total number of physical segments then the next one after the final DMA
> segment will have a dma_length of 0. In particular there are no
> guarantees at all about the state if the mapping was unsuccessful.
>
> If a driver is failing to keep track of the success/failure status and
> later down the line trying to guess what to do with a list that may or
> may not have been mapped, then frankly that driver should be redesigned
> because that is a terrible anti-pattern. At the very very least it
> should explicitly encode its own "known bad" state upon failure that it
> can then reliably recognise later.

Got it now. I'll get to work on a patch for the NVMe driver to fix the
bug. Thanks for all of these pointers. They are immensely helpful.

Thanks,
Marc
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid
@ 2021-01-11 21:36         ` Marc Orr
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Orr @ 2021-01-11 21:36 UTC (permalink / raw)
  To: Robin Murphy; +Cc: hch, m.szyprowski, Jianxiong Gao, iommu, linux-kernel

> >>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> >>> index 0a4881e59aa7..3d9b17fe5771 100644
> >>> --- a/kernel/dma/direct.c
> >>> +++ b/kernel/dma/direct.c
> >>> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> >>>        struct scatterlist *sg;
> >>>        int i;
> >>>
> >>> -     for_each_sg(sgl, sg, nents, i)
> >>> +     for_each_sg(sgl, sg, nents, i) {
> >>>                dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
> >>>                             attrs);
> >>> +             sg->dma_address = DMA_MAPPING_ERROR;
> >>
> >> There are more DMA API backends than just dma-direct, so while this
> >> might help paper over bugs when SWIOTLB is in use, it's not going to
> >> have any effect when those same bugs are hit under other circumstances.
> >> Once again, the moral of the story is that effort is better spent just
> >> fixing the bugs ;)
> >
> > Thanks for the quick feedback. What is the correct fix? I understand
> > the first half. The NVMe driver should be updated to not call unmap on
> > an address that has already been unmapped within the DMA direct code.
> > Where I'm less certain is how to communicate to the NVMe driver that
> > the mapping failed. In particular, the NVMe code explicitly checks if
> > the first DMA address in the scatter/gather list is set to
> > DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
> > to propagate DMA_MAPPING_ERROR back up to the driver, via the
> > scatter/gather struct?
>
> Erm, you check the return value of dma_map_sg(). If it's zero, the
> request failed; if it's nonzero, that's how many DMA segments you now
> have to process. See Documentation/core-api/dma-api.rst.
>
> The only guarantee offered about the state of the scatterlist itself is
> that if it is successfully mapped, then the dma_address and dma_length
> fields are valid for that many segments, and if that is fewer than the
> total number of physical segments then the next one after the final DMA
> segment will have a dma_length of 0. In particular there are no
> guarantees at all about the state if the mapping was unsuccessful.
>
> If a driver is failing to keep track of the success/failure status and
> later down the line trying to guess what to do with a list that may or
> may not have been mapped, then frankly that driver should be redesigned
> because that is a terrible anti-pattern. At the very very least it
> should explicitly encode its own "known bad" state upon failure that it
> can then reliably recognise later.

Got it now. I'll get to work on a patch for the NVMe driver to fix the
bug. Thanks for all of these pointers. They are immensely helpful.

Thanks,
Marc

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-01-11 21:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 15:43 [PATCH] dma: mark unmapped DMA scatter/gather invalid Marc Orr
2021-01-11 15:43 ` Marc Orr via iommu
2021-01-11 16:08 ` Robin Murphy
2021-01-11 16:08   ` Robin Murphy
2021-01-11 18:03   ` Marc Orr
2021-01-11 18:03     ` Marc Orr via iommu
2021-01-11 20:39     ` Robin Murphy
2021-01-11 20:39       ` Robin Murphy
2021-01-11 21:36       ` Marc Orr via iommu
2021-01-11 21:36         ` Marc Orr
2021-01-11 16:15 ` Greg KH
2021-01-11 16:15   ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.