Hi Sebastian, On Wed, Apr 10, 2024 at 11:55:55AM +0200, Sebastian Fricke wrote: > Hey Matthijs, > > On 07.04.2024 23:16, Matthijs Kooijman wrote: > > Hi, > > > > I've been chasing a decoder bug in the cedrus h264 hardware decoder and after a > > few days have been able to work around it by removing the > > DMA_ATTR_NO_KERNEL_MAPPING argument when allocating the neighbour info dma > > buffer (full patch at the end of the mail): > > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > @@ -581,7 +581,7 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx) > > ctx->codec.h264.neighbor_info_buf = > > dma_alloc_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE, > > &ctx->codec.h264.neighbor_info_buf_dma, > > - GFP_KERNEL, DMA_ATTR_NO_KERNEL_MAPPING); > > + GFP_KERNEL, 0); > > > > This does not seem like a proper fix to me (or maybe it is - I do not really > > understand the problem yet), but it very effective. > > thanks for the patch, I don't think we can take it as due to, as you > highlighted, it relies on unknown side effects. Perhaps if you or > someone else in the community can investigate this further and answer > some more questions we can move it forward. > > Some questions to help move the discussions along: > > - Does the platform have an IOMMU? > - What does the IOMMU driver do when it sees the > DMA_ATTR_NO_KERNEL_MAPPING flag? > - On the platform in question do coherent allocations get handled with > cached snooped mappings or uncached? > - Does changing that flag affect any cacheablility > - Does the codec block expect any particular intialized data in that > buffer? I have no idea about any of these, but I agree that they are good questions to ask. I've CC'd linux-sunxi@lists.linux.dev in this mail, maybe someone there can answer these questions for the Allwinner H3 platform. > > The actual problem is that sometimes (about 10%-50%) the first frame > > of a h264 stream incorrectly decodes, producing all-zero data in the bottom > > part of the frame. See attached error-green.png for an example. In some even > > more rare cases, more subtle decoding errors would occur, for example a lighter > > rectangle in the bottom left dark blue band in error-blue-bar.png, or some > > misplaced pixels in the bottom right "noise" area (compare error-noise.png with > > error-blue-bar.png to see). > > > > To reproduce: > > > > # Create a h264-encoded 320x240 single-frame stream > > gst-launch-1.0 videotestsrc num-buffers=1 ! openh264enc ! filesink location=test.h264 > > # Decode it again using hardware decoding (repeat until seeing a broken frame) > > gst-launch-1.0 filesrc location=test.h264 ! v4l2slh264dec ! videoconvert ! pngenc ! filesink location=test.png > > > > I've been testing on an Orange Pi PC with an allwinner H3 running > > Armbian 22.04 (based on Ubuntu Jammy), a 6.7.12 kernel (mostly with > > Armbian-patched, but also clean mainline) and gstreamer > > 1.20.3. The problem also occurred on older kernel versions (tested clean > > mainline down to 6.2 and then a bit earlier, but 6.1 mainline would > > not boot for me). I have also observed the problem (or at least the same > > symptom) on the Armbian-patched 6.1 kernel, but there it was extremely unlikely > > (saw it once in 700 or so playbacks). > > > > I've seen three variants of this issue: > > > > 1. With the 320x240 test pattern, showing the green area in the frame. > > In this case, the decoder would actually return an error IRQ status > > (VE_H264_STATUS register would be 0x10010003 or 0x70010003, instead > > of the normal 0x60000001, bit 0 is VE_H264_STATUS_SLICE_DECODE_INT, > > bit 1 is VE_H264_STATUS_DECODE_ERR_INT). > > > > 2. With the 320x240 test pattern, showing more subtle decoding errors (as > > shown in attachments). In these cases, the IRQ status would not indicate a > > decoding error. > > > > 3. With another test video (1920x1080) that I cannot share, there would > > be the same green areas (occasionally also green squares splattered > > around the frame), but *no* decoding error in the IRQ status. > > > > None of these three issues occur anymore with the workaround applied (both on > > clean 6.7.12 and on top of the Armbian patches), the gstreamer decoding then > > produces bit-for-bit identical results every time (tested hundreds of > > decodings). > > > > > > I have no clue why removing DMA_ATTR_NO_KERNEL_MAPPING from this > > allocation helps - I just tried it on a whim when I nothing else seemed > > to help. I suspect that mapping the memory into kernel space might have > > some side effect that helps, but I am not familiar enough with either > > the cedrus hardware or the DMA system to tell. > > > > I initially thought there might be an alignment issue and tried > > increasing the allocation sizes of all DMA buffers to 256K (which > > I think enforces a bigger alignment), but that did not help. The reason > > I was looking at memory allocation (and also timing differences, but no > > luck there) was that bisecting pointed me at commit fec94f8c995 ("media: > > cedrus: h264: Optimize mv col buffer allocation") which seemed to make > > the problem 2-4× likely to occur (but looking back, I think I might have > > been bisecting noise there). > > > > > > I hope you folks can make more sense of this to figure out a proper fix. > > For the short term my problem is solved (I'll just apply this patch and ship > > it), so will not be investing more time right now. > > > > If there's anything I can contribute, just let me know (I have done some tests > > with extra debug output added to the kernel, and a testscript for repeatedly > > testing the decoding that I could share, and I am happy to test any > > patches/hypothesis that you have). > > > > Gr. > > > > Matthijs > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > index dfb401df138..7cd3b6a5434 100644 > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > @@ -581,7 +581,7 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx) > > ctx->codec.h264.neighbor_info_buf = > > dma_alloc_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE, > > &ctx->codec.h264.neighbor_info_buf_dma, > > - GFP_KERNEL, DMA_ATTR_NO_KERNEL_MAPPING); > > + GFP_KERNEL, 0); > > if (!ctx->codec.h264.neighbor_info_buf) { > > ret = -ENOMEM; > > goto err_pic_buf; > > @@ -634,7 +634,7 @@ static int cedrus_h264_start(struct cedrus_ctx *ctx) > > dma_free_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE, > > ctx->codec.h264.neighbor_info_buf, > > ctx->codec.h264.neighbor_info_buf_dma, > > - DMA_ATTR_NO_KERNEL_MAPPING); > > + 0); > > > > err_pic_buf: > > dma_free_attrs(dev->dev, ctx->codec.h264.pic_info_buf_size, > > @@ -670,7 +670,7 @@ static void cedrus_h264_stop(struct cedrus_ctx *ctx) > > dma_free_attrs(dev->dev, CEDRUS_NEIGHBOR_INFO_BUF_SIZE, > > ctx->codec.h264.neighbor_info_buf, > > ctx->codec.h264.neighbor_info_buf_dma, > > - DMA_ATTR_NO_KERNEL_MAPPING); > > + 0); > > dma_free_attrs(dev->dev, ctx->codec.h264.pic_info_buf_size, > > ctx->codec.h264.pic_info_buf, > > ctx->codec.h264.pic_info_buf_dma,