From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751417AbcBHIv3 (ORCPT ); Mon, 8 Feb 2016 03:51:29 -0500 Received: from mail-yk0-f171.google.com ([209.85.160.171]:36166 "EHLO mail-yk0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750916AbcBHIv1 (ORCPT ); Mon, 8 Feb 2016 03:51:27 -0500 MIME-Version: 1.0 In-Reply-To: <20160205010906.GK10826@n2100.arm.linux.org.uk> References: <20160204231528.GI10826@n2100.arm.linux.org.uk> <20160205010906.GK10826@n2100.arm.linux.org.uk> Date: Mon, 8 Feb 2016 10:51:26 +0200 Message-ID: Subject: Re: Data corruption on serial interface under load From: Andy Shevchenko To: Russell King - ARM Linux Cc: Peter Hurley , "linux-kernel@vger.kernel.org" , "linux-serial@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 5, 2016 at 3:09 AM, Russell King - ARM Linux wrote: > On Fri, Feb 05, 2016 at 01:19:44AM +0200, Andy Shevchenko wrote: >> On Fri, Feb 5, 2016 at 1:15 AM, Russell King - ARM Linux >> wrote: >> > On Thu, Feb 04, 2016 at 08:55:48PM +0200, Andy Shevchenko wrote: >> >> Hi! >> >> >> >> Today I observed interesting bug / feature of uart layer in the kernel. >> >> I do have a setup which connects two identical devices by serial line. >> >> I run data transferring in one direction and got data corruption on >> >> receiver side (in uart layer, not the driver). >> >> >> >> Here is the dump from test suite and real data from 8250 registers: >> >> >> >> === 8< === >> >> >> >> Needed 16 reads 0 writes Oh oh, inconsistency at pos 1 (0x1). >> >> >> >> Original sample: >> >> 00000000: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 .ELF............ >> >> 00000010: 02 00 03 00 01 00 00 00 19 8d 04 08 34 00 00 00 ............4... >> >> 00000020: 2c f2 00 00 00 00 00 00 34 00 20 00 04 00 28 00 ,.......4. ...(. >> >> >> >> Received sample: >> >> 00000000: 7f 00 45 00 4c 00 46 00 01 00 01 00 01 00 00 00 ..E.L.F......... >> >> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> >> 00000020: 02 00 00 00 03 00 00 00 01 00 00 00 00 19 8d 04 ................ >> >> loops 1 / 1 >> >> >> >> cts: 0 dsr: 0 rng: 0 dcd: 0 rx: 53434 tx: 0 frame 0 ovr 34201 par: 0 >> >> brk: 0 buf_ovrr: 0 >> >> >> >> === 8< === >> >> >> >> R 356.360109 IIR 0xc4 >> >> R 356.360114 LSR 0x63 >> >> R 356.360119 RX 0x7f >> > >> > I think the obvious question here is: why is your serial port reporting >> > overrun errors in loopback mode. >> > >> > If you have no flow control, I suspect this is likely to happen: if we >> > try to fill the Tx FIFO, we won't be servicing the port trying to receive >> > characters. >> > >> > So if (eg) the port already contains 12 characters in the RX FIFO, and >> > we load up a full complement of characters into the TX FIFO, the port >> > will transmit them to the RX side. As we will not be reading the RX >> > side (as we're busy loading the TX side), if we fill the RX FIFO, you'll >> > then get overruns. >> > >> > Even so, with a dumb 8250 based UART, there's no hardware assisted flow >> > control, so it's never going to be particularly reliable. More modern >> > UARTs have realised this, and have implemented hardware (and software) >> > flow control mechanisms in hardware to reduce the chances of overruns. >> > >> >> Yeah, above makes sense to me, but that is another issue I'm >> investigating. The issue I complained about is additional '\0' >> characters (seems uart_insert_char() does this). > > Firstly, let's establish why this happens. When an overflow error occurs, > what has happened is that a character was received by the hardware which > it had no room in its receive FIFO, and so the character is discarded. > However, the UART records that act in a flag. > > Sensible ports attach the flag to the preceding character so that software > can read the successfully received characters without needing to care for > the overflow. > > The Linux behaviour on encountering an overflow condition is to "undo" > the discarding: a NUL character is inserted into the stream which is > marked with a TTY_OVERRUN status. (Standard Linux behaviour is to mark > the in-error characters with their error status if they are to be > received.) > > When in-band error reporting to the application is disabled, this appears > as a plain NUL character. > > I think the issue here is "if they are to be received". If you have > cleared IGNBRK, break characters will be reported as NUL character. If > IGNPAR is clear, a character with incorrect parity could be reported to > the application as a NUL character (it depends on other settings.) > > Overflow is not covered in the standard termios modes, and it's been > standard Linux behaviour to pass these through unless both IGNPAR and > IGNBRK are set. > > cfmakeraw clears IGNPAR, which means it's not in "real raw" mode. If > you want to ignore parity, break, framing and overflow errors in the > resulting byte stream, you need to ensure IGNPAR and IGNBRK are both > set. Thank you for such a detailed explanation! -- With Best Regards, Andy Shevchenko