LKML Archive mirror
 help / color / mirror / Atom feed
* Data corruption on serial interface under load
@ 2016-02-04 18:55 Andy Shevchenko
  2016-02-04 20:06 ` Peter Hurley
  2016-02-04 23:15 ` Russell King - ARM Linux
  0 siblings, 2 replies; 9+ messages in thread
From: Andy Shevchenko @ 2016-02-04 18:55 UTC (permalink / raw)
  To: Russell King, Peter Hurley, linux-kernel@vger.kernel.org,
	linux-serial@vger.kernel.org

Hi!

Today I observed interesting bug / feature of uart layer in the kernel.
I do have a setup which connects two identical devices by serial line.
I run data transferring in one direction and got data corruption on
receiver side (in uart layer, not the driver).

Here is the dump from test suite and real data from 8250 registers:

=== 8< ===

Needed 16 reads 0 writes Oh oh, inconsistency at pos 1 (0x1).

Original sample:
00000000: 7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00   .ELF............
00000010: 02 00 03 00 01 00 00 00  19 8d 04 08 34 00 00 00   ............4...
00000020: 2c f2 00 00 00 00 00 00  34 00 20 00 04 00 28 00   ,.......4. ...(.

Received sample:
00000000: 7f 00 45 00 4c 00 46 00  01 00 01 00 01 00 00 00   ..E.L.F.........
00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
00000020: 02 00 00 00 03 00 00 00  01 00 00 00 00 19 8d 04   ................
loops 1 / 1

cts: 0 dsr: 0 rng: 0 dcd: 0 rx: 53434 tx: 0 frame 0 ovr 34201 par: 0
brk: 0 buf_ovrr: 0

=== 8< ===

R 356.360109 IIR 0xc4
R 356.360114 LSR 0x63
R 356.360119 RX 0x7f
R 356.360124 LSR 0x63
R 356.360128 RX 0x45
R 356.360133 LSR 0x63
R 356.360137 RX 0x4c
R 356.360142 LSR 0x63
R 356.360147 RX 0x46
R 356.360151 LSR 0x63
R 356.360156 RX 0x01
R 356.360160 LSR 0x63
R 356.360165 RX 0x01
R 356.360169 LSR 0x63
R 356.360174 RX 0x01

As we can see the data is corrupted on Linux side. Can we somehow fix
this bug/feature?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Data corruption on serial interface under load
  2016-02-04 18:55 Data corruption on serial interface under load Andy Shevchenko
@ 2016-02-04 20:06 ` Peter Hurley
  2016-02-04 22:24   ` Andy Shevchenko
  2016-02-04 23:15 ` Russell King - ARM Linux
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Hurley @ 2016-02-04 20:06 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Russell King, linux-kernel@vger.kernel.org,
	linux-serial@vger.kernel.org

Hi Andy,

On 02/04/2016 10:55 AM, Andy Shevchenko wrote:
> Hi!
> 
> Today I observed interesting bug / feature of uart layer in the kernel.
> I do have a setup which connects two identical devices by serial line.
> I run data transferring in one direction and got data corruption on
> receiver side (in uart layer, not the driver).
> 
> Here is the dump from test suite and real data from 8250 registers:
> 
> === 8< ===
> 
> Needed 16 reads 0 writes Oh oh, inconsistency at pos 1 (0x1).
> 
> Original sample:
> 00000000: 7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00   .ELF............
> 00000010: 02 00 03 00 01 00 00 00  19 8d 04 08 34 00 00 00   ............4...
> 00000020: 2c f2 00 00 00 00 00 00  34 00 20 00 04 00 28 00   ,.......4. ...(.
> 
> Received sample:
> 00000000: 7f 00 45 00 4c 00 46 00  01 00 01 00 01 00 00 00   ..E.L.F.........
> 00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
> 00000020: 02 00 00 00 03 00 00 00  01 00 00 00 00 19 8d 04   ................
> loops 1 / 1
> 
> cts: 0 dsr: 0 rng: 0 dcd: 0 rx: 53434 tx: 0 frame 0 ovr 34201 par: 0
> brk: 0 buf_ovrr: 0
> 
> === 8< ===
> 
> R 356.360109 IIR 0xc4           RDI interrupt
> R 356.360114 LSR 0x63           DR + OE
> R 356.360119 RX 0x7f
> R 356.360124 LSR 0x63           DR + still OE
> R 356.360128 RX 0x45
> R 356.360133 LSR 0x63           DR + still OE
> R 356.360137 RX 0x4c
> R 356.360142 LSR 0x63           DR + still OE
> R 356.360147 RX 0x46
> R 356.360151 LSR 0x63           DR + still OE
> R 356.360156 RX 0x01
> R 356.360160 LSR 0x63           DR + still OE
> R 356.360165 RX 0x01
> R 356.360169 LSR 0x63
> R 356.360174 RX 0x01
> 
> As we can see the data is corrupted on Linux side. Can we somehow fix
> this bug/feature?

Not quite sure what you see as the issue.

1) That is a lot of overruns. Is that part of the test or are the overruns
   a regression?
2) If you mean the NUL bytes for overruns, I could have some functional mode
   mis-branched in the N_TTY line discipline. What are the termios settings
   on the rx side?

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Data corruption on serial interface under load
  2016-02-04 20:06 ` Peter Hurley
@ 2016-02-04 22:24   ` Andy Shevchenko
  2016-02-04 22:27     ` Andy Shevchenko
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Shevchenko @ 2016-02-04 22:24 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Russell King, linux-kernel@vger.kernel.org,
	linux-serial@vger.kernel.org

On Thu, Feb 4, 2016 at 10:06 PM, Peter Hurley <peter@hurleysoftware.com> wrote:
> Hi Andy,
>
> On 02/04/2016 10:55 AM, Andy Shevchenko wrote:
>> Hi!
>>
>> Today I observed interesting bug / feature of uart layer in the kernel.
>> I do have a setup which connects two identical devices by serial line.
>> I run data transferring in one direction and got data corruption on
>> receiver side (in uart layer, not the driver).
>>
>> Here is the dump from test suite and real data from 8250 registers:
>>
>> === 8< ===
>>
>> Needed 16 reads 0 writes Oh oh, inconsistency at pos 1 (0x1).
>>
>> Original sample:
>> 00000000: 7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00   .ELF............
>> 00000010: 02 00 03 00 01 00 00 00  19 8d 04 08 34 00 00 00   ............4...
>> 00000020: 2c f2 00 00 00 00 00 00  34 00 20 00 04 00 28 00   ,.......4. ...(.
>>
>> Received sample:
>> 00000000: 7f 00 45 00 4c 00 46 00  01 00 01 00 01 00 00 00   ..E.L.F.........
>> 00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
>> 00000020: 02 00 00 00 03 00 00 00  01 00 00 00 00 19 8d 04   ................
>> loops 1 / 1
>>
>> cts: 0 dsr: 0 rng: 0 dcd: 0 rx: 53434 tx: 0 frame 0 ovr 34201 par: 0
>> brk: 0 buf_ovrr: 0
>>
>> === 8< ===
>>
>> R 356.360109 IIR 0xc4           RDI interrupt
>> R 356.360114 LSR 0x63           DR + OE
>> R 356.360119 RX 0x7f
>> R 356.360124 LSR 0x63           DR + still OE
>> R 356.360128 RX 0x45
>> R 356.360133 LSR 0x63           DR + still OE
>> R 356.360137 RX 0x4c
>> R 356.360142 LSR 0x63           DR + still OE
>> R 356.360147 RX 0x46
>> R 356.360151 LSR 0x63           DR + still OE
>> R 356.360156 RX 0x01
>> R 356.360160 LSR 0x63           DR + still OE
>> R 356.360165 RX 0x01
>> R 356.360169 LSR 0x63
>> R 356.360174 RX 0x01
>>
>> As we can see the data is corrupted on Linux side. Can we somehow fix
>> this bug/feature?
>
> Not quite sure what you see as the issue.
>
> 1) That is a lot of overruns. Is that part of the test or are the overruns
>    a regression?

This is part of the other problem which I'm investigating.

> 2) If you mean the NUL bytes for overruns, I could have some functional mode
>    mis-branched in the N_TTY line discipline.

Yeah, this one.

> What are the termios settings
>    on the rx side?

I'm using this [1] tool with small patch applied that enables internal
loopback (TCIOM_LOOP).

[1] https://git.breakpoint.cc/cgit/bigeasy/serialcheck.git/

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Data corruption on serial interface under load
  2016-02-04 22:24   ` Andy Shevchenko
@ 2016-02-04 22:27     ` Andy Shevchenko
  2016-02-04 23:16       ` Russell King - ARM Linux
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Shevchenko @ 2016-02-04 22:27 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Russell King, linux-kernel@vger.kernel.org,
	linux-serial@vger.kernel.org

On Fri, Feb 5, 2016 at 12:24 AM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Thu, Feb 4, 2016 at 10:06 PM, Peter Hurley <peter@hurleysoftware.com> wrote:

>>> Original sample:
>>> 00000000: 7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00   .ELF............
>>> 00000010: 02 00 03 00 01 00 00 00  19 8d 04 08 34 00 00 00   ............4...
>>> 00000020: 2c f2 00 00 00 00 00 00  34 00 20 00 04 00 28 00   ,.......4. ...(.
>>>
>>> Received sample:
>>> 00000000: 7f 00 45 00 4c 00 46 00  01 00 01 00 01 00 00 00   ..E.L.F.........
>>> 00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
>>> 00000020: 02 00 00 00 03 00 00 00  01 00 00 00 00 19 8d 04   ................

>> 2) If you mean the NUL bytes for overruns, I could have some functional mode
>>    mis-branched in the N_TTY line discipline.
>
> Yeah, this one.
>
>> What are the termios settings
>>    on the rx side?
>
> I'm using this [1] tool with small patch applied that enables internal
> loopback (TCIOM_LOOP).

Here are the calls

ret = cfsetspeed(&new_term, opts.baudrate);
cfmakeraw(&new_term);
new_term.c_cflag |= CREAD;
new_term.c_cflag &= ~CRTSCTS;
new_term.c_cc[VMIN] = 64;
new_term.c_cc[VTIME] = 8;
...
ret = tcflush(fd, TCIFLUSH);
ret = fcntl(fd, F_SETFL, 0);

> [1] https://git.breakpoint.cc/cgit/bigeasy/serialcheck.git/

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Data corruption on serial interface under load
  2016-02-04 18:55 Data corruption on serial interface under load Andy Shevchenko
  2016-02-04 20:06 ` Peter Hurley
@ 2016-02-04 23:15 ` Russell King - ARM Linux
  2016-02-04 23:19   ` Andy Shevchenko
  1 sibling, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2016-02-04 23:15 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Peter Hurley, linux-kernel@vger.kernel.org,
	linux-serial@vger.kernel.org

On Thu, Feb 04, 2016 at 08:55:48PM +0200, Andy Shevchenko wrote:
> Hi!
> 
> Today I observed interesting bug / feature of uart layer in the kernel.
> I do have a setup which connects two identical devices by serial line.
> I run data transferring in one direction and got data corruption on
> receiver side (in uart layer, not the driver).
> 
> Here is the dump from test suite and real data from 8250 registers:
> 
> === 8< ===
> 
> Needed 16 reads 0 writes Oh oh, inconsistency at pos 1 (0x1).
> 
> Original sample:
> 00000000: 7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00   .ELF............
> 00000010: 02 00 03 00 01 00 00 00  19 8d 04 08 34 00 00 00   ............4...
> 00000020: 2c f2 00 00 00 00 00 00  34 00 20 00 04 00 28 00   ,.......4. ...(.
> 
> Received sample:
> 00000000: 7f 00 45 00 4c 00 46 00  01 00 01 00 01 00 00 00   ..E.L.F.........
> 00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
> 00000020: 02 00 00 00 03 00 00 00  01 00 00 00 00 19 8d 04   ................
> loops 1 / 1
> 
> cts: 0 dsr: 0 rng: 0 dcd: 0 rx: 53434 tx: 0 frame 0 ovr 34201 par: 0
> brk: 0 buf_ovrr: 0
> 
> === 8< ===
> 
> R 356.360109 IIR 0xc4
> R 356.360114 LSR 0x63
> R 356.360119 RX 0x7f

I think the obvious question here is: why is your serial port reporting
overrun errors in loopback mode.

If you have no flow control, I suspect this is likely to happen: if we
try to fill the Tx FIFO, we won't be servicing the port trying to receive
characters.

So if (eg) the port already contains 12 characters in the RX FIFO, and
we load up a full complement of characters into the TX FIFO, the port
will transmit them to the RX side.  As we will not be reading the RX
side (as we're busy loading the TX side), if we fill the RX FIFO, you'll
then get overruns.

Even so, with a dumb 8250 based UART, there's no hardware assisted flow
control, so it's never going to be particularly reliable.  More modern
UARTs have realised this, and have implemented hardware (and software)
flow control mechanisms in hardware to reduce the chances of overruns.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Data corruption on serial interface under load
  2016-02-04 22:27     ` Andy Shevchenko
@ 2016-02-04 23:16       ` Russell King - ARM Linux
  0 siblings, 0 replies; 9+ messages in thread
From: Russell King - ARM Linux @ 2016-02-04 23:16 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Peter Hurley, linux-kernel@vger.kernel.org,
	linux-serial@vger.kernel.org

On Fri, Feb 05, 2016 at 12:27:19AM +0200, Andy Shevchenko wrote:
> Here are the calls
> 
> ret = cfsetspeed(&new_term, opts.baudrate);
> cfmakeraw(&new_term);
> new_term.c_cflag |= CREAD;
> new_term.c_cflag &= ~CRTSCTS;

So no hardware flow control...

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Data corruption on serial interface under load
  2016-02-04 23:15 ` Russell King - ARM Linux
@ 2016-02-04 23:19   ` Andy Shevchenko
  2016-02-05  1:09     ` Russell King - ARM Linux
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Shevchenko @ 2016-02-04 23:19 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Peter Hurley, linux-kernel@vger.kernel.org,
	linux-serial@vger.kernel.org

On Fri, Feb 5, 2016 at 1:15 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Feb 04, 2016 at 08:55:48PM +0200, Andy Shevchenko wrote:
>> Hi!
>>
>> Today I observed interesting bug / feature of uart layer in the kernel.
>> I do have a setup which connects two identical devices by serial line.
>> I run data transferring in one direction and got data corruption on
>> receiver side (in uart layer, not the driver).
>>
>> Here is the dump from test suite and real data from 8250 registers:
>>
>> === 8< ===
>>
>> Needed 16 reads 0 writes Oh oh, inconsistency at pos 1 (0x1).
>>
>> Original sample:
>> 00000000: 7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00   .ELF............
>> 00000010: 02 00 03 00 01 00 00 00  19 8d 04 08 34 00 00 00   ............4...
>> 00000020: 2c f2 00 00 00 00 00 00  34 00 20 00 04 00 28 00   ,.......4. ...(.
>>
>> Received sample:
>> 00000000: 7f 00 45 00 4c 00 46 00  01 00 01 00 01 00 00 00   ..E.L.F.........
>> 00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
>> 00000020: 02 00 00 00 03 00 00 00  01 00 00 00 00 19 8d 04   ................
>> loops 1 / 1
>>
>> cts: 0 dsr: 0 rng: 0 dcd: 0 rx: 53434 tx: 0 frame 0 ovr 34201 par: 0
>> brk: 0 buf_ovrr: 0
>>
>> === 8< ===
>>
>> R 356.360109 IIR 0xc4
>> R 356.360114 LSR 0x63
>> R 356.360119 RX 0x7f
>
> I think the obvious question here is: why is your serial port reporting
> overrun errors in loopback mode.
>
> If you have no flow control, I suspect this is likely to happen: if we
> try to fill the Tx FIFO, we won't be servicing the port trying to receive
> characters.
>
> So if (eg) the port already contains 12 characters in the RX FIFO, and
> we load up a full complement of characters into the TX FIFO, the port
> will transmit them to the RX side.  As we will not be reading the RX
> side (as we're busy loading the TX side), if we fill the RX FIFO, you'll
> then get overruns.
>
> Even so, with a dumb 8250 based UART, there's no hardware assisted flow
> control, so it's never going to be particularly reliable.  More modern
> UARTs have realised this, and have implemented hardware (and software)
> flow control mechanisms in hardware to reduce the chances of overruns.
>

Yeah, above makes sense to me, but that is another issue I'm
investigating. The issue I complained about is additional '\0'
characters (seems uart_insert_char() does this).


> --
> RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Data corruption on serial interface under load
  2016-02-04 23:19   ` Andy Shevchenko
@ 2016-02-05  1:09     ` Russell King - ARM Linux
  2016-02-08  8:51       ` Andy Shevchenko
  0 siblings, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2016-02-05  1:09 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Peter Hurley, linux-kernel@vger.kernel.org,
	linux-serial@vger.kernel.org

On Fri, Feb 05, 2016 at 01:19:44AM +0200, Andy Shevchenko wrote:
> On Fri, Feb 5, 2016 at 1:15 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Thu, Feb 04, 2016 at 08:55:48PM +0200, Andy Shevchenko wrote:
> >> Hi!
> >>
> >> Today I observed interesting bug / feature of uart layer in the kernel.
> >> I do have a setup which connects two identical devices by serial line.
> >> I run data transferring in one direction and got data corruption on
> >> receiver side (in uart layer, not the driver).
> >>
> >> Here is the dump from test suite and real data from 8250 registers:
> >>
> >> === 8< ===
> >>
> >> Needed 16 reads 0 writes Oh oh, inconsistency at pos 1 (0x1).
> >>
> >> Original sample:
> >> 00000000: 7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00   .ELF............
> >> 00000010: 02 00 03 00 01 00 00 00  19 8d 04 08 34 00 00 00   ............4...
> >> 00000020: 2c f2 00 00 00 00 00 00  34 00 20 00 04 00 28 00   ,.......4. ...(.
> >>
> >> Received sample:
> >> 00000000: 7f 00 45 00 4c 00 46 00  01 00 01 00 01 00 00 00   ..E.L.F.........
> >> 00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
> >> 00000020: 02 00 00 00 03 00 00 00  01 00 00 00 00 19 8d 04   ................
> >> loops 1 / 1
> >>
> >> cts: 0 dsr: 0 rng: 0 dcd: 0 rx: 53434 tx: 0 frame 0 ovr 34201 par: 0
> >> brk: 0 buf_ovrr: 0
> >>
> >> === 8< ===
> >>
> >> R 356.360109 IIR 0xc4
> >> R 356.360114 LSR 0x63
> >> R 356.360119 RX 0x7f
> >
> > I think the obvious question here is: why is your serial port reporting
> > overrun errors in loopback mode.
> >
> > If you have no flow control, I suspect this is likely to happen: if we
> > try to fill the Tx FIFO, we won't be servicing the port trying to receive
> > characters.
> >
> > So if (eg) the port already contains 12 characters in the RX FIFO, and
> > we load up a full complement of characters into the TX FIFO, the port
> > will transmit them to the RX side.  As we will not be reading the RX
> > side (as we're busy loading the TX side), if we fill the RX FIFO, you'll
> > then get overruns.
> >
> > Even so, with a dumb 8250 based UART, there's no hardware assisted flow
> > control, so it's never going to be particularly reliable.  More modern
> > UARTs have realised this, and have implemented hardware (and software)
> > flow control mechanisms in hardware to reduce the chances of overruns.
> >
> 
> Yeah, above makes sense to me, but that is another issue I'm
> investigating. The issue I complained about is additional '\0'
> characters (seems uart_insert_char() does this).

Firstly, let's establish why this happens.  When an overflow error occurs,
what has happened is that a character was received by the hardware which
it had no room in its receive FIFO, and so the character is discarded.
However, the UART records that act in a flag.

Sensible ports attach the flag to the preceding character so that software
can read the successfully received characters without needing to care for
the overflow.

The Linux behaviour on encountering an overflow condition is to "undo"
the discarding: a NUL character is inserted into the stream which is
marked with a TTY_OVERRUN status.  (Standard Linux behaviour is to mark
the in-error characters with their error status if they are to be
received.)

When in-band error reporting to the application is disabled, this appears
as a plain NUL character.

I think the issue here is "if they are to be received".  If you have
cleared IGNBRK, break characters will be reported as NUL character.  If
IGNPAR is clear, a character with incorrect parity could be reported to
the application as a NUL character (it depends on other settings.)

Overflow is not covered in the standard termios modes, and it's been
standard Linux behaviour to pass these through unless both IGNPAR and
IGNBRK are set.

cfmakeraw clears IGNPAR, which means it's not in "real raw" mode.  If
you want to ignore parity, break, framing and overflow errors in the
resulting byte stream, you need to ensure IGNPAR and IGNBRK are both
set.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Data corruption on serial interface under load
  2016-02-05  1:09     ` Russell King - ARM Linux
@ 2016-02-08  8:51       ` Andy Shevchenko
  0 siblings, 0 replies; 9+ messages in thread
From: Andy Shevchenko @ 2016-02-08  8:51 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Peter Hurley, linux-kernel@vger.kernel.org,
	linux-serial@vger.kernel.org

On Fri, Feb 5, 2016 at 3:09 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Feb 05, 2016 at 01:19:44AM +0200, Andy Shevchenko wrote:
>> On Fri, Feb 5, 2016 at 1:15 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > On Thu, Feb 04, 2016 at 08:55:48PM +0200, Andy Shevchenko wrote:
>> >> Hi!
>> >>
>> >> Today I observed interesting bug / feature of uart layer in the kernel.
>> >> I do have a setup which connects two identical devices by serial line.
>> >> I run data transferring in one direction and got data corruption on
>> >> receiver side (in uart layer, not the driver).
>> >>
>> >> Here is the dump from test suite and real data from 8250 registers:
>> >>
>> >> === 8< ===
>> >>
>> >> Needed 16 reads 0 writes Oh oh, inconsistency at pos 1 (0x1).
>> >>
>> >> Original sample:
>> >> 00000000: 7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00   .ELF............
>> >> 00000010: 02 00 03 00 01 00 00 00  19 8d 04 08 34 00 00 00   ............4...
>> >> 00000020: 2c f2 00 00 00 00 00 00  34 00 20 00 04 00 28 00   ,.......4. ...(.
>> >>
>> >> Received sample:
>> >> 00000000: 7f 00 45 00 4c 00 46 00  01 00 01 00 01 00 00 00   ..E.L.F.........
>> >> 00000010: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
>> >> 00000020: 02 00 00 00 03 00 00 00  01 00 00 00 00 19 8d 04   ................
>> >> loops 1 / 1
>> >>
>> >> cts: 0 dsr: 0 rng: 0 dcd: 0 rx: 53434 tx: 0 frame 0 ovr 34201 par: 0
>> >> brk: 0 buf_ovrr: 0
>> >>
>> >> === 8< ===
>> >>
>> >> R 356.360109 IIR 0xc4
>> >> R 356.360114 LSR 0x63
>> >> R 356.360119 RX 0x7f
>> >
>> > I think the obvious question here is: why is your serial port reporting
>> > overrun errors in loopback mode.
>> >
>> > If you have no flow control, I suspect this is likely to happen: if we
>> > try to fill the Tx FIFO, we won't be servicing the port trying to receive
>> > characters.
>> >
>> > So if (eg) the port already contains 12 characters in the RX FIFO, and
>> > we load up a full complement of characters into the TX FIFO, the port
>> > will transmit them to the RX side.  As we will not be reading the RX
>> > side (as we're busy loading the TX side), if we fill the RX FIFO, you'll
>> > then get overruns.
>> >
>> > Even so, with a dumb 8250 based UART, there's no hardware assisted flow
>> > control, so it's never going to be particularly reliable.  More modern
>> > UARTs have realised this, and have implemented hardware (and software)
>> > flow control mechanisms in hardware to reduce the chances of overruns.
>> >
>>
>> Yeah, above makes sense to me, but that is another issue I'm
>> investigating. The issue I complained about is additional '\0'
>> characters (seems uart_insert_char() does this).
>
> Firstly, let's establish why this happens.  When an overflow error occurs,
> what has happened is that a character was received by the hardware which
> it had no room in its receive FIFO, and so the character is discarded.
> However, the UART records that act in a flag.
>
> Sensible ports attach the flag to the preceding character so that software
> can read the successfully received characters without needing to care for
> the overflow.
>
> The Linux behaviour on encountering an overflow condition is to "undo"
> the discarding: a NUL character is inserted into the stream which is
> marked with a TTY_OVERRUN status.  (Standard Linux behaviour is to mark
> the in-error characters with their error status if they are to be
> received.)
>
> When in-band error reporting to the application is disabled, this appears
> as a plain NUL character.
>
> I think the issue here is "if they are to be received".  If you have
> cleared IGNBRK, break characters will be reported as NUL character.  If
> IGNPAR is clear, a character with incorrect parity could be reported to
> the application as a NUL character (it depends on other settings.)
>
> Overflow is not covered in the standard termios modes, and it's been
> standard Linux behaviour to pass these through unless both IGNPAR and
> IGNBRK are set.
>
> cfmakeraw clears IGNPAR, which means it's not in "real raw" mode.  If
> you want to ignore parity, break, framing and overflow errors in the
> resulting byte stream, you need to ensure IGNPAR and IGNBRK are both
> set.

Thank you for such a detailed explanation!

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-02-08  8:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-04 18:55 Data corruption on serial interface under load Andy Shevchenko
2016-02-04 20:06 ` Peter Hurley
2016-02-04 22:24   ` Andy Shevchenko
2016-02-04 22:27     ` Andy Shevchenko
2016-02-04 23:16       ` Russell King - ARM Linux
2016-02-04 23:15 ` Russell King - ARM Linux
2016-02-04 23:19   ` Andy Shevchenko
2016-02-05  1:09     ` Russell King - ARM Linux
2016-02-08  8:51       ` Andy Shevchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).