All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Norris <computersforpeace@gmail.com>
To: Stefan Agner <stefan@agner.ch>
Cc: Richard Weinberger <richard@nod.at>,
	Bhuvanchandra DV <bhuvanchandradv@gmail.com>,
	linux-mtd@lists.infradead.org
Subject: Re: UBIFS errors when file-system is full
Date: Mon, 31 Aug 2015 18:43:15 -0700	[thread overview]
Message-ID: <20150901014315.GF81844@google.com> (raw)
In-Reply-To: <812d355821f585decf92a66628caef7b@agner.ch>

Hi Stefan,

On Mon, Aug 31, 2015 at 05:45:47PM -0700, Stefan Agner wrote:
> On 2015-08-12 00:27, Richard Weinberger wrote:
> > Am 12.08.2015 um 09:01 schrieb Stefan Agner:
> >> [also added Brian to the discussion, since he had a look into that
> >> driver before]
> > 
> > Good idea, maybe Brian has an idea.

Sorry, didn't look until now.

> >> I worked on the driver since quite some time, currently v10 is in
> >> review. With this issue in mind, I went through the driver however I
> >> currently can't see an issue.
> >>
> >> The error position is always page aligned, but at different pages. We
> >> printed the reread buffers once: It seems that one page lands on flash
> >> twice. My guess is that the second page doesn't get transmitted
> >> properly, while the new column/row gets transmitted and
> >> NAND_CMD_PAGEPROG executed... Hence the same buffer would be written to
> >> the device again.
> >>
> >> The NFC IP in Vybrid (vf610) has a higher level programming model which
> >> takes care of the command sequencing. Therefore some callbacks are not
> >> actually sending a command to the device (e.g. NAND_CMD_SEQIN) since
> >> this will be done one command later, on in NAND_CMD_PAGEPROG. Now, of
> >> course, the driver relies heavily on not being interrupted by other
> >> requests in between, (also not read!) but I thought that this is taken
> >> care of by the MTD subsystem? So for me it is a bit hard to spot the
> >> error since I'm always unsure whether the assumptions regarding
> >> locking/exclusiveness between the calls is really guaranteed...
> > 
> > NAND access is serialized using nand_get_device() and nand_release_device()
> > in nand_base.c, so serialization should be fine.

Right, on a macro level. But I suspect you may have serialization
problems within your driver routines themselves. See below.

> I did some more testing with v11 of this driver, and it seems when I
> continuously reset the system, I get read errors from time to time. The
> read errors end up in various errors, I have seen ubifs_read_node: bad
> node type or ubifs_tnc_read_node: bad key in node. However, particularly
> often and interesting are ubifs_decompress: cannot decompress errors,
> such as this one. It seems that same page is read twice (or has been
> written twice, see data pattern at offset 0x00000000 and 0x00000800).
> 
[...]
> 
> I checked the code regarding glitches on read/write side which could
> lead to page number not getting updated or something along this line, so
> far I could not nail the issue.
> 
> Brian, with the pattern above in mind, do you see possible issues in the
> driver?

I don't have any particular knowledge about that pattern. Regarding
patterns, I've mostly been worried about incomplete "check for 0xff"
algorithms. Doesn't seem to be relevant here.

I don't know too much of this IP other than what I glean from general
NAND code review. With that in mind, this snippet does look a bit
suspect:

+static void vf610_nfc_done(struct vf610_nfc *nfc)
+{
+	unsigned long timeout = msecs_to_jiffies(100);
+
+	/*
+	 * Barrier is needed after this write. This write need
+	 * to be done before reading the next register the first
+	 * time.
+	 * vf610_nfc_set implicates such a barrier by using writel
+	 * to write to the register.
+	 */
+	vf610_nfc_set(nfc, NFC_IRQ_STATUS, IDLE_EN_BIT);
+	vf610_nfc_set(nfc, NFC_FLASH_CMD2, START_BIT);
+
+	if (!(vf610_nfc_read(nfc, NFC_IRQ_STATUS) & IDLE_IRQ_BIT)) {

^^^ this line

+		if (!wait_for_completion_timeout(&nfc->cmd_done, timeout))
+			dev_warn(nfc->dev, "Timeout while waiting for BUSY.\n");
+	}
+	vf610_nfc_clear_status(nfc);
+}

Why do you actually need to check the idle bit? If you need to read it
to clear out some FIFO, then that's fine -- just read it, but don't use
it as a second condition. The complete()/wait_for_completion()
synchronization should be sufficient on its own. (If not, then I think
you have other problems.)

The NFC_IRQ_STATUS-based condition looks like it could lead to races
because your interrupt may fire between setting and checking
the idle bit. So the IRQ handler will increment the completion struct
(cmd_done), but you *won't* be doing the corresponding decrement via
wait_for_completion(). If you don't do that... then the subsequent
wait_for_completion() will immediately succeed.

Anyway, that's a quick and possibly disorganized explanation of my
initial thoughts. Let me know if you
(1) don't understand me;
(2) disagree; or
(3) have tested my suggested fix and still see the behavior.

I can take another look if we're on to option (2) or (3) :)

Regards,
Brian

  reply	other threads:[~2015-09-01  1:43 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-13 13:55 UBIFS errors when file-system is full Bhuvanchandra DV
2015-07-13 14:09 ` Richard Weinberger
2015-07-14  4:23   ` Bhuvanchandra DV
2015-07-14  6:13     ` Richard Weinberger
2015-07-14  6:30       ` Bhuvanchandra DV
2015-07-14  6:32         ` Richard Weinberger
2015-07-14  8:29           ` Bhuvanchandra DV
2015-07-14  8:42             ` Richard Weinberger
2015-07-14 10:08               ` Bhuvanchandra DV
2015-07-14 11:06                 ` Richard Weinberger
     [not found]                   ` <55A74812.2020906@gmail.com>
     [not found]                     ` <55A74AA1.2000000@nod.at>
     [not found]                       ` <55A7540C.3050900@nod.at>
     [not found]                         ` <55A7592D.6010906@gmail.com>
     [not found]                           ` <55A75A05.7040603@nod.at>
2015-07-16  7:33                             ` Bhuvanchandra DV
2015-07-21  6:04                             ` Bhuvanchandra DV
2015-07-21  6:14                               ` Richard Weinberger
2015-07-22  7:10                                 ` Bhuvanchandra DV
2015-07-22  7:20                                   ` Richard Weinberger
2015-07-24 14:43                                     ` Bhuvanchandra DV
2015-07-24 16:51                                       ` Richard Weinberger
2015-07-31 16:47                                         ` Bhuvanchandra DV
2015-08-01  6:36                                           ` Richard Weinberger
2015-08-06 10:31                                             ` Bhuvanchandra DV
2015-08-07 12:37                                               ` Richard Weinberger
2015-08-12  7:01                                                 ` Stefan Agner
2015-08-12  7:27                                                   ` Richard Weinberger
2015-09-01  0:45                                                     ` Stefan Agner
2015-09-01  1:43                                                       ` Brian Norris [this message]
2015-09-01  2:20                                                         ` Stefan Agner
2015-09-02 19:58                                                         ` Stefan Agner
2015-09-02 20:13                                                           ` Brian Norris
2015-09-02 20:41                                                             ` Stefan Agner
2015-09-07 13:47                                                           ` Bhuvanchandra
2015-09-11  4:03                                           ` Bhuvanchandra
2015-09-12  9:39                                             ` Richard Weinberger
2015-09-13  4:45                                               ` Bhuvanchandra
2015-07-16  7:35                     ` Fwd: " Bhuvanchandra DV

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150901014315.GF81844@google.com \
    --to=computersforpeace@gmail.com \
    --cc=bhuvanchandradv@gmail.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=richard@nod.at \
    --cc=stefan@agner.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.