From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pantelis Antoniou Date: Mon, 21 Sep 2015 18:32:51 +0300 Subject: [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds In-Reply-To: <20150918192758.GQ26226@bill-the-cat> References: <1440769821-24005-1-git-send-email-l.majewski@samsung.com> <1442225716.9459.11.camel@synopsys.com> <20150914132220.128a988c@amdc2363> <201509141536.17419.marex@denx.de> <20150917164333.10fdabec@amdc2363> <20150918003149.GG26226@bill-the-cat> <20150918093247.7b016180@amdc2363> <20150918192758.GQ26226@bill-the-cat> Message-ID: List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de Hi Tom, > On Sep 18, 2015, at 22:27 , Tom Rini wrote: > > On Fri, Sep 18, 2015 at 09:32:47AM +0200, Lukasz Majewski wrote: >> Hi Tom, >> >>> On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote: >>> >>>> Hi Tom, >>>> >>>>> On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski >>>>> wrote: >>>>>> Hi Alexey, >>>>>> >>>>>>> Hi Marek, Lukasz, >>>>>>> >>>>>>> On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote: >>>>>>>> On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz >>>>>>>> Majewski wrote: >>>>>>>>> Hi Marek, >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> [...] >>>>>>>> >>>>>>>>>>>> Still we need to fix regression first with virtually >>>>>>>>>>>> infinite timeout :) I would even thing that simple >>>>>>>>>>>> revert of Marek's patch may make sense for now. >>>>>>>>>>> >>>>>>>>>>> +1 - unfortunately there were some other patches >>>>>>>>>>> applied to this particular patch. Simple revert might >>>>>>>>>>> be a bit tricky here. >>>>>>>>>> >>>>>>>>>> -1 - In case the card gets removed during the DMA >>>>>>>>>> transfer and the board doesn't have a watchdog, it will >>>>>>>>>> get stuck indefinitelly. >>>>>>>>> >>>>>>>>> I'm just wondering here - why the indefinite loop was >>>>>>>>> working previously? Was anybody complaining (on the ML) >>>>>>>>> about the problem of removing the SD card when some >>>>>>>>> operation is ongoing? >>>>>>>> >>>>>>>> It worked for me for all the workloads I used. Noone was >>>>>>>> complaining. >>>>>>> >>>>>>> The same story here - previous code with infinite loop was >>>>>>> working for my boards. And now I do see a problem with pretty >>>>>>> simple scenario that we do use in our products. >>>>>>> >>>>>>>>> The problem with potential removal of SD card (after >>>>>>>>> booting the board) is with us for quite long time. Even >>>>>>>>> with indefinite loop (without your patch) we also could >>>>>>>>> "hang" the board if the SD card was removed during a >>>>>>>>> transfer. >>>>>>>> >>>>>>>> Which is why we should weed out the unbounded loops. >>>>>>>> >>>>>>>>>> We >>>>>>>>>> absolutelly don't want this sort of behavior in U-Boot. >>>>>>>>>> I understand that this is the easiest way for everyone >>>>>>>>>> to achieve some sort of "working" solution, but it is >>>>>>>>>> definitelly not the correct one. While I do agree to >>>>>>>>>> increasing the timeout, I do not agree to unbounded >>>>>>>>>> loops, sorry. >>>>>>>>> >>>>>>>>> We have agreed to not agree :-) >>>>>>>> >>>>>>>> Yes :-) >>>>>>> >>>>>>> The first thing I care is working U-Boot v2015.10 out of the >>>>>>> box on my boards. And so I may agree on any temporary >>>>>>> solution. I see it as timeout value either being infinite or >>>>>>> obviously very high like 60 seconds. >>>>>>> >>>>>>> 60 seconds might sound stupid but my thought behind this is to >>>>>>> make sure even long transfers succeed. Imagine 100 Mb rootfs >>>>>>> or update file downloaded from slow SD-card. >>>>>> >>>>>> Transfer of rootfs to SD-card (downloaded to memory via tftp) is >>>>>> definitely valid scenario. >>>>>> >>>>>>>>>>>> From both points of view for keeping history >>>>>>>>>>>> clean (compared to stacked fixes/workarounds) and >>>>>>>>>>>> from removal of regression root cause. >>>>>>>>>>> >>>>>>>>>>> As I said before - +1 from me. >>>>>>>>>> >>>>>>>>>> As I said before, -1 from me. Btw. did anything regress >>>>>>>>>> in here? To me, this seems like a newly discovered >>>>>>>>>> bug ... >>>>>>>>> >>>>>>>>> Yes, this is a bug. We had similar problem with Samsung's >>>>>>>>> SDHCI, before we switched to dw_mmc. This issue is new at >>>>>>>>> dw_mmc. >>>>>>>>> >>>>>>>>>>>> It's not that I like to have infinite loops but >>>>>>>>>>>> given previous implementation worked fine for >>>>>>>>>>>> people in the previous U-Boot release. >>>>>>>>>>> >>>>>>>>>>> Good justification >>>>>>>>>> >>>>>>>>>> It is never a justified to return to a potentially >>>>>>>>>> problematic version >>>>>>>>> >>>>>>>>> IMHO revering the change (before the release) is from the >>>>>>>>> software development point of view better solution than >>>>>>>>> adding some heuristic delta to timeout. >>>>>>>>> >>>>>>>>>> for the sake of getting some sort of crappy hardware >>>>>>>>>> operational. >>>>>>>>> >>>>>>>>> Unfortunately this "crappy hardware" is pervasive and we >>>>>>>>> cannot do anything about it. >>>>>>>>> >>>>>>>>> To sum up (my point of view): >>>>>>>>> 1. The best would be to revert the patch - but if simple >>>>>>>>> "git revert" is not working then, >>>>>>> >>>>>>> Well even if clean revert won't work we may do manual tweaks >>>>>>> so that functionally it is "revert". If of any interest I may >>>>>>> come up with that sort of patch. >>>>>>> >>>>>>>>> 2. We should increase the timeout (with my patch) for >>>>>>>>> v2015.10 release >>>>>>> >>>>>>> If everybody is OK with that let's go do it. Because release >>>>>>> is around the corner and I don't want to explain each and >>>>>>> every user how to fix their new problem. >>>>>> >>>>>> v2015.10 correctness is crucial here. >>> >>> Yes. >>> >>>>>>>> Let's do this for the sake of crappy cards. >>>>>>>> >>>>>>>>> 3. After release we can devise some kind of solution >>>>>>>>> 4. It is a good topic for U-boot's minisummit discussion >>>>>>>>> at Dublin >>>>>>>>> >>>>>>>>> Marek, Alexey, Tom, Pantelis what do you think? >>>>>>>> >>>>>>>> I think yes. >>>>>>> >>>>>>> What's important we need to make sure Tom is aware of this >>>>>>> situation and he won't cut a release until our fix is in place >>>>>>> and all involved parties reported their happiness. >>>>>> >>>> >>>> >>>>>> I also think that Tom should speak up on this issue. >>>> >>>> Tom, could you give your opinion on this issue? >>> >>> Well, is there a concensus patch now? >>> >> >> There isn't a consensus patch. >> >> There are two options: >> 1. Try to revert changes (which remove endless loop) >> >> 2. Increase the delay to e.g. 4 minutes (as it is done in Linux) before >> v2015.10 and provide correct solution (based on internal SD card >> information) after the release. > > OK, lets go with #2. > Send me a patch with the delay to 4mins please (blergh). > -- > Tom Regards ? Pantelis