From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp73.ord1c.emailsrvr.com ([108.166.43.73]:47271 "EHLO smtp73.ord1c.emailsrvr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755154AbbFQNrI (ORCPT ); Wed, 17 Jun 2015 09:47:08 -0400 Received: from smtp26.relay.ord1c.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp26.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id 4DC8A38036D for ; Wed, 17 Jun 2015 09:47:07 -0400 (EDT) Received: from smtp26.relay.ord1c.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp26.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id 3EB1838032D for ; Wed, 17 Jun 2015 09:47:07 -0400 (EDT) Received: by smtp26.relay.ord1c.emailsrvr.com (Authenticated sender: vincent-AT-up4.tv) with ESMTPSA id 0318938036D for ; Wed, 17 Jun 2015 09:47:01 -0400 (EDT) Content-Type: multipart/signed; boundary="Apple-Mail=_DAC8D7E7-0706-41ED-AE7A-4BA007AB30CD"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: RAID10 Balancing Request for Comments and Advices From: Vincent Olivier In-Reply-To: Date: Wed, 17 Jun 2015 09:46:50 -0400 Message-Id: <74449C35-BA4E-4476-9EA1-EFE66312AFFA@up4.com> References: <1434456557.89597618@apps.rackspace.com> <20150616122545.GI9850@carfax.org.uk> <61CBE6C4-0D06-4F16-B522-4DBB756FBC31@up4.com> To: linux-btrfs@vger.kernel.org Sender: linux-btrfs-owner@vger.kernel.org List-ID: --Apple-Mail=_DAC8D7E7-0706-41ED-AE7A-4BA007AB30CD Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jun 16, 2015, at 7:58 PM, Duncan <1i5t5.duncan@cox.net> wrote: >=20 > Vincent Olivier posted on Tue, 16 Jun 2015 09:34:29 -0400 as = excerpted: >=20 >=20 >>> On Jun 16, 2015, at 8:25 AM, Hugo Mills wrote: >>>=20 >>> On Tue, Jun 16, 2015 at 08:09:17AM -0400, Vincent Olivier wrote: >>>>=20 >>>> My first question is this : is it normal to have =E2=80=9Csingle=E2=80= =9D blocks ? >>>> Why not only RAID10? I don=E2=80=99t remember the exact mkfs = options I used >>>> but I certainly didn=E2=80=99t ask for =E2=80=9Csingle=E2=80=9D so = this is unexpected. >>>=20 >>> Yes. It's an artefact of the way that mkfs works. If you run a >>> balance on those chunks, they'll go away. (btrfs balance start >>> -dusage=3D0 -musage=3D0 /mountpoint) >>=20 >> Thanks! I did and it did go away, except for the "GlobalReserve, = single: >> total=3D512.00MiB, used=3D0.00B=E2=80=9D. But I suppose this is a = permanent fixture, >> right? >=20 > Yes. GlobalReserve is for short-term btrfs-internal use, reserved for > times when btrfs needs to (temporarily) allocate some space in ordered = to > free space, etc. It's always single, and you'll rarely see anything = but > 0 used except perhaps in the middle of a balance or something. Get it. Thanks. Is there anyway to put that on another device, say, a SSD? I am thinking = of backing up this RAID10 on a 2x8TB device-managed SMR RAID1 and I want = to minimize random write operations (noatime & al.). I will start a new = thread for that maybe but first, is there something substantial I can = read about btrfs+SMR? Or should I avoid SMR+btfs ? >=20 >>> For maintenance, I would suggest running a scrub regularly, to >>> check for various forms of bitrot. Typical frequencies for a scrub = are >>> once a week or once a month -- opinions vary (as do runtimes). >>=20 >>=20 >> Yes. I cronned it weekly for now. Takes about 5 hours. Is it >> automatically corrected on RAID10 since a copy of it exist within the >> filesystem ? What happens for RAID0 ? >=20 > For raid10 (and the raid1 I use), yes, it's corrected, from the other > existing copy, assuming it's good, tho if there are metadata checksum > errors, there may be corresponding unverified checksums as well, where > the verification couldn't be done because the metadata containing the > checksums was bad. Thus, if there are errors found and corrected, and > you see unverified errors as well, rerun the scrub, so the newly > corrected metadata can now be used to verify the previously unverified > errors. ok then, rule of the thumb re-run the scrub on =E2=80=9Cunverified = checksum error(s)=E2=80=9D. I have yet to see checksum errors yet but = will keep it in mind.. >=20 > I'm presently getting a lot of experience with this as one of the ssds = in > my raid1 is gradually failing and rewriting sectors. Generally what > happens is that the ssd will take too long, triggering a SATA reset = (30 > second timeout), and btrfs will call that an error. The scrub then > rewrites the bad copy on the unreliable device with the good copy from > the more reliable device, with the write triggering a sector = relocation > on the bad device. The newly written copy then checks out good, but = if > it was metadata, it very likely contained checksums for several other > blocks, which couldn't be verified because the block containing their > checksums was itself bad. Typically I'll see dozens to a couple = hundred > unverified errors for every bad metadata block rewritten in this way. > Rerunning the scrub then either verifies or fixes the previously > unverified blocks, tho sometimes one of those in turn ends up bad and = if > it's a metadata block, I may end up rerunning the scrub another time = or > two, until everything checks out. >=20 > FWIW, on the bad device, smartctl -A reports (excerpted): >=20 > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE = UPDATED WHEN_FAILED RAW_VALUE > 5 Reallocated_Sector_Ct 0x0032 098 098 036 Old_age = Always - 259 > 182 Erase_Fail_Count_Total 0x0032 100 100 000 Old_age = Always - 132 >=20 > While on the paired good device: >=20 > 5 Reallocated_Sector_Ct 0x0032 253 253 036 Old_age = Always - 0 > 182 Erase_Fail_Count_Total 0x0032 253 253 000 Old_age = Always - 0 >=20 > Meanwhile, smartctl -H has already warned once that the device is > failing, tho it went back to passing status again, but as of now it's > saying failing, again. The attribute that actually registers as = failing, > again from the bad device, followed by the good, is: >=20 > 1 Raw_Read_Error_Rate 0x000f 001 001 006 Pre-fail = Always FAILING_NOW 3081 >=20 > 1 Raw_Read_Error_Rate 0x000f 160 159 006 Pre-fail = Always - 41 >=20 > When it's not actually reporting failing, the FAILING_NOW status is > replaced with IN_THE_PAST. >=20 > 250 Read_Error_Retry_Rate is the other attribute of interest, with = values > of 100 current and worst for both devices, threshold 0, but a raw = value > of 2488 for the good device and over 17,000,000 for the failing = device. > But with the "cooked" value never moving from 100 and with no real > guidance on how to interpret the raw values, while it's interesting, > I am left relying on the others for indicators I can actually = understand. >=20 > The 5 and 182 raw counts have been increasing gradually over time, and = I > scrub every time I do a major update, with another reallocated sector = or > two often appearing. But as long as the paired good device keeps its = zero > count and I have backups (as I do!), btrfs is actually allowing me to > continue using the unreliable device, relying on btrfs checksums and > scrubbing to keep it usable. And FWIW, I do have another device ready = to > go in when I decide I've had enough of this, but as long as I have > backups and btrfs scrub keeps things fixed up, there's no real hurry > unless I decide I'm tired of dealing with it. Meanwhile, I'm having a > bit of morbid fun watching as it slowly decays, getting experience of > the process in a reasonably controlled setting without serious danger > to my data, since it is backed up. You sure have morbid inclinations ! ;-) Out of curiosity what is the frequency and sequence of smartctl = long/short tests + btrfs scrubs ? Is it all automated ? > As for raid0 (and single), there's only one copy. Btrfs detects = checksum > failure as it does above, but since there's only the one copy, if it's > bad, well, for data you simply can't access that file any longer. For > metadata, you can't access whatever directories and files it = referenced, > any longer. (FWIW for the truly desperate who hope that at least some = of > it can be recovered even if it's not a bit-perfect match, there's a = btrfs > command that wipes the checksum tree, which will let you access the > previously bad-checksum files again, but it works on the entire > filesystem so it's all or nothing, and of course with known = corruption, > there's no guarantees.) But is it possible to manually correct the corruption by overwriting the = corrupted files with a copy from a backup ? I mean is there enough = information reported in order to do that ? thanks! v --Apple-Mail=_DAC8D7E7-0706-41ED-AE7A-4BA007AB30CD Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJVgXpLAAoJEOggPckf4s143gIP/j02VkJI75634jeQ8FwbcZdP l4bDvMJI1IFweL2TKuTHN9XHYV4TfyqRWyP4FCe3v13uSTHcXzs9Hc4oeIORk7dv ax8xTYeYsVHOHzV5VJFsKKGAkf2azUUwGQEygznTozGtyUuroJqwE6oBqICh08J3 MsLnW6sisONUDT9JNpkyC944t0y5VK5pWw3x0igEuIA5jSxzKPcu7o3WPegUd2Ix 4es3VqxdErcTEjjcCXpxHmLdcvJcbtb4ivsgYGz2Z01S31crp4liN3nVfxFnseYp Nv21IkekxIni8wRRTg3EUzkR2/5P+0vA8Mi9IyUtgXMCj1waSpN/40jP3Sl2njLm 3yj3hy6OqF/z1aQsLE6u4S63lRKIOhcZGg03PjJJ25YwY5wqw0TUWccKAeMw0I07 zAXslS5skohw1Q1QidT92dxsk1S9ks/E5Yd2Ga+854RH0MCEGuYTpMZNJOPmbvQW D5hXyI3WaMLxca60U5dCI5LOC+HpEyux5lTJM1sfArYiNpO+jDmfjH3qCsuDDzx/ cj+JGdq0QD8sQT0hw0ivu/R+dvE6OI4qd93jzhPEEg6gcC4DfJzTy8nZu46UyVZ8 aemjl+9ijpu2R2nxs5FMbGoJN/DrFcGOskr09vDa0SYKG1QrljOYUPYjQnpuTv2e dgy5fs3+690+Q2C87fOa =cB05 -----END PGP SIGNATURE----- --Apple-Mail=_DAC8D7E7-0706-41ED-AE7A-4BA007AB30CD--