All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication
@ 2015-07-09 17:32 Hogan Whittall
  2015-07-09 19:05 ` Brian Foster
  2015-07-09 23:02 ` Dave Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Hogan Whittall @ 2015-07-09 17:32 UTC (permalink / raw
  To: xfs@oss.sgi.com


[-- Attachment #1.1: Type: text/plain, Size: 3369 bytes --]

Hello,
Recently we encountered a previously-reported issue regarding write amplification with MySQL replication and XFS when used with certain RAID controllers (In our case, HP P420).  That issue exactly matches our issue and was documented by someone else here - http://oss.sgi.com/archives/xfs/2013-03/msg00133.html - but I don't see any resolution.  I will say that the problem *does not* exist when mkfs.xfs 2.9.6 is used to format the filesystem on RHEL6 as that sets sunit=0 and swidth=0 instead of setting based on minimum_io_size and optimal_io_size.
We have systems that are identical in how they are built and configured, we can take a RHEL6 box that has the MySQL partition formatted with mkfs.xfs v3.1.1 and reproduce the write amplification problem with MySQL replication every single time.  If we take the same box and format the MySQL partition with mkfs.xfs 2.9.6, then bring up MySQL with the exact same configuration there is no problem.  I've included the working and broken settings below.  If it's not the sunit/swidth settings then what will cause 7-10MB/s worth of writes to the XFS partition to become over 200MB/s downstream?  The actual data change on the disks is not 200MB/s, but because the write ops are truly being amplified and not just being misreported our MySQL slaves with the bad XFS settings cannot keep up and the lag steadily increases with no hope of ever becoming current.
I am happy to try some other settings/options with the RHEL6 mkfs.xfs to see if replication performance is able to match that of systems formatted with mkfs.xfs 2.9.6, but the values set by 3.1.1 with the P420 RAID do not work for MySQL replication.  We have ruled out everything else as a possible cause, the absolute only difference on these systems is what values are set by mkfs.xfs.
============================================================ Working RHEL6 XFS partition:
meta-data=/dev/mapper/sys-home   isize=256    agcount=4, agsize=71271680 blks         =                       sectsz=512   attr=2, projid32bit=0data     =                       bsize=4096   blocks=285086720, imaxpct=5         =                       sunit=0      swidth=0 blksnaming   =version 2              bsize=4096   ascii-ci=0log      =internal               bsize=4096   blocks=32768, version=2         =                       sectsz=512   sunit=0 blks, lazy-count=0realtime =none                   extsz=4096   blocks=0, rtextents=0
============================================================ 
Broken RHEL6 XFS partition:
meta-data=/dev/mapper/sys-home   isize=256    agcount=32, agsize=8908992 blks         =                       sectsz=512   attr=2, projid32bit=0data     =                       bsize=4096   blocks=285086720, imaxpct=5         =                       sunit=64     swidth=128 blksnaming   =version 2              bsize=4096   ascii-ci=0log      =internal               bsize=4096   blocks=139264, version=2         =                       sectsz=512   sunit=64 blks, lazy-count=1realtime =none                   extsz=4096   blocks=0, rtextents=0
============================================================ 

Thanks!
-Hogan

[-- Attachment #1.2: Type: text/html, Size: 6896 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication
  2015-07-09 17:32 Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication Hogan Whittall
@ 2015-07-09 19:05 ` Brian Foster
  2015-07-09 19:23   ` Hogan Whittall
  2015-07-09 23:02 ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Brian Foster @ 2015-07-09 19:05 UTC (permalink / raw
  To: Hogan Whittall; +Cc: xfs@oss.sgi.com

On Thu, Jul 09, 2015 at 05:32:50PM +0000, Hogan Whittall wrote:
> Hello,
> Recently we encountered a previously-reported issue regarding write amplification with MySQL replication and XFS when used with certain RAID controllers (In our case, HP P420).  That issue exactly matches our issue and was documented by someone else here - http://oss.sgi.com/archives/xfs/2013-03/msg00133.html - but I don't see any resolution.  I will say that the problem *does not* exist when mkfs.xfs 2.9.6 is used to format the filesystem on RHEL6 as that sets sunit=0 and swidth=0 instead of setting based on minimum_io_size and optimal_io_size.

I'm not very familiar with MySQL and thus not sure what your workload
is, but either version of mkfs.xfs should support setting options such
that the fs is formatted as with the defaults of another version...

> We have systems that are identical in how they are built and configured, we can take a RHEL6 box that has the MySQL partition formatted with mkfs.xfs v3.1.1 and reproduce the write amplification problem with MySQL replication every single time.  If we take the same box and format the MySQL partition with mkfs.xfs 2.9.6, then bring up MySQL with the exact same configuration there is no problem.  I've included the working and broken settings below.  If it's not the sunit/swidth settings then what will cause 7-10MB/s worth of writes to the XFS partition to become over 200MB/s downstream?  The actual data change on the disks is not 200MB/s, but because the write ops are truly being amplified and not just being misreported our MySQL slaves with the bad XFS settings cannot keep up and the lag steadily increases with no hope of ever becoming current.

It would be nice to somehow see what requests are being made at the
application level. Perhaps via strace or something of that nature if you
can demonstrate a relatively isolated operation at the app. level
resulting in the same I/O requests to the kernel but different I/O out
of the filesystem..?

> I am happy to try some other settings/options with the RHEL6 mkfs.xfs to see if replication performance is able to match that of systems formatted with mkfs.xfs 2.9.6, but the values set by 3.1.1 with the P420 RAID do not work for MySQL replication.  We have ruled out everything else as a possible cause, the absolute only difference on these systems is what values are set by mkfs.xfs.
> ============================================================ Working RHEL6 XFS partition:
> meta-data=/dev/mapper/sys-home   isize=256    agcount=4, agsize=71271680 blks         =                       sectsz=512   attr=2, projid32bit=0data     =                       bsize=4096   blocks=285086720, imaxpct=5         =                       sunit=0      swidth=0 blksnaming   =version 2              bsize=4096   ascii-ci=0log      =internal               bsize=4096   blocks=32768, version=2         =                       sectsz=512   sunit=0 blks, lazy-count=0realtime =none                   extsz=4096   blocks=0, rtextents=0
> ============================================================ 
> Broken RHEL6 XFS partition:
> meta-data=/dev/mapper/sys-home   isize=256    agcount=32, agsize=8908992 blks         =                       sectsz=512   attr=2, projid32bit=0data     =                       bsize=4096   blocks=285086720, imaxpct=5         =                       sunit=64     swidth=128 blksnaming   =version 2              bsize=4096   ascii-ci=0log      =internal               bsize=4096   blocks=139264, version=2         =                       sectsz=512   sunit=64 blks, lazy-count=1realtime =none                   extsz=4096   blocks=0, rtextents=0
> ============================================================ 
> 

The differences I see for the second mkfs:

- agcount of 32 instead of 4
- sunit/swidth of 64/128 rather than 0/0
- log size of 139264 blocks rather than 32768
- lazy-count=1 rather than lazy-count=0

As mentioned above, I would take the "broken" mkfs.xfs and add options
one at a time that format the fs as the previous version did and try to
identify what leads to the behavior. E.g., maybe first use '-d
su=0,sw=0' to reset the stripe unit, then try adding '-l
size=<32768*blksize>' to set the log size, '-d agcount=N' to set the
allocation group count, etc.

Brian

> Thanks!
> -Hogan

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication
  2015-07-09 19:05 ` Brian Foster
@ 2015-07-09 19:23   ` Hogan Whittall
  0 siblings, 0 replies; 9+ messages in thread
From: Hogan Whittall @ 2015-07-09 19:23 UTC (permalink / raw
  To: Brian Foster; +Cc: xfs@oss.sgi.com

Apologies for top-posting, our mail UI makes inline replies virtually impossible.

I will see if I can start with the good XFS settings and change them one at a time to see exactly which setting triggers the issue.  The other issue, which I forgot to mention, is that mkfs.xfs 3.1.1 (shipped with RHEL6) will not let me set -d sunit=0,swidth=0.  No errors, it simply ignores those values and uses the values calculated based on minimum_io_size and optimal_io_size, so the only way that I have any chance of doing this test is by using the same version of mkfs.xfs that doesn't cause a problem in the first place.  It seems that mkfs.xfs 3.1.1 and 3.2.3 (pulled from git) function the same way, ignore 0 and only allow values that fall within a range that it deems acceptable.  Also, specifying those values at mount time, either in fstab or via the mount command, changes nothing.  No errors, just simply ignores them and uses the values set when mkfs.xfs ran.

Thanks for the suggestions, I'll see what I can make happen.  Honestly, I'd be perfectly happy if we could simply replicate the same values with the RHEL6 version of mkfs.xfs since those values work just fine for our various workloads.  3.x resulting in different parameters and being unable to set the same parameters as 2.x just smells like a bug.  Since "0" is a perfectly valid setting when minimum_io_size is 0 and/or optimal_io_size is 512 there really should be a way to manually set 0 as well.

-Hogan

________________________________
From: Brian Foster <bfoster@redhat.com>
To: Hogan Whittall <whittalh@yahoo-inc.com> 
Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com> 
Sent: Thursday, July 9, 2015 2:05 PM
Subject: Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication


On Thu, Jul 09, 2015 at 05:32:50PM +0000, Hogan Whittall wrote:
> Hello,
> Recently we encountered a previously-reported issue regarding write amplification with MySQL replication and XFS when used with certain RAID controllers (In our case, HP P420).  That issue exactly matches our issue and was documented by someone else here - http://oss.sgi.com/archives/xfs/2013-03/msg00133.html - but I don't see any resolution.  I will say that the problem *does not* exist when mkfs.xfs 2.9.6 is used to format the filesystem on RHEL6 as that sets sunit=0 and swidth=0 instead of setting based on minimum_io_size and optimal_io_size.

I'm not very familiar with MySQL and thus not sure what your workload
is, but either version of mkfs.xfs should support setting options such
that the fs is formatted as with the defaults of another version...

> We have systems that are identical in how they are built and configured, we can take a RHEL6 box that has the MySQL partition formatted with mkfs.xfs v3.1.1 and reproduce the write amplification problem with MySQL replication every single time.  If we take the same box and format the MySQL partition with mkfs.xfs 2.9.6, then bring up MySQL with the exact same configuration there is no problem.  I've included the working and broken settings below.  If it's not the sunit/swidth settings then what will cause 7-10MB/s worth of writes to the XFS partition to become over 200MB/s downstream?  The actual data change on the disks is not 200MB/s, but because the write ops are truly being amplified and not just being misreported our MySQL slaves with the bad XFS settings cannot keep up and the lag steadily increases with no hope of ever becoming current.

It would be nice to somehow see what requests are being made at the
application level. Perhaps via strace or something of that nature if you
can demonstrate a relatively isolated operation at the app. level
resulting in the same I/O requests to the kernel but different I/O out
of the filesystem..?

> I am happy to try some other settings/options with the RHEL6 mkfs.xfs to see if replication performance is able to match that of systems formatted with mkfs.xfs 2.9.6, but the values set by 3.1.1 with the P420 RAID do not work for MySQL replication.  We have ruled out everything else as a possible cause, the absolute only difference on these systems is what values are set by mkfs.xfs.
> ============================================================ Working RHEL6 XFS partition:
> meta-data=/dev/mapper/sys-home   isize=256    agcount=4, agsize=71271680 blks         =                       sectsz=512   attr=2, projid32bit=0data     =                       bsize=4096   blocks=285086720, imaxpct=5         =                       sunit=0      swidth=0 blksnaming   =version 2              bsize=4096   ascii-ci=0log      =internal               bsize=4096   blocks=32768, version=2         =                       sectsz=512   sunit=0 blks, lazy-count=0realtime =none                   extsz=4096   blocks=0, rtextents=0
> ============================================================ 
> Broken RHEL6 XFS partition:
> meta-data=/dev/mapper/sys-home   isize=256    agcount=32, agsize=8908992 blks         =                       sectsz=512   attr=2, projid32bit=0data     =                       bsize=4096   blocks=285086720, imaxpct=5         =                       sunit=64     swidth=128 blksnaming   =version 2              bsize=4096   ascii-ci=0log      =internal               bsize=4096   blocks=139264, version=2         =                       sectsz=512   sunit=64 blks, lazy-count=1realtime =none                   extsz=4096   blocks=0, rtextents=0
> ============================================================ 
> 

The differences I see for the second mkfs:

- agcount of 32 instead of 4
- sunit/swidth of 64/128 rather than 0/0
- log size of 139264 blocks rather than 32768
- lazy-count=1 rather than lazy-count=0

As mentioned above, I would take the "broken" mkfs.xfs and add options
one at a time that format the fs as the previous version did and try to
identify what leads to the behavior. E.g., maybe first use '-d
su=0,sw=0' to reset the stripe unit, then try adding '-l
size=<32768*blksize>' to set the log size, '-d agcount=N' to set the
allocation group count, etc.

Brian




> Thanks!
> -Hogan

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication
  2015-07-09 17:32 Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication Hogan Whittall
  2015-07-09 19:05 ` Brian Foster
@ 2015-07-09 23:02 ` Dave Chinner
  2015-07-10 15:59   ` Hogan Whittall
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2015-07-09 23:02 UTC (permalink / raw
  To: Hogan Whittall; +Cc: xfs@oss.sgi.com

On Thu, Jul 09, 2015 at 05:32:50PM +0000, Hogan Whittall wrote:
> Hello,
>
> Recently we encountered a previously-reported issue
> regarding write amplification with MySQL replication and XFS when
> used with certain RAID controllers (In our case, HP P420).  That
> issue exactly matches our issue and was documented by someone else
> here - http://oss.sgi.com/archives/xfs/2013-03/msg00133.html -
> but I don't see any resolution.  I will say that the problem
> *does not* exist when mkfs.xfs 2.9.6 is used to format the
> filesystem on RHEL6 as that sets sunit=0 and swidth=0 instead of
> setting based on minimum_io_size and optimal_io_size.

The issue is the log stripe unit padding log buffers on log
writes.  Your workload like has lots of fsync() calls, which means
log writes go from being padded to the next sector boundary to being
padded to the next log stripe unit boundary.

> We have systems that are identical in how they are built and
> configured, we can take a RHEL6 box that has the MySQL partition
> formatted with mkfs.xfs v3.1.1 and reproduce the write
> amplification problem with MySQL replication every single time.

Because the more recent kernel is probably getting sunit/swidth
direct from the hardware via the kernel.

>  If we take the same box and format the MySQL partition with
> mkfs.xfs 2.9.6, then bring up MySQL with the exact same
> configuration there is no problem.

Because that version of mkfs doesn't know about the kernel optimum
IO size parameters in sysfs that are set based on hardware mode page
support. Hence older mkfs is not able to set stripe unit defaults
for hardware RAID automatically....

Your other option is to use a small log, so that the log writes end
up being permanently pinned in the RAID BBWC, and so the bandwith
they consume doesn't matter because it never hits the platters...

FWIW, this problem has only been reported for HP RAID hardware, so I
suspect that there is something the HP RAID firmware that doesn't
handle streaming FUA writes (the log writes) mixed with other random
IO particularly well.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication
  2015-07-09 23:02 ` Dave Chinner
@ 2015-07-10 15:59   ` Hogan Whittall
  2015-07-10 22:42     ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Hogan Whittall @ 2015-07-10 15:59 UTC (permalink / raw
  To: Dave Chinner; +Cc: xfs@oss.sgi.com

Hi Dave,

Thanks for the reply, we can certainly try with the smaller log, but IIRC the performance hit wasn't because the disks were busy, it was the controller itself trying to determine what changed and then write that to disk.  Smaller anything should help the controller be able to cope better, but that's not really a solution.

Doing disk write performance tests on these systems produce very different results, they are capable of much more I/O than what was being triggered with this issue.

Back to why I think this should be considered a bug, by 2.9.6 setting 0 as the default for sunit/swidth and 3.1.1 having no way to set 0 for sunit/swidth the newer versions behave differently and don't provide any way to set the same options as 2.x.x.  To me, that kind of behavior is a bug, especially when the new defaults provide horrible performance under specific workloads with specific hardware.  If the newer versions are going to automatically calculate sunit/swidth then there needs to be a way to either disable that functionality or override it by allowing 0 to be set manually.

If the only way for us to truly restore performance on these HP systems is to run a 2.x.x version of mkfs.xfs then how is this not a bug?

We have a number of non-HP boxes running RHEL6 with hardware RAID, it's only the HP P420 RAID that is exposing IO size parameters to the kernel, all of the others show 0 or 512 and mkfs.xfs 3.1.1 knows to set sunit/swidth to 0 when those values are encountered.  Not being able to manually set 0 when it is a valid setting...that's a bug, IMO.

Thanks for your time!

-Hogan

----- Original Message -----
From: Dave Chinner <david@fromorbit.com>
To: Hogan Whittall <whittalh@yahoo-inc.com>
Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Sent: Thursday, July 9, 2015 6:02 PM
Subject: Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication

On Thu, Jul 09, 2015 at 05:32:50PM +0000, Hogan Whittall wrote:
> Hello,
>
> Recently we encountered a previously-reported issue
> regarding write amplification with MySQL replication and XFS when
> used with certain RAID controllers (In our case, HP P420).  That
> issue exactly matches our issue and was documented by someone else
> here - http://oss.sgi.com/archives/xfs/2013-03/msg00133.html -
> but I don't see any resolution.  I will say that the problem
> *does not* exist when mkfs.xfs 2.9.6 is used to format the
> filesystem on RHEL6 as that sets sunit=0 and swidth=0 instead of
> setting based on minimum_io_size and optimal_io_size.

The issue is the log stripe unit padding log buffers on log
writes.  Your workload like has lots of fsync() calls, which means
log writes go from being padded to the next sector boundary to being
padded to the next log stripe unit boundary.

> We have systems that are identical in how they are built and
> configured, we can take a RHEL6 box that has the MySQL partition
> formatted with mkfs.xfs v3.1.1 and reproduce the write
> amplification problem with MySQL replication every single time.

Because the more recent kernel is probably getting sunit/swidth
direct from the hardware via the kernel.


>  If we take the same box and format the MySQL partition with
> mkfs.xfs 2.9.6, then bring up MySQL with the exact same
> configuration there is no problem.

Because that version of mkfs doesn't know about the kernel optimum
IO size parameters in sysfs that are set based on hardware mode page
support. Hence older mkfs is not able to set stripe unit defaults
for hardware RAID automatically....

Your other option is to use a small log, so that the log writes end
up being permanently pinned in the RAID BBWC, and so the bandwith
they consume doesn't matter because it never hits the platters...

FWIW, this problem has only been reported for HP RAID hardware, so I
suspect that there is something the HP RAID firmware that doesn't
handle streaming FUA writes (the log writes) mixed with other random
IO particularly well.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication
  2015-07-10 15:59   ` Hogan Whittall
@ 2015-07-10 22:42     ` Dave Chinner
  2015-07-10 23:15       ` Hogan Whittall
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2015-07-10 22:42 UTC (permalink / raw
  To: Hogan Whittall; +Cc: xfs@oss.sgi.com

On Fri, Jul 10, 2015 at 03:59:48PM +0000, Hogan Whittall wrote:
> Hi Dave,
> 
> Thanks for the reply, we can certainly try with the smaller log,
> but IIRC the performance hit wasn't because the disks were busy,
> it was the controller itself trying to determine what changed and
> then write that to disk.

That makes no sense to me - the controller is almost never the IO
limitation in a hardware RAID when random small IO is being issued
by the host.

> Smaller anything should help the
> controller be able to cope better, but that's not really a
> solution.
> 
> Doing disk write performance tests on these systems produce very
> different results, they are capable of much more I/O than what was
> being triggered with this issue.
> 
> Back to why I think this should be considered a bug, by 2.9.6
> setting 0 as the default for sunit/swidth and 3.1.1 having no way
> to set 0 for sunit/swidth the newer versions behave differently

False:

# man mkfs.xfs
....
	noalign
		This  option  disables  automatic geometry detection
		and creates the filesystem without stripe geometry
		alignment even if the underlying storage device
		provides this information.

IOWs:

# mkfs.xfs -d noalign ....

Will do exactly what you want.  Or alternatively:

# mkfs.xfs -d sunit=0,swidth=0 ....

Or perhaps just turning of log stripe unit alignment will be enough:

# mkfs.xfs -l sunit=1 ....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication
  2015-07-10 22:42     ` Dave Chinner
@ 2015-07-10 23:15       ` Hogan Whittall
  2015-07-13  0:13         ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Hogan Whittall @ 2015-07-10 23:15 UTC (permalink / raw
  To: Dave Chinner; +Cc: xfs@oss.sgi.com

The issue isn't random small I/O being sent to disk and the disks maxing out IOPS, it's this particular workload created by MySQL replication and the XFS options that trigger something bad to happen on the controller.  I can run disk I/O tests at the same time that replication is choking and see perfectly fine throughput and response time.  Yeah, it's weird.

As for the noalign option, that would be great to have but it does not exist in version 3.1.1 which RHEL6 uses.  It sounds like not having it in previous 3.x versions was enough of an issue that it was added in 3.2.x, which is great.  I can probably work with this and get our ramdisks used for cloning updated with a stable 3.2.x since that would be more desirable than reverting back to 2.x.x.

Thanks again for your help, now I have a fix that doesn't involve using an old version of mkfs.xfs.

-Hogan

----- Original Message -----
From: Dave Chinner <david@fromorbit.com>
To: Hogan Whittall <whittalh@yahoo-inc.com>
Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Sent: Friday, July 10, 2015 5:42 PM
Subject: Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication

On Fri, Jul 10, 2015 at 03:59:48PM +0000, Hogan Whittall wrote:
> Hi Dave,
> 
> Thanks for the reply, we can certainly try with the smaller log,
> but IIRC the performance hit wasn't because the disks were busy,
> it was the controller itself trying to determine what changed and
> then write that to disk.

That makes no sense to me - the controller is almost never the IO
limitation in a hardware RAID when random small IO is being issued
by the host.

> Smaller anything should help the
> controller be able to cope better, but that's not really a
> solution.
> 
> Doing disk write performance tests on these systems produce very
> different results, they are capable of much more I/O than what was
> being triggered with this issue.
> 
> Back to why I think this should be considered a bug, by 2.9.6
> setting 0 as the default for sunit/swidth and 3.1.1 having no way
> to set 0 for sunit/swidth the newer versions behave differently

False:

# man mkfs.xfs
....
    noalign
        This  option  disables  automatic geometry detection
        and creates the filesystem without stripe geometry
        alignment even if the underlying storage device
        provides this information.

IOWs:

# mkfs.xfs -d noalign ....

Will do exactly what you want.  Or alternatively:

# mkfs.xfs -d sunit=0,swidth=0 ....

Or perhaps just turning of log stripe unit alignment will be enough:

# mkfs.xfs -l sunit=1 ....


Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication
  2015-07-10 23:15       ` Hogan Whittall
@ 2015-07-13  0:13         ` Dave Chinner
  2015-07-13  3:59           ` Hogan Whittall
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2015-07-13  0:13 UTC (permalink / raw
  To: Hogan Whittall; +Cc: xfs@oss.sgi.com

On Fri, Jul 10, 2015 at 11:15:04PM +0000, Hogan Whittall wrote:
> As for the noalign option, that would be great to have but it does
> not exist in version 3.1.1 which RHEL6 uses.  It sounds like not

$ gl -n 1 63a6384
commit 63a63844f8a02f34cbb724086a1f0bac492f25b3
Author: Nathan Scott <nathans@sgi.com>
Date:   Wed Mar 23 02:56:17 2005 +0000

    Add noalign suboptions to -d and -r to allow auto-stripe-alignment to be switched off.
    Merge of master-melb:xfs-cmds:21924a by kenmcd.
$

The noalign option has been in mkfs since 2005, so it's most
certainly supported on RHEL6.

Looking at the man page history, I forgot that it wasn't documented
until 2013 and hence is missing from the RHEL6 man page, so I can
understand why you might be saying this.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication
  2015-07-13  0:13         ` Dave Chinner
@ 2015-07-13  3:59           ` Hogan Whittall
  0 siblings, 0 replies; 9+ messages in thread
From: Hogan Whittall @ 2015-07-13  3:59 UTC (permalink / raw
  To: Dave Chinner; +Cc: xfs@oss.sgi.com

It's also missing from the command help output as well, but does seem to be there.  So, yeah, no man page or command output updates made me think that it was missing completely.  Thanks!  This makes things much easier.  :)


[root ~]# mkfs.xfs -V
mkfs.xfs version 3.1.1


[root ~]# mkfs.xfs
no device name given in argument list
Usage: mkfs.xfs
/* blocksize */         [-b log=n|size=num]
/* data subvol */       [-d agcount=n,agsize=n,file,name=xxx,size=num,
(sunit=value,swidth=value|su=num,sw=num),
sectlog=n|sectsize=num
/* inode size */        [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,
projid32bit=0|1]
/* log subvol */        [-l agnum=n,internal,size=num,logdev=xxx,version=n
sunit=value|su=num,sectlog=n|sectsize=num,
lazy-count=0|1]
/* label */             [-L label (maximum 12 characters)]
/* naming */            [-n log=n|size=num,version=2|ci]
/* prototype file */    [-p fname]
/* quiet */             [-q]
/* realtime subvol */   [-r extsize=num,size=num,rtdev=xxx]
/* sectorsize */        [-s log=n|size=num]
/* version */           [-V]
devicename
<devicename> is required unless -d name=xxx is given.
<num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),
xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB).
<value> is xxx (512 byte blocks).
[root@ ~]# mkfs.xfs -N -f /dev/mapper/sys-home -d noalign
meta-data=/dev/mapper/sys-home   isize=256    agcount=4, agsize=71271680 blks
=                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=285086720, imaxpct=5
=                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=139202, version=2
=                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


-Hogan


----- Original Message -----
From: Dave Chinner <david@fromorbit.com>
To: Hogan Whittall <whittalh@yahoo-inc.com>
Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Sent: Sunday, July 12, 2015 7:13 PM
Subject: Re: Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication

On Fri, Jul 10, 2015 at 11:15:04PM +0000, Hogan Whittall wrote:
> As for the noalign option, that would be great to have but it does
> not exist in version 3.1.1 which RHEL6 uses.  It sounds like not

$ gl -n 1 63a6384
commit 63a63844f8a02f34cbb724086a1f0bac492f25b3
Author: Nathan Scott <nathans@sgi.com>
Date:   Wed Mar 23 02:56:17 2005 +0000

    Add noalign suboptions to -d and -r to allow auto-stripe-alignment to be switched off.
    Merge of master-melb:xfs-cmds:21924a by kenmcd.
$

The noalign option has been in mkfs since 2005, so it's most
certainly supported on RHEL6.

Looking at the man page history, I forgot that it wasn't documented
until 2013 and hence is missing from the RHEL6 man page, so I can
understand why you might be saying this.


-Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-07-13  4:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-09 17:32 Issue with RHEL6 mkfs.xfs (3.1.1+), HP P420 RAID, and MySQL replication Hogan Whittall
2015-07-09 19:05 ` Brian Foster
2015-07-09 19:23   ` Hogan Whittall
2015-07-09 23:02 ` Dave Chinner
2015-07-10 15:59   ` Hogan Whittall
2015-07-10 22:42     ` Dave Chinner
2015-07-10 23:15       ` Hogan Whittall
2015-07-13  0:13         ` Dave Chinner
2015-07-13  3:59           ` Hogan Whittall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.