All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Guangwu Zhang <guazhang@redhat.com>
Cc: linux-block@vger.kernel.org, ming.lei@redhat.com
Subject: Re: [bug report] Format FS failed with ublk device
Date: Sun, 28 Apr 2024 18:09:59 +0800	[thread overview]
Message-ID: <Zi4gdxplmOFyE464@fedora> (raw)
In-Reply-To: <Zir+5VZOgip+HiFU@fedora>

On Fri, Apr 26, 2024 at 09:09:57AM +0800, Ming Lei wrote:
> Hi Guangwu,
> 
> Thanks for the report!
> 
> On Thu, Apr 25, 2024 at 09:54:04AM +0800, Guangwu Zhang wrote:
> > Hi,
> > the format FS command will hung up  with ublk device.
> > 
> > # ublk --version
> > ublksrv 1.1-7-gf01c509
> > 
> > kerne: 6.9.0-rc4.kasan
> > 
> > 
> > nvme0n1                     259:1    0   1.5T  0 disk
> > └─nvme0n1p1                 259:2    0     5G  0 part
> > # ublk add -t loop -f /dev/nvme0n1p1
> > dev id 0: nr_hw_queues 1 queue_depth 128 block size 4096 dev_capacity 10485760
> > max rq size 524288 daemon pid 3227 flags 0x42 state LIVE
> > ublkc: 245:0 ublkb: 259:3 owner: 0:0
> > queue 0: tid 3228 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
> > 18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
> > target {"backing_file":"/dev/nvme0n1p1","dev_size":5368709120,"direct_io":1,"name":"loop","type":1}
> > 
> > # mkfs.xfs -f /dev/ublkb0    << can not finish,  pid 3239
> > meta-data=/dev/ublkb0            isize=512    agcount=4, agsize=327680 blks
> >          =                       sectsz=4096  attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=1, rmapbt=0
> >          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
> > data     =                       bsize=4096   blocks=1310720, imaxpct=25
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > log      =internal log           bsize=4096   blocks=16384, version=2
> >          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > # cat /proc/3239/stack
> > [<0>] rq_qos_wait+0x12a/0x1f0
> > [<0>] wbt_wait+0x11a/0x240
> > [<0>] __rq_qos_throttle+0x49/0x90
> > [<0>] blk_mq_submit_bio+0x58c/0x19d0
> > [<0>] submit_bio_noacct_nocheck+0x40d/0x780
> > [<0>] blk_next_bio+0x41/0x50
> > [<0>] __blkdev_issue_zero_pages+0x1ba/0x370
> > [<0>] blkdev_issue_zeroout+0x1a7/0x390
> > [<0>] blkdev_fallocate+0x264/0x3d0
> > [<0>] vfs_fallocate+0x2b0/0xad0
> > [<0>] __x64_sys_fallocate+0xb4/0x100
> > [<0>] do_syscall_64+0x7b/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> > [  862.171377] INFO: task mkfs.xfs:3239 blocked for more than 122 seconds.
> > [  862.178073]       Not tainted 6.9.0-rc4.kasan+ #1
> > [  862.182820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> 
> Looks it might be one blk-wbt issue, and ublk-loop doesn't setup
> write_zero_max_bytes and it may take a bit long for __blkdev_issue_zero_pages
> to complete, but it shouldn't hang.
> 
> Can you collect the following bpftrace by starting it before running mkfs?
> And I can't reproduce it in my environment.
> 
> #!/usr/bin/bpftrace
> kretfunc:vfs_fallocate
> {
> 	printf("vfs_fallocate on %s ret %d (%x %lx %u)\n",
> 		str(args->file->f_path.dentry->d_name.name),
> 		retval, args->mode, args->offset, args->len);
> }

After co-working with Guangwu, the issue is now root-caused:

1) vfs_fallocate() can't translate block DISCARD into real discard, and
'FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE'
is supposed to be capable of doing that, but vfs doesn't allow
FALLOC_FL_NO_HIDE_STALE

2) so ublk discard is actually converted to write-zeroes because ublksrv
converts discard into fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE),
and that is the reason why mkfs.xfs takes too long, and Guangwu confirmed
that zeroing out is in-progress actually, not hang.

Does FALLOC_FL_PUNCH_HOLE have to imply zeroing out for block device?

3) now fix is pushed to ublksrv by translating ublk discard into
ioctl(DISCARD) for block device

And same issue exists on kernel loop driver too.

Thanks,
Ming


      reply	other threads:[~2024-04-28 10:10 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25  1:54 [bug report] Format FS failed with ublk device Guangwu Zhang
2024-04-26  1:09 ` Ming Lei
2024-04-28 10:09   ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zi4gdxplmOFyE464@fedora \
    --to=ming.lei@redhat.com \
    --cc=guazhang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.