Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
       [not found] <55803637.3060607@kamp.de>
@ 2015-06-16 15:34 ` Stefan Hajnoczi
  2015-06-17  8:35   ` Kevin Wolf
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-06-16 15:34 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu block

On Tue, Jun 16, 2015 at 3:44 PM, Peter Lieven <pl@kamp.de> wrote:
> I wonder how difficult it would be to have the IDE CDROM run in its own
> thread?
> We usually have ISOs mounted on an NFS share as CDROM. Problem: If the NFS
> Share
> goes down, it takes down monitor, qmp, vnc etc. with it.
>
> Maybe its already possible to do this via cmdline args?
>
> Any ideas, comments?

If QEMU hangs in the read/write/flush/discard code path due to NFS
downtime it is a bug.

QEMU is expected to hang in open/reopen because those are performed in
a blocking fashion.

Which of these cases applies to what you are seeing?  Maybe it can be fixed.

Stefan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-16 15:34 ` [Qemu-devel] [Qemu-block] RFC cdrom in own thread? Stefan Hajnoczi
@ 2015-06-17  8:35   ` Kevin Wolf
  2015-06-18  6:03     ` Peter Lieven
  2015-06-18  6:39     ` Peter Lieven
  0 siblings, 2 replies; 25+ messages in thread
From: Kevin Wolf @ 2015-06-17  8:35 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Peter Lieven, qemu-devel, qemu block

Am 16.06.2015 um 17:34 hat Stefan Hajnoczi geschrieben:
> On Tue, Jun 16, 2015 at 3:44 PM, Peter Lieven <pl@kamp.de> wrote:
> > I wonder how difficult it would be to have the IDE CDROM run in its own
> > thread?
> > We usually have ISOs mounted on an NFS share as CDROM. Problem: If the NFS
> > Share
> > goes down, it takes down monitor, qmp, vnc etc. with it.
> >
> > Maybe its already possible to do this via cmdline args?
> >
> > Any ideas, comments?
> 
> If QEMU hangs in the read/write/flush/discard code path due to NFS
> downtime it is a bug.
> 
> QEMU is expected to hang in open/reopen because those are performed in
> a blocking fashion.
> 
> Which of these cases applies to what you are seeing?  Maybe it can be fixed.

Don't forget bdrv_drain_all(), which is called a lot by the monitor. So
no matter what you do (and this includes moving to a thread as in a
hypothetical "ATAPI dataplane"), you end up with a hang sooner or later.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-17  8:35   ` Kevin Wolf
@ 2015-06-18  6:03     ` Peter Lieven
  2015-06-18  6:57       ` Markus Armbruster
  2015-06-18  6:39     ` Peter Lieven
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-06-18  6:03 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-devel, qemu block

Am 17.06.2015 um 10:35 schrieb Kevin Wolf:
> Am 16.06.2015 um 17:34 hat Stefan Hajnoczi geschrieben:
>> On Tue, Jun 16, 2015 at 3:44 PM, Peter Lieven <pl@kamp.de> wrote:
>>> I wonder how difficult it would be to have the IDE CDROM run in its own
>>> thread?
>>> We usually have ISOs mounted on an NFS share as CDROM. Problem: If the NFS
>>> Share
>>> goes down, it takes down monitor, qmp, vnc etc. with it.
>>>
>>> Maybe its already possible to do this via cmdline args?
>>>
>>> Any ideas, comments?
>> If QEMU hangs in the read/write/flush/discard code path due to NFS
>> downtime it is a bug.
>>
>> QEMU is expected to hang in open/reopen because those are performed in
>> a blocking fashion.
>>
>> Which of these cases applies to what you are seeing?  Maybe it can be fixed.
> Don't forget bdrv_drain_all(), which is called a lot by the monitor. So
> no matter what you do (and this includes moving to a thread as in a
> hypothetical "ATAPI dataplane"), you end up with a hang sooner or later.

I will have a look where qemu hangs. The problem exists with an NFS share
mounted by the kernel and also with libnfs. So it might be a bdrv_drain_all.
I regularly query info block and info blockstats. Do these commands always
call bdrv_drain_all()?.

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-17  8:35   ` Kevin Wolf
  2015-06-18  6:03     ` Peter Lieven
@ 2015-06-18  6:39     ` Peter Lieven
  2015-06-18  6:59       ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-06-18  6:39 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-devel, qemu block

Am 17.06.2015 um 10:35 schrieb Kevin Wolf:
> Am 16.06.2015 um 17:34 hat Stefan Hajnoczi geschrieben:
>> On Tue, Jun 16, 2015 at 3:44 PM, Peter Lieven <pl@kamp.de> wrote:
>>> I wonder how difficult it would be to have the IDE CDROM run in its own
>>> thread?
>>> We usually have ISOs mounted on an NFS share as CDROM. Problem: If the NFS
>>> Share
>>> goes down, it takes down monitor, qmp, vnc etc. with it.
>>>
>>> Maybe its already possible to do this via cmdline args?
>>>
>>> Any ideas, comments?
>> If QEMU hangs in the read/write/flush/discard code path due to NFS
>> downtime it is a bug.
>>
>> QEMU is expected to hang in open/reopen because those are performed in
>> a blocking fashion.
>>
>> Which of these cases applies to what you are seeing?  Maybe it can be fixed.
> Don't forget bdrv_drain_all(), which is called a lot by the monitor. So
> no matter what you do (and this includes moving to a thread as in a
> hypothetical "ATAPI dataplane"), you end up with a hang sooner or later.

It seems like the mainloop is waiting here:

#0  0x00007ffff606c89c in __lll_lock_wait ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff6068065 in _L_lock_858 ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x00007ffff6067eba in pthread_mutex_lock ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x00005555559f2557 in qemu_mutex_lock (mutex=0x555555ed6d40)
     at util/qemu-thread-posix.c:76
         err = 0
         __func__ = "qemu_mutex_lock"
#4  0x00005555556306ef in qemu_mutex_lock_iothread ()
     at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x0000555555954a87 in os_host_main_loop_wait (timeout=79413589)
     at main-loop.c:242
         ret = 1
         spin_counter = 0
#6  0x0000555555954b5f in main_loop_wait (nonblocking=0) at main-loop.c:494
         ret = 15
         timeout = 4294967295
         timeout_ns = 79413589
#7  0x000055555575e702 in main_loop () at vl.c:1882
         nonblocking = false
         last_io = 1

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  6:03     ` Peter Lieven
@ 2015-06-18  6:57       ` Markus Armbruster
  0 siblings, 0 replies; 25+ messages in thread
From: Markus Armbruster @ 2015-06-18  6:57 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, qemu block

Peter Lieven <pl@kamp.de> writes:

> Am 17.06.2015 um 10:35 schrieb Kevin Wolf:
>> Am 16.06.2015 um 17:34 hat Stefan Hajnoczi geschrieben:
>>> On Tue, Jun 16, 2015 at 3:44 PM, Peter Lieven <pl@kamp.de> wrote:
>>>> I wonder how difficult it would be to have the IDE CDROM run in its own
>>>> thread?
>>>> We usually have ISOs mounted on an NFS share as CDROM. Problem: If the NFS
>>>> Share
>>>> goes down, it takes down monitor, qmp, vnc etc. with it.
>>>>
>>>> Maybe its already possible to do this via cmdline args?
>>>>
>>>> Any ideas, comments?
>>> If QEMU hangs in the read/write/flush/discard code path due to NFS
>>> downtime it is a bug.
>>>
>>> QEMU is expected to hang in open/reopen because those are performed in
>>> a blocking fashion.
>>>
>>> Which of these cases applies to what you are seeing?  Maybe it can be fixed.
>> Don't forget bdrv_drain_all(), which is called a lot by the monitor. So
>> no matter what you do (and this includes moving to a thread as in a
>> hypothetical "ATAPI dataplane"), you end up with a hang sooner or later.
>
> I will have a look where qemu hangs. The problem exists with an NFS share
> mounted by the kernel and also with libnfs. So it might be a bdrv_drain_all.
> I regularly query info block and info blockstats. Do these commands always
> call bdrv_drain_all()?.

As far as I can tell, they don't.

In general, it's hard to see.  Wish it wasn't.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  6:39     ` Peter Lieven
@ 2015-06-18  6:59       ` Paolo Bonzini
  2015-06-18  7:03         ` Peter Lieven
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2015-06-18  6:59 UTC (permalink / raw)
  To: Peter Lieven, Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-devel, qemu block



On 18/06/2015 08:39, Peter Lieven wrote:
> 
> It seems like the mainloop is waiting here:
> 
> #0  0x00007ffff606c89c in __lll_lock_wait ()
>    from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #1  0x00007ffff6068065 in _L_lock_858 ()
>    from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #2  0x00007ffff6067eba in pthread_mutex_lock ()
>    from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #3  0x00005555559f2557 in qemu_mutex_lock (mutex=0x555555ed6d40)
>     at util/qemu-thread-posix.c:76
>         err = 0
>         __func__ = "qemu_mutex_lock"
> #4  0x00005555556306ef in qemu_mutex_lock_iothread ()
>     at /usr/src/qemu-2.2.0/cpus.c:1123
> No locals.

This means the VCPU is busy with some synchronous activity---maybe a
bdrv_aio_cancel?

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  6:59       ` Paolo Bonzini
@ 2015-06-18  7:03         ` Peter Lieven
  2015-06-18  7:12           ` Peter Lieven
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-06-18  7:03 UTC (permalink / raw)
  To: Paolo Bonzini, Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-devel, qemu block

Am 18.06.2015 um 08:59 schrieb Paolo Bonzini:
>
> On 18/06/2015 08:39, Peter Lieven wrote:
>> It seems like the mainloop is waiting here:
>>
>> #0  0x00007ffff606c89c in __lll_lock_wait ()
>>     from /lib/x86_64-linux-gnu/libpthread.so.0
>> No symbol table info available.
>> #1  0x00007ffff6068065 in _L_lock_858 ()
>>     from /lib/x86_64-linux-gnu/libpthread.so.0
>> No symbol table info available.
>> #2  0x00007ffff6067eba in pthread_mutex_lock ()
>>     from /lib/x86_64-linux-gnu/libpthread.so.0
>> No symbol table info available.
>> #3  0x00005555559f2557 in qemu_mutex_lock (mutex=0x555555ed6d40)
>>      at util/qemu-thread-posix.c:76
>>          err = 0
>>          __func__ = "qemu_mutex_lock"
>> #4  0x00005555556306ef in qemu_mutex_lock_iothread ()
>>      at /usr/src/qemu-2.2.0/cpus.c:1123
>> No locals.
> This means the VCPU is busy with some synchronous activity---maybe a
> bdrv_aio_cancel?

Here is what the other threads are doing (dropped VNC thread):

Thread 3 (Thread 0x7ffff4d4f700 (LWP 2637)):
#0  0x00007ffff606c89c in __lll_lock_wait ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff6068065 in _L_lock_858 ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x00007ffff6067eba in pthread_mutex_lock ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x00005555559f2557 in qemu_mutex_lock (mutex=0x555555ed6d40)
     at util/qemu-thread-posix.c:76
         err = 0
         __func__ = "qemu_mutex_lock"
#4  0x00005555556306ef in qemu_mutex_lock_iothread ()
     at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x000055555564b9ac in kvm_cpu_exec (cpu=0x5555563cb870)
     at /usr/src/qemu-2.2.0/kvm-all.c:1770
         run = 0x7ffff7ee2000
         ret = 65536
         run_ret = -4
#6  0x00005555556301dc in qemu_kvm_cpu_thread_fn (arg=0x5555563cb870)
     at /usr/src/qemu-2.2.0/cpus.c:953
         cpu = 0x5555563cb870
         r = 65536
#5  0x00007ffff5d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#6  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 3 (Thread 0x7ffff4d4f700 (LWP 2637)):
#0  0x00007ffff606c89c in __lll_lock_wait ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff6068065 in _L_lock_858 ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x00007ffff6067eba in pthread_mutex_lock ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x00005555559f2557 in qemu_mutex_lock (mutex=0x555555ed6d40)
     at util/qemu-thread-posix.c:76
         err = 0
         __func__ = "qemu_mutex_lock"
#4  0x00005555556306ef in qemu_mutex_lock_iothread ()
     at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x000055555564b9ac in kvm_cpu_exec (cpu=0x5555563cb870)
     at /usr/src/qemu-2.2.0/kvm-all.c:1770
         run = 0x7ffff7ee2000
         ret = 65536
         run_ret = -4
#6  0x00005555556301dc in qemu_kvm_cpu_thread_fn (arg=0x5555563cb870)
     at /usr/src/qemu-2.2.0/cpus.c:953
         cpu = 0x5555563cb870
         r = 65536
#7  0x00007ffff6065e9a in start_thread ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007ffff5d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
#0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
     timeout=4999424576) at qemu-timer.c:326
         ts = {tv_sec = 4, tv_nsec = 999424576}
         tvsec = 4
#2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
     at aio-posix.c:231
         node = 0x0
         was_dispatching = false
         ret = 1
         progress = false
#3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0, offset=4292007936,
     qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
         aio_context = 0x5555563528e0
         co = 0x5555563888a0
         rwco = {bs = 0x55555637eae0, offset = 4292007936,
           qiov = 0x7ffff554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0, sector_num=8382828,
     buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
     at block.c:2722
         qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size = 2048}
         iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007ffff5d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
#0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
     timeout=4999424576) at qemu-timer.c:326
         ts = {tv_sec = 4, tv_nsec = 999424576}
         tvsec = 4
#2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
     at aio-posix.c:231
         node = 0x0
         was_dispatching = false
         ret = 1
         progress = false
#3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0, offset=4292007936,
     qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
         aio_context = 0x5555563528e0
         co = 0x5555563888a0
         rwco = {bs = 0x55555637eae0, offset = 4292007936,
           qiov = 0x7ffff554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0, sector_num=8382828,
     buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
     at block.c:2722
         qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size = 2048}
         iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
#5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0, sector_num=8382828,
     buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
No locals.
#6  0x000055555599acef in blk_read (blk=0x555556376820, sector_num=8382828,
     buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88, lba=2095707,
     buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
         ret = 32767
#8  0x0000555555834202 in ide_atapi_cmd_reply_end (s=0x555556408f88)
     at hw/ide/atapi.c:190
         byte_count_limit = 21845
         size = 1801980
         ret = 0
#9  0x0000555555834657 in ide_atapi_cmd_read_pio (s=0x555556408f88,
     lba=2095707, nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:279
No locals.
#10 0x0000555555834b25 in ide_atapi_cmd_read (s=0x555556408f88, lba=2095707,
     nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:393
No locals.
#11 0x00005555558358ed in cmd_read (s=0x555556408f88, buf=0x7ffff44cc800 "(")
     at hw/ide/atapi.c:824
         nb_sectors = 16
         lba = 2095707
#12 0x0000555555836373 in ide_atapi_cmd (s=0x555556408f88)
     at hw/ide/atapi.c:1152
         buf = 0x7ffff44cc800 "("
#13 0x00005555558323e1 in ide_data_writew (opaque=0x555556408f08, addr=368,
     val=0) at hw/ide/core.c:2020
         bus = 0x555556408f08
         s = 0x555556408f88
         p = 0x7ffff44cc80c "IHDR"
     buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
No locals.
#6  0x000055555599acef in blk_read (blk=0x555556376820, sector_num=8382828,
     buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88, lba=2095707,
     buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
         ret = 32767
#8  0x0000555555834202 in ide_atapi_cmd_reply_end (s=0x555556408f88)
     at hw/ide/atapi.c:190
         byte_count_limit = 21845
         size = 1801980
         ret = 0
#9  0x0000555555834657 in ide_atapi_cmd_read_pio (s=0x555556408f88,
     lba=2095707, nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:279
No locals.
#10 0x0000555555834b25 in ide_atapi_cmd_read (s=0x555556408f88, lba=2095707,
     nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:393
No locals.
#11 0x00005555558358ed in cmd_read (s=0x555556408f88, buf=0x7ffff44cc800 "(")
     at hw/ide/atapi.c:824
         nb_sectors = 16
         lba = 2095707
#12 0x0000555555836373 in ide_atapi_cmd (s=0x555556408f88)
     at hw/ide/atapi.c:1152
         buf = 0x7ffff44cc800 "("
#13 0x00005555558323e1 in ide_data_writew (opaque=0x555556408f08, addr=368,
     val=0) at hw/ide/core.c:2020
         bus = 0x555556408f08
         s = 0x555556408f88
         p = 0x7ffff44cc80c "IHDR"
#14 0x000055555564285f in portio_write (opaque=0x55555641d5d0, addr=0, data=0,
     size=2) at /usr/src/qemu-2.2.0/ioport.c:204
         mrpio = 0x55555641d5d0
         mrp = 0x55555641d6f8
         __PRETTY_FUNCTION__ = "portio_write"
#15 0x000055555564f07c in memory_region_write_accessor (mr=0x55555641d5d0,
     addr=0, value=0x7ffff554fb28, size=2, shift=0, mask=65535)
     at /usr/src/qemu-2.2.0/memory.c:443
         tmp = 0
#16 0x000055555564f1c4 in access_with_adjusted_size (addr=0,
     value=0x7ffff554fb28, size=2, access_size_min=1, access_size_max=4,
     access=0x55555564efe0 <memory_region_write_accessor>, mr=0x55555641d5d0)
     at /usr/src/qemu-2.2.0/memory.c:480
         access_mask = 65535
         access_size = 2
         i = 0
#17 0x000055555565209f in memory_region_dispatch_write (mr=0x55555641d5d0,
     addr=0, data=0, size=2) at /usr/src/qemu-2.2.0/memory.c:1117
No locals.
#18 0x00005555556559c7 in io_mem_write (mr=0x55555641d5d0, addr=0, val=0,
     size=2) at /usr/src/qemu-2.2.0/memory.c:1973
No locals.
#19 0x00005555555fc4be in address_space_rw (as=0x555555e7a880, addr=368,
     buf=0x7ffff7ee6000 "", len=2, is_write=true)
     at /usr/src/qemu-2.2.0/exec.c:2141
         l = 2
         ptr = 0x55555567a7a6 "H\213E\370dH3\004%("
         val = 0
         addr1 = 0
         mr = 0x55555641d5d0
         error = false
#20 0x000055555564b454 in kvm_handle_io (port=368, data=0x7ffff7ee6000,
     size=2) at /usr/src/qemu-2.2.0/ioport.c:204
         mrpio = 0x55555641d5d0
         mrp = 0x55555641d6f8
         __PRETTY_FUNCTION__ = "portio_write"
#15 0x000055555564f07c in memory_region_write_accessor (mr=0x55555641d5d0,
     addr=0, value=0x7ffff554fb28, size=2, shift=0, mask=65535)
     at /usr/src/qemu-2.2.0/memory.c:443
         tmp = 0
#16 0x000055555564f1c4 in access_with_adjusted_size (addr=0,
     value=0x7ffff554fb28, size=2, access_size_min=1, access_size_max=4,
     access=0x55555564efe0 <memory_region_write_accessor>, mr=0x55555641d5d0)
     at /usr/src/qemu-2.2.0/memory.c:480
         access_mask = 65535
         access_size = 2
         i = 0
#17 0x000055555565209f in memory_region_dispatch_write (mr=0x55555641d5d0,
     addr=0, data=0, size=2) at /usr/src/qemu-2.2.0/memory.c:1117
No locals.
#18 0x00005555556559c7 in io_mem_write (mr=0x55555641d5d0, addr=0, val=0,
     size=2) at /usr/src/qemu-2.2.0/memory.c:1973
No locals.
#19 0x00005555555fc4be in address_space_rw (as=0x555555e7a880, addr=368,
     buf=0x7ffff7ee6000 "", len=2, is_write=true)
     at /usr/src/qemu-2.2.0/exec.c:2141
         l = 2
         ptr = 0x55555567a7a6 "H\213E\370dH3\004%("
         val = 0
         addr1 = 0
         mr = 0x55555641d5d0
         error = false
#20 0x000055555564b454 in kvm_handle_io (port=368, data=0x7ffff7ee6000,
     direction=1, size=2, count=1) at /usr/src/qemu-2.2.0/kvm-all.c:1632
         i = 0
         ptr = 0x7ffff7ee6000 ""
#21 0x000055555564baa4 in kvm_cpu_exec (cpu=0x55555638e7e0)
     at /usr/src/qemu-2.2.0/kvm-all.c:1789
         run = 0x7ffff7ee5000
         ret = 0
         run_ret = 0
#22 0x00005555556301dc in qemu_kvm_cpu_thread_fn (arg=0x55555638e7e0)
     at /usr/src/qemu-2.2.0/cpus.c:953
         cpu = 0x55555638e7e0
         r = 0
#23 0x00007ffff6065e9a in start_thread ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#24 0x00007ffff5d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#25 0x0000000000000000 in ?? ()
No symbol table info available.

Thank you,
Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  7:03         ` Peter Lieven
@ 2015-06-18  7:12           ` Peter Lieven
  2015-06-18  7:45             ` Kevin Wolf
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-06-18  7:12 UTC (permalink / raw)
  To: Paolo Bonzini, Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-devel, qemu block

Am 18.06.2015 um 09:03 schrieb Peter Lieven:
> Am 18.06.2015 um 08:59 schrieb Paolo Bonzini:
>>
>> On 18/06/2015 08:39, Peter Lieven wrote:
>>> It seems like the mainloop is waiting here:
>>>
>>> #0  0x00007ffff606c89c in __lll_lock_wait ()
>>>     from /lib/x86_64-linux-gnu/libpthread.so.0
>>> No symbol table info available.
>>> #1  0x00007ffff6068065 in _L_lock_858 ()
>>>     from /lib/x86_64-linux-gnu/libpthread.so.0
>>> No symbol table info available.
>>> #2  0x00007ffff6067eba in pthread_mutex_lock ()
>>>     from /lib/x86_64-linux-gnu/libpthread.so.0
>>> No symbol table info available.
>>> #3  0x00005555559f2557 in qemu_mutex_lock (mutex=0x555555ed6d40)
>>>      at util/qemu-thread-posix.c:76
>>>          err = 0
>>>          __func__ = "qemu_mutex_lock"
>>> #4  0x00005555556306ef in qemu_mutex_lock_iothread ()
>>>      at /usr/src/qemu-2.2.0/cpus.c:1123
>>> No locals.
>> This means the VCPU is busy with some synchronous activity---maybe a
>> bdrv_aio_cancel?
>
> Here is what the other threads are doing (dropped VNC thread):

Sorry, sth messed up while copying the buffer. Here should be the correct output:

(gdb) thread apply all bt full

Thread 4 (Thread 0x7fffee9ff700 (LWP 2640)):
#0  0x00007ffff6069d84 in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00005555559f27ae in qemu_cond_wait (cond=0x5555563beed0,
     mutex=0x5555563bef00) at util/qemu-thread-posix.c:135
         err = 0
         __func__ = "qemu_cond_wait"
#2  0x000055555593f12e in vnc_worker_thread_loop (queue=0x5555563beed0)
     at ui/vnc-jobs.c:222
         job = 0x55555637bbd0
         entry = 0x0
         tmp = 0x0
         vs = {csock = -1, dirty = {{0, 0, 0} <repeats 2048 times>},
           lossy_rect = 0x5555563ecd10, vd = 0x7ffff4465010, need_update = 0,
           force_update = 0, has_dirty = 0, features = 195, absolute = 0,
           last_x = 0, last_y = 0, last_bmask = 0, client_width = 0,
           client_height = 0, share_mode = 0, vnc_encoding = 5, major = 0,
           minor = 0, auth = 0, challenge = '\000' <repeats 15 times>,
           info = 0x0, output = {capacity = 6257, offset = 1348,
             buffer = 0x7fffe4000d10 ""}, input = {capacity = 0, offset = 0,
             buffer = 0x0},
           write_pixels = 0x555555925d57 <vnc_write_pixels_copy>, client_pf = {
             bits_per_pixel = 32 ' ', bytes_per_pixel = 4 '\004',
             depth = 24 '\030', rmask = 16711680, gmask = 65280, bmask = 255,
             amask = 0, rshift = 16 '\020', gshift = 8 '\b', bshift = 0 '\000',
             ashift = 24 '\030', rmax = 255 '\377', gmax = 255 '\377',
             bmax = 255 '\377', amax = 0 '\000', rbits = 8 '\b',
             gbits = 8 '\b', bbits = 8 '\b', abits = 0 '\000'},
           client_format = 0, client_be = false, audio_cap = 0x0, as = {
             freq = 0, nchannels = 0, fmt = AUD_FMT_U8, endianness = 0},
           read_handler = 0, read_handler_expect = 0,
           modifiers_state = '\000' <repeats 255 times>, led = 0x0,
           abort = false, initialized = false, output_mutex = {lock = {
               __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
                 __kind = 0, __spins = 0, __list = {__prev = 0x0,
                   __next = 0x0}}, __size = '\000' <repeats 39 times>,
               __align = 0}}, bh = 0x0, jobs_buffer = {capacity = 0,
             offset = 0, buffer = 0x0}, tight = {type = 0,
             quality = 255 '\377', compression = 9 '\t', pixel24 = 0 '\000',
             tight = {capacity = 0, offset = 0, buffer = 0x0}, tmp = {
               capacity = 0, offset = 0, buffer = 0x0}, zlib = {capacity = 0,
               offset = 0, buffer = 0x0}, gradient = {capacity = 0, offset = 0,
               buffer = 0x0}, levels = {0, 0, 0, 0}, stream = {{next_in = 0x0,
                 avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0,
                 total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0,
                 opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {
                 next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0,
                 avail_out = 0, total_out = 0, msg = 0x0, state = 0x0,
                 zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0,
                 reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0,
                 next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0,
                 state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0,
                 data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0,
                 avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0,
                 total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0,
                 opaque = 0x0, data_type = 0, adler = 0, reserved = 0}}},
           zlib = {zlib = {capacity = 0, offset = 0, buffer = 0x0}, tmp = {
               capacity = 0, offset = 0, buffer = 0x0}, stream = {
               next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0,
               avail_out = 0, total_out = 0, msg = 0x0, state = 0x0,
               zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0,
               reserved = 0}, level = 0}, hextile = {
             send_tile = 0x55555592d95a <send_hextile_tile_32>}, zrle = {
             type = 0, fb = {capacity = 0, offset = 0, buffer = 0x0}, zrle = {
               capacity = 0, offset = 0, buffer = 0x0}, tmp = {capacity = 0,
               offset = 0, buffer = 0x0}, zlib = {capacity = 0, offset = 0,
               buffer = 0x0}, stream = {next_in = 0x0, avail_in = 0,
               total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0,
               msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0,
               data_type = 0, adler = 0, reserved = 0}, palette = {pool = {{
                   idx = 0, color = 0, next = {le_next = 0x0,
                     le_prev = 0x0}} <repeats 256 times>}, size = 0, max = 0,
               bpp = 0, table = {{lh_first = 0x0} <repeats 256 times>}}},
           zywrle = {buf = {0 <repeats 4090 times>, 128, 0, -167350680, 32767,
               0, 0}}, mouse_mode_notifier = {notify = 0, node = {
               le_next = 0x0, le_prev = 0x5555559f2f90}}, next = {
             tqe_next = 0x0, tqe_prev = 0x5555563bef28}}
         n_rectangles = 14
         saved_offset = 2
#3  0x000055555593f691 in vnc_worker_thread (arg=0x5555563beed0)
     at ui/vnc-jobs.c:323
         queue = 0x5555563beed0
#4  0x00007ffff6065e9a in start_thread ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#5  0x00007ffff5d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#6  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 3 (Thread 0x7ffff4d4f700 (LWP 2637)):
#0  0x00007ffff606c89c in __lll_lock_wait ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff6068065 in _L_lock_858 ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x00007ffff6067eba in pthread_mutex_lock ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x00005555559f2557 in qemu_mutex_lock (mutex=0x555555ed6d40)
     at util/qemu-thread-posix.c:76
         err = 0
         __func__ = "qemu_mutex_lock"
#4  0x00005555556306ef in qemu_mutex_lock_iothread ()
     at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x000055555564b9ac in kvm_cpu_exec (cpu=0x5555563cb870)
     at /usr/src/qemu-2.2.0/kvm-all.c:1770
         run = 0x7ffff7ee2000
         ret = 65536
         run_ret = -4
#6  0x00005555556301dc in qemu_kvm_cpu_thread_fn (arg=0x5555563cb870)
     at /usr/src/qemu-2.2.0/cpus.c:953
         cpu = 0x5555563cb870
         r = 65536
#7  0x00007ffff6065e9a in start_thread ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007ffff5d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
#0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
     timeout=4999424576) at qemu-timer.c:326
         ts = {tv_sec = 4, tv_nsec = 999424576}
         tvsec = 4
#2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
     at aio-posix.c:231
         node = 0x0
         was_dispatching = false
         ret = 1
         progress = false
#3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0, offset=4292007936,
     qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
         aio_context = 0x5555563528e0
         co = 0x5555563888a0
         rwco = {bs = 0x55555637eae0, offset = 4292007936,
           qiov = 0x7ffff554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0, sector_num=8382828,
     buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
     at block.c:2722
         qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size = 2048}
         iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
#5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0, sector_num=8382828,
     buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
No locals.
#6  0x000055555599acef in blk_read (blk=0x555556376820, sector_num=8382828,
     buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88, lba=2095707,
     buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
         ret = 32767
#8  0x0000555555834202 in ide_atapi_cmd_reply_end (s=0x555556408f88)
     at hw/ide/atapi.c:190
         byte_count_limit = 21845
         size = 1801980
         ret = 0
#9  0x0000555555834657 in ide_atapi_cmd_read_pio (s=0x555556408f88,
     lba=2095707, nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:279
No locals.
#10 0x0000555555834b25 in ide_atapi_cmd_read (s=0x555556408f88, lba=2095707,
     nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:393
No locals.
#11 0x00005555558358ed in cmd_read (s=0x555556408f88, buf=0x7ffff44cc800 "(")
     at hw/ide/atapi.c:824
         nb_sectors = 16
         lba = 2095707
#12 0x0000555555836373 in ide_atapi_cmd (s=0x555556408f88)
     at hw/ide/atapi.c:1152
         buf = 0x7ffff44cc800 "("
#13 0x00005555558323e1 in ide_data_writew (opaque=0x555556408f08, addr=368,
     val=0) at hw/ide/core.c:2020
         bus = 0x555556408f08
         s = 0x555556408f88
         p = 0x7ffff44cc80c "IHDR"
#14 0x000055555564285f in portio_write (opaque=0x55555641d5d0, addr=0, data=0,
     size=2) at /usr/src/qemu-2.2.0/ioport.c:204
         mrpio = 0x55555641d5d0
         mrp = 0x55555641d6f8
         __PRETTY_FUNCTION__ = "portio_write"
#15 0x000055555564f07c in memory_region_write_accessor (mr=0x55555641d5d0,
     addr=0, value=0x7ffff554fb28, size=2, shift=0, mask=65535)
     at /usr/src/qemu-2.2.0/memory.c:443
         tmp = 0
#16 0x000055555564f1c4 in access_with_adjusted_size (addr=0,
     value=0x7ffff554fb28, size=2, access_size_min=1, access_size_max=4,
     access=0x55555564efe0 <memory_region_write_accessor>, mr=0x55555641d5d0)
     at /usr/src/qemu-2.2.0/memory.c:480
         access_mask = 65535
         access_size = 2
         i = 0
#17 0x000055555565209f in memory_region_dispatch_write (mr=0x55555641d5d0,
     addr=0, data=0, size=2) at /usr/src/qemu-2.2.0/memory.c:1117
No locals.
#18 0x00005555556559c7 in io_mem_write (mr=0x55555641d5d0, addr=0, val=0,
     size=2) at /usr/src/qemu-2.2.0/memory.c:1973
No locals.
#19 0x00005555555fc4be in address_space_rw (as=0x555555e7a880, addr=368,
     buf=0x7ffff7ee6000 "", len=2, is_write=true)
     at /usr/src/qemu-2.2.0/exec.c:2141
         l = 2
         ptr = 0x55555567a7a6 "H\213E\370dH3\004%("
         val = 0
         addr1 = 0
         mr = 0x55555641d5d0
         error = false
#20 0x000055555564b454 in kvm_handle_io (port=368, data=0x7ffff7ee6000,
     direction=1, size=2, count=1) at /usr/src/qemu-2.2.0/kvm-all.c:1632
         i = 0
         ptr = 0x7ffff7ee6000 ""
#21 0x000055555564baa4 in kvm_cpu_exec (cpu=0x55555638e7e0)
     at /usr/src/qemu-2.2.0/kvm-all.c:1789
         run = 0x7ffff7ee5000
         ret = 0
         run_ret = 0
#22 0x00005555556301dc in qemu_kvm_cpu_thread_fn (arg=0x55555638e7e0)
     at /usr/src/qemu-2.2.0/cpus.c:953
         cpu = 0x55555638e7e0
         r = 0
#23 0x00007ffff6065e9a in start_thread ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#24 0x00007ffff5d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#25 0x0000000000000000 in ?? ()
No symbol table info available.

Thread 1 (Thread 0x7ffff7fea900 (LWP 2633)):
#0  0x00007ffff606c89c in __lll_lock_wait ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x00007ffff6068065 in _L_lock_858 ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x00007ffff6067eba in pthread_mutex_lock ()
    from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x00005555559f2557 in qemu_mutex_lock (mutex=0x555555ed6d40)
     at util/qemu-thread-posix.c:76
         err = 0
         __func__ = "qemu_mutex_lock"
#4  0x00005555556306ef in qemu_mutex_lock_iothread ()
     at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x0000555555954a87 in os_host_main_loop_wait (timeout=79413589)
     at main-loop.c:242
         ret = 1
         spin_counter = 0
#6  0x0000555555954b5f in main_loop_wait (nonblocking=0) at main-loop.c:494
         ret = 15
         timeout = 4294967295
         timeout_ns = 79413589
#7  0x000055555575e702 in main_loop () at vl.c:1882
         nonblocking = false
         last_io = 1
#8  0x00005555557662ee in main (argc=52, argv=0x7fffffffe278,
     envp=0x7fffffffe420) at vl.c:4401
         i = 128
         snapshot = 0
         linux_boot = 0
         initrd_filename = 0x0
         kernel_filename = 0x0
         kernel_cmdline = 0x555555a3116e ""
         boot_order = 0x555556352270 "dc"
         ds = 0x5555563e2e20
         cyls = 0
         heads = 0
         secs = 0
         translation = 0
         hda_opts = 0x0
         opts = 0x555556352140
         machine_opts = 0x55555634c5b0
         icount_opts = 0x0
         olist = 0x555555e27a40
         optind = 52
         optarg = 0x0
         loadvm = 0x0
         machine_class = 0x555556345cb0
         cpu_model = 0x7fffffffe9d2 "qemu64,+fpu,+vme,+de,+pse,+tsc,+msr,+pae,+mce,+cx8,+apic,+sep,+mtrr,+pge,+mca,+cmov,+pat,+pse36,+clflush,+acpi,+mmx,+fxsr,+sse,+sse2,+ss,+ht,+tm,+pbe,+syscall,+nx,+pdpe1gb,+rdts
cp,+lm,+pni,+pclmulqdq,"...
         vga_model = 0x7fffffffeb67 "vmware"
         qtest_chrdev = 0x0
         qtest_log = 0x0
         pid_file = 0x7fffffffe990 "/var/run/qemu/vm-3092.pid"
         incoming = 0x0
         show_vnc_port = 0
         defconfig = true
         userconfig = true
         log_mask = 0x0
         log_file = 0x0
         mem_trace = {malloc = 0x555555761bf9 <malloc_and_trace>,
           realloc = 0x555555761c51 <realloc_and_trace>,
           free = 0x555555761cb8 <free_and_trace>, calloc = 0, try_malloc = 0,
           try_realloc = 0}
         trace_events = 0x0
         trace_file = 0x0
         default_ram_size = 134217728
         maxram_size = 8589934592
         ram_slots = 0
         vmstate_dump_file = 0x0
         main_loop_err = 0x0
         __func__ = "main"
(gdb)

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  7:12           ` Peter Lieven
@ 2015-06-18  7:45             ` Kevin Wolf
  2015-06-18  8:30               ` Peter Lieven
  0 siblings, 1 reply; 25+ messages in thread
From: Kevin Wolf @ 2015-06-18  7:45 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Paolo Bonzini, jsnow, qemu-devel, qemu block, Stefan Hajnoczi

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
> #0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
> No symbol table info available.
> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
>     timeout=4999424576) at qemu-timer.c:326
>         ts = {tv_sec = 4, tv_nsec = 999424576}
>         tvsec = 4
> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
>     at aio-posix.c:231
>         node = 0x0
>         was_dispatching = false
>         ret = 1
>         progress = false
> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0, offset=4292007936,
>     qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
>         aio_context = 0x5555563528e0
>         co = 0x5555563888a0
>         rwco = {bs = 0x55555637eae0, offset = 4292007936,
>           qiov = 0x7ffff554f760, is_write = false, ret = 2147483647, flags = 0}
> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0, sector_num=8382828,
>     buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
>     at block.c:2722
>         qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size = 2048}
>         iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0, sector_num=8382828,
>     buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
> No locals.
> #6  0x000055555599acef in blk_read (blk=0x555556376820, sector_num=8382828,
>     buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
> No locals.
> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88, lba=2095707,
>     buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>         ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.

We can (and should) fix that, otherwise the VCPUs is blocked while we're
reading from the image, even without a hang. It doesn't fully fix your
problem, though, as bdrv_drain_all() and friends still exist.

Kevin

> #8  0x0000555555834202 in ide_atapi_cmd_reply_end (s=0x555556408f88)
>     at hw/ide/atapi.c:190
>         byte_count_limit = 21845
>         size = 1801980
>         ret = 0
> #9  0x0000555555834657 in ide_atapi_cmd_read_pio (s=0x555556408f88,
>     lba=2095707, nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:279
> No locals.
> #10 0x0000555555834b25 in ide_atapi_cmd_read (s=0x555556408f88, lba=2095707,
>     nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:393
> No locals.
> #11 0x00005555558358ed in cmd_read (s=0x555556408f88, buf=0x7ffff44cc800 "(")
>     at hw/ide/atapi.c:824
>         nb_sectors = 16
>         lba = 2095707
> #12 0x0000555555836373 in ide_atapi_cmd (s=0x555556408f88)
>     at hw/ide/atapi.c:1152
>         buf = 0x7ffff44cc800 "("
> #13 0x00005555558323e1 in ide_data_writew (opaque=0x555556408f08, addr=368,
>     val=0) at hw/ide/core.c:2020
>         bus = 0x555556408f08
>         s = 0x555556408f88
>         p = 0x7ffff44cc80c "IHDR"
> #14 0x000055555564285f in portio_write (opaque=0x55555641d5d0, addr=0, data=0,
>     size=2) at /usr/src/qemu-2.2.0/ioport.c:204
>         mrpio = 0x55555641d5d0
>         mrp = 0x55555641d6f8
>         __PRETTY_FUNCTION__ = "portio_write"
> #15 0x000055555564f07c in memory_region_write_accessor (mr=0x55555641d5d0,
>     addr=0, value=0x7ffff554fb28, size=2, shift=0, mask=65535)
>     at /usr/src/qemu-2.2.0/memory.c:443
>         tmp = 0
> #16 0x000055555564f1c4 in access_with_adjusted_size (addr=0,
>     value=0x7ffff554fb28, size=2, access_size_min=1, access_size_max=4,
>     access=0x55555564efe0 <memory_region_write_accessor>, mr=0x55555641d5d0)
>     at /usr/src/qemu-2.2.0/memory.c:480
>         access_mask = 65535
>         access_size = 2
>         i = 0
> #17 0x000055555565209f in memory_region_dispatch_write (mr=0x55555641d5d0,
>     addr=0, data=0, size=2) at /usr/src/qemu-2.2.0/memory.c:1117
> No locals.
> #18 0x00005555556559c7 in io_mem_write (mr=0x55555641d5d0, addr=0, val=0,
>     size=2) at /usr/src/qemu-2.2.0/memory.c:1973
> No locals.
> #19 0x00005555555fc4be in address_space_rw (as=0x555555e7a880, addr=368,
>     buf=0x7ffff7ee6000 "", len=2, is_write=true)
>     at /usr/src/qemu-2.2.0/exec.c:2141
>         l = 2
>         ptr = 0x55555567a7a6 "H\213E\370dH3\004%("
>         val = 0
>         addr1 = 0
>         mr = 0x55555641d5d0
>         error = false
> #20 0x000055555564b454 in kvm_handle_io (port=368, data=0x7ffff7ee6000,
>     direction=1, size=2, count=1) at /usr/src/qemu-2.2.0/kvm-all.c:1632
>         i = 0
>         ptr = 0x7ffff7ee6000 ""
> #21 0x000055555564baa4 in kvm_cpu_exec (cpu=0x55555638e7e0)
>     at /usr/src/qemu-2.2.0/kvm-all.c:1789
>         run = 0x7ffff7ee5000
>         ret = 0
>         run_ret = 0
> #22 0x00005555556301dc in qemu_kvm_cpu_thread_fn (arg=0x55555638e7e0)
>     at /usr/src/qemu-2.2.0/cpus.c:953
>         cpu = 0x55555638e7e0
>         r = 0
> #23 0x00007ffff6065e9a in start_thread ()
>    from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #24 0x00007ffff5d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> No symbol table info available.
> #25 0x0000000000000000 in ?? ()
> No symbol table info available.
> 
> Thread 1 (Thread 0x7ffff7fea900 (LWP 2633)):
> #0  0x00007ffff606c89c in __lll_lock_wait ()
>    from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #1  0x00007ffff6068065 in _L_lock_858 ()
>    from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #2  0x00007ffff6067eba in pthread_mutex_lock ()
>    from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #3  0x00005555559f2557 in qemu_mutex_lock (mutex=0x555555ed6d40)
>     at util/qemu-thread-posix.c:76
>         err = 0
>         __func__ = "qemu_mutex_lock"
> #4  0x00005555556306ef in qemu_mutex_lock_iothread ()
>     at /usr/src/qemu-2.2.0/cpus.c:1123
> No locals.
> #5  0x0000555555954a87 in os_host_main_loop_wait (timeout=79413589)
>     at main-loop.c:242
>         ret = 1
>         spin_counter = 0
> #6  0x0000555555954b5f in main_loop_wait (nonblocking=0) at main-loop.c:494
>         ret = 15
>         timeout = 4294967295
>         timeout_ns = 79413589
> #7  0x000055555575e702 in main_loop () at vl.c:1882
>         nonblocking = false
>         last_io = 1
> #8  0x00005555557662ee in main (argc=52, argv=0x7fffffffe278,
>     envp=0x7fffffffe420) at vl.c:4401
>         i = 128
>         snapshot = 0
>         linux_boot = 0
>         initrd_filename = 0x0
>         kernel_filename = 0x0
>         kernel_cmdline = 0x555555a3116e ""
>         boot_order = 0x555556352270 "dc"
>         ds = 0x5555563e2e20
>         cyls = 0
>         heads = 0
>         secs = 0
>         translation = 0
>         hda_opts = 0x0
>         opts = 0x555556352140
>         machine_opts = 0x55555634c5b0
>         icount_opts = 0x0
>         olist = 0x555555e27a40
>         optind = 52
>         optarg = 0x0
>         loadvm = 0x0
>         machine_class = 0x555556345cb0
>         cpu_model = 0x7fffffffe9d2 "qemu64,+fpu,+vme,+de,+pse,+tsc,+msr,+pae,+mce,+cx8,+apic,+sep,+mtrr,+pge,+mca,+cmov,+pat,+pse36,+clflush,+acpi,+mmx,+fxsr,+sse,+sse2,+ss,+ht,+tm,+pbe,+syscall,+nx,+pdpe1gb,+rdts
> cp,+lm,+pni,+pclmulqdq,"...
>         vga_model = 0x7fffffffeb67 "vmware"
>         qtest_chrdev = 0x0
>         qtest_log = 0x0
>         pid_file = 0x7fffffffe990 "/var/run/qemu/vm-3092.pid"
>         incoming = 0x0
>         show_vnc_port = 0
>         defconfig = true
>         userconfig = true
>         log_mask = 0x0
>         log_file = 0x0
>         mem_trace = {malloc = 0x555555761bf9 <malloc_and_trace>,
>           realloc = 0x555555761c51 <realloc_and_trace>,
>           free = 0x555555761cb8 <free_and_trace>, calloc = 0, try_malloc = 0,
>           try_realloc = 0}
>         trace_events = 0x0
>         trace_file = 0x0
>         default_ram_size = 134217728
>         maxram_size = 8589934592
>         ram_slots = 0
>         vmstate_dump_file = 0x0
>         main_loop_err = 0x0
>         __func__ = "main"
> (gdb)
> 
> Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  7:45             ` Kevin Wolf
@ 2015-06-18  8:30               ` Peter Lieven
  2015-06-18  8:42                 ` Kevin Wolf
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-06-18  8:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, jsnow, qemu-devel, qemu block, Stefan Hajnoczi

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>> #0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
>> No symbol table info available.
>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
>>      timeout=4999424576) at qemu-timer.c:326
>>          ts = {tv_sec = 4, tv_nsec = 999424576}
>>          tvsec = 4
>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
>>      at aio-posix.c:231
>>          node = 0x0
>>          was_dispatching = false
>>          ret = 1
>>          progress = false
>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0, offset=4292007936,
>>      qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
>>          aio_context = 0x5555563528e0
>>          co = 0x5555563888a0
>>          rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>            qiov = 0x7ffff554f760, is_write = false, ret = 2147483647, flags = 0}
>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0, sector_num=8382828,
>>      buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
>>      at block.c:2722
>>          qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size = 2048}
>>          iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0, sector_num=8382828,
>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>> No locals.
>> #6  0x000055555599acef in blk_read (blk=0x555556376820, sector_num=8382828,
>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
>> No locals.
>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88, lba=2095707,
>>      buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>>          ret = 32767
> Here is the problem: The ATAPI emulation uses synchronous blk_read()
> instead of the AIO or coroutine interfaces. This means that it keeps
> polling for request completion while it holds the BQL until the request
> is completed.

I will look at this.

>
> We can (and should) fix that, otherwise the VCPUs is blocked while we're
> reading from the image, even without a hang. It doesn't fully fix your
> problem, though, as bdrv_drain_all() and friends still exist.

Any idea which commands actually call bdrv_drain_alll?

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  8:30               ` Peter Lieven
@ 2015-06-18  8:42                 ` Kevin Wolf
  2015-06-18  9:29                   ` Peter Lieven
  2015-06-18 10:17                   ` Peter Lieven
  0 siblings, 2 replies; 25+ messages in thread
From: Kevin Wolf @ 2015-06-18  8:42 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Paolo Bonzini, jsnow, qemu-devel, qemu block, Stefan Hajnoczi

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
> >Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
> >>Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
> >>#0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
> >>No symbol table info available.
> >>#1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
> >>     timeout=4999424576) at qemu-timer.c:326
> >>         ts = {tv_sec = 4, tv_nsec = 999424576}
> >>         tvsec = 4
> >>#2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
> >>     at aio-posix.c:231
> >>         node = 0x0
> >>         was_dispatching = false
> >>         ret = 1
> >>         progress = false
> >>#3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0, offset=4292007936,
> >>     qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
> >>         aio_context = 0x5555563528e0
> >>         co = 0x5555563888a0
> >>         rwco = {bs = 0x55555637eae0, offset = 4292007936,
> >>           qiov = 0x7ffff554f760, is_write = false, ret = 2147483647, flags = 0}
> >>#4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0, sector_num=8382828,
> >>     buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
> >>     at block.c:2722
> >>         qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size = 2048}
> >>         iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
> >>#5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0, sector_num=8382828,
> >>     buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
> >>No locals.
> >>#6  0x000055555599acef in blk_read (blk=0x555556376820, sector_num=8382828,
> >>     buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
> >>No locals.
> >>#7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88, lba=2095707,
> >>     buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
> >>         ret = 32767
> >Here is the problem: The ATAPI emulation uses synchronous blk_read()
> >instead of the AIO or coroutine interfaces. This means that it keeps
> >polling for request completion while it holds the BQL until the request
> >is completed.
> 
> I will look at this.
> 
> >
> >We can (and should) fix that, otherwise the VCPUs is blocked while we're
> >reading from the image, even without a hang. It doesn't fully fix your
> >problem, though, as bdrv_drain_all() and friends still exist.
> 
> Any idea which commands actually call bdrv_drain_alll?

At least 'stop' and all commands changing the BDS graph (block jobs,
snapshots, commit, etc.). For a full list, I would have to inspect each
command in the code.

The guest can even trigger bdrv_drain_all() by stopping a running DMA
operation.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  8:42                 ` Kevin Wolf
@ 2015-06-18  9:29                   ` Peter Lieven
  2015-06-18  9:36                     ` Stefan Hajnoczi
  2015-06-18 10:17                   ` Peter Lieven
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-06-18  9:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, jsnow, qemu-devel, qemu block, Stefan Hajnoczi

Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>> #0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
>>>> No symbol table info available.
>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
>>>>      timeout=4999424576) at qemu-timer.c:326
>>>>          ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>          tvsec = 4
>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
>>>>      at aio-posix.c:231
>>>>          node = 0x0
>>>>          was_dispatching = false
>>>>          ret = 1
>>>>          progress = false
>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0, offset=4292007936,
>>>>      qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
>>>>          aio_context = 0x5555563528e0
>>>>          co = 0x5555563888a0
>>>>          rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>            qiov = 0x7ffff554f760, is_write = false, ret = 2147483647, flags = 0}
>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0, sector_num=8382828,
>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
>>>>      at block.c:2722
>>>>          qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size = 2048}
>>>>          iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0, sector_num=8382828,
>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>> No locals.
>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820, sector_num=8382828,
>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
>>>> No locals.
>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88, lba=2095707,
>>>>      buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>>>>          ret = 32767
>>> Here is the problem: The ATAPI emulation uses synchronous blk_read()
>>> instead of the AIO or coroutine interfaces. This means that it keeps
>>> polling for request completion while it holds the BQL until the request
>>> is completed.
>> I will look at this.

I need some further help. My way to "emulate" a hung NFS Server is to
block it in the Firewall. Currently I face the problem that I cannot mount
a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
a kernel NFS mount). It reads a few sectors and then stalls (maybe another bug):

(gdb) thread apply all bt full

Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
#0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at util/qemu-thread-posix.c:120
         err = <optimized out>
         __func__ = "qemu_cond_broadcast"
#1  0x0000555555911164 in rfifolock_unlock (r=r@entry=0x555556259910) at util/rfifolock.c:75
         __PRETTY_FUNCTION__ = "rfifolock_unlock"
#2  0x0000555555875921 in aio_context_release (ctx=ctx@entry=0x5555562598b0) at async.c:329
No locals.
#3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0, blocking=blocking@entry=true) at aio-posix.c:272
         node = <optimized out>
         was_dispatching = false
         i = <optimized out>
         ret = <optimized out>
         progress = false
         timeout = 611734526
         __PRETTY_FUNCTION__ = "aio_poll"
#4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0, offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:552
         aio_context = 0x5555562598b0
         co = <optimized out>
         rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov = 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
#5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0, sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(", nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
     flags=flags@entry=(unknown: 0)) at block/io.c:575
         qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size = 2048}
         iov = {iov_base = 0x555557874800, iov_len = 2048}
#6  0x00005555558bc593 in bdrv_read (bs=<optimized out>, sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(", nb_sectors=nb_sectors@entry=4) at block/io.c:583
No locals.
#7  0x00005555558af75d in blk_read (blk=<optimized out>, sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(", nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
         ret = <optimized out>
#8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>, buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at hw/ide/atapi.c:116
         ret = <optimized out>
#9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
         byte_count_limit = <optimized out>
         size = <optimized out>
         ret = 2
#10 0x00005555556398a6 in memory_region_write_accessor (mr=0x5555577f85d0, addr=<optimized out>, value=0x7ffff0c20a68, size=2, shift=<optimized out>, mask=<optimized out>, attrs=...)
     at /home/lieven/git/qemu/memory.c:459
         tmp = <optimized out>
#11 0x000055555563956b in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7ffff0c20a68, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>,
     access=access@entry=0x555555639840 <memory_region_write_accessor>, mr=mr@entry=0x5555577f85d0, attrs=attrs@entry=...) at /home/lieven/git/qemu/memory.c:518
         access_mask = 65535
         access_size = 2
         i = <optimized out>
         r = 0
#12 0x000055555563b3a9 in memory_region_dispatch_write (mr=mr@entry=0x5555577f85d0, addr=0, data=0, size=2, attrs=...) at /home/lieven/git/qemu/memory.c:1174
No locals.
#13 0x00005555555fcc00 in address_space_rw (as=0x555555d7c7c0 <address_space_io>, addr=addr@entry=368, attrs=..., attrs@entry=..., buf=buf@entry=0x7ffff7ff1000 "", len=len@entry=2, is_write=is_write@entry=true)
     at /home/lieven/git/qemu/exec.c:2357
         l = 2
         ptr = <optimized out>
         val = 0
         addr1 = 0
         mr = 0x5555577f85d0
         result = 0
#14 0x0000555555638610 in kvm_handle_io (count=1, size=2, direction=<optimized out>, data=<optimized out>, attrs=..., port=368) at /home/lieven/git/qemu/kvm-all.c:1636
         i = 0
         ptr = 0x7ffff7ff1000 ""
#15 kvm_cpu_exec (cpu=cpu@entry=0x555556295c30) at /home/lieven/git/qemu/kvm-all.c:1804
         attrs = {unspecified = 0, secure = 0, user = 0, stream_id = 0}
         run = 0x7ffff7ff0000
---Type <return> to continue, or q <return> to quit---
         ret = <optimized out>
         run_ret = <optimized out>
#16 0x00005555556232f2 in qemu_kvm_cpu_thread_fn (arg=0x555556295c30) at /home/lieven/git/qemu/cpus.c:976
         cpu = 0x555556295c30
         r = <optimized out>
#17 0x00007ffff5a49182 in start_thread (arg=0x7ffff0c21700) at pthread_create.c:312
         __res = <optimized out>
         pd = 0x7ffff0c21700
         now = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737232639744, 6130646130327736738, 1, 0, 140737232640448, 140737232639744, -6130648513365749342, -6130659796022144606}, mask_was_saved = 0}}, priv = {pad = {
               0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
         pagesize_m1 = <optimized out>
         sp = <optimized out>
         freesize = <optimized out>
         __PRETTY_FUNCTION__ = "start_thread"
#18 0x00007ffff577647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.

Thread 2 (Thread 0x7ffff1911700 (LWP 29709)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1  0x00005555559006a2 in futex_wait (val=4294967295, ev=0x55555620a124 <rcu_call_ready_event>) at util/qemu-thread-posix.c:301
No locals.
#2  qemu_event_wait (ev=ev@entry=0x55555620a124 <rcu_call_ready_event>) at util/qemu-thread-posix.c:399
         value = <optimized out>
#3  0x00005555559114e6 in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:233
         tries = 0
         n = <optimized out>
         node = <optimized out>
#4  0x00007ffff5a49182 in start_thread (arg=0x7ffff1911700) at pthread_create.c:312
         __res = <optimized out>
         pd = 0x7ffff1911700
         now = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737246205696, 6130646130327736738, 1, 0, 140737246206400, 140737246205696, -6130651373813968478, -6130659796022144606}, mask_was_saved = 0}}, priv = {pad = {
               0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
         pagesize_m1 = <optimized out>
         sp = <optimized out>
         freesize = <optimized out>
         __PRETTY_FUNCTION__ = "start_thread"
#5  0x00007ffff577647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.

Thread 1 (Thread 0x7ffff7fc8a80 (LWP 29705)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
No locals.
#1  0x00007ffff5a4b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x00007ffff5a4b480 in __GI___pthread_mutex_lock (mutex=0x555555dd5880 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:79
         __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
         type = 4294966784
#3  0x0000555555900039 in qemu_mutex_lock (mutex=mutex@entry=0x555555dd5880 <qemu_global_mutex>) at util/qemu-thread-posix.c:73
         err = <optimized out>
         __func__ = "qemu_mutex_lock"
#4  0x0000555555624cbc in qemu_mutex_lock_iothread () at /home/lieven/git/qemu/cpus.c:1152
No locals.
#5  0x00005555558823fb in os_host_main_loop_wait (timeout=11000972) at main-loop.c:241
         ret = 1
         spin_counter = 0
---Type <return> to continue, or q <return> to quit---
#6  main_loop_wait (nonblocking=<optimized out>) at main-loop.c:493
         ret = 1
         timeout = 1000
#7  0x00005555555f19ee in main_loop () at vl.c:1808
         nonblocking = <optimized out>
         last_io = 1
#8  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4470
         i = <optimized out>
         snapshot = <optimized out>
         linux_boot = <optimized out>
         initrd_filename = <optimized out>
         kernel_filename = <optimized out>
         kernel_cmdline = <optimized out>
         boot_order = <optimized out>
         boot_once = 0x0
         ds = <optimized out>
         cyls = <optimized out>
         heads = <optimized out>
         secs = <optimized out>
         translation = <optimized out>
         hda_opts = <optimized out>
         opts = <optimized out>
         icount_opts = <optimized out>
         olist = <optimized out>
         optind = 12
         optarg = 0x0
         loadvm = <optimized out>
         machine_class = 0x55555623d910
         cpu_model = <optimized out>
         vga_model = 0x55555592b65b "std"
         qtest_chrdev = <optimized out>
         qtest_log = <optimized out>
         pid_file = <optimized out>
         incoming = <optimized out>
         defconfig = <optimized out>
         userconfig = 48
         log_mask = <optimized out>
         log_file = <optimized out>
         mem_trace = {malloc = 0x55555570b380 <malloc_and_trace>, realloc = 0x55555570b340 <realloc_and_trace>, free = 0x55555570b300 <free_and_trace>, calloc = 0x0, try_malloc = 0x0, try_realloc = 0x0}
         trace_events = <optimized out>
         trace_file = <optimized out>
         maxram_size = <optimized out>
         ram_slots = <optimized out>
         vmstate_dump_file = <optimized out>
         main_loop_err = 0x0
         __func__ = "main"

Any ideas?

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  9:29                   ` Peter Lieven
@ 2015-06-18  9:36                     ` Stefan Hajnoczi
  2015-06-18  9:53                       ` Peter Lieven
  2015-06-19 13:14                       ` Peter Lieven
  0 siblings, 2 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-06-18  9:36 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Paolo Bonzini, John Snow, qemu-devel, qemu block

On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>
>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>
>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>
>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>
>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>> #0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
>>>>> No symbol table info available.
>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
>>>>>      timeout=4999424576) at qemu-timer.c:326
>>>>>          ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>          tvsec = 4
>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
>>>>>      at aio-posix.c:231
>>>>>          node = 0x0
>>>>>          was_dispatching = false
>>>>>          ret = 1
>>>>>          progress = false
>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>> offset=4292007936,
>>>>>      qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
>>>>>          aio_context = 0x5555563528e0
>>>>>          co = 0x5555563888a0
>>>>>          rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>            qiov = 0x7ffff554f760, is_write = false, ret = 2147483647,
>>>>> flags = 0}
>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>> sector_num=8382828,
>>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
>>>>>      at block.c:2722
>>>>>          qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size =
>>>>> 2048}
>>>>>          iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>> sector_num=8382828,
>>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>> No locals.
>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>> sector_num=8382828,
>>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
>>>>> No locals.
>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>> lba=2095707,
>>>>>      buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>>>>>          ret = 32767
>>>>
>>>> Here is the problem: The ATAPI emulation uses synchronous blk_read()
>>>> instead of the AIO or coroutine interfaces. This means that it keeps
>>>> polling for request completion while it holds the BQL until the request
>>>> is completed.
>>>
>>> I will look at this.
>
>
> I need some further help. My way to "emulate" a hung NFS Server is to
> block it in the Firewall. Currently I face the problem that I cannot mount
> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
> a kernel NFS mount). It reads a few sectors and then stalls (maybe another
> bug):
>
> (gdb) thread apply all bt full
>
> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
> util/qemu-thread-posix.c:120
>         err = <optimized out>
>         __func__ = "qemu_cond_broadcast"
> #1  0x0000555555911164 in rfifolock_unlock (r=r@entry=0x555556259910) at
> util/rfifolock.c:75
>         __PRETTY_FUNCTION__ = "rfifolock_unlock"
> #2  0x0000555555875921 in aio_context_release (ctx=ctx@entry=0x5555562598b0)
> at async.c:329
> No locals.
> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
> blocking=blocking@entry=true) at aio-posix.c:272
>         node = <optimized out>
>         was_dispatching = false
>         i = <optimized out>
>         ret = <optimized out>
>         progress = false
>         timeout = 611734526
>         __PRETTY_FUNCTION__ = "aio_poll"
> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
> block/io.c:552
>         aio_context = 0x5555562598b0
>         co = <optimized out>
>         rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>     flags=flags@entry=(unknown: 0)) at block/io.c:575
>         qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size = 2048}
>         iov = {iov_base = 0x555557874800, iov_len = 2048}
> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
> nb_sectors=nb_sectors@entry=4) at block/io.c:583
> No locals.
> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>         ret = <optimized out>
> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at hw/ide/atapi.c:116
>         ret = <optimized out>
> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>         byte_count_limit = <optimized out>
>         size = <optimized out>
>         ret = 2

This is still the same scenario Kevin explained.

The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
function holds the QEMU global mutex while waiting for the I/O request
to complete.  This blocks other vcpu threads and the main loop thread.

The solution is to convert the CD-ROM emulation code to use
blk_aio_readv() instead of blk_read().

Stefan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  9:36                     ` Stefan Hajnoczi
@ 2015-06-18  9:53                       ` Peter Lieven
  2015-06-19 13:14                       ` Peter Lieven
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2015-06-18  9:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, John Snow, qemu-devel, qemu block

Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
>>>>>> No symbol table info available.
>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
>>>>>>       timeout=4999424576) at qemu-timer.c:326
>>>>>>           ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>           tvsec = 4
>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
>>>>>>       at aio-posix.c:231
>>>>>>           node = 0x0
>>>>>>           was_dispatching = false
>>>>>>           ret = 1
>>>>>>           progress = false
>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>> offset=4292007936,
>>>>>>       qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
>>>>>>           aio_context = 0x5555563528e0
>>>>>>           co = 0x5555563888a0
>>>>>>           rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>             qiov = 0x7ffff554f760, is_write = false, ret = 2147483647,
>>>>>> flags = 0}
>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>> sector_num=8382828,
>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
>>>>>>       at block.c:2722
>>>>>>           qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size =
>>>>>> 2048}
>>>>>>           iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>> sector_num=8382828,
>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>> No locals.
>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>> sector_num=8382828,
>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
>>>>>> No locals.
>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>> lba=2095707,
>>>>>>       buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>>>>>>           ret = 32767
>>>>> Here is the problem: The ATAPI emulation uses synchronous blk_read()
>>>>> instead of the AIO or coroutine interfaces. This means that it keeps
>>>>> polling for request completion while it holds the BQL until the request
>>>>> is completed.
>>>> I will look at this.
>>
>> I need some further help. My way to "emulate" a hung NFS Server is to
>> block it in the Firewall. Currently I face the problem that I cannot mount
>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
>> a kernel NFS mount). It reads a few sectors and then stalls (maybe another
>> bug):
>>
>> (gdb) thread apply all bt full
>>
>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>> util/qemu-thread-posix.c:120
>>          err = <optimized out>
>>          __func__ = "qemu_cond_broadcast"
>> #1  0x0000555555911164 in rfifolock_unlock (r=r@entry=0x555556259910) at
>> util/rfifolock.c:75
>>          __PRETTY_FUNCTION__ = "rfifolock_unlock"
>> #2  0x0000555555875921 in aio_context_release (ctx=ctx@entry=0x5555562598b0)
>> at async.c:329
>> No locals.
>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>> blocking=blocking@entry=true) at aio-posix.c:272
>>          node = <optimized out>
>>          was_dispatching = false
>>          i = <optimized out>
>>          ret = <optimized out>
>>          progress = false
>>          timeout = 611734526
>>          __PRETTY_FUNCTION__ = "aio_poll"
>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>> block/io.c:552
>>          aio_context = 0x5555562598b0
>>          co = <optimized out>
>>          rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>      flags=flags@entry=(unknown: 0)) at block/io.c:575
>>          qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size = 2048}
>>          iov = {iov_base = 0x555557874800, iov_len = 2048}
>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>> No locals.
>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>          ret = <optimized out>
>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at hw/ide/atapi.c:116
>>          ret = <optimized out>
>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>          byte_count_limit = <optimized out>
>>          size = <optimized out>
>>          ret = 2
> This is still the same scenario Kevin explained.
>
> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
> function holds the QEMU global mutex while waiting for the I/O request
> to complete.  This blocks other vcpu threads and the main loop thread.
>
> The solution is to convert the CD-ROM emulation code to use
> blk_aio_readv() instead of blk_read().



Yes, my problem was that I was using a libnfs version that was broken.
I can now continue debugging. Sorry for the noise.

A problem I faced is that 'info block' queries the allocated file size of the
CDROM Image and thus hangs. I think nfs_get_allocated_file_size additionally
needs a timeout?

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  8:42                 ` Kevin Wolf
  2015-06-18  9:29                   ` Peter Lieven
@ 2015-06-18 10:17                   ` Peter Lieven
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2015-06-18 10:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, jsnow, qemu-devel, qemu block, Stefan Hajnoczi

Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>> #0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
>>>> No symbol table info available.
>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
>>>>      timeout=4999424576) at qemu-timer.c:326
>>>>          ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>          tvsec = 4
>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
>>>>      at aio-posix.c:231
>>>>          node = 0x0
>>>>          was_dispatching = false
>>>>          ret = 1
>>>>          progress = false
>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0, offset=4292007936,
>>>>      qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
>>>>          aio_context = 0x5555563528e0
>>>>          co = 0x5555563888a0
>>>>          rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>            qiov = 0x7ffff554f760, is_write = false, ret = 2147483647, flags = 0}
>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0, sector_num=8382828,
>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
>>>>      at block.c:2722
>>>>          qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size = 2048}
>>>>          iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0, sector_num=8382828,
>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>> No locals.
>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820, sector_num=8382828,
>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
>>>> No locals.
>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88, lba=2095707,
>>>>      buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>>>>          ret = 32767
>>> Here is the problem: The ATAPI emulation uses synchronous blk_read()
>>> instead of the AIO or coroutine interfaces. This means that it keeps
>>> polling for request completion while it holds the BQL until the request
>>> is completed.
>> I will look at this.
>>
>>> We can (and should) fix that, otherwise the VCPUs is blocked while we're
>>> reading from the image, even without a hang. It doesn't fully fix your
>>> problem, though, as bdrv_drain_all() and friends still exist.
>> Any idea which commands actually call bdrv_drain_alll?
> At least 'stop' and all commands changing the BDS graph (block jobs,
> snapshots, commit, etc.). For a full list, I would have to inspect each
> command in the code.
>
> The guest can even trigger bdrv_drain_all() by stopping a running DMA
> operation.

Unfortunately, excactly this is happening...
Is there any way to avoid the bdrv_drain_all in bmdma_cmd_writeb?

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-18  9:36                     ` Stefan Hajnoczi
  2015-06-18  9:53                       ` Peter Lieven
@ 2015-06-19 13:14                       ` Peter Lieven
  2015-06-22  9:25                         ` Stefan Hajnoczi
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-06-19 13:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, John Snow, qemu-devel, qemu block

Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
>>>>>> No symbol table info available.
>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
>>>>>>      timeout=4999424576) at qemu-timer.c:326
>>>>>>          ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>          tvsec = 4
>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
>>>>>>      at aio-posix.c:231
>>>>>>          node = 0x0
>>>>>>          was_dispatching = false
>>>>>>          ret = 1
>>>>>>          progress = false
>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>> offset=4292007936,
>>>>>>      qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
>>>>>>          aio_context = 0x5555563528e0
>>>>>>          co = 0x5555563888a0
>>>>>>          rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>            qiov = 0x7ffff554f760, is_write = false, ret = 2147483647,
>>>>>> flags = 0}
>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>> sector_num=8382828,
>>>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
>>>>>>      at block.c:2722
>>>>>>          qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size =
>>>>>> 2048}
>>>>>>          iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>> sector_num=8382828,
>>>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>> No locals.
>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>> sector_num=8382828,
>>>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
>>>>>> No locals.
>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>> lba=2095707,
>>>>>>      buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>>>>>>          ret = 32767
>>>>> Here is the problem: The ATAPI emulation uses synchronous blk_read()
>>>>> instead of the AIO or coroutine interfaces. This means that it keeps
>>>>> polling for request completion while it holds the BQL until the request
>>>>> is completed.
>>>> I will look at this.
>>
>> I need some further help. My way to "emulate" a hung NFS Server is to
>> block it in the Firewall. Currently I face the problem that I cannot mount
>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
>> a kernel NFS mount). It reads a few sectors and then stalls (maybe another
>> bug):
>>
>> (gdb) thread apply all bt full
>>
>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>> util/qemu-thread-posix.c:120
>>         err = <optimized out>
>>         __func__ = "qemu_cond_broadcast"
>> #1  0x0000555555911164 in rfifolock_unlock (r=r@entry=0x555556259910) at
>> util/rfifolock.c:75
>>         __PRETTY_FUNCTION__ = "rfifolock_unlock"
>> #2  0x0000555555875921 in aio_context_release (ctx=ctx@entry=0x5555562598b0)
>> at async.c:329
>> No locals.
>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>> blocking=blocking@entry=true) at aio-posix.c:272
>>         node = <optimized out>
>>         was_dispatching = false
>>         i = <optimized out>
>>         ret = <optimized out>
>>         progress = false
>>         timeout = 611734526
>>         __PRETTY_FUNCTION__ = "aio_poll"
>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>> block/io.c:552
>>         aio_context = 0x5555562598b0
>>         co = <optimized out>
>>         rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>     flags=flags@entry=(unknown: 0)) at block/io.c:575
>>         qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size = 2048}
>>         iov = {iov_base = 0x555557874800, iov_len = 2048}
>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>> No locals.
>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>         ret = <optimized out>
>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at hw/ide/atapi.c:116
>>         ret = <optimized out>
>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>         byte_count_limit = <optimized out>
>>         size = <optimized out>
>>         ret = 2
> This is still the same scenario Kevin explained.
>
> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
> function holds the QEMU global mutex while waiting for the I/O request
> to complete.  This blocks other vcpu threads and the main loop thread.
>
> The solution is to convert the CD-ROM emulation code to use
> blk_aio_readv() instead of blk_read().

I tried a little, but i am stuck with my approach. I reads one sector
and then doesn't continue. Maybe someone with more knowledge
of ATAPI/IDE could help?

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 950e311..cdcbd49 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -27,6 +27,8 @@
 #include "hw/scsi/scsi.h"
 #include "sysemu/block-backend.h"
 
+#define DEBUG_IDE_ATAPI 1
+
 static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret);
 
 static void padstr8(uint8_t *buf, int buf_size, const char *src)
@@ -105,31 +107,55 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
     memset(buf, 0, 288);
 }
 
-static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
+static void cd_read_sector_cb(void *opaque, int ret) {
+    IDEState *s = opaque;
+
+    block_acct_done(blk_get_stats(s->blk), &s->acct);
+
+    printf("cd_read_sector_cb lba %d ret = %d\n", s->lba, ret);
+
+    if (ret < 0) {
+        ide_atapi_io_error(s, ret);
+        return;
+    }
+
+    if (s->cd_sector_size == 2352) {
+        cd_data_to_raw(s->io_buffer, s->lba);
+    }
+
+    s->lba++;
+    s->io_buffer_index = 0;
+
+    ide_atapi_cmd_reply_end(s);
+}
+
+static BlockAIOCB *cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
 {
-    int ret;
+    BlockAIOCB *aiocb = NULL;
 
-    switch(sector_size) {
-    case 2048:
-        block_acct_start(blk_get_stats(s->blk), &s->acct,
-                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
-        block_acct_done(blk_get_stats(s->blk), &s->acct);
-        break;
-    case 2352:
+    if (sector_size != 2048 && sector_size != 2352) {
+        return NULL;
+    }
+
+    s->iov.iov_base = buf;
+    if (sector_size == 2352) {
+        buf += 4;
+    }
+
+    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
+
+    printf("cd_read_sector lba %d\n", lba);
+
+    aiocb = blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
+                          cd_read_sector_cb, s);
+
+    if (aiocb != NULL) {
         block_acct_start(blk_get_stats(s->blk), &s->acct,
                          4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
-        block_acct_done(blk_get_stats(s->blk), &s->acct);
-        if (ret < 0)
-            return ret;
-        cd_data_to_raw(buf, lba);
-        break;
-    default:
-        ret = -EIO;
-        break;
     }
-    return ret;
+
+    return aiocb;
 }
 
 void ide_atapi_cmd_ok(IDEState *s)
@@ -170,7 +196,7 @@ void ide_atapi_io_error(IDEState *s, int ret)
 /* The whole ATAPI transfer logic is handled in this function */
 void ide_atapi_cmd_reply_end(IDEState *s)
 {
-    int byte_count_limit, size, ret;
+    int byte_count_limit, size;
 #ifdef DEBUG_IDE_ATAPI
     printf("reply: tx_size=%d elem_tx_size=%d index=%d\n",
            s->packet_transfer_size,
@@ -187,13 +213,10 @@ void ide_atapi_cmd_reply_end(IDEState *s)
     } else {
         /* see if a new sector must be read */
         if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
-            ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
-            if (ret < 0) {
-                ide_atapi_io_error(s, ret);
-                return;
+            if (cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size) == NULL) {
+                ide_atapi_io_error(s, -EIO);
             }
-            s->lba++;
-            s->io_buffer_index = 0;
+            return;
         }
         if (s->elementary_transfer_size > 0) {
             /* there are some data left to transmit in this elementary

Thanks,
Peter

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-19 13:14                       ` Peter Lieven
@ 2015-06-22  9:25                         ` Stefan Hajnoczi
  2015-06-22 13:09                           ` Peter Lieven
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-06-22  9:25 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Paolo Bonzini, John Snow, qemu-devel, qemu block

On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
>>>>>>> No symbol table info available.
>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
>>>>>>>      timeout=4999424576) at qemu-timer.c:326
>>>>>>>          ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>          tvsec = 4
>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
>>>>>>>      at aio-posix.c:231
>>>>>>>          node = 0x0
>>>>>>>          was_dispatching = false
>>>>>>>          ret = 1
>>>>>>>          progress = false
>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>> offset=4292007936,
>>>>>>>      qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
>>>>>>>          aio_context = 0x5555563528e0
>>>>>>>          co = 0x5555563888a0
>>>>>>>          rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>            qiov = 0x7ffff554f760, is_write = false, ret = 2147483647,
>>>>>>> flags = 0}
>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>> sector_num=8382828,
>>>>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
>>>>>>>      at block.c:2722
>>>>>>>          qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size =
>>>>>>> 2048}
>>>>>>>          iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>> sector_num=8382828,
>>>>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>> No locals.
>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>> sector_num=8382828,
>>>>>>>      buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
>>>>>>> No locals.
>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>> lba=2095707,
>>>>>>>      buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>>>>>>>          ret = 32767
>>>>>> Here is the problem: The ATAPI emulation uses synchronous blk_read()
>>>>>> instead of the AIO or coroutine interfaces. This means that it keeps
>>>>>> polling for request completion while it holds the BQL until the request
>>>>>> is completed.
>>>>> I will look at this.
>>>
>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>> block it in the Firewall. Currently I face the problem that I cannot mount
>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe another
>>> bug):
>>>
>>> (gdb) thread apply all bt full
>>>
>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>> util/qemu-thread-posix.c:120
>>>         err = <optimized out>
>>>         __func__ = "qemu_cond_broadcast"
>>> #1  0x0000555555911164 in rfifolock_unlock (r=r@entry=0x555556259910) at
>>> util/rfifolock.c:75
>>>         __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>> #2  0x0000555555875921 in aio_context_release (ctx=ctx@entry=0x5555562598b0)
>>> at async.c:329
>>> No locals.
>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>         node = <optimized out>
>>>         was_dispatching = false
>>>         i = <optimized out>
>>>         ret = <optimized out>
>>>         progress = false
>>>         timeout = 611734526
>>>         __PRETTY_FUNCTION__ = "aio_poll"
>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>> block/io.c:552
>>>         aio_context = 0x5555562598b0
>>>         co = <optimized out>
>>>         rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>     flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>         qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size = 2048}
>>>         iov = {iov_base = 0x555557874800, iov_len = 2048}
>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>> No locals.
>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>         ret = <optimized out>
>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at hw/ide/atapi.c:116
>>>         ret = <optimized out>
>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>         byte_count_limit = <optimized out>
>>>         size = <optimized out>
>>>         ret = 2
>> This is still the same scenario Kevin explained.
>>
>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
>> function holds the QEMU global mutex while waiting for the I/O request
>> to complete.  This blocks other vcpu threads and the main loop thread.
>>
>> The solution is to convert the CD-ROM emulation code to use
>> blk_aio_readv() instead of blk_read().
>
> I tried a little, but i am stuck with my approach. I reads one sector
> and then doesn't continue. Maybe someone with more knowledge
> of ATAPI/IDE could help?

Converting synchronous code to asynchronous requires an understanding
of the device's state transitions.  Asynchronous code has to put the
device registers into a busy state until the request completes.  It
also needs to handle hardware register accesses that occur while the
request is still pending.

I don't know ATAPI/IDE code well enough to suggest a fix.

> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 950e311..cdcbd49 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -27,6 +27,8 @@
>  #include "hw/scsi/scsi.h"
>  #include "sysemu/block-backend.h"
>
> +#define DEBUG_IDE_ATAPI 1
> +
>  static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret);
>
>  static void padstr8(uint8_t *buf, int buf_size, const char *src)
> @@ -105,31 +107,55 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>      memset(buf, 0, 288);
>  }
>
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> +static void cd_read_sector_cb(void *opaque, int ret) {
> +    IDEState *s = opaque;
> +
> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
> +
> +    printf("cd_read_sector_cb lba %d ret = %d\n", s->lba, ret);
> +
> +    if (ret < 0) {
> +        ide_atapi_io_error(s, ret);
> +        return;
> +    }
> +
> +    if (s->cd_sector_size == 2352) {
> +        cd_data_to_raw(s->io_buffer, s->lba);
> +    }
> +
> +    s->lba++;
> +    s->io_buffer_index = 0;
> +
> +    ide_atapi_cmd_reply_end(s);
> +}
> +
> +static BlockAIOCB *cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
>  {
> -    int ret;
> +    BlockAIOCB *aiocb = NULL;
>
> -    switch(sector_size) {
> -    case 2048:
> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> -        break;
> -    case 2352:
> +    if (sector_size != 2048 && sector_size != 2352) {
> +        return NULL;
> +    }
> +
> +    s->iov.iov_base = buf;
> +    if (sector_size == 2352) {
> +        buf += 4;
> +    }
> +
> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +
> +    printf("cd_read_sector lba %d\n", lba);
> +
> +    aiocb = blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> +                          cd_read_sector_cb, s);
> +
> +    if (aiocb != NULL) {
>          block_acct_start(blk_get_stats(s->blk), &s->acct,
>                           4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> -        if (ret < 0)
> -            return ret;
> -        cd_data_to_raw(buf, lba);
> -        break;
> -    default:
> -        ret = -EIO;
> -        break;
>      }
> -    return ret;
> +
> +    return aiocb;
>  }
>
>  void ide_atapi_cmd_ok(IDEState *s)
> @@ -170,7 +196,7 @@ void ide_atapi_io_error(IDEState *s, int ret)
>  /* The whole ATAPI transfer logic is handled in this function */
>  void ide_atapi_cmd_reply_end(IDEState *s)
>  {
> -    int byte_count_limit, size, ret;
> +    int byte_count_limit, size;
>  #ifdef DEBUG_IDE_ATAPI
>      printf("reply: tx_size=%d elem_tx_size=%d index=%d\n",
>             s->packet_transfer_size,
> @@ -187,13 +213,10 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>      } else {
>          /* see if a new sector must be read */
>          if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
> -            ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
> -            if (ret < 0) {
> -                ide_atapi_io_error(s, ret);
> -                return;
> +            if (cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size) == NULL) {
> +                ide_atapi_io_error(s, -EIO);
>              }
> -            s->lba++;
> -            s->io_buffer_index = 0;
> +            return;
>          }
>          if (s->elementary_transfer_size > 0) {
>              /* there are some data left to transmit in this elementary
>
> Thanks,
> Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-22  9:25                         ` Stefan Hajnoczi
@ 2015-06-22 13:09                           ` Peter Lieven
  2015-06-22 21:54                             ` John Snow
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-06-22 13:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, John Snow, qemu-devel, qemu block

Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
>>>>>>>> No symbol table info available.
>>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, nfds=3,
>>>>>>>>       timeout=4999424576) at qemu-timer.c:326
>>>>>>>>           ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>>           tvsec = 4
>>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0, blocking=true)
>>>>>>>>       at aio-posix.c:231
>>>>>>>>           node = 0x0
>>>>>>>>           was_dispatching = false
>>>>>>>>           ret = 1
>>>>>>>>           progress = false
>>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>>> offset=4292007936,
>>>>>>>>       qiov=0x7ffff554f760, is_write=false, flags=0) at block.c:2699
>>>>>>>>           aio_context = 0x5555563528e0
>>>>>>>>           co = 0x5555563888a0
>>>>>>>>           rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>>             qiov = 0x7ffff554f760, is_write = false, ret = 2147483647,
>>>>>>>> flags = 0}
>>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>>> sector_num=8382828,
>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, flags=0)
>>>>>>>>       at block.c:2722
>>>>>>>>           qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, size =
>>>>>>>> 2048}
>>>>>>>>           iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>>> sector_num=8382828,
>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>>> No locals.
>>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>>> sector_num=8382828,
>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block/block-backend.c:404
>>>>>>>> No locals.
>>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>>> lba=2095707,
>>>>>>>>       buf=0x7ffff44cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>>>>>>>>           ret = 32767
>>>>>>> Here is the problem: The ATAPI emulation uses synchronous blk_read()
>>>>>>> instead of the AIO or coroutine interfaces. This means that it keeps
>>>>>>> polling for request completion while it holds the BQL until the request
>>>>>>> is completed.
>>>>>> I will look at this.
>>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>>> block it in the Firewall. Currently I face the problem that I cannot mount
>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe another
>>>> bug):
>>>>
>>>> (gdb) thread apply all bt full
>>>>
>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>>> util/qemu-thread-posix.c:120
>>>>          err = <optimized out>
>>>>          __func__ = "qemu_cond_broadcast"
>>>> #1  0x0000555555911164 in rfifolock_unlock (r=r@entry=0x555556259910) at
>>>> util/rfifolock.c:75
>>>>          __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>>> #2  0x0000555555875921 in aio_context_release (ctx=ctx@entry=0x5555562598b0)
>>>> at async.c:329
>>>> No locals.
>>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>>          node = <optimized out>
>>>>          was_dispatching = false
>>>>          i = <optimized out>
>>>>          ret = <optimized out>
>>>>          progress = false
>>>>          timeout = 611734526
>>>>          __PRETTY_FUNCTION__ = "aio_poll"
>>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>>> block/io.c:552
>>>>          aio_context = 0x5555562598b0
>>>>          co = <optimized out>
>>>>          rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
>>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>>      flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>>          qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size = 2048}
>>>>          iov = {iov_base = 0x555557874800, iov_len = 2048}
>>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>>> No locals.
>>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>>          ret = <optimized out>
>>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at hw/ide/atapi.c:116
>>>>          ret = <optimized out>
>>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>>          byte_count_limit = <optimized out>
>>>>          size = <optimized out>
>>>>          ret = 2
>>> This is still the same scenario Kevin explained.
>>>
>>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
>>> function holds the QEMU global mutex while waiting for the I/O request
>>> to complete.  This blocks other vcpu threads and the main loop thread.
>>>
>>> The solution is to convert the CD-ROM emulation code to use
>>> blk_aio_readv() instead of blk_read().
>> I tried a little, but i am stuck with my approach. I reads one sector
>> and then doesn't continue. Maybe someone with more knowledge
>> of ATAPI/IDE could help?
> Converting synchronous code to asynchronous requires an understanding
> of the device's state transitions.  Asynchronous code has to put the
> device registers into a busy state until the request completes.  It
> also needs to handle hardware register accesses that occur while the
> request is still pending.

That was my assumption as well. But I don't know how to proceed...

>
> I don't know ATAPI/IDE code well enough to suggest a fix.

Maybe @John can help?

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-22 13:09                           ` Peter Lieven
@ 2015-06-22 21:54                             ` John Snow
  2015-06-23  6:36                               ` Peter Lieven
  2015-08-14 13:43                               ` Peter Lieven
  0 siblings, 2 replies; 25+ messages in thread
From: John Snow @ 2015-06-22 21:54 UTC (permalink / raw)
  To: Peter Lieven, Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, qemu block,
	Alexander Bezzubikov



On 06/22/2015 09:09 AM, Peter Lieven wrote:
> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from
>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>>>>>> No symbol table info available.
>>>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0,
>>>>>>>>> nfds=3,
>>>>>>>>>       timeout=4999424576) at qemu-timer.c:326
>>>>>>>>>           ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>>>           tvsec = 4
>>>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0,
>>>>>>>>> blocking=true)
>>>>>>>>>       at aio-posix.c:231
>>>>>>>>>           node = 0x0
>>>>>>>>>           was_dispatching = false
>>>>>>>>>           ret = 1
>>>>>>>>>           progress = false
>>>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>>>> offset=4292007936,
>>>>>>>>>       qiov=0x7ffff554f760, is_write=false, flags=0) at
>>>>>>>>> block.c:2699
>>>>>>>>>           aio_context = 0x5555563528e0
>>>>>>>>>           co = 0x5555563888a0
>>>>>>>>>           rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>>>             qiov = 0x7ffff554f760, is_write = false, ret =
>>>>>>>>> 2147483647,
>>>>>>>>> flags = 0}
>>>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>>>> sector_num=8382828,
>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false,
>>>>>>>>> flags=0)
>>>>>>>>>       at block.c:2722
>>>>>>>>>           qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1,
>>>>>>>>> size =
>>>>>>>>> 2048}
>>>>>>>>>           iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>>>> sector_num=8382828,
>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>>>> No locals.
>>>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>>>> sector_num=8382828,
>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at
>>>>>>>>> block/block-backend.c:404
>>>>>>>>> No locals.
>>>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>>>> lba=2095707,
>>>>>>>>>       buf=0x7ffff44cc800 "(", sector_size=2048) at
>>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>>           ret = 32767
>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous
>>>>>>>> blk_read()
>>>>>>>> instead of the AIO or coroutine interfaces. This means that it
>>>>>>>> keeps
>>>>>>>> polling for request completion while it holds the BQL until the
>>>>>>>> request
>>>>>>>> is completed.
>>>>>>> I will look at this.
>>>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>>>> block it in the Firewall. Currently I face the problem that I
>>>>> cannot mount
>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
>>>>> tried with
>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
>>>>> another
>>>>> bug):
>>>>>
>>>>> (gdb) thread apply all bt full
>>>>>
>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>>>> util/qemu-thread-posix.c:120
>>>>>          err = <optimized out>
>>>>>          __func__ = "qemu_cond_broadcast"
>>>>> #1  0x0000555555911164 in rfifolock_unlock
>>>>> (r=r@entry=0x555556259910) at
>>>>> util/rfifolock.c:75
>>>>>          __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>>>> #2  0x0000555555875921 in aio_context_release
>>>>> (ctx=ctx@entry=0x5555562598b0)
>>>>> at async.c:329
>>>>> No locals.
>>>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>>>          node = <optimized out>
>>>>>          was_dispatching = false
>>>>>          i = <optimized out>
>>>>>          ret = <optimized out>
>>>>>          progress = false
>>>>>          timeout = 611734526
>>>>>          __PRETTY_FUNCTION__ = "aio_poll"
>>>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>>>> block/io.c:552
>>>>>          aio_context = 0x5555562598b0
>>>>>          co = <optimized out>
>>>>>          rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags =
>>>>> (unknown: 0)}
>>>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>>>      flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>>>          qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size
>>>>> = 2048}
>>>>>          iov = {iov_base = 0x555557874800, iov_len = 2048}
>>>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>>>> No locals.
>>>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>>>          ret = <optimized out>
>>>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at
>>>>> hw/ide/atapi.c:116
>>>>>          ret = <optimized out>
>>>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>>>          byte_count_limit = <optimized out>
>>>>>          size = <optimized out>
>>>>>          ret = 2
>>>> This is still the same scenario Kevin explained.
>>>>
>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
>>>> function holds the QEMU global mutex while waiting for the I/O request
>>>> to complete.  This blocks other vcpu threads and the main loop thread.
>>>>
>>>> The solution is to convert the CD-ROM emulation code to use
>>>> blk_aio_readv() instead of blk_read().
>>> I tried a little, but i am stuck with my approach. I reads one sector
>>> and then doesn't continue. Maybe someone with more knowledge
>>> of ATAPI/IDE could help?
>> Converting synchronous code to asynchronous requires an understanding
>> of the device's state transitions.  Asynchronous code has to put the
>> device registers into a busy state until the request completes.  It
>> also needs to handle hardware register accesses that occur while the
>> request is still pending.
> 
> That was my assumption as well. But I don't know how to proceed...
> 
>>
>> I don't know ATAPI/IDE code well enough to suggest a fix.
> 
> Maybe @John can help?
> 
> Peter
> 

Sure thing. I will take a deep look as soon as I get my NCQ patches out
the door. I don't have high hopes for a proper comprehensive fix for
2.4, of course, but is there anything we should stick a band-aid on for
2.4? My current reading is "It's just as broken as it's always been, so
it's not necessarily dire."

Also: since ATAPI apparently is doing all of its reads in a synchronous
manner in the SCSI fakery layer it has, I think a lot of the work is
done by making sure the IDE device is set +BSY +DRQ which will prevent
any new commands being sent to it just like DMA_READ commands already do.

Looks like the ATAPI state machine is supposed to be something like this:

CMD_PACKET is received: BSY bit is set.
Device is ready to receive command packet: -BSY +DRQ
Based on the nIEN bit, we either wait for an interrupt, or:
Poll the status register until BSY and DRQ clear.

The IDE layer already prevents new commands from showing up while we
have BSY or DRQ set, and it looks like the flow is:

- ide_exec_cmd sets +BSY
- cmd_packet sets -BST
- ide_transfer_start sets +DRQ
  (Note: this is fully synchronous for e.g. AHCI, PCI/ISA will wait
   for PIO data)
- After the last byte is transferred, we'll invoke ide_atapi_cmd.
- ide_atapi_cmd invokes e.g. cmd_inquiry
- cmd_inquiry will fill its buffer and invoke ide_atapi_cmd_reply.
- ide_atapi_cmd_reply either does a DMA transfer (-BSY +DRQ)
  or a PIO reply (ide_atapi_cmd_reply_end) (-BSY -DRQ)
  (Note again: AHCI is still fully synchronous here, PCI/ISA will
   wait for data reads.)

(Hmm, it looks like there's an opening for new commands to show up here,
since we've got -BSY and -DRQ)

- ide_atapi_cmd_reply_end will call ide_atapi_cmd_ok, which will
  clear the error bits, definitely set -BSY -DRQ +RDY, and set the IRQ
  if nIEN is not set.


I think this won't be too bad, since the ide_exec_cmd layer itself is
already used to commands returning that aren't actually finished yet,
and the cmd_packet launcher itself also assumes the same.

The way the ATAPI commands seem to work is: Tell the core layer that
we're not finished (even if we possibly are already) and set the
appropriate status bits ourselves after we're done, synchronously or not.

A lot of the pathways are almost all protected by BSY/DRQ the whole way
and we already have a nearly asynchronous method for clearing them only
when the command is actually complete.

Maybe I'll start hacking away at this after hard freeze to see what I
can do. If you already started, want to link me to a git and I'll start
from there?

--js

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-22 21:54                             ` John Snow
@ 2015-06-23  6:36                               ` Peter Lieven
  2015-08-14 13:43                               ` Peter Lieven
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2015-06-23  6:36 UTC (permalink / raw)
  To: John Snow, Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, qemu block,
	Alexander Bezzubikov

Am 22.06.2015 um 23:54 schrieb John Snow:
>
> On 06/22/2015 09:09 AM, Peter Lieven wrote:
>> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
>>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from
>>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>>>>>>> No symbol table info available.
>>>>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0,
>>>>>>>>>> nfds=3,
>>>>>>>>>>        timeout=4999424576) at qemu-timer.c:326
>>>>>>>>>>            ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>>>>            tvsec = 4
>>>>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0,
>>>>>>>>>> blocking=true)
>>>>>>>>>>        at aio-posix.c:231
>>>>>>>>>>            node = 0x0
>>>>>>>>>>            was_dispatching = false
>>>>>>>>>>            ret = 1
>>>>>>>>>>            progress = false
>>>>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>>>>> offset=4292007936,
>>>>>>>>>>        qiov=0x7ffff554f760, is_write=false, flags=0) at
>>>>>>>>>> block.c:2699
>>>>>>>>>>            aio_context = 0x5555563528e0
>>>>>>>>>>            co = 0x5555563888a0
>>>>>>>>>>            rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>>>>              qiov = 0x7ffff554f760, is_write = false, ret =
>>>>>>>>>> 2147483647,
>>>>>>>>>> flags = 0}
>>>>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>        buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false,
>>>>>>>>>> flags=0)
>>>>>>>>>>        at block.c:2722
>>>>>>>>>>            qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1,
>>>>>>>>>> size =
>>>>>>>>>> 2048}
>>>>>>>>>>            iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>        buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>>>>> No locals.
>>>>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>        buf=0x7ffff44cc800 "(", nb_sectors=4) at
>>>>>>>>>> block/block-backend.c:404
>>>>>>>>>> No locals.
>>>>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>>>>> lba=2095707,
>>>>>>>>>>        buf=0x7ffff44cc800 "(", sector_size=2048) at
>>>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>>>            ret = 32767
>>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous
>>>>>>>>> blk_read()
>>>>>>>>> instead of the AIO or coroutine interfaces. This means that it
>>>>>>>>> keeps
>>>>>>>>> polling for request completion while it holds the BQL until the
>>>>>>>>> request
>>>>>>>>> is completed.
>>>>>>>> I will look at this.
>>>>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>>>>> block it in the Firewall. Currently I face the problem that I
>>>>>> cannot mount
>>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
>>>>>> tried with
>>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
>>>>>> another
>>>>>> bug):
>>>>>>
>>>>>> (gdb) thread apply all bt full
>>>>>>
>>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>>>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>>>>> util/qemu-thread-posix.c:120
>>>>>>           err = <optimized out>
>>>>>>           __func__ = "qemu_cond_broadcast"
>>>>>> #1  0x0000555555911164 in rfifolock_unlock
>>>>>> (r=r@entry=0x555556259910) at
>>>>>> util/rfifolock.c:75
>>>>>>           __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>>>>> #2  0x0000555555875921 in aio_context_release
>>>>>> (ctx=ctx@entry=0x5555562598b0)
>>>>>> at async.c:329
>>>>>> No locals.
>>>>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>>>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>>>>           node = <optimized out>
>>>>>>           was_dispatching = false
>>>>>>           i = <optimized out>
>>>>>>           ret = <optimized out>
>>>>>>           progress = false
>>>>>>           timeout = 611734526
>>>>>>           __PRETTY_FUNCTION__ = "aio_poll"
>>>>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>>>>> block/io.c:552
>>>>>>           aio_context = 0x5555562598b0
>>>>>>           co = <optimized out>
>>>>>>           rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags =
>>>>>> (unknown: 0)}
>>>>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>>>>       flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>>>>           qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size
>>>>>> = 2048}
>>>>>>           iov = {iov_base = 0x555557874800, iov_len = 2048}
>>>>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>>>>> No locals.
>>>>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>>>>           ret = <optimized out>
>>>>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at
>>>>>> hw/ide/atapi.c:116
>>>>>>           ret = <optimized out>
>>>>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>>>>           byte_count_limit = <optimized out>
>>>>>>           size = <optimized out>
>>>>>>           ret = 2
>>>>> This is still the same scenario Kevin explained.
>>>>>
>>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
>>>>> function holds the QEMU global mutex while waiting for the I/O request
>>>>> to complete.  This blocks other vcpu threads and the main loop thread.
>>>>>
>>>>> The solution is to convert the CD-ROM emulation code to use
>>>>> blk_aio_readv() instead of blk_read().
>>>> I tried a little, but i am stuck with my approach. I reads one sector
>>>> and then doesn't continue. Maybe someone with more knowledge
>>>> of ATAPI/IDE could help?
>>> Converting synchronous code to asynchronous requires an understanding
>>> of the device's state transitions.  Asynchronous code has to put the
>>> device registers into a busy state until the request completes.  It
>>> also needs to handle hardware register accesses that occur while the
>>> request is still pending.
>> That was my assumption as well. But I don't know how to proceed...
>>
>>> I don't know ATAPI/IDE code well enough to suggest a fix.
>> Maybe @John can help?
>>
>> Peter
>>
> Sure thing. I will take a deep look as soon as I get my NCQ patches out
> the door. I don't have high hopes for a proper comprehensive fix for
> 2.4, of course, but is there anything we should stick a band-aid on for
> 2.4? My current reading is "It's just as broken as it's always been, so
> it's not necessarily dire."

It was broken all the time. So there is no particular need to fix
it for 2.4. But it should be fixed.

The second problem in the code is that a DMA cancel drains
all block devices. This is the second point I have seen a VM
hang when I forcibly should down the NFS server where my ISOs
are on. I don't know if thats something that can be fixed as well.

>
> Also: since ATAPI apparently is doing all of its reads in a synchronous
> manner in the SCSI fakery layer it has, I think a lot of the work is
> done by making sure the IDE device is set +BSY +DRQ which will prevent
> any new commands being sent to it just like DMA_READ commands already do.
>
> Looks like the ATAPI state machine is supposed to be something like this:
>
> CMD_PACKET is received: BSY bit is set.
> Device is ready to receive command packet: -BSY +DRQ
> Based on the nIEN bit, we either wait for an interrupt, or:
> Poll the status register until BSY and DRQ clear.
>
> The IDE layer already prevents new commands from showing up while we
> have BSY or DRQ set, and it looks like the flow is:
>
> - ide_exec_cmd sets +BSY
> - cmd_packet sets -BST
> - ide_transfer_start sets +DRQ
>    (Note: this is fully synchronous for e.g. AHCI, PCI/ISA will wait
>     for PIO data)
> - After the last byte is transferred, we'll invoke ide_atapi_cmd.
> - ide_atapi_cmd invokes e.g. cmd_inquiry
> - cmd_inquiry will fill its buffer and invoke ide_atapi_cmd_reply.
> - ide_atapi_cmd_reply either does a DMA transfer (-BSY +DRQ)
>    or a PIO reply (ide_atapi_cmd_reply_end) (-BSY -DRQ)
>    (Note again: AHCI is still fully synchronous here, PCI/ISA will
>     wait for data reads.)
>
> (Hmm, it looks like there's an opening for new commands to show up here,
> since we've got -BSY and -DRQ)
>
> - ide_atapi_cmd_reply_end will call ide_atapi_cmd_ok, which will
>    clear the error bits, definitely set -BSY -DRQ +RDY, and set the IRQ
>    if nIEN is not set.
>
>
> I think this won't be too bad, since the ide_exec_cmd layer itself is
> already used to commands returning that aren't actually finished yet,
> and the cmd_packet launcher itself also assumes the same.
>
> The way the ATAPI commands seem to work is: Tell the core layer that
> we're not finished (even if we possibly are already) and set the
> appropriate status bits ourselves after we're done, synchronously or not.
>
> A lot of the pathways are almost all protected by BSY/DRQ the whole way
> and we already have a nearly asynchronous method for clearing them only
> when the command is actually complete.

>
> Maybe I'll start hacking away at this after hard freeze to see what I
> can do. If you already started, want to link me to a git and I'll start
> from there?

Thanks for your comprehensive explainations. It makes the states
at least a bit clearer to me. I try to find some time to check why
my patch does not work. What I have is in this repo. Its not much
AND does not work, but maybe its just a small thing that needs
to be changed...

https://github.com/plieven/qemu/tree/atapi_async

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-06-22 21:54                             ` John Snow
  2015-06-23  6:36                               ` Peter Lieven
@ 2015-08-14 13:43                               ` Peter Lieven
  2015-08-14 14:08                                 ` Kevin Wolf
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-08-14 13:43 UTC (permalink / raw)
  To: John Snow, Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, qemu block,
	Alexander Bezzubikov

Am 22.06.2015 um 23:54 schrieb John Snow:
>
> On 06/22/2015 09:09 AM, Peter Lieven wrote:
>> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
>>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from
>>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>>>>>>> No symbol table info available.
>>>>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0,
>>>>>>>>>> nfds=3,
>>>>>>>>>>       timeout=4999424576) at qemu-timer.c:326
>>>>>>>>>>           ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>>>>           tvsec = 4
>>>>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0,
>>>>>>>>>> blocking=true)
>>>>>>>>>>       at aio-posix.c:231
>>>>>>>>>>           node = 0x0
>>>>>>>>>>           was_dispatching = false
>>>>>>>>>>           ret = 1
>>>>>>>>>>           progress = false
>>>>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>>>>> offset=4292007936,
>>>>>>>>>>       qiov=0x7ffff554f760, is_write=false, flags=0) at
>>>>>>>>>> block.c:2699
>>>>>>>>>>           aio_context = 0x5555563528e0
>>>>>>>>>>           co = 0x5555563888a0
>>>>>>>>>>           rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>>>>             qiov = 0x7ffff554f760, is_write = false, ret =
>>>>>>>>>> 2147483647,
>>>>>>>>>> flags = 0}
>>>>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false,
>>>>>>>>>> flags=0)
>>>>>>>>>>       at block.c:2722
>>>>>>>>>>           qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1,
>>>>>>>>>> size =
>>>>>>>>>> 2048}
>>>>>>>>>>           iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>>>>> No locals.
>>>>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at
>>>>>>>>>> block/block-backend.c:404
>>>>>>>>>> No locals.
>>>>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>>>>> lba=2095707,
>>>>>>>>>>       buf=0x7ffff44cc800 "(", sector_size=2048) at
>>>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>>>           ret = 32767
>>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous
>>>>>>>>> blk_read()
>>>>>>>>> instead of the AIO or coroutine interfaces. This means that it
>>>>>>>>> keeps
>>>>>>>>> polling for request completion while it holds the BQL until the
>>>>>>>>> request
>>>>>>>>> is completed.
>>>>>>>> I will look at this.
>>>>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>>>>> block it in the Firewall. Currently I face the problem that I
>>>>>> cannot mount
>>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
>>>>>> tried with
>>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
>>>>>> another
>>>>>> bug):
>>>>>>
>>>>>> (gdb) thread apply all bt full
>>>>>>
>>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>>>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>>>>> util/qemu-thread-posix.c:120
>>>>>>          err = <optimized out>
>>>>>>          __func__ = "qemu_cond_broadcast"
>>>>>> #1  0x0000555555911164 in rfifolock_unlock
>>>>>> (r=r@entry=0x555556259910) at
>>>>>> util/rfifolock.c:75
>>>>>>          __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>>>>> #2  0x0000555555875921 in aio_context_release
>>>>>> (ctx=ctx@entry=0x5555562598b0)
>>>>>> at async.c:329
>>>>>> No locals.
>>>>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>>>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>>>>          node = <optimized out>
>>>>>>          was_dispatching = false
>>>>>>          i = <optimized out>
>>>>>>          ret = <optimized out>
>>>>>>          progress = false
>>>>>>          timeout = 611734526
>>>>>>          __PRETTY_FUNCTION__ = "aio_poll"
>>>>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>>>>> block/io.c:552
>>>>>>          aio_context = 0x5555562598b0
>>>>>>          co = <optimized out>
>>>>>>          rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags =
>>>>>> (unknown: 0)}
>>>>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>>>>      flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>>>>          qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size
>>>>>> = 2048}
>>>>>>          iov = {iov_base = 0x555557874800, iov_len = 2048}
>>>>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>>>>> No locals.
>>>>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>>>>          ret = <optimized out>
>>>>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at
>>>>>> hw/ide/atapi.c:116
>>>>>>          ret = <optimized out>
>>>>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>>>>          byte_count_limit = <optimized out>
>>>>>>          size = <optimized out>
>>>>>>          ret = 2
>>>>> This is still the same scenario Kevin explained.
>>>>>
>>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
>>>>> function holds the QEMU global mutex while waiting for the I/O request
>>>>> to complete.  This blocks other vcpu threads and the main loop thread.
>>>>>
>>>>> The solution is to convert the CD-ROM emulation code to use
>>>>> blk_aio_readv() instead of blk_read().
>>>> I tried a little, but i am stuck with my approach. I reads one sector
>>>> and then doesn't continue. Maybe someone with more knowledge
>>>> of ATAPI/IDE could help?
>>> Converting synchronous code to asynchronous requires an understanding
>>> of the device's state transitions.  Asynchronous code has to put the
>>> device registers into a busy state until the request completes.  It
>>> also needs to handle hardware register accesses that occur while the
>>> request is still pending.
>> That was my assumption as well. But I don't know how to proceed...
>>
>>> I don't know ATAPI/IDE code well enough to suggest a fix.
>> Maybe @John can help?
>>
>> Peter
>>
>

I looked into this again and it seems that the remaining problem (at least when the CDROM is
mounted via libnfs) is the blk_drain_all() in bmdma_cmd_writeb. At least I end there if I have
a proper OS booted and cut off the NFS server. The VM remains responsive until the guest OS
issues a DMA cancel.

I do not know what the proper solution is. I had the following ideas so far (not knowing if the
approaches would be correct or not).

a) Do not clear BM_STATUS_DMAING if we are not able to drain all requests. This works until
the connection is reestablished. The guest OS issues DMA cancel operations again and
again, but when the connectivity is back I end in the following assertion:

qemu-system-x86_64: ./hw/ide/pci.h:65: bmdma_active_if: Assertion `bmdma->bus->retry_unit != (uint8_t)-1' failed.

b) Call the aiocb with -ECANCELED and somehow (?) turn all the callbacks of the outstanding IOs into NOPs.

c) Follow the hint in the comment in bmdma_cmd_writeb (however this works out):
             * In the future we'll be able to safely cancel the I/O if the
             * whole DMA operation will be submitted to disk with a single
             * aio operation with preadv/pwritev.

Your help is appreciated.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-08-14 13:43                               ` Peter Lieven
@ 2015-08-14 14:08                                 ` Kevin Wolf
  2015-08-14 14:21                                   ` Peter Lieven
  2015-08-14 14:45                                   ` Peter Lieven
  0 siblings, 2 replies; 25+ messages in thread
From: Kevin Wolf @ 2015-08-14 14:08 UTC (permalink / raw)
  To: Peter Lieven
  Cc: qemu block, Stefan Hajnoczi, Alexander Bezzubikov, qemu-devel,
	Paolo Bonzini, John Snow

Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
> Am 22.06.2015 um 23:54 schrieb John Snow:
> >
> > On 06/22/2015 09:09 AM, Peter Lieven wrote:
> >> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
> >>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
> >>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
> >>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
> >>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
> >>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
> >>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
> >>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
> >>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
> >>>>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from
> >>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6
> >>>>>>>>>> No symbol table info available.
> >>>>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0,
> >>>>>>>>>> nfds=3,
> >>>>>>>>>>       timeout=4999424576) at qemu-timer.c:326
> >>>>>>>>>>           ts = {tv_sec = 4, tv_nsec = 999424576}
> >>>>>>>>>>           tvsec = 4
> >>>>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0,
> >>>>>>>>>> blocking=true)
> >>>>>>>>>>       at aio-posix.c:231
> >>>>>>>>>>           node = 0x0
> >>>>>>>>>>           was_dispatching = false
> >>>>>>>>>>           ret = 1
> >>>>>>>>>>           progress = false
> >>>>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
> >>>>>>>>>> offset=4292007936,
> >>>>>>>>>>       qiov=0x7ffff554f760, is_write=false, flags=0) at
> >>>>>>>>>> block.c:2699
> >>>>>>>>>>           aio_context = 0x5555563528e0
> >>>>>>>>>>           co = 0x5555563888a0
> >>>>>>>>>>           rwco = {bs = 0x55555637eae0, offset = 4292007936,
> >>>>>>>>>>             qiov = 0x7ffff554f760, is_write = false, ret =
> >>>>>>>>>> 2147483647,
> >>>>>>>>>> flags = 0}
> >>>>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
> >>>>>>>>>> sector_num=8382828,
> >>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false,
> >>>>>>>>>> flags=0)
> >>>>>>>>>>       at block.c:2722
> >>>>>>>>>>           qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1,
> >>>>>>>>>> size =
> >>>>>>>>>> 2048}
> >>>>>>>>>>           iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
> >>>>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
> >>>>>>>>>> sector_num=8382828,
> >>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
> >>>>>>>>>> No locals.
> >>>>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
> >>>>>>>>>> sector_num=8382828,
> >>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at
> >>>>>>>>>> block/block-backend.c:404
> >>>>>>>>>> No locals.
> >>>>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
> >>>>>>>>>> lba=2095707,
> >>>>>>>>>>       buf=0x7ffff44cc800 "(", sector_size=2048) at
> >>>>>>>>>> hw/ide/atapi.c:116
> >>>>>>>>>>           ret = 32767
> >>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous
> >>>>>>>>> blk_read()
> >>>>>>>>> instead of the AIO or coroutine interfaces. This means that it
> >>>>>>>>> keeps
> >>>>>>>>> polling for request completion while it holds the BQL until the
> >>>>>>>>> request
> >>>>>>>>> is completed.
> >>>>>>>> I will look at this.
> >>>>>> I need some further help. My way to "emulate" a hung NFS Server is to
> >>>>>> block it in the Firewall. Currently I face the problem that I
> >>>>>> cannot mount
> >>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
> >>>>>> tried with
> >>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
> >>>>>> another
> >>>>>> bug):
> >>>>>>
> >>>>>> (gdb) thread apply all bt full
> >>>>>>
> >>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
> >>>>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
> >>>>>> util/qemu-thread-posix.c:120
> >>>>>>          err = <optimized out>
> >>>>>>          __func__ = "qemu_cond_broadcast"
> >>>>>> #1  0x0000555555911164 in rfifolock_unlock
> >>>>>> (r=r@entry=0x555556259910) at
> >>>>>> util/rfifolock.c:75
> >>>>>>          __PRETTY_FUNCTION__ = "rfifolock_unlock"
> >>>>>> #2  0x0000555555875921 in aio_context_release
> >>>>>> (ctx=ctx@entry=0x5555562598b0)
> >>>>>> at async.c:329
> >>>>>> No locals.
> >>>>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
> >>>>>> blocking=blocking@entry=true) at aio-posix.c:272
> >>>>>>          node = <optimized out>
> >>>>>>          was_dispatching = false
> >>>>>>          i = <optimized out>
> >>>>>>          ret = <optimized out>
> >>>>>>          progress = false
> >>>>>>          timeout = 611734526
> >>>>>>          __PRETTY_FUNCTION__ = "aio_poll"
> >>>>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
> >>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
> >>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
> >>>>>> block/io.c:552
> >>>>>>          aio_context = 0x5555562598b0
> >>>>>>          co = <optimized out>
> >>>>>>          rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
> >>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags =
> >>>>>> (unknown: 0)}
> >>>>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
> >>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
> >>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
> >>>>>>      flags=flags@entry=(unknown: 0)) at block/io.c:575
> >>>>>>          qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size
> >>>>>> = 2048}
> >>>>>>          iov = {iov_base = 0x555557874800, iov_len = 2048}
> >>>>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
> >>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
> >>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
> >>>>>> No locals.
> >>>>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
> >>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
> >>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
> >>>>>>          ret = <optimized out>
> >>>>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
> >>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at
> >>>>>> hw/ide/atapi.c:116
> >>>>>>          ret = <optimized out>
> >>>>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
> >>>>>>          byte_count_limit = <optimized out>
> >>>>>>          size = <optimized out>
> >>>>>>          ret = 2
> >>>>> This is still the same scenario Kevin explained.
> >>>>>
> >>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
> >>>>> function holds the QEMU global mutex while waiting for the I/O request
> >>>>> to complete.  This blocks other vcpu threads and the main loop thread.
> >>>>>
> >>>>> The solution is to convert the CD-ROM emulation code to use
> >>>>> blk_aio_readv() instead of blk_read().
> >>>> I tried a little, but i am stuck with my approach. I reads one sector
> >>>> and then doesn't continue. Maybe someone with more knowledge
> >>>> of ATAPI/IDE could help?
> >>> Converting synchronous code to asynchronous requires an understanding
> >>> of the device's state transitions.  Asynchronous code has to put the
> >>> device registers into a busy state until the request completes.  It
> >>> also needs to handle hardware register accesses that occur while the
> >>> request is still pending.
> >> That was my assumption as well. But I don't know how to proceed...
> >>
> >>> I don't know ATAPI/IDE code well enough to suggest a fix.
> >> Maybe @John can help?
> >>
> >> Peter
> >>
> >
> 
> I looked into this again and it seems that the remaining problem (at least when the CDROM is
> mounted via libnfs) is the blk_drain_all() in bmdma_cmd_writeb. At least I end there if I have
> a proper OS booted and cut off the NFS server. The VM remains responsive until the guest OS
> issues a DMA cancel.
> 
> I do not know what the proper solution is. I had the following ideas so far (not knowing if the
> approaches would be correct or not).
> 
> a) Do not clear BM_STATUS_DMAING if we are not able to drain all requests. This works until
> the connection is reestablished. The guest OS issues DMA cancel operations again and
> again, but when the connectivity is back I end in the following assertion:
> 
> qemu-system-x86_64: ./hw/ide/pci.h:65: bmdma_active_if: Assertion `bmdma->bus->retry_unit != (uint8_t)-1' failed.

I would have to check the specs to see if this is allowed.

> b) Call the aiocb with -ECANCELED and somehow (?) turn all the callbacks of the outstanding IOs into NOPs.

This wouldn't be correct for write requests: We would tell the guest
that the request is cancelled when it's actually still in flight. At
some point it could still complete, however, and that's not expected by
the guest.

> c) Follow the hint in the comment in bmdma_cmd_writeb (however this works out):
>              * In the future we'll be able to safely cancel the I/O if the
>              * whole DMA operation will be submitted to disk with a single
>              * aio operation with preadv/pwritev.

Not sure how likely it is that cancelling that single AIOCB will
actually cancel the operation and not end up doing bdrv_drain_all()
internally instead because there is no good way of cancelling the
request.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-08-14 14:08                                 ` Kevin Wolf
@ 2015-08-14 14:21                                   ` Peter Lieven
  2015-08-14 14:45                                   ` Peter Lieven
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2015-08-14 14:21 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu block, Stefan Hajnoczi, Alexander Bezzubikov, qemu-devel,
	Paolo Bonzini, John Snow

Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
> Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
>> Am 22.06.2015 um 23:54 schrieb John Snow:
>>> On 06/22/2015 09:09 AM, Peter Lieven wrote:
>>>> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>>>>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
>>>>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>>>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>>>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from
>>>>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>>>>>>>>> No symbol table info available.
>>>>>>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0,
>>>>>>>>>>>> nfds=3,
>>>>>>>>>>>>       timeout=4999424576) at qemu-timer.c:326
>>>>>>>>>>>>           ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>>>>>>           tvsec = 4
>>>>>>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0,
>>>>>>>>>>>> blocking=true)
>>>>>>>>>>>>       at aio-posix.c:231
>>>>>>>>>>>>           node = 0x0
>>>>>>>>>>>>           was_dispatching = false
>>>>>>>>>>>>           ret = 1
>>>>>>>>>>>>           progress = false
>>>>>>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>>>>>>> offset=4292007936,
>>>>>>>>>>>>       qiov=0x7ffff554f760, is_write=false, flags=0) at
>>>>>>>>>>>> block.c:2699
>>>>>>>>>>>>           aio_context = 0x5555563528e0
>>>>>>>>>>>>           co = 0x5555563888a0
>>>>>>>>>>>>           rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>>>>>>             qiov = 0x7ffff554f760, is_write = false, ret =
>>>>>>>>>>>> 2147483647,
>>>>>>>>>>>> flags = 0}
>>>>>>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false,
>>>>>>>>>>>> flags=0)
>>>>>>>>>>>>       at block.c:2722
>>>>>>>>>>>>           qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1,
>>>>>>>>>>>> size =
>>>>>>>>>>>> 2048}
>>>>>>>>>>>>           iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>>>>>>> No locals.
>>>>>>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at
>>>>>>>>>>>> block/block-backend.c:404
>>>>>>>>>>>> No locals.
>>>>>>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>>>>>>> lba=2095707,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", sector_size=2048) at
>>>>>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>>>>>           ret = 32767
>>>>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous
>>>>>>>>>>> blk_read()
>>>>>>>>>>> instead of the AIO or coroutine interfaces. This means that it
>>>>>>>>>>> keeps
>>>>>>>>>>> polling for request completion while it holds the BQL until the
>>>>>>>>>>> request
>>>>>>>>>>> is completed.
>>>>>>>>>> I will look at this.
>>>>>>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>>>>>>> block it in the Firewall. Currently I face the problem that I
>>>>>>>> cannot mount
>>>>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
>>>>>>>> tried with
>>>>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
>>>>>>>> another
>>>>>>>> bug):
>>>>>>>>
>>>>>>>> (gdb) thread apply all bt full
>>>>>>>>
>>>>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>>>>>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>>>>>>> util/qemu-thread-posix.c:120
>>>>>>>>          err = <optimized out>
>>>>>>>>          __func__ = "qemu_cond_broadcast"
>>>>>>>> #1  0x0000555555911164 in rfifolock_unlock
>>>>>>>> (r=r@entry=0x555556259910) at
>>>>>>>> util/rfifolock.c:75
>>>>>>>>          __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>>>>>>> #2  0x0000555555875921 in aio_context_release
>>>>>>>> (ctx=ctx@entry=0x5555562598b0)
>>>>>>>> at async.c:329
>>>>>>>> No locals.
>>>>>>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>>>>>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>>>>>>          node = <optimized out>
>>>>>>>>          was_dispatching = false
>>>>>>>>          i = <optimized out>
>>>>>>>>          ret = <optimized out>
>>>>>>>>          progress = false
>>>>>>>>          timeout = 611734526
>>>>>>>>          __PRETTY_FUNCTION__ = "aio_poll"
>>>>>>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>>>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>>>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>>>>>>> block/io.c:552
>>>>>>>>          aio_context = 0x5555562598b0
>>>>>>>>          co = <optimized out>
>>>>>>>>          rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>>>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags =
>>>>>>>> (unknown: 0)}
>>>>>>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>>>>>>      flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>>>>>>          qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size
>>>>>>>> = 2048}
>>>>>>>>          iov = {iov_base = 0x555557874800, iov_len = 2048}
>>>>>>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>>>>>>> No locals.
>>>>>>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>>>>>>          ret = <optimized out>
>>>>>>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>>>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at
>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>          ret = <optimized out>
>>>>>>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>>>>>>          byte_count_limit = <optimized out>
>>>>>>>>          size = <optimized out>
>>>>>>>>          ret = 2
>>>>>>> This is still the same scenario Kevin explained.
>>>>>>>
>>>>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
>>>>>>> function holds the QEMU global mutex while waiting for the I/O request
>>>>>>> to complete.  This blocks other vcpu threads and the main loop thread.
>>>>>>>
>>>>>>> The solution is to convert the CD-ROM emulation code to use
>>>>>>> blk_aio_readv() instead of blk_read().
>>>>>> I tried a little, but i am stuck with my approach. I reads one sector
>>>>>> and then doesn't continue. Maybe someone with more knowledge
>>>>>> of ATAPI/IDE could help?
>>>>> Converting synchronous code to asynchronous requires an understanding
>>>>> of the device's state transitions.  Asynchronous code has to put the
>>>>> device registers into a busy state until the request completes.  It
>>>>> also needs to handle hardware register accesses that occur while the
>>>>> request is still pending.
>>>> That was my assumption as well. But I don't know how to proceed...
>>>>
>>>>> I don't know ATAPI/IDE code well enough to suggest a fix.
>>>> Maybe @John can help?
>>>>
>>>> Peter
>>>>
>> I looked into this again and it seems that the remaining problem (at least when the CDROM is
>> mounted via libnfs) is the blk_drain_all() in bmdma_cmd_writeb. At least I end there if I have
>> a proper OS booted and cut off the NFS server. The VM remains responsive until the guest OS
>> issues a DMA cancel.
>>
>> I do not know what the proper solution is. I had the following ideas so far (not knowing if the
>> approaches would be correct or not).
>>
>> a) Do not clear BM_STATUS_DMAING if we are not able to drain all requests. This works until
>> the connection is reestablished. The guest OS issues DMA cancel operations again and
>> again, but when the connectivity is back I end in the following assertion:
>>
>> qemu-system-x86_64: ./hw/ide/pci.h:65: bmdma_active_if: Assertion `bmdma->bus->retry_unit != (uint8_t)-1' failed.
> I would have to check the specs to see if this is allowed.

Maybe there is a better approach after all...

>
>> b) Call the aiocb with -ECANCELED and somehow (?) turn all the callbacks of the outstanding IOs into NOPs.
> This wouldn't be correct for write requests: We would tell the guest
> that the request is cancelled when it's actually still in flight. At
> some point it could still complete, however, and that's not expected by
> the guest.

In case of a CDROM we have a read only device. So this could work?

>
>> c) Follow the hint in the comment in bmdma_cmd_writeb (however this works out):
>>              * In the future we'll be able to safely cancel the I/O if the
>>              * whole DMA operation will be submitted to disk with a single
>>              * aio operation with preadv/pwritev.
> Not sure how likely it is that cancelling that single AIOCB will
> actually cancel the operation and not end up doing bdrv_drain_all()
> internally instead because there is no good way of cancelling the
> request.

You might be right.

It seems that the whole thing is not trivial. But it seems so wrong that a whole vServer goes
down just because someone forgot to eject a CDROM he used once and then the NFS
server has a hickup.

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-08-14 14:08                                 ` Kevin Wolf
  2015-08-14 14:21                                   ` Peter Lieven
@ 2015-08-14 14:45                                   ` Peter Lieven
  2015-08-15 19:02                                     ` Peter Lieven
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2015-08-14 14:45 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu block, Stefan Hajnoczi, Alexander Bezzubikov, qemu-devel,
	Paolo Bonzini, John Snow

Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
> Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
>> Am 22.06.2015 um 23:54 schrieb John Snow:
>>> On 06/22/2015 09:09 AM, Peter Lieven wrote:
>>>> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>>>>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
>>>>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>>>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>>>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from
>>>>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>>>>>>>>> No symbol table info available.
>>>>>>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0,
>>>>>>>>>>>> nfds=3,
>>>>>>>>>>>>       timeout=4999424576) at qemu-timer.c:326
>>>>>>>>>>>>           ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>>>>>>           tvsec = 4
>>>>>>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0,
>>>>>>>>>>>> blocking=true)
>>>>>>>>>>>>       at aio-posix.c:231
>>>>>>>>>>>>           node = 0x0
>>>>>>>>>>>>           was_dispatching = false
>>>>>>>>>>>>           ret = 1
>>>>>>>>>>>>           progress = false
>>>>>>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>>>>>>> offset=4292007936,
>>>>>>>>>>>>       qiov=0x7ffff554f760, is_write=false, flags=0) at
>>>>>>>>>>>> block.c:2699
>>>>>>>>>>>>           aio_context = 0x5555563528e0
>>>>>>>>>>>>           co = 0x5555563888a0
>>>>>>>>>>>>           rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>>>>>>             qiov = 0x7ffff554f760, is_write = false, ret =
>>>>>>>>>>>> 2147483647,
>>>>>>>>>>>> flags = 0}
>>>>>>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false,
>>>>>>>>>>>> flags=0)
>>>>>>>>>>>>       at block.c:2722
>>>>>>>>>>>>           qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1,
>>>>>>>>>>>> size =
>>>>>>>>>>>> 2048}
>>>>>>>>>>>>           iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>>>>>>> No locals.
>>>>>>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at
>>>>>>>>>>>> block/block-backend.c:404
>>>>>>>>>>>> No locals.
>>>>>>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>>>>>>> lba=2095707,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", sector_size=2048) at
>>>>>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>>>>>           ret = 32767
>>>>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous
>>>>>>>>>>> blk_read()
>>>>>>>>>>> instead of the AIO or coroutine interfaces. This means that it
>>>>>>>>>>> keeps
>>>>>>>>>>> polling for request completion while it holds the BQL until the
>>>>>>>>>>> request
>>>>>>>>>>> is completed.
>>>>>>>>>> I will look at this.
>>>>>>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>>>>>>> block it in the Firewall. Currently I face the problem that I
>>>>>>>> cannot mount
>>>>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
>>>>>>>> tried with
>>>>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
>>>>>>>> another
>>>>>>>> bug):
>>>>>>>>
>>>>>>>> (gdb) thread apply all bt full
>>>>>>>>
>>>>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>>>>>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>>>>>>> util/qemu-thread-posix.c:120
>>>>>>>>          err = <optimized out>
>>>>>>>>          __func__ = "qemu_cond_broadcast"
>>>>>>>> #1  0x0000555555911164 in rfifolock_unlock
>>>>>>>> (r=r@entry=0x555556259910) at
>>>>>>>> util/rfifolock.c:75
>>>>>>>>          __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>>>>>>> #2  0x0000555555875921 in aio_context_release
>>>>>>>> (ctx=ctx@entry=0x5555562598b0)
>>>>>>>> at async.c:329
>>>>>>>> No locals.
>>>>>>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>>>>>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>>>>>>          node = <optimized out>
>>>>>>>>          was_dispatching = false
>>>>>>>>          i = <optimized out>
>>>>>>>>          ret = <optimized out>
>>>>>>>>          progress = false
>>>>>>>>          timeout = 611734526
>>>>>>>>          __PRETTY_FUNCTION__ = "aio_poll"
>>>>>>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>>>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>>>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>>>>>>> block/io.c:552
>>>>>>>>          aio_context = 0x5555562598b0
>>>>>>>>          co = <optimized out>
>>>>>>>>          rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>>>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags =
>>>>>>>> (unknown: 0)}
>>>>>>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>>>>>>      flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>>>>>>          qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size
>>>>>>>> = 2048}
>>>>>>>>          iov = {iov_base = 0x555557874800, iov_len = 2048}
>>>>>>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>>>>>>> No locals.
>>>>>>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>>>>>>          ret = <optimized out>
>>>>>>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>>>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at
>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>          ret = <optimized out>
>>>>>>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>>>>>>          byte_count_limit = <optimized out>
>>>>>>>>          size = <optimized out>
>>>>>>>>          ret = 2
>>>>>>> This is still the same scenario Kevin explained.
>>>>>>>
>>>>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
>>>>>>> function holds the QEMU global mutex while waiting for the I/O request
>>>>>>> to complete.  This blocks other vcpu threads and the main loop thread.
>>>>>>>
>>>>>>> The solution is to convert the CD-ROM emulation code to use
>>>>>>> blk_aio_readv() instead of blk_read().
>>>>>> I tried a little, but i am stuck with my approach. I reads one sector
>>>>>> and then doesn't continue. Maybe someone with more knowledge
>>>>>> of ATAPI/IDE could help?
>>>>> Converting synchronous code to asynchronous requires an understanding
>>>>> of the device's state transitions.  Asynchronous code has to put the
>>>>> device registers into a busy state until the request completes.  It
>>>>> also needs to handle hardware register accesses that occur while the
>>>>> request is still pending.
>>>> That was my assumption as well. But I don't know how to proceed...
>>>>
>>>>> I don't know ATAPI/IDE code well enough to suggest a fix.
>>>> Maybe @John can help?
>>>>
>>>> Peter
>>>>
>> I looked into this again and it seems that the remaining problem (at least when the CDROM is
>> mounted via libnfs) is the blk_drain_all() in bmdma_cmd_writeb. At least I end there if I have
>> a proper OS booted and cut off the NFS server. The VM remains responsive until the guest OS
>> issues a DMA cancel.
>>
>> I do not know what the proper solution is. I had the following ideas so far (not knowing if the
>> approaches would be correct or not).
>>
>> a) Do not clear BM_STATUS_DMAING if we are not able to drain all requests. This works until
>> the connection is reestablished. The guest OS issues DMA cancel operations again and
>> again, but when the connectivity is back I end in the following assertion:
>>
>> qemu-system-x86_64: ./hw/ide/pci.h:65: bmdma_active_if: Assertion `bmdma->bus->retry_unit != (uint8_t)-1' failed.
> I would have to check the specs to see if this is allowed.
>
>> b) Call the aiocb with -ECANCELED and somehow (?) turn all the callbacks of the outstanding IOs into NOPs.
> This wouldn't be correct for write requests: We would tell the guest
> that the request is cancelled when it's actually still in flight. At
> some point it could still complete, however, and that's not expected by
> the guest.
>
>> c) Follow the hint in the comment in bmdma_cmd_writeb (however this works out):
>>              * In the future we'll be able to safely cancel the I/O if the
>>              * whole DMA operation will be submitted to disk with a single
>>              * aio operation with preadv/pwritev.
> Not sure how likely it is that cancelling that single AIOCB will
> actually cancel the operation and not end up doing bdrv_drain_all()
> internally instead because there is no good way of cancelling the
> request.

Maybe this is a solution? It seems to work for the CDROM only case:

diff --git a/block/io.c b/block/io.c
index d4bc83b..475d44c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2075,7 +2075,9 @@ static const AIOCBInfo bdrv_em_co_aiocb_info = {
 static void bdrv_co_complete(BlockAIOCBCoroutine *acb)
 {
     if (!acb->need_bh) {
-        acb->common.cb(acb->common.opaque, acb->req.error);
+        if (acb->common.cb) {
+            acb->common.cb(acb->common.opaque, acb->req.error);
+        }
         qemu_aio_unref(acb);
     }
 }
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index d31ff88..fecfa3e 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -252,11 +252,17 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
              * whole DMA operation will be submitted to disk with a single
              * aio operation with preadv/pwritev.
              */
-            if (bm->bus->dma->aiocb) {
-                blk_drain_all();
-                assert(bm->bus->dma->aiocb == NULL);
-            }
-            bm->status &= ~BM_STATUS_DMAING;
+             if (bm->bus->dma->aiocb) {
+                 bool read_write = false;
+                 read_write |= bm->bus->master && !blk_is_read_only(bm->bus->master->conf.blk);
+                 read_write |= bm->bus->slave && !blk_is_read_only(bm->bus->slave->conf.blk);
+                 if (read_write) {
+                     blk_drain_all();
+                 } else {
+                     bm->bus->dma->aiocb->cb = NULL;
+                 }
+             }
+             bm->status &= ~BM_STATUS_DMAING;
         } else {
             bm->cur_addr = bm->addr;
             if (!(bm->status & BM_STATUS_DMAING)) {


>
> Kevin

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
  2015-08-14 14:45                                   ` Peter Lieven
@ 2015-08-15 19:02                                     ` Peter Lieven
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2015-08-15 19:02 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu block, Stefan Hajnoczi, Alexander Bezzubikov, qemu-devel,
	Paolo Bonzini, John Snow

Am 14.08.2015 um 16:45 schrieb Peter Lieven:
> Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
>> Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
>>> Am 22.06.2015 um 23:54 schrieb John Snow:
>>>> On 06/22/2015 09:09 AM, Peter Lieven wrote:
>>>>> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>>>>>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
>>>>>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>>>>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>>>>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>>>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from
>>>>>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>>>>>>>>>> No symbol table info available.
>>>>>>>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0,
>>>>>>>>>>>>> nfds=3,
>>>>>>>>>>>>>       timeout=4999424576) at qemu-timer.c:326
>>>>>>>>>>>>>           ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>>>>>>>           tvsec = 4
>>>>>>>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0,
>>>>>>>>>>>>> blocking=true)
>>>>>>>>>>>>>       at aio-posix.c:231
>>>>>>>>>>>>>           node = 0x0
>>>>>>>>>>>>>           was_dispatching = false
>>>>>>>>>>>>>           ret = 1
>>>>>>>>>>>>>           progress = false
>>>>>>>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>>>>>>>> offset=4292007936,
>>>>>>>>>>>>>       qiov=0x7ffff554f760, is_write=false, flags=0) at
>>>>>>>>>>>>> block.c:2699
>>>>>>>>>>>>>           aio_context = 0x5555563528e0
>>>>>>>>>>>>>           co = 0x5555563888a0
>>>>>>>>>>>>>           rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>>>>>>>             qiov = 0x7ffff554f760, is_write = false, ret =
>>>>>>>>>>>>> 2147483647,
>>>>>>>>>>>>> flags = 0}
>>>>>>>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false,
>>>>>>>>>>>>> flags=0)
>>>>>>>>>>>>>       at block.c:2722
>>>>>>>>>>>>>           qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1,
>>>>>>>>>>>>> size =
>>>>>>>>>>>>> 2048}
>>>>>>>>>>>>>           iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>>>>>>>> No locals.
>>>>>>>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at
>>>>>>>>>>>>> block/block-backend.c:404
>>>>>>>>>>>>> No locals.
>>>>>>>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>>>>>>>> lba=2095707,
>>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", sector_size=2048) at
>>>>>>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>>>>>>           ret = 32767
>>>>>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous
>>>>>>>>>>>> blk_read()
>>>>>>>>>>>> instead of the AIO or coroutine interfaces. This means that it
>>>>>>>>>>>> keeps
>>>>>>>>>>>> polling for request completion while it holds the BQL until the
>>>>>>>>>>>> request
>>>>>>>>>>>> is completed.
>>>>>>>>>>> I will look at this.
>>>>>>>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>>>>>>>> block it in the Firewall. Currently I face the problem that I
>>>>>>>>> cannot mount
>>>>>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
>>>>>>>>> tried with
>>>>>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
>>>>>>>>> another
>>>>>>>>> bug):
>>>>>>>>>
>>>>>>>>> (gdb) thread apply all bt full
>>>>>>>>>
>>>>>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>>>>>>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>>>>>>>> util/qemu-thread-posix.c:120
>>>>>>>>>          err = <optimized out>
>>>>>>>>>          __func__ = "qemu_cond_broadcast"
>>>>>>>>> #1  0x0000555555911164 in rfifolock_unlock
>>>>>>>>> (r=r@entry=0x555556259910) at
>>>>>>>>> util/rfifolock.c:75
>>>>>>>>>          __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>>>>>>>> #2  0x0000555555875921 in aio_context_release
>>>>>>>>> (ctx=ctx@entry=0x5555562598b0)
>>>>>>>>> at async.c:329
>>>>>>>>> No locals.
>>>>>>>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>>>>>>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>>>>>>>          node = <optimized out>
>>>>>>>>>          was_dispatching = false
>>>>>>>>>          i = <optimized out>
>>>>>>>>>          ret = <optimized out>
>>>>>>>>>          progress = false
>>>>>>>>>          timeout = 611734526
>>>>>>>>>          __PRETTY_FUNCTION__ = "aio_poll"
>>>>>>>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>>>>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>>>>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>>>>>>>> block/io.c:552
>>>>>>>>>          aio_context = 0x5555562598b0
>>>>>>>>>          co = <optimized out>
>>>>>>>>>          rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>>>>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags =
>>>>>>>>> (unknown: 0)}
>>>>>>>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>>>>>>>      flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>>>>>>>          qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size
>>>>>>>>> = 2048}
>>>>>>>>>          iov = {iov_base = 0x555557874800, iov_len = 2048}
>>>>>>>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>>>>>>>> No locals.
>>>>>>>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>>>>>>>          ret = <optimized out>
>>>>>>>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>>>>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at
>>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>>          ret = <optimized out>
>>>>>>>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>>>>>>>          byte_count_limit = <optimized out>
>>>>>>>>>          size = <optimized out>
>>>>>>>>>          ret = 2
>>>>>>>> This is still the same scenario Kevin explained.
>>>>>>>>
>>>>>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
>>>>>>>> function holds the QEMU global mutex while waiting for the I/O request
>>>>>>>> to complete.  This blocks other vcpu threads and the main loop thread.
>>>>>>>>
>>>>>>>> The solution is to convert the CD-ROM emulation code to use
>>>>>>>> blk_aio_readv() instead of blk_read().
>>>>>>> I tried a little, but i am stuck with my approach. I reads one sector
>>>>>>> and then doesn't continue. Maybe someone with more knowledge
>>>>>>> of ATAPI/IDE could help?
>>>>>> Converting synchronous code to asynchronous requires an understanding
>>>>>> of the device's state transitions.  Asynchronous code has to put the
>>>>>> device registers into a busy state until the request completes.  It
>>>>>> also needs to handle hardware register accesses that occur while the
>>>>>> request is still pending.
>>>>> That was my assumption as well. But I don't know how to proceed...
>>>>>
>>>>>> I don't know ATAPI/IDE code well enough to suggest a fix.
>>>>> Maybe @John can help?
>>>>>
>>>>> Peter
>>>>>
>>> I looked into this again and it seems that the remaining problem (at least when the CDROM is
>>> mounted via libnfs) is the blk_drain_all() in bmdma_cmd_writeb. At least I end there if I have
>>> a proper OS booted and cut off the NFS server. The VM remains responsive until the guest OS
>>> issues a DMA cancel.
>>>
>>> I do not know what the proper solution is. I had the following ideas so far (not knowing if the
>>> approaches would be correct or not).
>>>
>>> a) Do not clear BM_STATUS_DMAING if we are not able to drain all requests. This works until
>>> the connection is reestablished. The guest OS issues DMA cancel operations again and
>>> again, but when the connectivity is back I end in the following assertion:
>>>
>>> qemu-system-x86_64: ./hw/ide/pci.h:65: bmdma_active_if: Assertion `bmdma->bus->retry_unit != (uint8_t)-1' failed.
>> I would have to check the specs to see if this is allowed.
>>
>>> b) Call the aiocb with -ECANCELED and somehow (?) turn all the callbacks of the outstanding IOs into NOPs.
>> This wouldn't be correct for write requests: We would tell the guest
>> that the request is cancelled when it's actually still in flight. At
>> some point it could still complete, however, and that's not expected by
>> the guest.
>>
>>> c) Follow the hint in the comment in bmdma_cmd_writeb (however this works out):
>>>              * In the future we'll be able to safely cancel the I/O if the
>>>              * whole DMA operation will be submitted to disk with a single
>>>              * aio operation with preadv/pwritev.
>> Not sure how likely it is that cancelling that single AIOCB will
>> actually cancel the operation and not end up doing bdrv_drain_all()
>> internally instead because there is no good way of cancelling the
>> request.
> Maybe this is a solution? It seems to work for the CDROM only case:
>
> diff --git a/block/io.c b/block/io.c
> index d4bc83b..475d44c 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2075,7 +2075,9 @@ static const AIOCBInfo bdrv_em_co_aiocb_info = {
>  static void bdrv_co_complete(BlockAIOCBCoroutine *acb)
>  {
>      if (!acb->need_bh) {
> -        acb->common.cb(acb->common.opaque, acb->req.error);
> +        if (acb->common.cb) {
> +            acb->common.cb(acb->common.opaque, acb->req.error);
> +        }
>          qemu_aio_unref(acb);
>      }
>  }
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index d31ff88..fecfa3e 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -252,11 +252,17 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
>               * whole DMA operation will be submitted to disk with a single
>               * aio operation with preadv/pwritev.
>               */
> -            if (bm->bus->dma->aiocb) {
> -                blk_drain_all();
> -                assert(bm->bus->dma->aiocb == NULL);
> -            }
> -            bm->status &= ~BM_STATUS_DMAING;
> +             if (bm->bus->dma->aiocb) {
> +                 bool read_write = false;
> +                 read_write |= bm->bus->master && !blk_is_read_only(bm->bus->master->conf.blk);
> +                 read_write |= bm->bus->slave && !blk_is_read_only(bm->bus->slave->conf.blk);
> +                 if (read_write) {
> +                     blk_drain_all();
> +                 } else {
> +                     bm->bus->dma->aiocb->cb = NULL;
> +                 }
> +             }
> +             bm->status &= ~BM_STATUS_DMAING;
>          } else {
>              bm->cur_addr = bm->addr;
>              if (!(bm->status & BM_STATUS_DMAING)) {
>
>
>> Kevin

I meanwhile tested a little with this approach. It seems to work absolutely perfect. I can even interrupt a booting vServer (after the kernel is loaded)
and shut the NFS for a long time. The vServer main thread stays responsive. When the NFS comes back I see the AIO requests completed that
where cancelled and their callbacks being ignored. The vServer then starts up perfectly as the guest OS does all the retrying stuff.

I also found that I do not have to do such vodoo to find out if I have to deal with a read/write media. The AIOCB has a pointer to the BDS ;-)

If noone has objections I would create a proper patch (or two) for that.

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-08-15 19:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <55803637.3060607@kamp.de>
2015-06-16 15:34 ` [Qemu-devel] [Qemu-block] RFC cdrom in own thread? Stefan Hajnoczi
2015-06-17  8:35   ` Kevin Wolf
2015-06-18  6:03     ` Peter Lieven
2015-06-18  6:57       ` Markus Armbruster
2015-06-18  6:39     ` Peter Lieven
2015-06-18  6:59       ` Paolo Bonzini
2015-06-18  7:03         ` Peter Lieven
2015-06-18  7:12           ` Peter Lieven
2015-06-18  7:45             ` Kevin Wolf
2015-06-18  8:30               ` Peter Lieven
2015-06-18  8:42                 ` Kevin Wolf
2015-06-18  9:29                   ` Peter Lieven
2015-06-18  9:36                     ` Stefan Hajnoczi
2015-06-18  9:53                       ` Peter Lieven
2015-06-19 13:14                       ` Peter Lieven
2015-06-22  9:25                         ` Stefan Hajnoczi
2015-06-22 13:09                           ` Peter Lieven
2015-06-22 21:54                             ` John Snow
2015-06-23  6:36                               ` Peter Lieven
2015-08-14 13:43                               ` Peter Lieven
2015-08-14 14:08                                 ` Kevin Wolf
2015-08-14 14:21                                   ` Peter Lieven
2015-08-14 14:45                                   ` Peter Lieven
2015-08-15 19:02                                     ` Peter Lieven
2015-06-18 10:17                   ` Peter Lieven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.