From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35162) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZQFfV-00048z-P2 for qemu-devel@nongnu.org; Fri, 14 Aug 2015 10:09:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZQFfQ-0007DB-Oo for qemu-devel@nongnu.org; Fri, 14 Aug 2015 10:09:13 -0400 Date: Fri, 14 Aug 2015 16:08:52 +0200 From: Kevin Wolf Message-ID: <20150814140852.GB5856@noname.redhat.com> References: <20150618074500.GB4270@noname.redhat.com> <558281B2.6020905@kamp.de> <20150618084241.GC4270@noname.redhat.com> <55828F73.3080809@kamp.de> <558415C8.3060207@kamp.de> <55880926.5070800@kamp.de> <55888430.50504@redhat.com> <55CDF083.9080503@kamp.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55CDF083.9080503@kamp.de> Subject: Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven Cc: qemu block , Stefan Hajnoczi , Alexander Bezzubikov , qemu-devel , Paolo Bonzini , John Snow Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben: > Am 22.06.2015 um 23:54 schrieb John Snow: > > > > On 06/22/2015 09:09 AM, Peter Lieven wrote: > >> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi: > >>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven wrote: > >>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi: > >>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven wrote: > >>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf: > >>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben: > >>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf: > >>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben: > >>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)): > >>>>>>>>>> #0 0x00007ffff5d87aa3 in ppoll () from > >>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6 > >>>>>>>>>> No symbol table info available. > >>>>>>>>>> #1 0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0, > >>>>>>>>>> nfds=3, > >>>>>>>>>> timeout=4999424576) at qemu-timer.c:326 > >>>>>>>>>> ts = {tv_sec = 4, tv_nsec = 999424576} > >>>>>>>>>> tvsec = 4 > >>>>>>>>>> #2 0x0000555555956feb in aio_poll (ctx=0x5555563528e0, > >>>>>>>>>> blocking=true) > >>>>>>>>>> at aio-posix.c:231 > >>>>>>>>>> node = 0x0 > >>>>>>>>>> was_dispatching = false > >>>>>>>>>> ret = 1 > >>>>>>>>>> progress = false > >>>>>>>>>> #3 0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0, > >>>>>>>>>> offset=4292007936, > >>>>>>>>>> qiov=0x7ffff554f760, is_write=false, flags=0) at > >>>>>>>>>> block.c:2699 > >>>>>>>>>> aio_context = 0x5555563528e0 > >>>>>>>>>> co = 0x5555563888a0 > >>>>>>>>>> rwco = {bs = 0x55555637eae0, offset = 4292007936, > >>>>>>>>>> qiov = 0x7ffff554f760, is_write = false, ret = > >>>>>>>>>> 2147483647, > >>>>>>>>>> flags = 0} > >>>>>>>>>> #4 0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0, > >>>>>>>>>> sector_num=8382828, > >>>>>>>>>> buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false, > >>>>>>>>>> flags=0) > >>>>>>>>>> at block.c:2722 > >>>>>>>>>> qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1, > >>>>>>>>>> size = > >>>>>>>>>> 2048} > >>>>>>>>>> iov = {iov_base = 0x7ffff44cc800, iov_len = 2048} > >>>>>>>>>> #5 0x000055555594b008 in bdrv_read (bs=0x55555637eae0, > >>>>>>>>>> sector_num=8382828, > >>>>>>>>>> buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730 > >>>>>>>>>> No locals. > >>>>>>>>>> #6 0x000055555599acef in blk_read (blk=0x555556376820, > >>>>>>>>>> sector_num=8382828, > >>>>>>>>>> buf=0x7ffff44cc800 "(", nb_sectors=4) at > >>>>>>>>>> block/block-backend.c:404 > >>>>>>>>>> No locals. > >>>>>>>>>> #7 0x0000555555833ed2 in cd_read_sector (s=0x555556408f88, > >>>>>>>>>> lba=2095707, > >>>>>>>>>> buf=0x7ffff44cc800 "(", sector_size=2048) at > >>>>>>>>>> hw/ide/atapi.c:116 > >>>>>>>>>> ret = 32767 > >>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous > >>>>>>>>> blk_read() > >>>>>>>>> instead of the AIO or coroutine interfaces. This means that it > >>>>>>>>> keeps > >>>>>>>>> polling for request completion while it holds the BQL until the > >>>>>>>>> request > >>>>>>>>> is completed. > >>>>>>>> I will look at this. > >>>>>> I need some further help. My way to "emulate" a hung NFS Server is to > >>>>>> block it in the Firewall. Currently I face the problem that I > >>>>>> cannot mount > >>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously > >>>>>> tried with > >>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe > >>>>>> another > >>>>>> bug): > >>>>>> > >>>>>> (gdb) thread apply all bt full > >>>>>> > >>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)): > >>>>>> #0 qemu_cond_broadcast (cond=cond@entry=0x555556259940) at > >>>>>> util/qemu-thread-posix.c:120 > >>>>>> err = > >>>>>> __func__ = "qemu_cond_broadcast" > >>>>>> #1 0x0000555555911164 in rfifolock_unlock > >>>>>> (r=r@entry=0x555556259910) at > >>>>>> util/rfifolock.c:75 > >>>>>> __PRETTY_FUNCTION__ = "rfifolock_unlock" > >>>>>> #2 0x0000555555875921 in aio_context_release > >>>>>> (ctx=ctx@entry=0x5555562598b0) > >>>>>> at async.c:329 > >>>>>> No locals. > >>>>>> #3 0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0, > >>>>>> blocking=blocking@entry=true) at aio-posix.c:272 > >>>>>> node = > >>>>>> was_dispatching = false > >>>>>> i = > >>>>>> ret = > >>>>>> progress = false > >>>>>> timeout = 611734526 > >>>>>> __PRETTY_FUNCTION__ = "aio_poll" > >>>>>> #4 0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0, > >>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0, > >>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at > >>>>>> block/io.c:552 > >>>>>> aio_context = 0x5555562598b0 > >>>>>> co = > >>>>>> rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov = > >>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags = > >>>>>> (unknown: 0)} > >>>>>> #5 0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0, > >>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(", > >>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false, > >>>>>> flags=flags@entry=(unknown: 0)) at block/io.c:575 > >>>>>> qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size > >>>>>> = 2048} > >>>>>> iov = {iov_base = 0x555557874800, iov_len = 2048} > >>>>>> #6 0x00005555558bc593 in bdrv_read (bs=, > >>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(", > >>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583 > >>>>>> No locals. > >>>>>> #7 0x00005555558af75d in blk_read (blk=, > >>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(", > >>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493 > >>>>>> ret = > >>>>>> #8 0x00005555557abb88 in cd_read_sector (sector_size=, > >>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at > >>>>>> hw/ide/atapi.c:116 > >>>>>> ret = > >>>>>> #9 ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190 > >>>>>> byte_count_limit = > >>>>>> size = > >>>>>> ret = 2 > >>>>> This is still the same scenario Kevin explained. > >>>>> > >>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read(). This > >>>>> function holds the QEMU global mutex while waiting for the I/O request > >>>>> to complete. This blocks other vcpu threads and the main loop thread. > >>>>> > >>>>> The solution is to convert the CD-ROM emulation code to use > >>>>> blk_aio_readv() instead of blk_read(). > >>>> I tried a little, but i am stuck with my approach. I reads one sector > >>>> and then doesn't continue. Maybe someone with more knowledge > >>>> of ATAPI/IDE could help? > >>> Converting synchronous code to asynchronous requires an understanding > >>> of the device's state transitions. Asynchronous code has to put the > >>> device registers into a busy state until the request completes. It > >>> also needs to handle hardware register accesses that occur while the > >>> request is still pending. > >> That was my assumption as well. But I don't know how to proceed... > >> > >>> I don't know ATAPI/IDE code well enough to suggest a fix. > >> Maybe @John can help? > >> > >> Peter > >> > > > > I looked into this again and it seems that the remaining problem (at least when the CDROM is > mounted via libnfs) is the blk_drain_all() in bmdma_cmd_writeb. At least I end there if I have > a proper OS booted and cut off the NFS server. The VM remains responsive until the guest OS > issues a DMA cancel. > > I do not know what the proper solution is. I had the following ideas so far (not knowing if the > approaches would be correct or not). > > a) Do not clear BM_STATUS_DMAING if we are not able to drain all requests. This works until > the connection is reestablished. The guest OS issues DMA cancel operations again and > again, but when the connectivity is back I end in the following assertion: > > qemu-system-x86_64: ./hw/ide/pci.h:65: bmdma_active_if: Assertion `bmdma->bus->retry_unit != (uint8_t)-1' failed. I would have to check the specs to see if this is allowed. > b) Call the aiocb with -ECANCELED and somehow (?) turn all the callbacks of the outstanding IOs into NOPs. This wouldn't be correct for write requests: We would tell the guest that the request is cancelled when it's actually still in flight. At some point it could still complete, however, and that's not expected by the guest. > c) Follow the hint in the comment in bmdma_cmd_writeb (however this works out): > * In the future we'll be able to safely cancel the I/O if the > * whole DMA operation will be submitted to disk with a single > * aio operation with preadv/pwritev. Not sure how likely it is that cancelling that single AIOCB will actually cancel the operation and not end up doing bdrv_drain_all() internally instead because there is no good way of cancelling the request. Kevin