From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40915) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZHvNQ-0007cw-2e for qemu-devel@nongnu.org; Wed, 22 Jul 2015 10:52:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZHvNM-0002Xi-KK for qemu-devel@nongnu.org; Wed, 22 Jul 2015 10:52:08 -0400 Received: from smtp02.citrix.com ([66.165.176.63]:37213) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZHvNM-0002XL-Ds for qemu-devel@nongnu.org; Wed, 22 Jul 2015 10:52:04 -0400 Date: Wed, 22 Jul 2015 15:50:52 +0100 From: Stefano Stabellini In-Reply-To: <55AFBEDC020000780009417A@prv-mh.provo.novell.com> Message-ID: References: <5582E14F0200007800086A37@mail.emea.novell.com> <55AFBEDC020000780009417A@prv-mh.provo.novell.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Subject: Re: [Qemu-devel] [PATCH] xen/HVM: atomically access pointers in bufioreq handling List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Beulich Cc: xen-devel , qemu-devel@nongnu.org, Stefano Stabellini On Wed, 22 Jul 2015, Jan Beulich wrote: > >> The number of slots per page being 511 (i.e. not a power of two) means > >> that the (32-bit) read and write indexes going beyond 2^32 will likely > >> disturb operation. The hypervisor side gets I/O req server creation > >> extended so we can indicate that we're using suitable atomic accesses > >> where needed (not all accesses to the two pointers really need to be > >> atomic), allowing it to atomically canonicalize both pointers when both > >> have gone through at least one cycle. > > > > The description is a bit terse: which accesses don't really need to be > > atomic? > > Perhaps I should drop this part - I more or less copied the hypervisor > side's commit message, and the above really applies to e.g. > > if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >= > (IOREQ_BUFFER_SLOT_NUM - qw) ) > > in hypervisor code. Yes, please remove it as it is confusing. > >> --- a/xen-hvm.c > >> +++ b/xen-hvm.c > >> @@ -981,19 +981,30 @@ static void handle_ioreq(XenIOState *sta > >> > >> static int handle_buffered_iopage(XenIOState *state) > >> { > >> + buffered_iopage_t *buf_page = state->buffered_io_page; > >> buf_ioreq_t *buf_req = NULL; > >> ioreq_t req; > >> int qw; > >> > >> - if (!state->buffered_io_page) { > >> + if (!buf_page) { > >> return 0; > >> } > >> > >> memset(&req, 0x00, sizeof(req)); > >> > >> - while (state->buffered_io_page->read_pointer != state->buffered_io_page->write_pointer) { > >> - buf_req = &state->buffered_io_page->buf_ioreq[ > >> - state->buffered_io_page->read_pointer % IOREQ_BUFFER_SLOT_NUM]; > >> + for (;;) { > >> + uint32_t rdptr = buf_page->read_pointer, wrptr; > >> + > >> + xen_rmb(); > > > > We don't need this barrier. > > How would we not? We need to make sure we read in this order > read_pointer, write_pointer, and read_pointer again (in the > comparison). Only that way we can be certain to hold a matching > pair in hands at the end. See below > >> + wrptr = buf_page->write_pointer; > >> + xen_rmb(); > >> + if (rdptr != buf_page->read_pointer) { > > > > I think you have to use atomic_read to be sure that the second read to > > buf_page->read_pointer is up to date and not optimized away. > > No, suppressing such an optimization is an intended (side) effect > of the barriers used. I understand what you are saying but I am not sure if your assumption is correct. Can the compiler optimize the second read anyway? In any case, if you follow my suggestion below, it won't matter. > > But if I think that it would be best to simply use atomic_read to read > > both pointers at once using uint64_t as type, so you are sure to get a > > consistent view and there is no need for this check. > > But I'm specifically trying to avoid e.g. a locked cmpxchg8b here on > ix86. OK, but we don't need cmpxchg8b, just: #define atomic_read(ptr) (*(__typeof__(*ptr) volatile*) (ptr)) something like: for (;;) { uint64_t ptrs; uint32_t rdptr, wrptr; ptrs = atomic_read((uint64_t*)&state->buffered_io_page->read_pointer); rdptr = (uint32_t)ptrs; wrptr = *(((uint32_t*)&ptrs) + 1); if (rdptr == wrptr) { continue; } [work] atomic_add(&buf_page->read_pointer, qw + 1); } it would work, wouldn't it? > >> handle_ioreq(state, &req); > >> > >> - xen_mb(); > >> - state->buffered_io_page->read_pointer += qw ? 2 : 1; > >> + atomic_add(&buf_page->read_pointer, qw + 1); > > > > I couldn't get specific info on the type of barrier implemented by > > __sync_fetch_and_add, so I cannot tell for sure whether removing > > xen_mb() is appropriate. Do you have a link? I suspect that given the > > strong guarantees of the x86 architecture we'll be fine. I would be less > > confident if this code was used on other archs. > > gcc.pdf, in the section covering them, says "In most cases, these > built-in functions are considered a full barrier. [...] Further, > instructions are issued as necessary to prevent the processor from > speculating loads across the operation and from queuing stores > after the operation." Details on individual builtins subsequently > tell the exceptions from this general rule, but the one used here is > not among the exceptions. Good, thanks for looking it up > >> --- a/include/hw/xen/xen_common.h > >> +++ b/include/hw/xen/xen_common.h > >> @@ -370,7 +370,8 @@ static inline void xen_unmap_pcidev(XenX > >> static inline int xen_create_ioreq_server(XenXC xc, domid_t dom, > >> ioservid_t *ioservid) > >> { > >> - int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid); > >> + int rc = xc_hvm_create_ioreq_server(xc, dom, HVM_IOREQSRV_BUFIOREQ_ATOMIC, > >> + ioservid); > > > > I am concerned that passing 2 instead of 1 could break older > > hypervisors. However handle_bufioreq was never defined as a true > > boolean, so maybe it is OK? > > Indeed I'm building on it only having done == 0 or != 0 checks. > > > The alternative would be to create a xen_xc_hvm_create_ioreq_server > > versioned wrapper in include/hw/xen/xen_common.h. > > Which is what I aimed at avoiding. OK