From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:40915)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <prvs=63804ba87=Stefano.Stabellini@citrix.com>)
	id 1ZHvNQ-0007cw-2e
	for qemu-devel@nongnu.org; Wed, 22 Jul 2015 10:52:09 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <prvs=63804ba87=Stefano.Stabellini@citrix.com>)
	id 1ZHvNM-0002Xi-KK
	for qemu-devel@nongnu.org; Wed, 22 Jul 2015 10:52:08 -0400
Received: from smtp02.citrix.com ([66.165.176.63]:37213)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <prvs=63804ba87=Stefano.Stabellini@citrix.com>)
	id 1ZHvNM-0002XL-Ds
	for qemu-devel@nongnu.org; Wed, 22 Jul 2015 10:52:04 -0400
Date: Wed, 22 Jul 2015 15:50:52 +0100
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
In-Reply-To: <55AFBEDC020000780009417A@prv-mh.provo.novell.com>
Message-ID: <alpine.DEB.2.02.1507221505370.17378@kaball.uk.xensource.com>
References: <5582E14F0200007800086A37@mail.emea.novell.com>
	<alpine.DEB.2.02.1507201723370.17378@kaball.uk.xensource.com>
	<55AFBEDC020000780009417A@prv-mh.provo.novell.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Subject: Re: [Qemu-devel] [PATCH] xen/HVM: atomically access pointers in
 bufioreq handling
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>, qemu-devel@nongnu.org, Stefano Stabellini <stefano.stabellini@eu.citrix.com>

On Wed, 22 Jul 2015, Jan Beulich wrote:
> >> The number of slots per page being 511 (i.e. not a power of two) means
> >> that the (32-bit) read and write indexes going beyond 2^32 will likely
> >> disturb operation. The hypervisor side gets I/O req server creation
> >> extended so we can indicate that we're using suitable atomic accesses
> >> where needed (not all accesses to the two pointers really need to be
> >> atomic), allowing it to atomically canonicalize both pointers when both
> >> have gone through at least one cycle.
> > 
> > The description is a bit terse: which accesses don't really need to be
> > atomic?
> 
> Perhaps I should drop this part - I more or less copied the hypervisor
> side's commit message, and the above really applies to e.g.
> 
>     if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >=
>          (IOREQ_BUFFER_SLOT_NUM - qw) )
> 
> in hypervisor code.

Yes, please remove it as it is confusing.


> >> --- a/xen-hvm.c
> >> +++ b/xen-hvm.c
> >> @@ -981,19 +981,30 @@ static void handle_ioreq(XenIOState *sta
> >>  
> >>  static int handle_buffered_iopage(XenIOState *state)
> >>  {
> >> +    buffered_iopage_t *buf_page = state->buffered_io_page;
> >>      buf_ioreq_t *buf_req = NULL;
> >>      ioreq_t req;
> >>      int qw;
> >>  
> >> -    if (!state->buffered_io_page) {
> >> +    if (!buf_page) {
> >>          return 0;
> >>      }
> >>  
> >>      memset(&req, 0x00, sizeof(req));
> >>  
> >> -    while (state->buffered_io_page->read_pointer != state->buffered_io_page->write_pointer) {
> >> -        buf_req = &state->buffered_io_page->buf_ioreq[
> >> -            state->buffered_io_page->read_pointer % IOREQ_BUFFER_SLOT_NUM];
> >> +    for (;;) {
> >> +        uint32_t rdptr = buf_page->read_pointer, wrptr;
> >> +
> >> +        xen_rmb();
> > 
> > We don't need this barrier.
> 
> How would we not? We need to make sure we read in this order
> read_pointer, write_pointer, and read_pointer again (in the
> comparison).  Only that way we can be certain to hold a matching
> pair in hands at the end.

See below


> >> +        wrptr = buf_page->write_pointer;
> >> +        xen_rmb();
> >> +        if (rdptr != buf_page->read_pointer) {
> > 
> > I think you have to use atomic_read to be sure that the second read to
> > buf_page->read_pointer is up to date and not optimized away.
> 
> No, suppressing such an optimization is an intended (side) effect
> of the barriers used.

I understand what you are saying but I am not sure if your assumption
is correct. Can the compiler optimize the second read anyway?
In any case, if you follow my suggestion below, it won't matter.


> > But if I think that it would be best to simply use atomic_read to read
> > both pointers at once using uint64_t as type, so you are sure to get a
> > consistent view and there is no need for this check.
> 
> But I'm specifically trying to avoid e.g. a locked cmpxchg8b here on
> ix86.

OK, but we don't need cmpxchg8b, just:

#define atomic_read(ptr)       (*(__typeof__(*ptr) volatile*) (ptr))

something like:

 for (;;) {
     uint64_t ptrs;
     uint32_t rdptr, wrptr;
 
     ptrs = atomic_read((uint64_t*)&state->buffered_io_page->read_pointer);
     rdptr = (uint32_t)ptrs;
     wrptr = *(((uint32_t*)&ptrs) + 1);
 
     if (rdptr == wrptr) {
         continue;
     }
 
     [work]
 
     atomic_add(&buf_page->read_pointer, qw + 1);
 }

it would work, wouldn't it?


> >>          handle_ioreq(state, &req);
> >>  
> >> -        xen_mb();
> >> -        state->buffered_io_page->read_pointer += qw ? 2 : 1;
> >> +        atomic_add(&buf_page->read_pointer, qw + 1);
> > 
> > I couldn't get specific info on the type of barrier implemented by
> > __sync_fetch_and_add, so I cannot tell for sure whether removing
> > xen_mb() is appropriate. Do you have a link? I suspect that given the
> > strong guarantees of the x86 architecture we'll be fine. I would be less
> > confident if this code was used on other archs.
> 
> gcc.pdf, in the section covering them, says "In most cases, these
> built-in functions are considered a full barrier. [...] Further,
> instructions are issued as necessary to prevent the processor from
> speculating loads across the operation and from queuing stores
> after the operation." Details on individual builtins subsequently
> tell the exceptions from this general rule, but the one used here is
> not among the exceptions.

Good, thanks for looking it up


> >> --- a/include/hw/xen/xen_common.h
> >> +++ b/include/hw/xen/xen_common.h
> >> @@ -370,7 +370,8 @@ static inline void xen_unmap_pcidev(XenX
> >>  static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
> >>                                            ioservid_t *ioservid)
> >>  {
> >> -    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
> >> +    int rc = xc_hvm_create_ioreq_server(xc, dom, HVM_IOREQSRV_BUFIOREQ_ATOMIC,
> >> +                                        ioservid);
> > 
> > I am concerned that passing 2 instead of 1 could break older
> > hypervisors. However handle_bufioreq was never defined as a true
> > boolean, so maybe it is OK?
> 
> Indeed I'm building on it only having done == 0 or != 0 checks.
> 
> > The alternative would be to create a xen_xc_hvm_create_ioreq_server
> > versioned wrapper in include/hw/xen/xen_common.h.
> 
> Which is what I aimed at avoiding.

OK