From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: [PATCH] xen/HVM: atomically access pointers in bufioreq handling Date: Wed, 22 Jul 2015 18:24:48 +0100 Message-ID: References: <5582E14F0200007800086A37@mail.emea.novell.com> <55AFBEDC020000780009417A@prv-mh.provo.novell.com> <55AFD416020000780009426D@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZHxmM-0004QD-EL for xen-devel@lists.xenproject.org; Wed, 22 Jul 2015 17:26:02 +0000 In-Reply-To: <55AFD416020000780009426D@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: xen-devel , qemu-devel@nongnu.org, Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On Wed, 22 Jul 2015, Jan Beulich wrote: > >>> On 22.07.15 at 16:50, wrote: > > On Wed, 22 Jul 2015, Jan Beulich wrote: > >> >> --- a/xen-hvm.c > >> >> +++ b/xen-hvm.c > >> >> @@ -981,19 +981,30 @@ static void handle_ioreq(XenIOState *sta > >> >> > >> >> static int handle_buffered_iopage(XenIOState *state) > >> >> { > >> >> + buffered_iopage_t *buf_page = state->buffered_io_page; > >> >> buf_ioreq_t *buf_req = NULL; > >> >> ioreq_t req; > >> >> int qw; > >> >> > >> >> - if (!state->buffered_io_page) { > >> >> + if (!buf_page) { > >> >> return 0; > >> >> } > >> >> > >> >> memset(&req, 0x00, sizeof(req)); > >> >> > >> >> - while (state->buffered_io_page->read_pointer != state->buffered_io_page->write_pointer) { > >> >> - buf_req = &state->buffered_io_page->buf_ioreq[ > >> >> - state->buffered_io_page->read_pointer % IOREQ_BUFFER_SLOT_NUM]; > >> >> + for (;;) { > >> >> + uint32_t rdptr = buf_page->read_pointer, wrptr; > >> >> + > >> >> + xen_rmb(); > >> > > >> > We don't need this barrier. > >> > >> How would we not? We need to make sure we read in this order > >> read_pointer, write_pointer, and read_pointer again (in the > >> comparison). Only that way we can be certain to hold a matching > >> pair in hands at the end. > > > > See below > > > > > >> >> + wrptr = buf_page->write_pointer; > >> >> + xen_rmb(); > >> >> + if (rdptr != buf_page->read_pointer) { > >> > > >> > I think you have to use atomic_read to be sure that the second read to > >> > buf_page->read_pointer is up to date and not optimized away. > >> > >> No, suppressing such an optimization is an intended (side) effect > >> of the barriers used. > > > > I understand what you are saying but I am not sure if your assumption > > is correct. Can the compiler optimize the second read anyway? > > No, it can't, due to the barrier. OK > >> > But if I think that it would be best to simply use atomic_read to read > >> > both pointers at once using uint64_t as type, so you are sure to get a > >> > consistent view and there is no need for this check. > >> > >> But I'm specifically trying to avoid e.g. a locked cmpxchg8b here on > >> ix86. > > > > OK, but we don't need cmpxchg8b, just: > > > > #define atomic_read(ptr) (*(__typeof__(*ptr) volatile*) (ptr)) > > This only gives the impression of being atomic when the type is wider > than a machine word. There's no ix86 (i.e. 32-bit) instruction other > than LOCK CMPXCHG8B (and possibly MMX/SSE/AVX ones) allowing > to atomically read a 64-bit quantity. Damn! > > something like: > > > > for (;;) { > > uint64_t ptrs; > > uint32_t rdptr, wrptr; > > > > ptrs = atomic_read((uint64_t*)&state->buffered_io_page->read_pointer); > > rdptr = (uint32_t)ptrs; > > wrptr = *(((uint32_t*)&ptrs) + 1); > > > > if (rdptr == wrptr) { > > continue; > > } > > > > [work] > > > > atomic_add(&buf_page->read_pointer, qw + 1); > > } > > > > it would work, wouldn't it? > > Looks like so, but the amount of casts alone makes me wish for > no-one to consider this (but I agree that the casts could be > taken care of). Still I think (as btw done elsewhere) the lock > free access is preferable. Actually I think it is conceptually easier to understand, but the current implementation of atomic_read not working with uint64_t on x86_32 is a real bummer. In that case I am OK with the lock free loop too. Thanks for the explanation. I'll queue this change up for the next QEMU release cycle.