From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: [PATCH] xen/HVM: atomically access pointers in bufioreq handling Date: Wed, 22 Jul 2015 18:26:01 +0100 Message-ID: References: <5582E14F0200007800086A37@mail.emea.novell.com> <55AFBEDC020000780009417A@prv-mh.provo.novell.com> <55AFD416020000780009426D@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZHxnX-0004ji-GE for xen-devel@lists.xenproject.org; Wed, 22 Jul 2015 17:27:15 +0000 In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Stefano Stabellini Cc: xen-devel , qemu-devel@nongnu.org, Jan Beulich List-Id: xen-devel@lists.xenproject.org On Wed, 22 Jul 2015, Stefano Stabellini wrote: > On Wed, 22 Jul 2015, Jan Beulich wrote: > > >>> On 22.07.15 at 16:50, wrote: > > > On Wed, 22 Jul 2015, Jan Beulich wrote: > > >> >> --- a/xen-hvm.c > > >> >> +++ b/xen-hvm.c > > >> >> @@ -981,19 +981,30 @@ static void handle_ioreq(XenIOState *sta > > >> >> > > >> >> static int handle_buffered_iopage(XenIOState *state) > > >> >> { > > >> >> + buffered_iopage_t *buf_page = state->buffered_io_page; > > >> >> buf_ioreq_t *buf_req = NULL; > > >> >> ioreq_t req; > > >> >> int qw; > > >> >> > > >> >> - if (!state->buffered_io_page) { > > >> >> + if (!buf_page) { > > >> >> return 0; > > >> >> } > > >> >> > > >> >> memset(&req, 0x00, sizeof(req)); > > >> >> > > >> >> - while (state->buffered_io_page->read_pointer != state->buffered_io_page->write_pointer) { > > >> >> - buf_req = &state->buffered_io_page->buf_ioreq[ > > >> >> - state->buffered_io_page->read_pointer % IOREQ_BUFFER_SLOT_NUM]; > > >> >> + for (;;) { > > >> >> + uint32_t rdptr = buf_page->read_pointer, wrptr; > > >> >> + > > >> >> + xen_rmb(); > > >> > > > >> > We don't need this barrier. > > >> > > >> How would we not? We need to make sure we read in this order > > >> read_pointer, write_pointer, and read_pointer again (in the > > >> comparison). Only that way we can be certain to hold a matching > > >> pair in hands at the end. > > > > > > See below > > > > > > > > >> >> + wrptr = buf_page->write_pointer; > > >> >> + xen_rmb(); > > >> >> + if (rdptr != buf_page->read_pointer) { > > >> > > > >> > I think you have to use atomic_read to be sure that the second read to > > >> > buf_page->read_pointer is up to date and not optimized away. > > >> > > >> No, suppressing such an optimization is an intended (side) effect > > >> of the barriers used. > > > > > > I understand what you are saying but I am not sure if your assumption > > > is correct. Can the compiler optimize the second read anyway? > > > > No, it can't, due to the barrier. > > OK > > > > >> > But if I think that it would be best to simply use atomic_read to read > > >> > both pointers at once using uint64_t as type, so you are sure to get a > > >> > consistent view and there is no need for this check. > > >> > > >> But I'm specifically trying to avoid e.g. a locked cmpxchg8b here on > > >> ix86. > > > > > > OK, but we don't need cmpxchg8b, just: > > > > > > #define atomic_read(ptr) (*(__typeof__(*ptr) volatile*) (ptr)) > > > > This only gives the impression of being atomic when the type is wider > > than a machine word. There's no ix86 (i.e. 32-bit) instruction other > > than LOCK CMPXCHG8B (and possibly MMX/SSE/AVX ones) allowing > > to atomically read a 64-bit quantity. > > Damn! > > > > > something like: > > > > > > for (;;) { > > > uint64_t ptrs; > > > uint32_t rdptr, wrptr; > > > > > > ptrs = atomic_read((uint64_t*)&state->buffered_io_page->read_pointer); > > > rdptr = (uint32_t)ptrs; > > > wrptr = *(((uint32_t*)&ptrs) + 1); > > > > > > if (rdptr == wrptr) { > > > continue; > > > } > > > > > > [work] > > > > > > atomic_add(&buf_page->read_pointer, qw + 1); > > > } > > > > > > it would work, wouldn't it? > > > > Looks like so, but the amount of casts alone makes me wish for > > no-one to consider this (but I agree that the casts could be > > taken care of). Still I think (as btw done elsewhere) the lock > > free access is preferable. > > Actually I think it is conceptually easier to understand, but the > current implementation of atomic_read not working with uint64_t on > x86_32 is a real bummer. In that case I am OK with the lock free loop > too. Thanks for the explanation. > > I'll queue this change up for the next QEMU release cycle. I forgot about the commit message change. Please resend, then, provided that everything is OK, I'll queue it up.