Re: [PATCH v2] KVM: Mark a vCPU as preempted/ready iff it's scheduled out while running

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH v2] KVM: Mark a vCPU as preempted/ready iff it's scheduled out while running
Date: Mon, 29 Apr 2024 11:05:09 -0700	[thread overview]
Message-ID: <Zi_hVeBVhEODwP4x@google.com> (raw)
In-Reply-To: <CALzav=dyeNtDKW5-s=vGhWASbbxtBY4gkHKv_eqPnqavPfoa+g@mail.gmail.com>

On Mon, Apr 29, 2024, David Matlack wrote:
> On Fri, Apr 26, 2024 at 2:01 PM Sean Christopherson <seanjc@google.com> wrote:
> > On Thu, Mar 07, 2024, David Matlack wrote:
> > >
> > > -     if (current->on_rq) {
> > > +     if (current->on_rq && vcpu->wants_to_run) {
> > >               WRITE_ONCE(vcpu->preempted, true);
> > >               WRITE_ONCE(vcpu->ready, true);
> > >       }
> >
> > Long story short, I was playing around with wants_to_run for a few hairbrained
> > ideas, and realized that there's a TOCTOU bug here.  Userspace can toggle
> > run->immediate_exit at will, e.g. can clear it after the kernel loads it to
> > compute vcpu->wants_to_run.
> >
> > That's not fatal for this use case, since userspace would only be shooting itself
> > in the foot, but it leaves a very dangerous landmine, e.g. if something else in
> > KVM keys off of vcpu->wants_to_run to detect that KVM is in its run loop, i.e.
> > relies on wants_to_run being set if KVM is in its core run loop.
> >
> > To address that, I think we should have all architectures check wants_to_run, not
> > immediate_exit.
> 
> Rephrasing to make sure I understand you correctly: It's possible for
> KVM to see conflicting values of wants_to_run and immediate_exit,
> since userspace can change immediate_exit at any point. e.g. It's
> possible for KVM to see wants_to_run=true and immediate_exit=true,
> which wouldn't make sense. This wouldn't cause any bugs today, but
> could result in buggy behavior down the road so we might as well clean
> it up now.

Yep.

> > Hmm, and we should probably go a step further and actively prevent using
> > immediate_exit from the kernel, e.g. rename it to something scary like:
> >
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 2190adbe3002..9c5fe1dae744 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -196,7 +196,11 @@ struct kvm_xen_exit {
> >  struct kvm_run {
> >         /* in */
> >         __u8 request_interrupt_window;
> > +#ifndef __KERNEL__
> >         __u8 immediate_exit;
> > +#else
> > +       __u8 hidden_do_not_touch;
> > +#endif
> 
> This would result in:
> 
>   vcpu->wants_to_run = !READ_ONCE(vcpu->run->hidden_do_not_touch);
> 
> :)
> 
> Of course we could pick a better name...

Heh, yeah, for demonstration purposes only.

> but isn't every field in kvm_run open to TOCTOU issues?

Yep, and we've had bugs, e.g. see commit 0d033770d43a ("KVM: x86: Fix
KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues").

> (Is immediate_exit really special enough to need this protection?)

I think so.

It's not that immediate_exit is more prone to TOCTOU bugs than other fields in
kvm_run (though I do think immediate_exit does have higher potential for future
bugs), or even that the severity of bugs that could occur with immediate_exit is
high (which I again think is the case), it's that it's actually feasible to
effectively prevent TOCTOU bugs with minimal cost (including ongoing maintenance
cost).  At the cost of a small-ish one-time change, we can protect *all* KVM code
against improer usage of immediate_exit.

Doing the same for other kvm_run fields is less feasiable, as the relevant logic
is much more architecture specific.  E.g. x86 now does a full copy of sregs and
events in kvm_sync_regs, but not regs because the input for regs is never checked.
And blindly creating an in-kernel copy of all state would be extremely wasteful
for s390, which IIUC uses kvm_run.s.regs as _the_ buffer for guest register
state.

     prev parent reply	other threads:[~2024-04-29 18:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-07 16:35 [PATCH v2] KVM: Mark a vCPU as preempted/ready iff it's scheduled out while running David Matlack
2024-04-02 16:41 ` David Matlack
2024-04-26 21:01 ` Sean Christopherson
2024-04-29 17:22   ` David Matlack
2024-04-29 18:05     ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zi_hVeBVhEODwP4x@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.