All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Thiago Macieira <thiago.macieira@intel.com>
To: "Bae, Chang Seok" <chang.seok.bae@intel.com>
Cc: "bp@suse.de" <bp@suse.de>, "Lutomirski, Andy" <luto@kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"Brown, Len" <len.brown@intel.com>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"Liu, Jing2" <jing2.liu@intel.com>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v9 14/26] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE
Date: Mon, 9 Aug 2021 16:42:30 -0700	[thread overview]
Message-ID: <3144206.qcAzhSVzjS@tjmaciei-mobl5> (raw)
In-Reply-To: <E9C8AF5E-E229-4BA2-AEC8-4289EF7428CA@intel.com>

On Monday, 9 August 2021 15:08:19 PDT Bae, Chang Seok wrote:
> I suspect the EBUSY situation is somewhat imaginative. In theory, the
> situation might be possible one thread calls this syscall at some point when
> a new task is being created -- after task data duplication [1] and before
> enlisted [2].
> 
> As stated in the changelog, the alternative is possible:
> > An alternative implementation would not save the permission bitmap in
> > every task. But instead would extend the per-process signal data, and
> > that would not be subject to this race.
> 
> But it involves quite a bit of code complexity and this is pretty much
> backend. I think it is possible to follow up and update when the case ever
> turns out to be real. At least, I'm not aware of any report against the
> PR_SET_FP_MODE prctl(2) [3] which took the same way -- walk and update the
> task list.
> 
> Perhaps, the hunk above can be improved to be atomic.
> 
> <snip>

Hello Chang

Thanks for taking a look at this. I agree that this is a very, very tiny 
corner case and the issue can be treated as a bugfix later. The API between 
userspace and kernel is fine, which is the big issue right now.

But explaining what the issue I see is: a userspace library cannot enforce 
that other threads in the same process aren't either making this same system 
call or starting/exiting threads. So I see two scenarios where the corner case 
can be reached:

1) two simultaneous ARCH_SET_STATE_ENABLE calls
Imagine two threads, each setting a different bit (say bits 18 and 19). Since 
they race with each other and this line:
              t->thread.fpu.dynamic_state_perm |= req_dynstate_perm;
is not using an atomic, the compiler won't emit LOCK OR, so it's possible the 
two calls will step over each other and partially undo the other's work. The 
result after the two calls is completely indeterminate, yet both functions 
returned success.

Since right now there's only one bit that can be set, we know that the two 
calls are doing the same thing, so they're not effectively racing each other. 
So this case is not an issue *right* *now*. There's only duplicate work.

2) one thread calls ARCH_SET_STATE_ENABLE while another thread exits/starts
In this case, the nr_threads != tsk->signal->nr_threads test will fail 
resulting in the dynamic state to be rolled back and the EBUSY condition. I'd 
like a recommendation from the kernel on how to deal with that signal: should 
I retry? For now, I'm going to treat EBUSY like EINVAL and assume I cannot use 
the feature.

1+2) both situations at the same time
This means the corruption can get worse since the rollback code can undo or 
partially undo the progression of the other ARCH_SET_STATE_ENABLE.

> > So I have to insist that the XGETBV instruction's result match exactly
> > what is permitted to run. That means we either enable AMX unconditionally
> > with no need for system calls (with or without XFD trapping to
> > dynamically allocate more state), or that the XCR0 register be set
> > without the AMX bits by default, until the system call is issued.
> 
> XCR0 provokes VMEXIT which will impact the performance hardly. At least the
> opt-in model is a consensus out of the long debate [4]. Let alone the
> question on how well advertise this new syscall though.

I understand.

I am pointing out that this will cause crashes because of improperly / 
insufficiently-tested software. That is, software that violates the contract 
of the new API because we inserted a new requirement that didn't exist for old 
features. Yes, said software is buggy.

The problem is that the crashes can be surprising and will only show up after 
the software gets run on an AMX-capable machine. That may happen, for example, 
if a cloud provider "upgrades" the instance of a VM from a previous generation 
of processor to a new one, or if a batch job does include the new instance 
type. That is, the crashes will not happen for the developer of the software 
in question, but instead for the user.

However, given the requirements that:
 a) XCR0 not be context-switched
 b) a new API call be required to allow the new instructions

Then there's no alternative.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering




  reply	other threads:[~2021-08-09 23:42 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 14:59 [PATCH v9 00/26] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 01/26] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 02/26] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 03/26] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 04/26] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 05/26] x86/fpu/xstate: Add new variables to indicate dynamic XSTATE buffer size Chang S. Bae
2021-08-12 15:03   ` Borislav Petkov
2021-07-30 14:59 ` [PATCH v9 06/26] x86/fpu/xstate: Calculate and remember dynamic XSTATE buffer sizes Chang S. Bae
2021-08-12 16:36   ` Borislav Petkov
2021-07-30 14:59 ` [PATCH v9 07/26] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
2021-08-12 17:09   ` Borislav Petkov
2021-07-30 14:59 ` [PATCH v9 08/26] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically Chang S. Bae
2021-08-12 19:44   ` Borislav Petkov
2021-08-13  8:04     ` Bae, Chang Seok
2021-08-13 10:04       ` Borislav Petkov
2021-08-13 19:43         ` Bae, Chang Seok
2021-08-18  9:28           ` Borislav Petkov
2021-08-18 19:46             ` Bae, Chang Seok
2021-08-25 16:01               ` Bae, Chang Seok
2021-08-30 17:07               ` Borislav Petkov
2021-08-30 23:39                 ` Bae, Chang Seok
2021-08-16 18:33     ` Bae, Chang Seok
2021-08-16 18:53       ` Borislav Petkov
2021-08-30 17:45   ` Dave Hansen
2021-08-30 23:39     ` Bae, Chang Seok
2021-07-30 14:59 ` [PATCH v9 09/26] x86/fpu/xstate: Update the XSTATE save function to support dynamic states Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 10/26] x86/fpu/xstate: Update the XSTATE buffer address finder " Chang S. Bae
2021-08-18 11:33   ` Borislav Petkov
2021-08-18 19:47     ` Bae, Chang Seok
2021-08-30 17:18       ` Borislav Petkov
2021-08-30 23:38         ` Bae, Chang Seok
2021-07-30 14:59 ` [PATCH v9 11/26] x86/fpu/xstate: Update the XSTATE context copy function " Chang S. Bae
2021-08-18 12:03   ` Borislav Petkov
2021-08-18 19:47     ` Bae, Chang Seok
2021-07-30 14:59 ` [PATCH v9 12/26] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state Chang S. Bae
2021-08-18 16:24   ` Borislav Petkov
2021-08-18 17:20     ` Thiago Macieira
2021-08-18 17:46       ` Borislav Petkov
2021-08-18 17:58         ` Thiago Macieira
2021-08-18 18:10           ` Borislav Petkov
2021-08-24 22:51             ` Len Brown
2021-08-18 20:43         ` Bae, Chang Seok
2021-08-18 21:04           ` Thiago Macieira
2021-08-18 21:12             ` Bae, Chang Seok
2021-08-18 22:27               ` Thiago Macieira
2021-08-19  1:21             ` Andy Lutomirski
2021-08-19 16:06               ` Thiago Macieira
2021-08-18 21:17           ` Borislav Petkov
2021-08-18 21:37             ` Bae, Chang Seok
2021-08-19  8:00               ` Borislav Petkov
2021-08-19 15:24                 ` Bae, Chang Seok
2021-08-24 23:22             ` Len Brown
2021-08-30 17:31               ` Borislav Petkov
2021-09-17  3:48                 ` Len Brown
2021-08-18 19:47     ` Bae, Chang Seok
2021-08-24 22:21     ` Len Brown
2021-08-30 17:41       ` Borislav Petkov
2021-08-31 21:44         ` Len Brown
2021-08-24 23:17     ` Len Brown
2021-08-30 17:53       ` Borislav Petkov
2021-08-31 22:07         ` Len Brown
2021-08-31 22:11           ` Dave Hansen
2021-08-30 18:04       ` Dave Hansen
2021-08-31 22:15         ` Len Brown
2021-08-31 22:16           ` Len Brown
2021-08-31 22:39           ` Thiago Macieira
2021-08-31 22:44             ` Len Brown
2021-07-30 14:59 ` [PATCH v9 13/26] x86/fpu/xstate: Support ptracer-induced XSTATE buffer expansion Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 14/26] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE Chang S. Bae
2021-08-06 16:46   ` Thiago Macieira
2021-08-09 22:08     ` Bae, Chang Seok
2021-08-09 23:42       ` Thiago Macieira [this message]
2021-08-10  0:57         ` Bae, Chang Seok
2021-08-13 19:44           ` Bae, Chang Seok
2021-07-30 14:59 ` [PATCH v9 15/26] x86/fpu/xstate: Support both legacy and expanded signal XSTATE size Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 16/26] x86/fpu/xstate: Adjust the XSAVE feature table to address gaps in state component numbers Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 17/26] x86/fpu/xstate: Disable XSTATE support if an inconsistent state is detected Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 18/26] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 19/26] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 20/26] x86/fpu/amx: Initialize child's AMX state Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 21/26] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 22/26] x86/fpu/xstate: Skip writing zeros to signal frame for dynamic user states if in INIT-state Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 23/26] selftest/x86/amx: Test cases for the AMX state management Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 24/26] x86/insn/amx: Add TILERELEASE instruction to the opcode map Chang S. Bae
2021-07-30 14:59 ` [PATCH v9 25/26] intel_idle/amx: Add SPR support with XTILEDATA capability Chang S. Bae
2021-07-30 18:41   ` Dave Hansen
2021-08-03 21:32     ` Bae, Chang Seok
2021-08-03 21:38       ` Dave Hansen
2021-08-03 21:43         ` Brown, Len
2021-07-30 20:15   ` Dave Hansen
2021-07-30 14:59 ` [PATCH v9 26/26] x86/fpu/xstate: Add a sanity check for XFD state when saving XSTATE Chang S. Bae

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3144206.qcAzhSVzjS@tjmaciei-mobl5 \
    --to=thiago.macieira@intel.com \
    --cc=bp@suse.de \
    --cc=chang.seok.bae@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=jing2.liu@intel.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.