Linux-Sgx Archive mirror
 help / color / mirror / Atom feed
From: "Haitao Huang" <haitao.huang@linux.intel.com>
To: "jarkko@kernel.org" <jarkko@kernel.org>,
	"Huang, Kai" <kai.huang@intel.com>
Cc: "linux-sgx@vger.kernel.org" <linux-sgx@vger.kernel.org>,
	"Chatre, Reinette" <reinette.chatre@intel.com>,
	"Dhanraj, Vijay" <vijay.dhanraj@intel.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Subject: Re: [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED
Date: Wed, 08 Mar 2023 18:50:06 -0600	[thread overview]
Message-ID: <op.11iklswmwjvjmi@hhuan26-mobl.amr.corp.intel.com> (raw)
In-Reply-To: <5de607230294552829b075846a66688f65f3f74e.camel@intel.com>

Hi Kai

Thanks for your detailed review.

On Tue, 07 Mar 2023 17:32:15 -0600, Huang, Kai <kai.huang@intel.com> wrote:

> On Tue, 2023-02-21 at 19:37 -0600, Haitao Huang wrote:
>> On Tue, 21 Feb 2023 16:10:02 -0600, jarkko@kernel.org  
>> <jarkko@kernel.org>
>> wrote:
>>
>> > On Fri, Feb 17, 2023 at 05:03:05PM -0600, Haitao Huang wrote:
>> > > Hi Jarkko
>> > >
>> > > On Fri, 17 Feb 2023 16:32:12 -0600, jarkko@kernel.org
>> > > <jarkko@kernel.org>
>> > > wrote:
>> > >
>> > > > On Wed, Feb 15, 2023 at 09:42:30AM -0600, Haitao Huang wrote:
>> > > > > On Wed, 15 Feb 2023 02:51:23 -0600, Huang, Kai  
>> <kai.huang@intel.com>
>> > > > > wrote:
>> > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > > > Since sgx_mmap() can happen before enclave is created,
>> > > > > calculating the
>> > > > > > > > vm_pgoff
>> > > > > > > > from enclave_base is conceptually wrong.  Even if you  
>> really
>> > > want
>> > > > > > > to do
>> > > > > > > > it, it
>> > > > > > > > should be:
>> > > > > > > >
>> > > > > > > > 	if (enclave_has_initialized())
>> > > > > > > > 		vma->vm_pgoff = ...;
>> > > > > > >
>> > > > > > > I got your point now. I can add a condition to test the
>> > > > > SGX_ENCL_CREATED
>> > > > > > > bit. However, we still have a hole if we must handle the
>> > > sequence
>> > > > > > > mmap(..., enclave_fd) being called before ECREATE ioctl. We
>> > > > > can't leave
>> > > > > > > vm_pgoff not set for those cases.
>> > > > > > >
>> > > > > > > Since no one does that so far, can we explicitly return an  
>> error
>> > > > > from
>> > > > > > > sgx_mmap when that happens?
>> > > > > > > Other suggestions?
>> > > > > >
>> > > > > > As I replied to patch 4/4, I believe userspace should pass the
>> > > correct
>> > > > > > pgoff in
>> > > > > > mmap().  It's wrong to always pass 0 or any random value.
>> > > > > >
>> > > > > > If userspace follow the mmap() rule, you won't need to  
>> manually
>> > > set
>> > > > > > vm_pgoff
>> > > > > > here (which is hacky IMHO).  Everything works fine.
>> > > > > >
>> > > > >
>> > > > > SGX driver was following MAP_ANONYMOUS semantics. If we change  
>> that,
>> > > > > it'd
>> > > > > break current usage/ABI.
>> > > > >
>> > > > > I still think returning error for cases mmap(..., enclave_fd) if
>> > > > > enclave is
>> > > > > not created would be less intrusive change.
>> > > >
>> > > > Is this something you care in SGX SDK?
>> > > >
>> > > > It is not categorically forbidden so that's why I'm asking.
>> > >
>> > > SDK does not care as we would never do this and don't think anyone  
>> is
>> > > doing
>> > > that either. Suggesting returning error to cover all cases so user  
>> space
>> > > would not accidentally cause incorrect vm_pgoff set.
>> >
>> > vm_pgoff != 0 should result -EINVAL.
>> >
>>
>> Do you mean 'offset' passed in from mmap syscall and converted to pgoff  
>> in
>> kernel? I can see it only passed to driver in get_unmapped_area. I can  
>> add
>> this enforcement there so it is consistent to the MAP_ANONYMOUS spec if
>> that's what you suggest.
>>
>>   vma->vm_pgoff is the offset in pages of the vma->vm_start relative to
>> 'file'.
>>   It can not be zero for all VMAs of the enclave.
>>
>> Kernel sets vma->vm_pgoff == vma->vm_start>>PAGE_SHIFT for private anon
>> VMAs, and ignore it (zero) for shared anon mapping.
>>
>> For us, we propose to set it to PFN_DOWN(vma->vm_start - encl->base)  
>> upon
>> mmap.
>
> Sorry for late reply.  Basically I was sick and having limited working  
> time in
> the passed two weeks.
>
> For what I understand now, SGX driver wants to use MAP_ANONYMOUS  
> semantic for
> mmap()s against /dev/sgx_enclave (for whatever reason that I don't  
> understand).
> And because of that, we even want to explicitly enforce userspace to  
> always pass
> 0 as pgoff in mmap()s (hasn't been done yet).
>
> And then here, we want to rewrite vma->pgoff so that the VMA can act  
> _like_ a
> normal file-based mmap() semantics (despite it is indeed a file-based  
> VMA),
> because the VFS fadvice() implementation assumes file-based VMAs are  
> always
> following file-based mmap() semantics.
>
> Hmm..
>
> Doesn't this sound weird?
>

Sometime weird means good or creative as expressed in a popular slogan in  
the city where I live. Just joking:-)

> I must have been missing something, but why cannot SGX driver just  
> always follow
> file-based mmap() semantics at the beginning?
>
> For instance, what's wrong with:
>
> 	1) encl_fd = open("/dev/sgx_enclave")
> 	2) encl_base = mmap(..., encl_size, MAP_SHARED, encl_fd, 0 /* pgoff */)
> 	3) ioctl(ECREATE, encl_base, encl_size)
>
> (In ioctl(ECREATE), we might want to verify the VMA we found has the same
> encl_base as starting address, and 0 pgoff).
>
> And in following mmap()s in which we want to map a small range of  
> enclave:
>
> 	encl_addr = mmap(encl_addr, MAP_SHARED|MAP_FIXED, encl_fd,
> 			(encl_addr - encl_base) >> PAGE_SHIFT);
>
> ?
>
> Anything wrong above?
>

I believe this can be done except for that you are breaking existing code  
by requiring pg_off for all mmaps now.

Doing mmap before ECREATE also effectively bypasses sgx_encl_may_map. And  
we have a very long thread of discussion about this[1] and that's why  
sgx_encl_may_map come into play to limit VMA permissions to those given in  
SecInfo by EADD ioctl. However, current code does not explicitly block it  
and I think it'd be better do it.  More comment below on this point.

> In fact, if the first mmap() in step 2) before IOCTL(ecreate) used
> MAP_ANONYMOUS, IIUC that VMA will stay as anonymous VMA but will never be
> converted to SGX file-based one, because by looking at the code, I see  
> neither
> ioctl(ECREATE) nor ioctl(EINIT) converts that VMA to file-based one.
>
> That being said, all page faults caused by userspace access to that VMA  
> will
> still go to anonymous VMA's fault path, but not SGX driver's.
>

It is required to mmap anonymously with PROT_NONE initially and
user space must invoke mmap with enclave fd after EADDing each  
segments/areas. PF handler will check VMA prot bits against  
vm_max_prot_bits in sgx_encl_load_page_in_vma, and report failure on any  
mismatch.
If user space does not mmap(...,encl_fd) after EADD, pages will have  
PROT_NONE and fail on any access.

> This is fine for SGX1 as all enclave pages are fully populated.  For  
> SGX2,
> _unless_ you explicitly mmap("/dev/sgx_enclave") those dynamical ranges,  
> the
> fault will never be handled by SGX driver.
>

Indeed, it is required user space invoke mmap to config the regions for  
EAUG.

> Architecturally, for SGX2, if you mmap() the entire enclave range (as in  
> step
> 2), you don't need explicitly mmap() all dynamic ranges.  The userspace  
> should
> just be able to access those dynamic ranges (i.e. EACCEPT) and the  
> kernel driver
> should gracefully handle the fault.
>
> Shouldn't this be a problem?
>
> (Btw, there might be other corner cases that could cause VMA  
> splitting/merging,
> etc, but I haven't thought about those)
>
>> The concern Kai raised about encl->base not available in the window
>> between mmap and ECREATE can be addressed by disallowing mmap before
>> ECREATE is done. It does not make much sense anyway to mmap(enclave_fd)
>> without a valid enclave range associated with enclave_fd.
>>
>
> We can, but I don't understand why the first mmap() before ECREATE  
> must/should
> be done via MMAP_ANONYMOUS.  In fact, I think it might be wrong for SGX2.
>

There is a long history behind it. The use of anonymous mapping can be  
traced back to v29.
See this thread[2].

Before v29, user space can do mmap(PROT_NONE, ..., enclave_fd) before  
ECREATE to reserve a range for the enclave. But it must be PROT_NONE for  
the reservation, otherwise sgx_encl_may_map will block it by the "!page"  
check below [3]:

int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
		     unsigned long end, unsigned long vm_prot_bits)
{
	unsigned long idx, idx_start, idx_end;
	struct sgx_encl_page *page;

	/* PROT_NONE always succeeds. */
	if (!vm_prot_bits)
		return 0;

	idx_start = PFN_DOWN(start);
	idx_end = PFN_DOWN(end - 1);
	for (idx = idx_start; idx <= idx_end; ++idx) {
		mutex_lock(&encl->lock);
		page = radix_tree_lookup(&encl->page_tree, idx);
		mutex_unlock(&encl->lock);

		if (!page || (~page->vm_prot_bits & vm_prot_bits))
			return -EACCES;
...

Then in v29, PROT_NONE mapping was disallowed for encl_fd before pages  
EADDed so user space has to mmap anonymously to reserve the range.The  
intent was still not to allow mmap before pages EADDed (the !page check  
was still there up to v38)

Later version (around v39?) we switched to enforce the permissions in PF  
handler and mmap with any permissions before EADD is allowed, but may  
cause failure later on PF. I'm not sure it was intentional but pretty sure  
no meaningful usage for doing mmap before ECREATE due to PF handler  
enforcement.

I think that's the history behind it. Others more knowledgeable can  
correct me as needed.

Again, even if we redesign the whole thing to be less "weird", then  
requiring pgoff for each mmap would break existing user space code. And I  
think explicitly block mmap before ECREATE make the API more consistent  
with the PF handler enforcement and original intent of sgx_encl_may_map.

Thanks again for detailed review and those thoughtful questions.

Haitao

[1]https://lore.kernel.org/linux-sgx/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
[2]https://lore.kernel.org/linux-sgx/CAOASepPFe_ucuwe7JW_-+VBQ4=+sHqyGXOdA9kUbcYA_9=v0sA@mail.gmail.com/
[3]https://lore.kernel.org/linux-sgx/20190713170804.2340-17-jarkko.sakkinen@linux.intel.com/

  reply	other threads:[~2023-03-09  0:50 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-28  4:55 [RFC PATCH v4 0/4] x86/sgx: implement support for MADV_WILLNEED Haitao Huang
2023-01-28  4:55 ` [RFC PATCH v4 1/4] x86/sgx: Export sgx_encl_eaug_page Haitao Huang
2023-01-28  4:55   ` [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED Haitao Huang
2023-01-28  4:55     ` [RFC PATCH v4 3/4] selftests/sgx: add len field for EACCEPT op Haitao Huang
2023-01-28  4:55       ` [RFC PATCH v4 4/4] selftests/sgx: Add test for madvise(..., WILLNEED) Haitao Huang
2023-02-07 23:30         ` Jarkko Sakkinen
2023-02-15  2:38         ` Huang, Kai
2023-02-15  4:42           ` Haitao Huang
2023-02-15  8:46             ` Huang, Kai
2023-02-17 22:29               ` jarkko
2023-02-07 23:29       ` [RFC PATCH v4 3/4] selftests/sgx: add len field for EACCEPT op Jarkko Sakkinen
2023-02-07 23:28     ` [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED Jarkko Sakkinen
2023-02-14  9:47     ` Huang, Kai
2023-02-14 19:18       ` Haitao Huang
2023-02-14 20:54         ` Huang, Kai
2023-02-14 21:42           ` Haitao Huang
2023-02-14 22:36             ` Huang, Kai
2023-02-15  3:59               ` Haitao Huang
2023-02-15  8:51                 ` Huang, Kai
2023-02-15 15:42                   ` Haitao Huang
2023-02-16  7:53                     ` Huang, Kai
2023-02-16 17:12                       ` Haitao Huang
2023-02-17 22:32                     ` jarkko
2023-02-17 23:03                       ` Haitao Huang
2023-02-21 22:10                         ` jarkko
2023-02-22  1:37                           ` Haitao Huang
2023-03-07 23:32                             ` Huang, Kai
2023-03-09  0:50                               ` Haitao Huang [this message]
2023-03-09 11:31                                 ` Huang, Kai
2023-03-14 14:54                                   ` Haitao Huang
2023-03-19 13:26                                     ` jarkko
2023-03-20  9:36                                       ` Huang, Kai
2023-03-20 14:04                                         ` jarkko
2023-05-27  0:32                                   ` Haitao Huang
2023-06-06  4:11                                     ` Huang, Kai
2023-06-07 16:59                                       ` Haitao Huang
2023-06-16  3:49                                       ` Huang, Kai
2023-06-16 22:05                                         ` Sean Christopherson
2023-06-19 11:17                                           ` Huang, Kai
2023-06-22 22:01                                             ` Sean Christopherson
2023-06-22 23:21                                               ` Huang, Kai
2023-06-26 22:28                                                 ` Sean Christopherson
2023-06-27 11:43                                                   ` Huang, Kai
2023-06-27 14:50                                                     ` Sean Christopherson
2023-06-28  9:37                                                       ` Huang, Kai
2023-06-28 14:57                                                         ` Sean Christopherson
2023-06-29  3:10                                                           ` Huang, Kai
2023-06-29 14:23                                                             ` Sean Christopherson
2023-06-29 23:29                                                               ` Huang, Kai
2023-06-30  0:14                                                                 ` Sean Christopherson
2023-06-30  0:56                                                                   ` Huang, Kai
2023-06-30  1:54                                                                 ` Jarkko Sakkinen
2023-06-30  1:57                                                                   ` Jarkko Sakkinen
2023-06-30  4:26                                                                     ` Huang, Kai
2023-06-30  9:35                                                                       ` Jarkko Sakkinen
2023-03-12  1:25                               ` jarkko
2023-03-12 22:25                                 ` Huang, Kai
2023-02-17 22:07           ` jarkko
2023-02-17 21:50       ` jarkko
2023-02-07 23:26   ` [RFC PATCH v4 1/4] x86/sgx: Export sgx_encl_eaug_page Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=op.11iklswmwjvjmi@hhuan26-mobl.amr.corp.intel.com \
    --to=haitao.huang@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jarkko@kernel.org \
    --cc=kai.huang@intel.com \
    --cc=linux-sgx@vger.kernel.org \
    --cc=reinette.chatre@intel.com \
    --cc=vijay.dhanraj@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).