Re: [PATCH v8 1/1] ns: add binfmt_misc to the user namespace

Linux-api Archive mirror
 help / color / mirror / Atom feed

From: Laurent Vivier <laurent@vivier.eu>
To: Jann Horn <jannh@google.com>
Cc: "kernel list" <linux-kernel@vger.kernel.org>,
	"Greg Kurz" <groug@kaod.org>, "Andrei Vagin" <avagin@gmail.com>,
	"Linux API" <linux-api@vger.kernel.org>,
	"Dmitry Safonov" <dima@arista.com>,
	"James Bottomley" <James.Bottomley@hansenpartnership.com>,
	"Jan Kiszka" <jan.kiszka@siemens.com>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"Linux Containers" <containers@lists.linux-foundation.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Eric Biederman" <ebiederm@xmission.com>,
	"Henning Schild" <henning.schild@siemens.com>,
	"Cédric Le Goater" <clg@kaod.org>
Subject: Re: [PATCH v8 1/1] ns: add binfmt_misc to the user namespace
Date: Mon, 16 Dec 2019 21:05:02 +0100	[thread overview]
Message-ID: <15d270a6-2264-adc5-3f56-fdb8b67ad267@vivier.eu> (raw)
In-Reply-To: <CAG48ez2xNCRmuzpNqYW5R+XMKzW8YiemsPUPgk42KSkSZXmvLg@mail.gmail.com>

Le 16/12/2019 à 20:08, Jann Horn a écrit :
> On Mon, Dec 16, 2019 at 10:12 AM Laurent Vivier <laurent@vivier.eu> wrote:
>> This patch allows to have a different binfmt_misc configuration
>> for each new user namespace. By default, the binfmt_misc configuration
>> is the one of the previous level, but if the binfmt_misc filesystem is
>> mounted in the new namespace a new empty binfmt instance is created and
>> used in this namespace.
>>
>> For instance, using "unshare" we can start a chroot of another
>> architecture and configure the binfmt_misc interpreter without being root
>> to run the binaries in this chroot.
> 
> How do you ensure that when userspace is no longer using the user
> namespace and mount namespace, the entries and the binfmt_misc
> superblock are deleted? As far as I can tell from looking at the code,
> at the moment, if I create a user namespace+mount namespace, mount
> binfmt_misc in there, register a file format and then let all
> processes inside the namespaces exit, the binfmt_misc mount will be
> kept alive by the simple_pin_fs() stuff, and the binfmt_misc entries
> will also stay in memory.
> 
> [...]

Do you have any idea how I can fix this issue?

>> @@ -718,7 +736,9 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
>>         if (!inode)
>>                 goto out2;
>>
>> -       err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count);
>> +       ns = binfmt_ns(file_dentry(file)->d_sb->s_user_ns);
>> +       err = simple_pin_fs(&bm_fs_type, &ns->bm_mnt,
>> +                           &ns->entry_count);
> 
> When you call simple_pin_fs() here, the user namespace of `current`
> and the user namespace of the superblock are not necessarily related.
> So simple_pin_fs() may end up taking a reference on the mountpoint for
> a user namespace that has nothing to do with the namespace for which
> an entry is being created.

Do you have any idea how I can fix this issue?

> 
> [...]
>>  static int bm_fill_super(struct super_block *sb, struct fs_context *fc)
>>  {
>>         int err;
>> +       struct user_namespace *ns = sb->s_user_ns;
> [...]
>> +       /* create a new binfmt namespace
>> +        * if we are not in the first user namespace
>> +        * but the binfmt namespace is the first one
>> +        */
>> +       if (READ_ONCE(ns->binfmt_ns) == NULL) {
> 
> The READ_ONCE() here is unnecessary, right? AFAIK the VFS layer is
> going to ensure that bm_fill_super() can't run concurrently for the
> same namespace?

So I understand the "READ_ONCE()" is unnecessary and I will remove it.

> 
>> +               struct binfmt_namespace *new_ns;
>> +
>> +               new_ns = kmalloc(sizeof(struct binfmt_namespace),
>> +                                GFP_KERNEL);
>> +               if (new_ns == NULL)
>> +                       return -ENOMEM;
>> +               INIT_LIST_HEAD(&new_ns->entries);
>> +               new_ns->enabled = 1;
>> +               rwlock_init(&new_ns->entries_lock);
>> +               new_ns->bm_mnt = NULL;
>> +               new_ns->entry_count = 0;
>> +               /* ensure new_ns is completely initialized before sharing it */
>> +               smp_wmb();
>> +               WRITE_ONCE(ns->binfmt_ns, new_ns);
> 
> Nit: This would be a little bit semantically clearer if you used
> smp_store_release() instead of smp_wmb()+WRITE_ONCE().

I will.

> 
>> +       }
>> +
>>         err = simple_fill_super(sb, BINFMTFS_MAGIC, bm_files);
> [...]
>> +static void bm_free(struct fs_context *fc)
>> +{
>> +       if (fc->s_fs_info)
>> +               put_user_ns(fc->s_fs_info);
>> +}
> 
> Silly question: Why the "if"? Can you ever reach this with fc->s_fs_info==NULL?

So I understand the if is unnecessary and I will remove it.

> 
>> +
>>  static int bm_get_tree(struct fs_context *fc)
>>  {
>> -       return get_tree_single(fc, bm_fill_super);
>> +       return get_tree_keyed(fc, bm_fill_super, get_user_ns(fc->user_ns));
> 
> get_user_ns() increments the refcount of the namespace, but in the
> case where a binfmt_misc mount already exists, that refcount is never
> dropped, right? That would be a security bug, since an attacker could
> overflow the refcount of the user namespace and then trigger a UAF.
> (And the refcount hardening won't catch it because user namespaces
> still use raw atomics instead of refcount_t.)

Do you have any idea how I can fix this issue?

> [...]
>> +#if IS_ENABLED(CONFIG_BINFMT_MISC)
> 
> Nit: Isn't this kind of check normally written as "#ifdef"?
> 

What is the difference?

Thanks,
Laurent

next prev parent reply	other threads:[~2019-12-16 20:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-16  9:12 [PATCH v8 0/1] ns: introduce binfmt_misc namespace Laurent Vivier
2019-12-16  9:12 ` [PATCH v8 1/1] ns: add binfmt_misc to the user namespace Laurent Vivier
2019-12-16 19:08   ` Jann Horn
2019-12-16 20:05     ` Laurent Vivier [this message]
2019-12-16 22:53       ` Jann Horn
2021-01-08  8:22   ` Jan Kiszka
2021-01-18 19:51     ` Laurent Vivier
2023-06-30  8:38       ` Norbert Lange
2023-06-30  8:52         ` Laurent Vivier
2023-06-30  9:06           ` Christian Brauner
2023-07-12 19:40             ` Kees Cook
2023-09-06 10:28               ` Norbert Lange
2023-10-11  0:36                 ` Kees Cook
2019-12-16  9:46 ` [PATCH v8 0/1] ns: introduce binfmt_misc namespace Christian Brauner
2019-12-16  9:53   ` Laurent Vivier
2019-12-16 10:06     ` Christian Brauner
2019-12-16 10:08       ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15d270a6-2264-adc5-3f56-fdb8b67ad267@vivier.eu \
    --to=laurent@vivier.eu \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=avagin@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=clg@kaod.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dima@arista.com \
    --cc=ebiederm@xmission.com \
    --cc=groug@kaod.org \
    --cc=henning.schild@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).