linux-ia64.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Helge Deller <deller@gmx.de>
To: matoro <matoro_mailinglist_kernel@matoro.tk>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org, Linux Ia64 <linux-ia64@vger.kernel.org>,
	glaubitz@physik.fu-berlin.de, Sam James <sam@gentoo.org>
Subject: Re: [PATCH 1/5] io_uring: Adjust mapping wrt architecture aliasing requirements
Date: Wed, 12 Jul 2023 22:30:31 +0200	[thread overview]
Message-ID: <802b84f7-94f4-638b-3742-26bca00b262d@gmx.de> (raw)
In-Reply-To: <a3ae1656-be97-ccc2-8962-1cb70ebc67fa@gmx.de>

On 7/12/23 21:05, Helge Deller wrote:
> On 7/12/23 19:28, matoro wrote:
>> On 2023-07-12 12:24, Helge Deller wrote:
>>> Hi Matoro,
>>>
>>> * matoro <matoro_mailinglist_kernel@matoro.tk>:
>>>> On 2023-03-14 13:16, Jens Axboe wrote:
>>>> > From: Helge Deller <deller@gmx.de>
>>>> >
>>>> > Some architectures have memory cache aliasing requirements (e.g. parisc)
>>>> > if memory is shared between userspace and kernel. This patch fixes the
>>>> > kernel to return an aliased address when asked by userspace via mmap().
>>>> >
>>>> > Signed-off-by: Helge Deller <deller@gmx.de>
>>>> > Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>> > ---
>>>> >  io_uring/io_uring.c | 51 +++++++++++++++++++++++++++++++++++++++++++++
>>>> >  1 file changed, 51 insertions(+)
>>>> >
>>>> > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>> > index 722624b6d0dc..3adecebbac71 100644
>>>> > --- a/io_uring/io_uring.c
>>>> > +++ b/io_uring/io_uring.c
>>>> > @@ -72,6 +72,7 @@
>>>> >  #include <linux/io_uring.h>
>>>> >  #include <linux/audit.h>
>>>> >  #include <linux/security.h>
>>>> > +#include <asm/shmparam.h>
>>>> >
>>>> >  #define CREATE_TRACE_POINTS
>>>> >  #include <trace/events/io_uring.h>
>>>> > @@ -3317,6 +3318,54 @@ static __cold int io_uring_mmap(struct file
>>>> > *file, struct vm_area_struct *vma)
>>>> >      return remap_pfn_range(vma, vma->vm_start, pfn, sz,
>>>> > vma->vm_page_prot);
>>>> >  }
>>>> >
>>>> > +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>>>> > +            unsigned long addr, unsigned long len,
>>>> > +            unsigned long pgoff, unsigned long flags)
>>>> > +{
>>>> > +    const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
>>>> > +    struct vm_unmapped_area_info info;
>>>> > +    void *ptr;
>>>> > +
>>>> > +    /*
>>>> > +     * Do not allow to map to user-provided address to avoid breaking the
>>>> > +     * aliasing rules. Userspace is not able to guess the offset address
>>>> > of
>>>> > +     * kernel kmalloc()ed memory area.
>>>> > +     */
>>>> > +    if (addr)
>>>> > +        return -EINVAL;
>>>> > +
>>>> > +    ptr = io_uring_validate_mmap_request(filp, pgoff, len);
>>>> > +    if (IS_ERR(ptr))
>>>> > +        return -ENOMEM;
>>>> > +
>>>> > +    info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>>>> > +    info.length = len;
>>>> > +    info.low_limit = max(PAGE_SIZE, mmap_min_addr);
>>>> > +    info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
>>>> > +#ifdef SHM_COLOUR
>>>> > +    info.align_mask = PAGE_MASK & (SHM_COLOUR - 1UL);
>>>> > +#else
>>>> > +    info.align_mask = PAGE_MASK & (SHMLBA - 1UL);
>>>> > +#endif
>>>> > +    info.align_offset = (unsigned long) ptr;
>>>> > +
>>>> > +    /*
>>>> > +     * A failed mmap() very likely causes application failure,
>>>> > +     * so fall back to the bottom-up function here. This scenario
>>>> > +     * can happen with large stack limits and large mmap()
>>>> > +     * allocations.
>>>> > +     */
>>>> > +    addr = vm_unmapped_area(&info);
>>>> > +    if (offset_in_page(addr)) {
>>>> > +        info.flags = 0;
>>>> > +        info.low_limit = TASK_UNMAPPED_BASE;
>>>> > +        info.high_limit = mmap_end;
>>>> > +        addr = vm_unmapped_area(&info);
>>>> > +    }
>>>> > +
>>>> > +    return addr;
>>>> > +}
>>>> > +
>>>> >  #else /* !CONFIG_MMU */
>>>> >
>>>> >  static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>>>> > @@ -3529,6 +3578,8 @@ static const struct file_operations io_uring_fops
>>>> > = {
>>>> >  #ifndef CONFIG_MMU
>>>> >      .get_unmapped_area = io_uring_nommu_get_unmapped_area,
>>>> >      .mmap_capabilities = io_uring_nommu_mmap_capabilities,
>>>> > +#else
>>>> > +    .get_unmapped_area = io_uring_mmu_get_unmapped_area,
>>>> >  #endif
>>>> >      .poll        = io_uring_poll,
>>>> >  #ifdef CONFIG_PROC_FS
>>>>
>>>> Hi Jens, Helge - I've bisected a regression with io_uring on ia64 to this
>>>> patch in 6.4.  Unfortunately this breaks userspace programs using io_uring,
>>>> the easiest one to test is cmake with an io_uring enabled libuv (i.e., libuv
>>>> >= 1.45.0) which will hang.
>>>>
>>>> I am aware that ia64 is in a vulnerable place right now which I why I am
>>>> keeping this spread limited.  Since this clearly involves
>>>> architecture-specific changes for parisc,
>>>
>>> it isn't so much architecture-specific... (just one ifdef)
>>>
>>>> is there any chance of looking at
>>>> what is required to do the same for ia64?  I looked at
>>>> 0ef36bd2b37815719e31a72d2beecc28ca8ecd26 ("parisc: change value of SHMLBA
>>>> from 0x00400000 to PAGE_SIZE") and tried to replicate the SHMLBA ->
>>>> SHM_COLOUR change, but it made no difference.
>>>>
>>>> If hardware is necessary for testing, I can provide it, including remote BMC
>>>> access for restarts/kernel debugging.  Any takers?
>>>
>>> I won't have time to test myself, but maybe you could test?
>>>
>>> Basically we should try to find out why io_uring_mmu_get_unmapped_area()
>>> doesn't return valid addresses, while arch_get_unmapped_area()
>>> [in arch/ia64/kernel/sys_ia64.c] does.
>>>
>>> You could apply this patch first:
>>> It introduces a memory leak (as it requests memory twice), but maybe we
>>> get an idea?
>>> The ia64 arch_get_unmapped_area() searches for memory from bottom
>>> (flags=0), while io_uring function tries top-down first. Maybe that's
>>> the problem. And I don't understand the offset_in_page() check right
>>> now.
>>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index 3bca7a79efda..93b1964d2bbb 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -3431,13 +3431,17 @@ static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
>>>       * can happen with large stack limits and large mmap()
>>>       * allocations.
>>>       */
>>> +/* compare to arch_get_unmapped_area() in arch/ia64/kernel/sys_ia64.c */
>>>      addr = vm_unmapped_area(&info);
>>> -    if (offset_in_page(addr)) {
>>> +printk("io_uring_mmu_get_unmapped_area() address 1 is: %px\n", addr);
>>> +    addr = NULL;
>>> +    if (!addr) {
>>>          info.flags = 0;
>>>          info.low_limit = TASK_UNMAPPED_BASE;
>>>          info.high_limit = mmap_end;
>>>          addr = vm_unmapped_area(&info);
>>>      }
>>> +printk("io_uring_mmu_get_unmapped_area() returns address %px\n", addr);
>>>
>>>      return addr;
>>>  }
>>>
>>>
>>> Another option is to disable the call to io_uring_nommu_get_unmapped_area())
>>> with the next patch. Maybe you could add printks() to ia64's arch_get_unmapped_area()
>>> and check what it returns there?
>>>
>>> @@ -3654,6 +3658,8 @@ static const struct file_operations io_uring_fops = {
>>>  #ifndef CONFIG_MMU
>>>      .get_unmapped_area = io_uring_nommu_get_unmapped_area,
>>>      .mmap_capabilities = io_uring_nommu_mmap_capabilities,
>>> +#elif 0    /* IS_ENABLED(CONFIG_IA64) */
>>> +    .get_unmapped_area = NULL,
>>>  #else
>>>      .get_unmapped_area = io_uring_mmu_get_unmapped_area,
>>>  #endif
>>>
>>> Helge
>>
>> Thanks Helge.  Sample output from that first patch:
>>
>> [Wed Jul 12 13:09:50 2023] io_uring_mmu_get_unmapped_area() address 1 is: 1ffffffffff40000
>> [Wed Jul 12 13:09:50 2023] io_uring_mmu_get_unmapped_area() returns address 2000000001e40000
>> [Wed Jul 12 13:09:50 2023] io_uring_mmu_get_unmapped_area() address 1 is: 1ffffffffff20000
>> [Wed Jul 12 13:09:50 2023] io_uring_mmu_get_unmapped_area() returns address 2000000001f20000
>> [Wed Jul 12 13:09:50 2023] io_uring_mmu_get_unmapped_area() address 1 is: 1ffffffffff30000
>> [Wed Jul 12 13:09:50 2023] io_uring_mmu_get_unmapped_area() returns address 2000000001f30000
>> [Wed Jul 12 13:09:50 2023] io_uring_mmu_get_unmapped_area() address 1 is: 1ffffffffff90000
>> [Wed Jul 12 13:09:50 2023] io_uring_mmu_get_unmapped_area() returns address 2000000001f90000
>>
>> This pattern seems to be pretty stable, I tried instead just directly returning the result of a call to arch_get_unmapped_area() at the end of the function and it seems similar:
>>
>> [Wed Jul 12 13:27:07 2023] io_uring_mmu_get_unmapped_area() would return address 1ffffffffffd0000
>> [Wed Jul 12 13:27:07 2023] but arch_get_unmapped_area() would return address 2000000001f00000
>> [Wed Jul 12 13:27:07 2023] io_uring_mmu_get_unmapped_area() would return address 1ffffffffff00000
>> [Wed Jul 12 13:27:07 2023] but arch_get_unmapped_area() would return address 1ffffffffff00000
>> [Wed Jul 12 13:27:07 2023] io_uring_mmu_get_unmapped_area() would return address 1fffffffffe20000
>> [Wed Jul 12 13:27:07 2023] but arch_get_unmapped_area() would return address 2000000002000000
>> [Wed Jul 12 13:27:07 2023] io_uring_mmu_get_unmapped_area() would return address 1fffffffffe30000
>> [Wed Jul 12 13:27:07 2023] but arch_get_unmapped_area() would return address 2000000002100000
>>
>> Is that enough of a clue to go on?
>
> SHMLBA on ia64 is 0x100000:
> arch/ia64/include/asm/shmparam.h:#define        SHMLBA  (1024*1024)
> but the values returned by io_uring_mmu_get_unmapped_area() does not fullfill this.
>
> So, probably ia64's SHMLBA isn't pulled in correctly in io_uring/io_uring.c.
> Check value of this line:
>      info.align_mask = PAGE_MASK & (SHMLBA - 1UL);
>
> You could also add
> #define SHM_COLOUR  0x100000
> in front of the
>      #ifdef SHM_COLOUR
> (define SHM_COLOUR in io_uring/kbuf.c too).

What is the value of PAGE_SIZE and "ptr" on your machine?
For 4k page size I get:
SHMLBA -1   ->        FFFFF
PAGE_MASK   -> FFFFFFFFF000
so,
info.align_mask = PAGE_MASK & (SHMLBA - 1UL) = 0xFF000;
You could try to set nfo.align_mask = 0xfffff;

Helge

  reply	other threads:[~2023-07-12 20:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20230314171641.10542-1-axboe@kernel.dk>
     [not found] ` <20230314171641.10542-2-axboe@kernel.dk>
2023-07-12  4:43   ` [PATCH 1/5] io_uring: Adjust mapping wrt architecture aliasing requirements matoro
2023-07-12 16:24     ` Helge Deller
2023-07-12 17:28       ` matoro
2023-07-12 19:05         ` Helge Deller
2023-07-12 20:30           ` Helge Deller [this message]
2023-07-13  0:35             ` matoro
2023-07-13  7:27               ` Helge Deller
2023-07-13 23:57                 ` matoro
2023-07-16  6:54                   ` Helge Deller
2023-07-16 18:03                     ` matoro
2023-07-16 20:54                       ` Helge Deller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=802b84f7-94f4-638b-3742-26bca00b262d@gmx.de \
    --to=deller@gmx.de \
    --cc=axboe@kernel.dk \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=matoro_mailinglist_kernel@matoro.tk \
    --cc=sam@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).