Netdev Archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
       [not found]       ` <c5b26bbe-7d95-ec86-5ddb-c2bd2b6c79a7@gmail.com>
@ 2021-03-18  0:15         ` Stefan Metzmacher
  2021-03-18 13:00           ` Pavel Begunkov
  2021-03-18 13:08           ` Pavel Begunkov
  0 siblings, 2 replies; 3+ messages in thread
From: Stefan Metzmacher @ 2021-03-18  0:15 UTC (permalink / raw
  To: Pavel Begunkov, Jens Axboe, io-uring; +Cc: netdev@vger.kernel.org

Hi Pavel,

>>>>> here're patches which fix linking of send[msg]()/recv[msg]() calls
>>>>> and make sure io_uring_enter() never generate a SIGPIPE.
>>>
>>> 1/2 breaks userspace.
>>
>> Can you explain that a bit please, how could some application ever
>> have a useful use of IOSQE_IO_LINK with these socket calls?
> 
> Packet delivery of variable size, i.e. recv(max_size). Byte stream
> that consumes whatever you've got and links something (e.g. notification
> delivery, or poll). Not sure about netlink, but maybe. Or some
> "create a file via send" crap, or some made-up custom protocols

Ok, then we need a flag or a new opcode to provide that behavior?

For recv() and recvmsg() MSG_WAITALL might be usable.

It's not defined in 'man 2 sendmsg', but should we use it anyway
for IORING_OP_SEND[MSG] in order to activate the short send check
as the low level sock_sendmsg() call seem to ignore unused flags,
which seems to be the reason for the following logic in tcp_sendmsg_locked:

if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {

You need to set SOCK_ZEROCOPY in the socket in order to give a meaning
to MSG_ZEROCOPY.

Should I prepare an add-on patch to make the short send/recv logic depend
on MSG_WAITALL?

I'm cc'ing netdev@vger.kernel.org in order to more feedback of
MSG_WAITALL can be passed to sendmsg without fear to trigger
-EINVAL.

The example for io_sendmsg() would look like this:

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4383,7 +4383,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
        struct io_async_msghdr iomsg, *kmsg;
        struct socket *sock;
        unsigned flags;
-       int expected_ret;
+       int min_ret = 0;
        int ret;

        sock = sock_from_file(req->file);
@@ -4404,9 +4404,11 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
        else if (issue_flags & IO_URING_F_NONBLOCK)
                flags |= MSG_DONTWAIT;

-       expected_ret = iov_iter_count(&kmsg->msg.msg_iter);
-       if (unlikely(expected_ret == MAX_RW_COUNT))
-               expected_ret += 1;
+       if (flags & MSG_WAITALL) {
+               min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+               if (unlikely(min_ret == MAX_RW_COUNT))
+                       min_ret += 1;
+       }
        ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
        if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
                return io_setup_async_msg(req, kmsg);
@@ -4417,7 +4419,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
        if (kmsg->free_iov)
                kfree(kmsg->free_iov);
        req->flags &= ~REQ_F_NEED_CLEANUP;
-       if (ret != expected_ret)
+       if (ret < min_ret)
                req_set_fail_links(req);
        __io_req_complete(req, issue_flags, ret, 0);
        return 0;

Which means the default of min_ret = 0 would result in:

        if (ret < 0)
                req_set_fail_links(req);

again...

>>> Sounds like 2/2 might too, does it?
>>
>> Do you think any application really expects to get a SIGPIPE
>> when calling io_uring_enter()?
> 
> If it was about what I think I would remove lots of old garbage :)
> I doubt it wasn't working well before, e.g. because of iowq, but
> who knows

Yes, it was inconsistent before and now it's reliable.

metze




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-18  0:15         ` [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Stefan Metzmacher
@ 2021-03-18 13:00           ` Pavel Begunkov
  2021-03-18 13:08           ` Pavel Begunkov
  1 sibling, 0 replies; 3+ messages in thread
From: Pavel Begunkov @ 2021-03-18 13:00 UTC (permalink / raw
  To: Stefan Metzmacher, Jens Axboe, io-uring; +Cc: netdev@vger.kernel.org

On 18/03/2021 00:15, Stefan Metzmacher wrote:
> Hi Pavel,
> 
>>>>>> here're patches which fix linking of send[msg]()/recv[msg]() calls
>>>>>> and make sure io_uring_enter() never generate a SIGPIPE.
>>>>
>>>> 1/2 breaks userspace.
>>>
>>> Can you explain that a bit please, how could some application ever
>>> have a useful use of IOSQE_IO_LINK with these socket calls?
>>
>> Packet delivery of variable size, i.e. recv(max_size). Byte stream
>> that consumes whatever you've got and links something (e.g. notification
>> delivery, or poll). Not sure about netlink, but maybe. Or some
>> "create a file via send" crap, or some made-up custom protocols
> 
> Ok, then we need a flag or a new opcode to provide that behavior?
> 
> For recv() and recvmsg() MSG_WAITALL might be usable.

Hmm, unrelated, but there is a good chance MSG_WAITALL with io_uring
is broken because of our first MSG_DONTWAIT attempt. 

> It's not defined in 'man 2 sendmsg', but should we use it anyway
> for IORING_OP_SEND[MSG] in order to activate the short send check
> as the low level sock_sendmsg() call seem to ignore unused flags,
> which seems to be the reason for the following logic in tcp_sendmsg_locked:
> 
> if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {

Yep, it maintains compatibility because of unchecked unsupported flags.
Alleviating an old design problem, IIRC.

> 
> You need to set SOCK_ZEROCOPY in the socket in order to give a meaning
> to MSG_ZEROCOPY.
> 
> Should I prepare an add-on patch to make the short send/recv logic depend
> on MSG_WAITALL?

IMHO, conceptually it would make much more sense with MSG_WAITALL.

> 
> I'm cc'ing netdev@vger.kernel.org in order to more feedback of
> MSG_WAITALL can be passed to sendmsg without fear to trigger
> -EINVAL.
> 
> The example for io_sendmsg() would look like this:
> 
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -4383,7 +4383,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>         struct io_async_msghdr iomsg, *kmsg;
>         struct socket *sock;
>         unsigned flags;
> -       int expected_ret;
> +       int min_ret = 0;
>         int ret;
> 
>         sock = sock_from_file(req->file);
> @@ -4404,9 +4404,11 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>         else if (issue_flags & IO_URING_F_NONBLOCK)
>                 flags |= MSG_DONTWAIT;
> 
> -       expected_ret = iov_iter_count(&kmsg->msg.msg_iter);
> -       if (unlikely(expected_ret == MAX_RW_COUNT))
> -               expected_ret += 1;
> +       if (flags & MSG_WAITALL) {
> +               min_ret = iov_iter_count(&kmsg->msg.msg_iter);
> +               if (unlikely(min_ret == MAX_RW_COUNT))
> +                       min_ret += 1;
> +       }
>         ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
>         if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
>                 return io_setup_async_msg(req, kmsg);
> @@ -4417,7 +4419,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>         if (kmsg->free_iov)
>                 kfree(kmsg->free_iov);
>         req->flags &= ~REQ_F_NEED_CLEANUP;
> -       if (ret != expected_ret)
> +       if (ret < min_ret)
>                 req_set_fail_links(req);
>         __io_req_complete(req, issue_flags, ret, 0);
>         return 0;
> 
> Which means the default of min_ret = 0 would result in:
> 
>         if (ret < 0)
>                 req_set_fail_links(req);
> 
> again...
> 
>>>> Sounds like 2/2 might too, does it?
>>>
>>> Do you think any application really expects to get a SIGPIPE
>>> when calling io_uring_enter()?
>>
>> If it was about what I think I would remove lots of old garbage :)
>> I doubt it wasn't working well before, e.g. because of iowq, but
>> who knows
> 
> Yes, it was inconsistent before and now it's reliable.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-18  0:15         ` [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Stefan Metzmacher
  2021-03-18 13:00           ` Pavel Begunkov
@ 2021-03-18 13:08           ` Pavel Begunkov
  1 sibling, 0 replies; 3+ messages in thread
From: Pavel Begunkov @ 2021-03-18 13:08 UTC (permalink / raw
  To: Stefan Metzmacher, Jens Axboe, io-uring; +Cc: netdev@vger.kernel.org

On 18/03/2021 00:15, Stefan Metzmacher wrote:
>>>> Sounds like 2/2 might too, does it?
>>>
>>> Do you think any application really expects to get a SIGPIPE
>>> when calling io_uring_enter()?
>>
>> If it was about what I think I would remove lots of old garbage :)
>> I doubt it wasn't working well before, e.g. because of iowq, but
>> who knows
> 
> Yes, it was inconsistent before and now it's reliable.

Yep, that where my hesitation was coming from, but the case I had
in mind is 

1) send() -> gone to io-wq
2) close the other end
3) send() fails, probably without SIGPIPE (because io-wq)
4) userspace retries send() and inline execution delivers SIGPIPE

But I guess we don't really care. In any case, let's drop stable tag,
maybe? I don't see a reason for it, considering that stable tries hard
to preserve ABI.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-03-18 13:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1615908477.git.metze@samba.org>
     [not found] ` <47ae1117-0de3-47a9-26a2-80f92e242426@kernel.dk>
     [not found]   ` <b2f00537-a8a3-9243-6990-d6708e7f7691@gmail.com>
     [not found]     ` <e15f23a2-4efc-c12a-9a4f-b4e3c347ae63@samba.org>
     [not found]       ` <c5b26bbe-7d95-ec86-5ddb-c2bd2b6c79a7@gmail.com>
2021-03-18  0:15         ` [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Stefan Metzmacher
2021-03-18 13:00           ` Pavel Begunkov
2021-03-18 13:08           ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).