From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752324AbbBRDPz (ORCPT ); Tue, 17 Feb 2015 22:15:55 -0500 Received: from prod-mail-xrelay02.akamai.com ([72.246.2.14]:34484 "EHLO prod-mail-xrelay02.akamai.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751847AbbBRDPx (ORCPT ); Tue, 17 Feb 2015 22:15:53 -0500 Message-ID: <54E403E7.2060209@akamai.com> Date: Tue, 17 Feb 2015 22:15:51 -0500 From: Jason Baron User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Andy Lutomirski CC: Peter Zijlstra , Ingo Molnar , Al Viro , Andrew Morton , Eric Wong , Davide Libenzi , Michael Kerrisk-manpages , "linux-kernel@vger.kernel.org" , Linux FS Devel , Linux API , Linus Torvalds , Mathieu Desnoyers , edumazet@google.com Subject: Re: [PATCH v2 0/2] Add epoll round robin wakeup mode References: <54E3A591.2050806@akamai.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/17/2015 04:09 PM, Andy Lutomirski wrote: > On Tue, Feb 17, 2015 at 12:33 PM, Jason Baron wrote: >> On 02/17/2015 02:46 PM, Andy Lutomirski wrote: >>> On Tue, Feb 17, 2015 at 11:33 AM, Jason Baron wrote: >>>> When we are sharing a wakeup source among multiple epoll fds, we end up with >>>> thundering herd wakeups, since there is currently no way to add to the >>>> wakeup source exclusively. This series introduces 2 new epoll flags, >>>> EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And EPOLLROUNDROBIN >>>> which is to be used in conjunction to EPOLLEXCLUSIVE to evenly >>>> distribute the wakeups. This patch was originally motivated by a desire to >>>> improve wakeup balance and cpu usage for a listen socket() shared amongst >>>> multiple epoll fd sets. >>>> >>>> See: http://lwn.net/Articles/632590/ for previous test program and testing >>>> resutls. >>>> >>>> Epoll manpage text: >>>> >>>> EPOLLEXCLUSIVE >>>> Provides exclusive wakeups when attaching multiple epoll fds to a >>>> shared wakeup source. Must be specified with an EPOLL_CTL_ADD operation. >>>> >>>> EPOLLROUNDROBIN >>>> Provides balancing for exclusive wakeups when attaching multiple epoll >>>> fds to a shared wakeup soruce. Depends on EPOLLEXCLUSIVE being set and >>>> must be specified with an EPOLL_CTL_ADD operation. >>>> >>>> Thanks, >>> What permissions do you need on the file descriptor to do this? This >>> will be the first case where a poll-like operation has side effects, >>> and that's rather weird IMO. >>> >> So in the case where you have both non-exclusive and exclusive >> waiters, all of the non-exclusive waiters will continue to get woken >> up. However, I think you're getting at having multiple exclusive >> waiters and potentially 'starving' out other exclusive waiters. >> >> In general, I think wait queues are associated with a 'struct file', >> so I think unless you are sharing your fd table, this isn't an issue. >> However, there may be cases where this is not true? In which >> case, perhaps, we could limit this to CAP_SYS_ADMIN... > There's also SCM_RIGHTS, which can be used in conjunction with file > sealing and such. > > In general, I feel like this patch series solves a problem that isn't > well understood and does it by adding a rather strange new mechanism. > Is there really a problem that can't be addressed by more normal epoll > features? > > --Andy hmm....so I dug through some of the Linux archives a bit and this problem seems to crop up every so often without resolution. So I do believe that its an issue that ppl are more generally interested in. See: http://lkml.iu.edu/hypermail/linux/kernel/1201.1/02620.html http://marc.info/?l=linux-kernel&m=128638781921073&w=2 In the latter thread, Linus suggests adding it to the "requested events" field to poll: http://marc.info/?l=linux-kernel&m=128639416832335&w=2 So, I think that this series at least moves in that suggested direction. Thanks, -Jason