From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755133AbbINKPM (ORCPT ); Mon, 14 Sep 2015 06:15:12 -0400 Received: from mail-wi0-f171.google.com ([209.85.212.171]:36803 "EHLO mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751945AbbINKPI convert rfc822-to-8bit (ORCPT ); Mon, 14 Sep 2015 06:15:08 -0400 MIME-Version: 1.0 In-Reply-To: <1828884A29C6694DAF28B7E6B8A82373A903A586@ORSMSX109.amr.corp.intel.com> References: <1441658303-18081-1-git-send-email-pandit.parav@gmail.com> <20150908152340.GA13749@mtj.duckdns.org> <20150910164946.GH8114@mtj.duckdns.org> <20150910202210.GL8114@mtj.duckdns.org> <20150911040413.GA18850@htj.duckdns.org> <55F25781.20308@redhat.com> <20150911145213.GQ8114@mtj.duckdns.org> <1828884A29C6694DAF28B7E6B8A82373A903A586@ORSMSX109.amr.corp.intel.com> Date: Mon, 14 Sep 2015 15:45:05 +0530 Message-ID: Subject: Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource From: Parav Pandit To: "Hefty, Sean" Cc: Tejun Heo , Doug Ledford , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-rdma@vger.kernel.org" , "lizefan@huawei.com" , Johannes Weiner , Jonathan Corbet , "james.l.morris@oracle.com" , "serge@hallyn.com" , Haggai Eran , Or Gerlitz , Matan Barak , "raindel@mellanox.com" , "akpm@linux-foundation.org" , "linux-security-module@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Sep 12, 2015 at 12:52 AM, Hefty, Sean wrote: >> So, the existence of resource limitations is fine. That's what we >> deal with all the time. The problem usually with this sort of >> interfaces which expose implementation details to users directly is >> that it severely limits engineering manuevering space. You usually >> want your users to express their intentions and a mechanism to >> arbitrate resources to satisfy those intentions (and in a way more >> graceful than "we can't, maybe try later?"); otherwise, implementing >> any sort of high level resource distribution scheme becomes painful >> and usually the only thing possible is preventing runaway disasters - >> you don't wanna pin unused resource permanently if there actually is >> contention around it, so usually all you can do with hard limits is >> overcommiting limits so that it at least prevents disasters. > > I agree with Tejun that this proposal is at the wrong level of abstraction. > > If you look at just trying to limit QPs, it's not clear what that attempts to accomplish. Conceptually, a QP is little more than an addressable endpoint. It may or may not map to HW resources (for Intel NICs it does not). Even when HW resources do back the QP, the hardware is limited by how many QPs can realistically be active at any one time, based on how much caching is available in the NIC. > cgroups as it stands today provides resource controls in effective manner of existing defined resource, such as cpu cycles, memory in user and kernel space, tcp bytes, IOPS etc. Similarly RDMA programming model defines its own set of resources which is used by applications which accesses those resources directly. What we are debating here is that, RDMA exposing hardware resources is not correct, and therefore whether a cgroup controller is needed or not. There are two points here. 1. Whether RDMA programming model is correct or not which works on defined resources of IB spec. 2. Assuming that programming model is fine, (because we have actively maintained IB stack in kernel and adoption of user space components in OS), whether we need to control those resources or not via cgroup. Tejun trying to say that because point_1 is doesn't seem to be right way to solve problem, point_2 should not be done or done at different level of abstraction. More questions/comments in Jason and Sean thread. Sean, Even though there is no one to one map of verb-QP to hw-QP, in order for driver or lower layer to effectively map the right verb-QP to hw-QP, such vendor specific layer needs to know how is it going to be used. Otherwise two contending applications for a QP may not get the right number of hw-QPs to use. > Trying to limit the number of QPs that an app can allocate, therefore, just limits how much of the address space an app can use. There's no clear link between QP limits and HW resource limits, unless you assume a very specific underlying implementation. > > - Sean