From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752600AbbIODJC (ORCPT ); Mon, 14 Sep 2015 23:09:02 -0400 Received: from mail-wi0-f173.google.com ([209.85.212.173]:34443 "EHLO mail-wi0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752457AbbIODI4 (ORCPT ); Mon, 14 Sep 2015 23:08:56 -0400 MIME-Version: 1.0 In-Reply-To: <20150914201840.GA8764@obsidianresearch.com> References: <20150911040413.GA18850@htj.duckdns.org> <55F25781.20308@redhat.com> <20150911145213.GQ8114@mtj.duckdns.org> <1828884A29C6694DAF28B7E6B8A82373A903A586@ORSMSX109.amr.corp.intel.com> <20150911194311.GA18755@obsidianresearch.com> <1828884A29C6694DAF28B7E6B8A82373A903A5DB@ORSMSX109.amr.corp.intel.com> <20150914172832.GA21652@obsidianresearch.com> <20150914201840.GA8764@obsidianresearch.com> Date: Tue, 15 Sep 2015 08:38:54 +0530 Message-ID: Subject: Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource From: Parav Pandit To: Jason Gunthorpe Cc: "Hefty, Sean" , Tejun Heo , Doug Ledford , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-rdma@vger.kernel.org" , "lizefan@huawei.com" , Johannes Weiner , Jonathan Corbet , "james.l.morris@oracle.com" , "serge@hallyn.com" , Haggai Eran , Or Gerlitz , Matan Barak , "raindel@mellanox.com" , "akpm@linux-foundation.org" , "linux-security-module@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Because actual hardware resources *ARE* the limit. We cannot abstract > it away. The hardware/driver has real, fixed, immutable limits. No API > abstraction can possibly change that. > > The limits are such there *IS NO* API boundary that can bundle them > into something simpler. There will always be apps that require wildly > different ratios of the basic verbs resources (PD/QP/CQ/AH/MR) > > Either we control each and every vendor's limited resource directly > (which is where you started), or we just roll them up into a 'all > resource' bundle and control them indirectly. There just isn't a > mythical third 'better API' choice with the hardware we have today. > As you precisely described, about wild ratio, we are asking vendor driver (bottom most layer) to statically define what the resource pool is, without telling him which application are we going to run to use those pool. Therefore vendor layer cannot ever define "right" resource pool. If we try to fix defining "right" resource pool, we will have to come up with API to modify/tune individual element of the pool. Once we bring that complexity, it becomes what is proposed in this pachset. Instead of bringing such complex solution, that affecting all the layers which solves the same problem as this patch, its better to keep definition of "bundle" in the user library/application deployment engine. where bundle is set of those resources. May be instead of having invidividual files for each resource, at user interface level, we can have rdma.bundle file. this bundle cgroup file defines these resources such as "ah 100 mr 100 qp 10" > So? I don't think it is really important to have an exact, precise, > limit. The HW pools are pretty big, unless you plan to run tens of > thousands of containers eacg with tiny RDMA limits, it is fine to talk > in broader terms (ie 10% of all HW limited resource) which is totally > adaquate to hard-prevent run away or exhaustion scenarios. > rdma cgroup will allow us to run post 512 or 1024 containers without using PCIe SR-IOV, without creating any vendor specific resource pools. > Jason