From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752534AbbIODqR (ORCPT ); Mon, 14 Sep 2015 23:46:17 -0400 Received: from quartz.orcorp.ca ([184.70.90.242]:44228 "EHLO quartz.orcorp.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751633AbbIODqO (ORCPT ); Mon, 14 Sep 2015 23:46:14 -0400 Date: Mon, 14 Sep 2015 21:45:49 -0600 From: Jason Gunthorpe To: Parav Pandit Cc: "Hefty, Sean" , Tejun Heo , Doug Ledford , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-rdma@vger.kernel.org" , "lizefan@huawei.com" , Johannes Weiner , Jonathan Corbet , "james.l.morris@oracle.com" , "serge@hallyn.com" , Haggai Eran , Or Gerlitz , Matan Barak , "raindel@mellanox.com" , "akpm@linux-foundation.org" , "linux-security-module@vger.kernel.org" Subject: Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource Message-ID: <20150915034549.GA27847@obsidianresearch.com> References: <55F25781.20308@redhat.com> <20150911145213.GQ8114@mtj.duckdns.org> <1828884A29C6694DAF28B7E6B8A82373A903A586@ORSMSX109.amr.corp.intel.com> <20150911194311.GA18755@obsidianresearch.com> <1828884A29C6694DAF28B7E6B8A82373A903A5DB@ORSMSX109.amr.corp.intel.com> <20150914172832.GA21652@obsidianresearch.com> <20150914201840.GA8764@obsidianresearch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 15, 2015 at 08:38:54AM +0530, Parav Pandit wrote: > As you precisely described, about wild ratio, > we are asking vendor driver (bottom most layer) to statically define > what the resource pool is, without telling him which application are > we going to run to use those pool. > Therefore vendor layer cannot ever define "right" resource pool. No, I'm saying the resource pool is *well defined* and *fixed* by each hardware. The only question is how do we expose the N resource limits, the list of which is totally vendor specific. Yes, using a % scheme fixes the ratios, 1% is going to be a certain number of PD's, QP's, MRs, CQ's, etc at a ratio fixed by the driver configuration. That is the trade off for API simplicity. Yes, this results in some resources being over provisioned. I have no idea if that is usable for the workloads people want to run.. But *there is no middle option*. Either each and every single hardware limited resources has a dedicated per-container limit, or they are *somehow* bundled and the ratios become fixed. If Tejun says we can't have something so emphemeral as a vendor specific list of hardware resource pools - then what choice is left? > Instead of bringing such complex solution, that affecting all the > layers which solves the same problem as this patch, > its better to keep definition of "bundle" in the user > library/application deployment engine. > where bundle is set of those resources. The kernel has to do the restriction, so at some point you are telling the kernel to limit each and every unique resource the HW has, which is back to the original patch set, munging how the data is passed makes no difference to the basic objection, IMHO. > rdma cgroup will allow us to run post 512 or 1024 containers without > using PCIe SR-IOV, without creating any vendor specific resource > pools. If you ignore any vendor specific resource limits then you've just left open a hole, a wayward container can exhaust all others - so what was the point of doing all this work? Jason From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Gunthorpe Subject: Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource Date: Mon, 14 Sep 2015 21:45:49 -0600 Message-ID: <20150915034549.GA27847@obsidianresearch.com> References: <55F25781.20308@redhat.com> <20150911145213.GQ8114@mtj.duckdns.org> <1828884A29C6694DAF28B7E6B8A82373A903A586@ORSMSX109.amr.corp.intel.com> <20150911194311.GA18755@obsidianresearch.com> <1828884A29C6694DAF28B7E6B8A82373A903A5DB@ORSMSX109.amr.corp.intel.com> <20150914172832.GA21652@obsidianresearch.com> <20150914201840.GA8764@obsidianresearch.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Parav Pandit Cc: "Hefty, Sean" , Tejun Heo , Doug Ledford , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org" , Johannes Weiner , Jonathan Corbet , "james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org" , "serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org" , Haggai Eran , Or Gerlitz , Matan Barak , "raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org" , "akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org" , "linux-security-module-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" List-Id: linux-rdma@vger.kernel.org On Tue, Sep 15, 2015 at 08:38:54AM +0530, Parav Pandit wrote: > As you precisely described, about wild ratio, > we are asking vendor driver (bottom most layer) to statically define > what the resource pool is, without telling him which application are > we going to run to use those pool. > Therefore vendor layer cannot ever define "right" resource pool. No, I'm saying the resource pool is *well defined* and *fixed* by each hardware. The only question is how do we expose the N resource limits, the list of which is totally vendor specific. Yes, using a % scheme fixes the ratios, 1% is going to be a certain number of PD's, QP's, MRs, CQ's, etc at a ratio fixed by the driver configuration. That is the trade off for API simplicity. Yes, this results in some resources being over provisioned. I have no idea if that is usable for the workloads people want to run.. But *there is no middle option*. Either each and every single hardware limited resources has a dedicated per-container limit, or they are *somehow* bundled and the ratios become fixed. If Tejun says we can't have something so emphemeral as a vendor specific list of hardware resource pools - then what choice is left? > Instead of bringing such complex solution, that affecting all the > layers which solves the same problem as this patch, > its better to keep definition of "bundle" in the user > library/application deployment engine. > where bundle is set of those resources. The kernel has to do the restriction, so at some point you are telling the kernel to limit each and every unique resource the HW has, which is back to the original patch set, munging how the data is passed makes no difference to the basic objection, IMHO. > rdma cgroup will allow us to run post 512 or 1024 containers without > using PCIe SR-IOV, without creating any vendor specific resource > pools. If you ignore any vendor specific resource limits then you've just left open a hole, a wayward container can exhaust all others - so what was the point of doing all this work? Jason