Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Parav Pandit <pandit.parav@gmail.com>
To: "Hefty, Sean" <sean.hefty@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>,
	Tejun Heo <tj@kernel.org>, Doug Ledford <dledford@redhat.com>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"lizefan@huawei.com" <lizefan@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"james.l.morris@oracle.com" <james.l.morris@oracle.com>,
	"serge@hallyn.com" <serge@hallyn.com>,
	Haggai Eran <haggaie@mellanox.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Matan Barak <matanb@mellanox.com>,
	"raindel@mellanox.com" <raindel@mellanox.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-security-module@vger.kernel.org" 
	<linux-security-module@vger.kernel.org>
Subject: Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource
Date: Mon, 14 Sep 2015 19:34:09 +0530	[thread overview]
Message-ID: <CAG53R5U7sYnR2w+Wrhh58Ud1HOrKLDCYxZZgK58FyAkJ8exshw@mail.gmail.com> (raw)
In-Reply-To: <CAG53R5XsMwnLK7L4q1mQx3_wEJNv1qthOr5TsX0o43kRWaiWrg@mail.gmail.com>

Hi Tejun,

I missed to acknowledge your point that we need both - hard limit and
soft limit/weight. Current patchset is only based on hard limit.
I see that weight would be another helfpul layer in chain that we can
implement after this as incremental that makes review, debugging
manageable?

Parav



On Mon, Sep 14, 2015 at 4:39 PM, Parav Pandit <pandit.parav@gmail.com> wrote:
> On Sat, Sep 12, 2015 at 1:36 AM, Hefty, Sean <sean.hefty@intel.com> wrote:
>>> > Trying to limit the number of QPs that an app can allocate,
>>> > therefore, just limits how much of the address space an app can use.
>>> > There's no clear link between QP limits and HW resource limits,
>>> > unless you assume a very specific underlying implementation.
>>>
>>> Isn't that the point though? We have several vendors with hardware
>>> that does impose hard limits on specific resources. There is no way to
>>> avoid that, and ultimately, those exact HW resources need to be
>>> limited.
>>
>> My point is that limiting the number of QPs that an app can allocate doesn't necessarily mean anything.  Is allocating 1000 QPs with 1 entry each better or worse than 1 QP with 10,000 entries?  Who knows?
>
> I think it means if its RDMA RC QP, than whether you can talk to 1000
> nodes or 1 node in network.
> When we deploy MPI application, it know the rank of the application,
> we know the cluster size of the deployment and based on that resource
> allocation can be done.
> If you meant to say from performance point of view, than resource
> count is possibly not the right measure.
>
> Just because we have not defined those interface for performance today
> in this patch set, doesn't mean that we won't do it.
> I could easily see a number_of_messages/sec as one interface to be
> added in future.
> But that won't stop process hoarders to stop taking away all the QPs,
> just the way we needed PID controller.
>
> Now when it comes to Intel implementation, if it driver layer knows
> (in future we new APIs) that whether 10 or 100 user QPs should map to
> few hw-QPs or more hw-QPs (uSNIC).
> so that hw-QP exposed to one cgroup is isolated from hw-QP exposed to
> other cgroup.
> If hw- implementation doesn't require isolation, it could just
> continue from single pool, its left to the vendor implementation on
> how to use this information (this API is not present in the patch).
>
> So cgroup can also provides a control point for vendor layer to tune
> internal resource allocation based on provided matrix, which cannot be
> done by just providing "memory usage by RDMA structures".
>
> If I have to compare it with other cgroup knobs, low level individual
> knobs by itself, doesn't serve any meaningful purpose either.
> Just by defined how much CPU to use or how much memory to use, it
> cannot define the application performance either.
> I am not sure, whether iocontroller can achieve 10 million IOPs by
> defining single CPU and 64KB of memory.
> all the knobs needs to be set in right way to reach desired number.
>
> In similar line RDMA resource knobs as individual knobs are not
> definition of performance, its just another knob.
>
>>
>>> If we want to talk about abstraction, then I'd suggest something very
>>> general and simple - two limits:
>>>  '% of the RDMA hardware resource pool' (per device or per ep?)
>>>  'bytes of kernel memory for RDMA structures' (all devices)
>>
>> Yes - this makes more sense to me.
>>
>
> Sean, Jason,
> Help me to understand this scheme.
>
> 1. How does the % of resource, is different than absolute number? With
> rest of the cgroups systems we define absolute number at most places
> to my knowledge.
> Such as (a) number_of_tcp_bytes, (b) IOPs of block device, (c) cpu cycles etc.
> 20% of QP = 20 QPs when 100 QPs are with hw.
> I prefer to keep the resource scheme consistent with other resource
> control points - i.e. absolute number.
>
> 2. bytes of  kernel memory for RDMA structures
> One QP of one vendor might consume X bytes and other Y bytes. How does
> the application knows how much memory to give.
> application can allocate 100 QP of each 1 entry deep or 1 QP of 100
> entries deep as in Sean's example.
> Both might consume almost same memory.
> Application doing 100 QP allocation, still within limit of memory of
> cgroup leaves other applications without any QP.
> I don't see a point of memory footprint based scheme, as memory limits
> are well addressed by more smarter memory controller anyway.
>
> I do agree with Tejun, Sean on the point that abstraction level has to
> be different for using RDMA and thats why libfabrics and other
> interfaces are emerging which will take its own time to get stabilize,
> integrated.
>
> Until pure IB style RDMA programming model exist - based on RDMA
> resource based scheme, I think control point also has to be on
> resources.
> Once a stable abstraction level is on table (possibly across fabric
> not just RDMA), than a right resource controller can be implemented.
> Even when RDMA abstraction layer arrives, as Jason mentioned, at the
> end it would consume some hw resource anyway, that needs to be
> controlled too.
>
> Jason,
> If the hardware vendor defines the resource pool without saying its
> resource QP or MR, how would actually management/control point can
> decide what should be controlled to what limit?
> We will need additional user space library component to decode than,
> after that it needs to be abstracted out as QP or MR so that it can be
> deal in vendor agnostic way as application layer.
> and than it would look similar to what is being proposed here?

next prev parent reply	other threads:[~2015-09-14 14:04 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-07 20:38 [PATCH 0/7] devcg: device cgroup extension for rdma resource Parav Pandit
2015-09-07 20:38 ` Parav Pandit
2015-09-07 20:38 ` [PATCH 1/7] devcg: Added user option to rdma resource tracking Parav Pandit
2015-09-07 20:38   ` Parav Pandit
2015-09-07 20:38 ` [PATCH 2/7] devcg: Added rdma resource tracking module Parav Pandit
2015-09-07 20:38   ` Parav Pandit
2015-09-07 20:38 ` [PATCH 3/7] devcg: Added infrastructure for rdma device cgroup Parav Pandit
2015-09-08  5:31   ` Haggai Eran
2015-09-08  5:31     ` Haggai Eran
2015-09-08  7:02     ` Parav Pandit
2015-09-08  7:02       ` Parav Pandit
2015-09-07 20:38 ` [PATCH 4/7] devcg: Added rdma resource tracker object per task Parav Pandit
2015-09-08  5:48   ` Haggai Eran
2015-09-08  5:48     ` Haggai Eran
2015-09-08  7:04     ` Parav Pandit
2015-09-08  8:24       ` Haggai Eran
2015-09-08  8:24         ` Haggai Eran
2015-09-08  8:26         ` Parav Pandit
2015-09-07 20:38 ` [PATCH 5/7] devcg: device cgroup's extension for RDMA resource Parav Pandit
2015-09-07 20:38   ` Parav Pandit
2015-09-08  8:22   ` Haggai Eran
2015-09-08  8:22     ` Haggai Eran
2015-09-08 10:18     ` Parav Pandit
2015-09-08 13:50       ` Haggai Eran
2015-09-08 13:50         ` Haggai Eran
2015-09-08 14:13         ` Parav Pandit
2015-09-08  8:36   ` Haggai Eran
2015-09-08  8:36     ` Haggai Eran
2015-09-08 10:50     ` Parav Pandit
2015-09-08 10:50       ` Parav Pandit
2015-09-08 14:10       ` Haggai Eran
2015-09-08 14:10         ` Haggai Eran
2015-09-07 20:38 ` [PATCH 6/7] devcg: Added support to use RDMA device cgroup Parav Pandit
2015-09-08  8:40   ` Haggai Eran
2015-09-08  8:40     ` Haggai Eran
2015-09-08 10:22     ` Parav Pandit
2015-09-08 13:40       ` Haggai Eran
2015-09-08 13:40         ` Haggai Eran
2015-09-07 20:38 ` [PATCH 7/7] devcg: Added Documentation of " Parav Pandit
2015-09-07 20:38   ` Parav Pandit
2015-09-07 20:55 ` [PATCH 0/7] devcg: device cgroup extension for rdma resource Parav Pandit
2015-09-08 12:45 ` Haggai Eran
2015-09-08 12:45   ` Haggai Eran
2015-09-08 15:23 ` Tejun Heo
2015-09-08 15:23   ` Tejun Heo
2015-09-09  3:57   ` Parav Pandit
2015-09-10 16:49     ` Tejun Heo
2015-09-10 17:46       ` Parav Pandit
2015-09-10 17:46         ` Parav Pandit
2015-09-10 20:22         ` Tejun Heo
2015-09-11  3:39           ` Parav Pandit
2015-09-11  4:04             ` Tejun Heo
2015-09-11  4:04               ` Tejun Heo
2015-09-11  4:24               ` Doug Ledford
2015-09-11  4:24                 ` Doug Ledford
2015-09-11 14:52                 ` Tejun Heo
2015-09-11 14:52                   ` Tejun Heo
2015-09-11 16:26                   ` Parav Pandit
2015-09-11 16:34                     ` Tejun Heo
2015-09-11 16:34                       ` Tejun Heo
2015-09-11 16:39                       ` Parav Pandit
2015-09-11 16:39                         ` Parav Pandit
2015-09-11 19:25                         ` Tejun Heo
2015-09-14 10:18                           ` Parav Pandit
2015-09-14 10:18                             ` Parav Pandit
2015-09-11 16:47                   ` Parav Pandit
2015-09-11 16:47                     ` Parav Pandit
2015-09-11 19:05                     ` Tejun Heo
2015-09-11 19:05                       ` Tejun Heo
2015-09-11 19:22                   ` Hefty, Sean
2015-09-11 19:43                     ` Jason Gunthorpe
2015-09-11 19:43                       ` Jason Gunthorpe
2015-09-11 20:06                       ` Hefty, Sean
2015-09-14 11:09                         ` Parav Pandit
2015-09-14 14:04                           ` Parav Pandit [this message]
2015-09-14 15:21                             ` Tejun Heo
2015-09-14 15:21                               ` Tejun Heo
2015-09-14 17:28                           ` Jason Gunthorpe
2015-09-14 17:28                             ` Jason Gunthorpe
2015-09-14 18:54                             ` Parav Pandit
2015-09-14 18:54                               ` Parav Pandit
2015-09-14 20:18                               ` Jason Gunthorpe
2015-09-15  3:08                                 ` Parav Pandit
2015-09-15  3:45                                   ` Jason Gunthorpe
2015-09-15  3:45                                     ` Jason Gunthorpe
2015-09-16  4:41                                     ` Parav Pandit
2015-09-16  4:41                                       ` Parav Pandit
2015-09-20 10:35                                     ` Haggai Eran
2015-09-20 10:35                                       ` Haggai Eran
2015-10-28  8:14                                       ` Parav Pandit
2015-10-28  8:14                                         ` Parav Pandit
2015-09-14 10:15                     ` Parav Pandit
2015-09-11  4:43               ` Parav Pandit
2015-09-11 15:03                 ` Tejun Heo
2015-09-10 17:48       ` Hefty, Sean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG53R5U7sYnR2w+Wrhh58Ud1HOrKLDCYxZZgK58FyAkJ8exshw@mail.gmail.com \
    --to=pandit.parav@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=dledford@redhat.com \
    --cc=haggaie@mellanox.com \
    --cc=hannes@cmpxchg.org \
    --cc=james.l.morris@oracle.com \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=matanb@mellanox.com \
    --cc=ogerlitz@mellanox.com \
    --cc=raindel@mellanox.com \
    --cc=sean.hefty@intel.com \
    --cc=serge@hallyn.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.