From: Haggai Eran <haggaie@mellanox.com> To: Parav Pandit <pandit.parav@gmail.com> Cc: <cgroups@vger.kernel.org>, <linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linux-rdma@vger.kernel.org>, <tj@kernel.org>, <lizefan@huawei.com>, Johannes Weiner <hannes@cmpxchg.org>, Doug Ledford <dledford@redhat.com>, Jonathan Corbet <corbet@lwn.net>, <james.l.morris@oracle.com>, <serge@hallyn.com>, Or Gerlitz <ogerlitz@mellanox.com>, Matan Barak <matanb@mellanox.com>, <raindel@mellanox.com>, <akpm@linux-foundation.org>, <linux-security-module@vger.kernel.org> Subject: Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource. Date: Tue, 8 Sep 2015 16:50:11 +0300 [thread overview] Message-ID: <55EEE793.9020105@mellanox.com> (raw) In-Reply-To: <CAG53R5V9ydty2--amXFFDiDd01gGYE0Oh5jV0OhysQmfhWHDPQ@mail.gmail.com> On 08/09/2015 13:18, Parav Pandit wrote: >> > >>> >> + * RDMA resource limits are hierarchical, so the highest configured limit of >>> >> + * the hierarchy is enforced. Allowing resource limit configuration to default >>> >> + * cgroup allows fair share to kernel space ULPs as well. >> > In what way is the highest configured limit of the hierarchy enforced? I >> > would expect all the limits along the hierarchy to be enforced. >> > > In hierarchy, of say 3 cgroups, the smallest limit of the cgroup is applied. > > Lets take example to clarify. > Say cg_A, cg_B, cg_C > Role name limit > Parent cg_A 100 > Child_level1 cg_B (child of cg_A) 20 > Child_level2: cg_C (child of cg_B) 50 > > If the process allocating rdma resource belongs to cg_C, limit lowest > limit in the hierarchy is applied during charge() stage. > If cg_A limit happens to be 10, since 10 is lowest, its limit would be > applicable as you expected. Looking at the code, the usage in every level is charged. This is what I would expect. I just think the comment is a bit misleading. >>> +int devcgroup_rdma_get_max_resource(struct seq_file *sf, void *v) >>> +{ >>> + struct dev_cgroup *dev_cg = css_to_devcgroup(seq_css(sf)); >>> + int type = seq_cft(sf)->private; >>> + u32 usage; >>> + >>> + if (dev_cg->rdma.tracker[type].limit == DEVCG_RDMA_MAX_RESOURCES) { >>> + seq_printf(sf, "%s\n", DEVCG_RDMA_MAX_RESOURCE_STR); >> I'm not sure hiding the actual number is good, especially in the >> show_usage case. > > This is similar to following other controller same as newly added PID > subsystem in showing max limit. Okay. >>> +void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext, >>> + enum devcgroup_rdma_rt type, int num) >>> +{ >>> + struct dev_cgroup *dev_cg, *p; >>> + struct task_struct *ctx_task; >>> + >>> + if (!num) >>> + return; >>> + >>> + /* get cgroup of ib_ucontext it belong to, to uncharge >>> + * so that when its called from any worker tasks or any >>> + * other tasks to which this resource doesn't belong to, >>> + * it can be uncharged correctly. >>> + */ >>> + if (ucontext) >>> + ctx_task = get_pid_task(ucontext->tgid, PIDTYPE_PID); >>> + else >>> + ctx_task = current; >>> + dev_cg = task_devcgroup(ctx_task); >>> + >>> + spin_lock(&ctx_task->rdma_res_counter->lock); >> Don't you need an rcu read lock and rcu_dereference to access >> rdma_res_counter? > > I believe, its not required because when uncharge() is happening, it > can happen only from 3 contexts. > (a) from the caller task context, who has made allocation call, so no > synchronizing needed. > (b) from the dealloc resource context, again this is from the same > task context which allocated, it so this is single threaded, no need > to syncronize. I don't think it is true. You can access uverbs from multiple threads. What may help your case here I think is the fact that only when the last ucontext is released you can change the rdma_res_counter field, and ucontext release takes the ib_uverbs_file->mutex. Still, I think it would be best to use rcu_dereference(), if only for documentation and sparse. > (c) from the fput() context when process is terminated abruptly or as > part of differed cleanup, when this is happening there cannot be > allocator task anyway.
WARNING: multiple messages have this Message-ID (diff)
From: Haggai Eran <haggaie@mellanox.com> To: Parav Pandit <pandit.parav@gmail.com> Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, tj@kernel.org, lizefan@huawei.com, Johannes Weiner <hannes@cmpxchg.org>, Doug Ledford <dledford@redhat.com>, Jonathan Corbet <corbet@lwn.net>, james.l.morris@oracle.com, serge@hallyn.com, Or Gerlitz <ogerlitz@mellanox.com>, Matan Barak <matanb@mellanox.com>, raindel@mellanox.com, akpm@linux-foundation.org, linux-security-module@vger.kernel.org Subject: Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource. Date: Tue, 8 Sep 2015 16:50:11 +0300 [thread overview] Message-ID: <55EEE793.9020105@mellanox.com> (raw) In-Reply-To: <CAG53R5V9ydty2--amXFFDiDd01gGYE0Oh5jV0OhysQmfhWHDPQ@mail.gmail.com> On 08/09/2015 13:18, Parav Pandit wrote: >> > >>> >> + * RDMA resource limits are hierarchical, so the highest configured limit of >>> >> + * the hierarchy is enforced. Allowing resource limit configuration to default >>> >> + * cgroup allows fair share to kernel space ULPs as well. >> > In what way is the highest configured limit of the hierarchy enforced? I >> > would expect all the limits along the hierarchy to be enforced. >> > > In hierarchy, of say 3 cgroups, the smallest limit of the cgroup is applied. > > Lets take example to clarify. > Say cg_A, cg_B, cg_C > Role name limit > Parent cg_A 100 > Child_level1 cg_B (child of cg_A) 20 > Child_level2: cg_C (child of cg_B) 50 > > If the process allocating rdma resource belongs to cg_C, limit lowest > limit in the hierarchy is applied during charge() stage. > If cg_A limit happens to be 10, since 10 is lowest, its limit would be > applicable as you expected. Looking at the code, the usage in every level is charged. This is what I would expect. I just think the comment is a bit misleading. >>> +int devcgroup_rdma_get_max_resource(struct seq_file *sf, void *v) >>> +{ >>> + struct dev_cgroup *dev_cg = css_to_devcgroup(seq_css(sf)); >>> + int type = seq_cft(sf)->private; >>> + u32 usage; >>> + >>> + if (dev_cg->rdma.tracker[type].limit == DEVCG_RDMA_MAX_RESOURCES) { >>> + seq_printf(sf, "%s\n", DEVCG_RDMA_MAX_RESOURCE_STR); >> I'm not sure hiding the actual number is good, especially in the >> show_usage case. > > This is similar to following other controller same as newly added PID > subsystem in showing max limit. Okay. >>> +void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext, >>> + enum devcgroup_rdma_rt type, int num) >>> +{ >>> + struct dev_cgroup *dev_cg, *p; >>> + struct task_struct *ctx_task; >>> + >>> + if (!num) >>> + return; >>> + >>> + /* get cgroup of ib_ucontext it belong to, to uncharge >>> + * so that when its called from any worker tasks or any >>> + * other tasks to which this resource doesn't belong to, >>> + * it can be uncharged correctly. >>> + */ >>> + if (ucontext) >>> + ctx_task = get_pid_task(ucontext->tgid, PIDTYPE_PID); >>> + else >>> + ctx_task = current; >>> + dev_cg = task_devcgroup(ctx_task); >>> + >>> + spin_lock(&ctx_task->rdma_res_counter->lock); >> Don't you need an rcu read lock and rcu_dereference to access >> rdma_res_counter? > > I believe, its not required because when uncharge() is happening, it > can happen only from 3 contexts. > (a) from the caller task context, who has made allocation call, so no > synchronizing needed. > (b) from the dealloc resource context, again this is from the same > task context which allocated, it so this is single threaded, no need > to syncronize. I don't think it is true. You can access uverbs from multiple threads. What may help your case here I think is the fact that only when the last ucontext is released you can change the rdma_res_counter field, and ucontext release takes the ib_uverbs_file->mutex. Still, I think it would be best to use rcu_dereference(), if only for documentation and sparse. > (c) from the fput() context when process is terminated abruptly or as > part of differed cleanup, when this is happening there cannot be > allocator task anyway.
next prev parent reply other threads:[~2015-09-08 13:50 UTC|newest] Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-09-07 20:38 [PATCH 0/7] devcg: device cgroup extension for rdma resource Parav Pandit 2015-09-07 20:38 ` Parav Pandit 2015-09-07 20:38 ` [PATCH 1/7] devcg: Added user option to rdma resource tracking Parav Pandit 2015-09-07 20:38 ` Parav Pandit 2015-09-07 20:38 ` [PATCH 2/7] devcg: Added rdma resource tracking module Parav Pandit 2015-09-07 20:38 ` Parav Pandit 2015-09-07 20:38 ` [PATCH 3/7] devcg: Added infrastructure for rdma device cgroup Parav Pandit 2015-09-08 5:31 ` Haggai Eran 2015-09-08 5:31 ` Haggai Eran 2015-09-08 7:02 ` Parav Pandit 2015-09-08 7:02 ` Parav Pandit 2015-09-07 20:38 ` [PATCH 4/7] devcg: Added rdma resource tracker object per task Parav Pandit 2015-09-08 5:48 ` Haggai Eran 2015-09-08 5:48 ` Haggai Eran 2015-09-08 7:04 ` Parav Pandit 2015-09-08 8:24 ` Haggai Eran 2015-09-08 8:24 ` Haggai Eran 2015-09-08 8:26 ` Parav Pandit 2015-09-07 20:38 ` [PATCH 5/7] devcg: device cgroup's extension for RDMA resource Parav Pandit 2015-09-07 20:38 ` Parav Pandit 2015-09-08 8:22 ` Haggai Eran 2015-09-08 8:22 ` Haggai Eran 2015-09-08 10:18 ` Parav Pandit 2015-09-08 13:50 ` Haggai Eran [this message] 2015-09-08 13:50 ` Haggai Eran 2015-09-08 14:13 ` Parav Pandit 2015-09-08 8:36 ` Haggai Eran 2015-09-08 8:36 ` Haggai Eran 2015-09-08 10:50 ` Parav Pandit 2015-09-08 10:50 ` Parav Pandit 2015-09-08 14:10 ` Haggai Eran 2015-09-08 14:10 ` Haggai Eran 2015-09-07 20:38 ` [PATCH 6/7] devcg: Added support to use RDMA device cgroup Parav Pandit 2015-09-08 8:40 ` Haggai Eran 2015-09-08 8:40 ` Haggai Eran 2015-09-08 10:22 ` Parav Pandit 2015-09-08 13:40 ` Haggai Eran 2015-09-08 13:40 ` Haggai Eran 2015-09-07 20:38 ` [PATCH 7/7] devcg: Added Documentation of " Parav Pandit 2015-09-07 20:38 ` Parav Pandit 2015-09-07 20:55 ` [PATCH 0/7] devcg: device cgroup extension for rdma resource Parav Pandit 2015-09-08 12:45 ` Haggai Eran 2015-09-08 12:45 ` Haggai Eran 2015-09-08 15:23 ` Tejun Heo 2015-09-08 15:23 ` Tejun Heo 2015-09-09 3:57 ` Parav Pandit 2015-09-10 16:49 ` Tejun Heo 2015-09-10 17:46 ` Parav Pandit 2015-09-10 17:46 ` Parav Pandit 2015-09-10 20:22 ` Tejun Heo 2015-09-11 3:39 ` Parav Pandit 2015-09-11 4:04 ` Tejun Heo 2015-09-11 4:04 ` Tejun Heo 2015-09-11 4:24 ` Doug Ledford 2015-09-11 4:24 ` Doug Ledford 2015-09-11 14:52 ` Tejun Heo 2015-09-11 14:52 ` Tejun Heo 2015-09-11 16:26 ` Parav Pandit 2015-09-11 16:34 ` Tejun Heo 2015-09-11 16:34 ` Tejun Heo 2015-09-11 16:39 ` Parav Pandit 2015-09-11 16:39 ` Parav Pandit 2015-09-11 19:25 ` Tejun Heo 2015-09-14 10:18 ` Parav Pandit 2015-09-14 10:18 ` Parav Pandit 2015-09-11 16:47 ` Parav Pandit 2015-09-11 16:47 ` Parav Pandit 2015-09-11 19:05 ` Tejun Heo 2015-09-11 19:05 ` Tejun Heo 2015-09-11 19:22 ` Hefty, Sean 2015-09-11 19:43 ` Jason Gunthorpe 2015-09-11 19:43 ` Jason Gunthorpe 2015-09-11 20:06 ` Hefty, Sean 2015-09-14 11:09 ` Parav Pandit 2015-09-14 14:04 ` Parav Pandit 2015-09-14 15:21 ` Tejun Heo 2015-09-14 15:21 ` Tejun Heo 2015-09-14 17:28 ` Jason Gunthorpe 2015-09-14 17:28 ` Jason Gunthorpe 2015-09-14 18:54 ` Parav Pandit 2015-09-14 18:54 ` Parav Pandit 2015-09-14 20:18 ` Jason Gunthorpe 2015-09-15 3:08 ` Parav Pandit 2015-09-15 3:45 ` Jason Gunthorpe 2015-09-15 3:45 ` Jason Gunthorpe 2015-09-16 4:41 ` Parav Pandit 2015-09-16 4:41 ` Parav Pandit 2015-09-20 10:35 ` Haggai Eran 2015-09-20 10:35 ` Haggai Eran 2015-10-28 8:14 ` Parav Pandit 2015-10-28 8:14 ` Parav Pandit 2015-09-14 10:15 ` Parav Pandit 2015-09-11 4:43 ` Parav Pandit 2015-09-11 15:03 ` Tejun Heo 2015-09-10 17:48 ` Hefty, Sean
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=55EEE793.9020105@mellanox.com \ --to=haggaie@mellanox.com \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=corbet@lwn.net \ --cc=dledford@redhat.com \ --cc=hannes@cmpxchg.org \ --cc=james.l.morris@oracle.com \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-rdma@vger.kernel.org \ --cc=linux-security-module@vger.kernel.org \ --cc=lizefan@huawei.com \ --cc=matanb@mellanox.com \ --cc=ogerlitz@mellanox.com \ --cc=pandit.parav@gmail.com \ --cc=raindel@mellanox.com \ --cc=serge@hallyn.com \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.