From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755051AbbIHONL (ORCPT ); Tue, 8 Sep 2015 10:13:11 -0400 Received: from mail-wi0-f193.google.com ([209.85.212.193]:33199 "EHLO mail-wi0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754430AbbIHONH (ORCPT ); Tue, 8 Sep 2015 10:13:07 -0400 MIME-Version: 1.0 In-Reply-To: <55EEE793.9020105@mellanox.com> References: <1441658303-18081-1-git-send-email-pandit.parav@gmail.com> <1441658303-18081-6-git-send-email-pandit.parav@gmail.com> <55EE9AE0.5030508@mellanox.com> <55EEE793.9020105@mellanox.com> Date: Tue, 8 Sep 2015 19:43:04 +0530 Message-ID: Subject: Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource. From: Parav Pandit To: Haggai Eran Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, tj@kernel.org, lizefan@huawei.com, Johannes Weiner , Doug Ledford , Jonathan Corbet , james.l.morris@oracle.com, serge@hallyn.com, Or Gerlitz , Matan Barak , raindel@mellanox.com, akpm@linux-foundation.org, linux-security-module@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 8, 2015 at 7:20 PM, Haggai Eran wrote: > On 08/09/2015 13:18, Parav Pandit wrote: >>> > >>>> >> + * RDMA resource limits are hierarchical, so the highest configured limit of >>>> >> + * the hierarchy is enforced. Allowing resource limit configuration to default >>>> >> + * cgroup allows fair share to kernel space ULPs as well. >>> > In what way is the highest configured limit of the hierarchy enforced? I >>> > would expect all the limits along the hierarchy to be enforced. >>> > >> In hierarchy, of say 3 cgroups, the smallest limit of the cgroup is applied. >> >> Lets take example to clarify. >> Say cg_A, cg_B, cg_C >> Role name limit >> Parent cg_A 100 >> Child_level1 cg_B (child of cg_A) 20 >> Child_level2: cg_C (child of cg_B) 50 >> >> If the process allocating rdma resource belongs to cg_C, limit lowest >> limit in the hierarchy is applied during charge() stage. >> If cg_A limit happens to be 10, since 10 is lowest, its limit would be >> applicable as you expected. > > Looking at the code, the usage in every level is charged. This is what I > would expect. I just think the comment is a bit misleading. > >>>> +int devcgroup_rdma_get_max_resource(struct seq_file *sf, void *v) >>>> +{ >>>> + struct dev_cgroup *dev_cg = css_to_devcgroup(seq_css(sf)); >>>> + int type = seq_cft(sf)->private; >>>> + u32 usage; >>>> + >>>> + if (dev_cg->rdma.tracker[type].limit == DEVCG_RDMA_MAX_RESOURCES) { >>>> + seq_printf(sf, "%s\n", DEVCG_RDMA_MAX_RESOURCE_STR); >>> I'm not sure hiding the actual number is good, especially in the >>> show_usage case. >> >> This is similar to following other controller same as newly added PID >> subsystem in showing max limit. > > Okay. > >>>> +void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext, >>>> + enum devcgroup_rdma_rt type, int num) >>>> +{ >>>> + struct dev_cgroup *dev_cg, *p; >>>> + struct task_struct *ctx_task; >>>> + >>>> + if (!num) >>>> + return; >>>> + >>>> + /* get cgroup of ib_ucontext it belong to, to uncharge >>>> + * so that when its called from any worker tasks or any >>>> + * other tasks to which this resource doesn't belong to, >>>> + * it can be uncharged correctly. >>>> + */ >>>> + if (ucontext) >>>> + ctx_task = get_pid_task(ucontext->tgid, PIDTYPE_PID); >>>> + else >>>> + ctx_task = current; >>>> + dev_cg = task_devcgroup(ctx_task); >>>> + >>>> + spin_lock(&ctx_task->rdma_res_counter->lock); >>> Don't you need an rcu read lock and rcu_dereference to access >>> rdma_res_counter? >> >> I believe, its not required because when uncharge() is happening, it >> can happen only from 3 contexts. >> (a) from the caller task context, who has made allocation call, so no >> synchronizing needed. >> (b) from the dealloc resource context, again this is from the same >> task context which allocated, it so this is single threaded, no need >> to syncronize. > I don't think it is true. You can access uverbs from multiple threads. Yes, thats right. Though I design counter structure allocation on per task basis for individual thread access, I totally missed out ucontext sharing among threads. I replied in other thread to make counters during charge, uncharge to atomic to cover that case. Therefore I need rcu lock and deference as well. > What may help your case here I think is the fact that only when the last > ucontext is released you can change the rdma_res_counter field, and > ucontext release takes the ib_uverbs_file->mutex. > > Still, I think it would be best to use rcu_dereference(), if only for > documentation and sparse. yes. > >> (c) from the fput() context when process is terminated abruptly or as >> part of differed cleanup, when this is happening there cannot be >> allocator task anyway. >