From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755003AbbIHNuT (ORCPT ); Tue, 8 Sep 2015 09:50:19 -0400 Received: from mail-am1on0076.outbound.protection.outlook.com ([157.56.112.76]:33888 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754907AbbIHNuL (ORCPT ); Tue, 8 Sep 2015 09:50:11 -0400 Authentication-Results: spf=pass (sender IP is 193.47.165.134) smtp.mailfrom=mellanox.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=bestguesspass action=none header.from=mellanox.com; Subject: Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource. To: Parav Pandit References: <1441658303-18081-1-git-send-email-pandit.parav@gmail.com> <1441658303-18081-6-git-send-email-pandit.parav@gmail.com> <55EE9AE0.5030508@mellanox.com> CC: , , , , , , Johannes Weiner , Doug Ledford , Jonathan Corbet , , , Or Gerlitz , Matan Barak , , , From: Haggai Eran Message-ID: <55EEE793.9020105@mellanox.com> Date: Tue, 8 Sep 2015 16:50:11 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.0.52.254] X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;AM1FFO11OLC009;1:xaHBhCHuH8Cd6zD8YNJE/tO6hvDRwXBEuntcZv9TGlXz3OktTUBUcx/0e09MEcU3+LoxqNfEffIPudGcHqgHg9KN1Uxz1XtvIDFnBYR/RcAhbfGoyGYuiklU/fmrhQvlCWkMSMU4MnEainFFscb4GXrTpykz3nPPZcpzVGEwiBgRZwTECJ5FXk8qQONNy8Leb+82Bw/CUyR+WPu47qzN9jT7rXK2Qa9ilv/42AMhSUirNPfobMUaRAJflNLAvTnlJdXZTPiIJHVtuDvEDqSo5L+TDGPz/F2qUHhkKz/pOxczv2vDzeiuFxT2dxEZEEe0dCDjAxpBRPNlcm49UYxf0Emnie3GIJnWNOnooQ/4V8T7AnEXiUlTrumBvwa+4E9K4uBS3kyi5aHJhUlAevB5bA== X-Forefront-Antispam-Report: CIP:193.47.165.134;CTRY:IL;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(438002)(479174004)(199003)(24454002)(189002)(76176999)(62966003)(4001350100001)(87936001)(36756003)(93886004)(68736005)(5004730100002)(5007970100001)(54356999)(4001540100001)(77096005)(33656002)(97736004)(87266999)(86362001)(59896002)(50466002)(50986999)(5001860100001)(47776003)(106466001)(80316001)(5001830100001)(65956001)(2950100001)(77156002)(64706001)(83506001)(46102003)(64126003)(6806004)(65806001)(189998001)(92566002)(110136002)(23676002)(65816999)(11100500001)(3940600001);DIR:OUT;SFP:1101;SCL:1;SRVR:HE1PR05MB1466;H:mtlcas13.mtl.com;FPR:;SPF:Pass;PTR:ErrorRetry;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;HE1PR05MB1466;2:cqBfZInP/fFpLTI5sHDov6sHzSXCl5TrMVCE0flrazHPDgSeITqvmRo9iN13oBEAK3PjMUWryRByF2jmSvPJM3gTPuWez6dRxRA87jArxxLmn8vKOxnSGQvzrVrbFpNsyUQPXFUS4TzmTgdPKOjn/kQLVKFaPf+QFZwUSdF7Vco=;3:E4asGea5LOfaXXmUSsET2H2Mx0mBraG+4JKUBWJoj0wW+E9sgpgZJQsyaFzO+fR6QTIfh4aaQSHJ4FhciQ1CmEvgR7OOLXo2+3B2NIIcayRGdlvtjMPkZFV+P+1soJKWQ5E9woAaI4AwcNnF4AKN2rggYOcxBaeAkKrHBylrV8HYGXcO1e0tvVVW1YiSAmr3i6cYXIN6qWs2YkH//GLMEIQDTWt/9gib3QsNOMDG/hmFXk0xjSX20zyjCB8ubopfQ03xWn5T0+7DGne5IyQbWQ==;25:PmW8iCBi+ujCPZgaWBtUEQTM4h7R4Bub8V1DtTJveDDEArWJn15AA6sMfGoII6mldGcD2mPwlvHdlyyqK7h4NRvYHGWSIavKWAzuYCBnnEQVzqpR+dwYow3QtvCWC8oVBZcP1gpw/3b0T1aMsg1ILq660EfGpj8XBHz2GpcO7THMr2qT29LzlVj7WPeBMNBa7bjaJ6OUXekUBr8jBstRbklyLVc0sxkyQwN6J2vp0/X4ajrDIOwWtSRnOeR9vmFIA/Sxkaj7pxPiN+q5dyC5lQ== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(8251501001);SRVR:HE1PR05MB1466; X-Microsoft-Exchange-Diagnostics: 1;HE1PR05MB1466;20:yJmb+A1CAKE0rQOg4v14S/oCPf8Vy9z2TLttrPjFi6A+xQHnxXuNEhM1SqXH+isxlPCx7t/jgAjDJARRafgROlWx0gzHPZqu+ovDsnJux+cN6hR9b+RealqMMzT1jwylOt90qPWyz0RSiOLWUtQRwO783FFp2vOs6WPCH4TbSWObc76KOBhHI2n1kuj8V5awSyRw6qfKUCoP/Jl6Y9WuLi8FaVAmsePx74BcbpycTF0jpHN6Zo+BuzndW4qcsJhtNWIOHeJ2O7iiTlsJjxQtzwEoCp47DYntlm0CeqxaF8wkw94/ITw9aHy4XQVmrSWz0KHXzutzwiUMej0n7lV+Nd2UzZlUofGZrmJb4Vj9jYHXTGomxnNwewH+BfE8K8/X6DJkw0hKOomj5wmuQ/eI4JOVNnjTIWBeTGz3Ab7dT55DNAgJ14UIzZKYEkwA1rXzZC6DxRsz81TS9C01vbMKuQGKuPcUwIhaVwdTbGtHBSDPtU5N1+JdBMaL6cSOWjlZ;4:b4X32+W4htV8lNZODuGFu4Xl8CRbDEjRu8C6FsA/rJvdY5Nav9L6hgNcBS14ze2UQGzJjNtWvutnhvAvVYY4wEVMBBOXEJ561JrP9Bg3yZ4wihuaGIBvP4a4dhvfZH/D7FaK2NZQpuTJoALBA4YQM/ueVAMZkixSSA3szJSMcE9XSFxXlte12YdNWlSHolIAThJDQ3/8kMUB8CFLdHE64EbihS4s7fGn8bppfOOcbHkdqigDQWN6Fgl24JfPGZCOfvauNUMKmILAgGdwdB//SJ/ZKJMUdQ6e/946Z+6PauZJd6Eaya5vAJ9c/Ea/Q0+UiROnqstwvB20aIev7uniFQ== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(8121501046)(5005006)(3002001);SRVR:HE1PR05MB1466;BCL:0;PCL:0;RULEID:;SRVR:HE1PR05MB1466; X-Forefront-PRVS: 069373DFB6 X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtIRTFQUjA1TUIxNDY2OzIzOld1cktyelF5SWppQnVkY3I3UDBlUkVkb3Nt?= =?utf-8?B?WDhtK0ZmcWhjbXNsMmNVSGtPT1E1cEZVOFpGY1FkM0NoRDlvckplYVZkVnB2?= =?utf-8?B?UWx6N1hlNFBRSHVBc1U1V3I5bTEvR1dHOXF4L0JlTUxhMzE2RktFYkFreXVo?= =?utf-8?B?bHRNb0d1TFJob3ZxS3lsOE9mVitIYXhpQndoQ0NIT1RwbEpoREFxK3p6VXBY?= =?utf-8?B?NnZzUjgwQWR1bTBGZE1TcnFUTDd2QmRPcW1GdG12US84cGFLT0VNdGVld0p6?= =?utf-8?B?QUEwNkw0SGVubXB2cVJBUGxrYUtFNVI3cWFSNlVZZTMwa2Z0ai9sSDFpVnFK?= =?utf-8?B?bkRyVlN5SE1JRmpqc0p6Q0luUXhWTlgyVFNFbmtEQzJ5bkNUblBWV08velU2?= =?utf-8?B?QnNXcHZYYmtVazQ1ODA4dG16eDJrNmY4bzBNMDI3eDN3TEFORE8rUWNBdUVL?= =?utf-8?B?UzJJU1Y0Z3QycjV1N3AxcXZRc3hIRUk0NEJMZ2UwdE5hK1F6OU83c0ljWmp1?= =?utf-8?B?OUZzZEdGWUxQQ3RHZjd6eWxCcVMzaUZaaDhQV1EwYWFFMG0ra21YZTVtMGRz?= =?utf-8?B?VHczbmpGejE5MUxudVV6dXV6ZEs0RHFtYXB6WjEwTXlMWktidndxNUUxV2JX?= =?utf-8?B?Qk5mNjY0YjZHc21nNVhsUkxPQncrSTdnSVM5ckZmbVlmeG9LdnZ5cFphbzBw?= =?utf-8?B?ZktqN2VvT3oxUVoySGdlRXB6ZEV5bEVlbyt0c09lMGdCNkg3Qk54NlJLRFJN?= =?utf-8?B?cXZjdnQ2NjJWWDVqMm5rNDhJSWFyZTRianA2dmhrSStndy9TR3dpR0I1UFJl?= =?utf-8?B?ZFJoWUdrSnNaTGxnRk9yeWp5NmlXdEQ2b0kzeTYxWmZnSUdUQ1pMUE11STI2?= =?utf-8?B?RFdsaUtDQ29XQ0ZiUG9sUjV5TGNia0dtVWt6b0Y0MG9lZmgrUUZYZUZ1QjQ1?= =?utf-8?B?NzNKMHAvR0JkRkZkeE5wZWN6ZExRSk95ZjBKTlQwNVN5SlpjUW1rWnY4S1BF?= =?utf-8?B?bFVRMFlXRE81WisrVnZKZ2VlRkc1ZURvVGxnZkxSbFdDNElTVkl6YkJ4dlN3?= =?utf-8?B?OWpvNTdsckRYa3U4L3JJVExndmVabGkwb2RxREpOcUtxNjY0Qk5JWG9sNVVh?= =?utf-8?B?M0lMQWY1UHdzMHZMeTg4MFNvYVh0TEJQeVowYjVHTFZPMG1CSGtDcmpRalNy?= =?utf-8?B?M090YktxWFhGRlYzSEdtOUpNUWd5TGp0UWk0L0lyd3IrdWdHemY1N1RtbWlZ?= =?utf-8?B?KytUanh5bThtQVM0L3N1cWMzYXkyVExGMFNpWTNIV3JuSzY3LzFJcnlhVERp?= =?utf-8?B?NWQyYi92TnliandPaE0zdTlRd2VNb0tEYWNWOGVOUVNkbnorUXRyNi9BQW1i?= =?utf-8?B?N2lVeG5ST00yclB2MkwxYnFKVS9STEtUYXUyU3BKY3BlWE5TYXlrOGZCa1hG?= =?utf-8?B?UzBJRzdQWk9tT2dXOHZFTTVXUkNEWmlxMWlqYVFMdmdsOVdjWlpKaWtIWVdK?= =?utf-8?B?Z2xKT2ZvY0xueG1OSGUxYmd4Q1JSQWpqWklDdlUwa3I5a2tQWFlxRUxWaWRE?= =?utf-8?B?OTBuVnZCMUJURmJJcVh1aEduQ2dsV3YzVFBzdndHRi9XczFQT3pHZGZWQ3VK?= =?utf-8?B?b3ZRL1d5RDdLTGlobktoR0RpYkh2ZUlIWG9DbGZvbjh1UlFEYTE1Sm04K213?= =?utf-8?Q?fZe7pnRCqIlJggrREM=3D?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR05MB1466;5:vrcTeP4Z7BSYDkTdCKZUtTfYC2I7ZDNAaHmpBsvH2bWY80yvI5qKB317xadydHo+nMMOHe1QnZJezeXkncCiDf6jOFQGBwPh5vvHJMU+6WPGjm8LJR1m+MUN0bltIORwFohZdR3N5Y7SFzGbE313rg==;24:VIG72XNXmL+ct8QN1hdBuoYhe1XCRHccWxQvOsGYSCo8wvFvKVFu07OXEfQT0i53R9Me9y2XqF8yowfT/BX2z2REWb4nTu0Z99Jf5ZMdRPU=;20:pL38niIihD6ruYaFslY9oXvh9+bv4eVRftrBue/oG3OVBdQXTGEZtyZgHNS8js/cOGoIq9ahzioL0vgW6GDtqQ== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Sep 2015 13:50:00.4760 (UTC) X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=a652971c-7d2e-4d9b-a6a4-d149256f461b;Ip=[193.47.165.134];Helo=[mtlcas13.mtl.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR05MB1466 X-Microsoft-Exchange-Diagnostics: 1;HE1PR05MB1532;2:qohMn5V4AGO4i2x0nshe+R42HMNjA2s1vi4Kqbnm6y2pUxqrkDYwYcyw5G7sJmuTseD4rNfBdr8jb13TCkc4B4nlaU0wAf+K+bgp+UqUdc3tRmlTU/pKUC6bp37tLV++QTYLxMOv2MT2I0WyLkHwwaszYeGDUTakf6kbKOaBaN8=;23:Ki2sLuAuK1lUT4HwU7+KZAcU5zC6Tp/8neatNXIIrlShe+AiBFhFL7x0F/zZOVlqkJ18xHF1Pfm0ojrH5n7QIzmEgHvlJQu1H4DMAXS0up711+WWvd/zz0ZYPnrW4dOoYL5HUlOvnSXBSGtvskG7Xy3cVP4Cz8IarKjZ8yUm9Mg8eA58hMhmRQZeRvPxYM83 X-OriginatorOrg: Mellanox.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/09/2015 13:18, Parav Pandit wrote: >> > >>> >> + * RDMA resource limits are hierarchical, so the highest configured limit of >>> >> + * the hierarchy is enforced. Allowing resource limit configuration to default >>> >> + * cgroup allows fair share to kernel space ULPs as well. >> > In what way is the highest configured limit of the hierarchy enforced? I >> > would expect all the limits along the hierarchy to be enforced. >> > > In hierarchy, of say 3 cgroups, the smallest limit of the cgroup is applied. > > Lets take example to clarify. > Say cg_A, cg_B, cg_C > Role name limit > Parent cg_A 100 > Child_level1 cg_B (child of cg_A) 20 > Child_level2: cg_C (child of cg_B) 50 > > If the process allocating rdma resource belongs to cg_C, limit lowest > limit in the hierarchy is applied during charge() stage. > If cg_A limit happens to be 10, since 10 is lowest, its limit would be > applicable as you expected. Looking at the code, the usage in every level is charged. This is what I would expect. I just think the comment is a bit misleading. >>> +int devcgroup_rdma_get_max_resource(struct seq_file *sf, void *v) >>> +{ >>> + struct dev_cgroup *dev_cg = css_to_devcgroup(seq_css(sf)); >>> + int type = seq_cft(sf)->private; >>> + u32 usage; >>> + >>> + if (dev_cg->rdma.tracker[type].limit == DEVCG_RDMA_MAX_RESOURCES) { >>> + seq_printf(sf, "%s\n", DEVCG_RDMA_MAX_RESOURCE_STR); >> I'm not sure hiding the actual number is good, especially in the >> show_usage case. > > This is similar to following other controller same as newly added PID > subsystem in showing max limit. Okay. >>> +void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext, >>> + enum devcgroup_rdma_rt type, int num) >>> +{ >>> + struct dev_cgroup *dev_cg, *p; >>> + struct task_struct *ctx_task; >>> + >>> + if (!num) >>> + return; >>> + >>> + /* get cgroup of ib_ucontext it belong to, to uncharge >>> + * so that when its called from any worker tasks or any >>> + * other tasks to which this resource doesn't belong to, >>> + * it can be uncharged correctly. >>> + */ >>> + if (ucontext) >>> + ctx_task = get_pid_task(ucontext->tgid, PIDTYPE_PID); >>> + else >>> + ctx_task = current; >>> + dev_cg = task_devcgroup(ctx_task); >>> + >>> + spin_lock(&ctx_task->rdma_res_counter->lock); >> Don't you need an rcu read lock and rcu_dereference to access >> rdma_res_counter? > > I believe, its not required because when uncharge() is happening, it > can happen only from 3 contexts. > (a) from the caller task context, who has made allocation call, so no > synchronizing needed. > (b) from the dealloc resource context, again this is from the same > task context which allocated, it so this is single threaded, no need > to syncronize. I don't think it is true. You can access uverbs from multiple threads. What may help your case here I think is the fact that only when the last ucontext is released you can change the rdma_res_counter field, and ucontext release takes the ib_uverbs_file->mutex. Still, I think it would be best to use rcu_dereference(), if only for documentation and sparse. > (c) from the fput() context when process is terminated abruptly or as > part of differed cleanup, when this is happening there cannot be > allocator task anyway. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Haggai Eran Subject: Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource. Date: Tue, 8 Sep 2015 16:50:11 +0300 Message-ID: <55EEE793.9020105@mellanox.com> References: <1441658303-18081-1-git-send-email-pandit.parav@gmail.com> <1441658303-18081-6-git-send-email-pandit.parav@gmail.com> <55EE9AE0.5030508@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-doc-owner@vger.kernel.org To: Parav Pandit Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, tj@kernel.org, lizefan@huawei.com, Johannes Weiner , Doug Ledford , Jonathan Corbet , james.l.morris@oracle.com, serge@hallyn.com, Or Gerlitz , Matan Barak , raindel@mellanox.com, akpm@linux-foundation.org, linux-security-module@vger.kernel.org List-Id: linux-rdma@vger.kernel.org On 08/09/2015 13:18, Parav Pandit wrote: >> > >>> >> + * RDMA resource limits are hierarchical, so the highest configured limit of >>> >> + * the hierarchy is enforced. Allowing resource limit configuration to default >>> >> + * cgroup allows fair share to kernel space ULPs as well. >> > In what way is the highest configured limit of the hierarchy enforced? I >> > would expect all the limits along the hierarchy to be enforced. >> > > In hierarchy, of say 3 cgroups, the smallest limit of the cgroup is applied. > > Lets take example to clarify. > Say cg_A, cg_B, cg_C > Role name limit > Parent cg_A 100 > Child_level1 cg_B (child of cg_A) 20 > Child_level2: cg_C (child of cg_B) 50 > > If the process allocating rdma resource belongs to cg_C, limit lowest > limit in the hierarchy is applied during charge() stage. > If cg_A limit happens to be 10, since 10 is lowest, its limit would be > applicable as you expected. Looking at the code, the usage in every level is charged. This is what I would expect. I just think the comment is a bit misleading. >>> +int devcgroup_rdma_get_max_resource(struct seq_file *sf, void *v) >>> +{ >>> + struct dev_cgroup *dev_cg = css_to_devcgroup(seq_css(sf)); >>> + int type = seq_cft(sf)->private; >>> + u32 usage; >>> + >>> + if (dev_cg->rdma.tracker[type].limit == DEVCG_RDMA_MAX_RESOURCES) { >>> + seq_printf(sf, "%s\n", DEVCG_RDMA_MAX_RESOURCE_STR); >> I'm not sure hiding the actual number is good, especially in the >> show_usage case. > > This is similar to following other controller same as newly added PID > subsystem in showing max limit. Okay. >>> +void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext, >>> + enum devcgroup_rdma_rt type, int num) >>> +{ >>> + struct dev_cgroup *dev_cg, *p; >>> + struct task_struct *ctx_task; >>> + >>> + if (!num) >>> + return; >>> + >>> + /* get cgroup of ib_ucontext it belong to, to uncharge >>> + * so that when its called from any worker tasks or any >>> + * other tasks to which this resource doesn't belong to, >>> + * it can be uncharged correctly. >>> + */ >>> + if (ucontext) >>> + ctx_task = get_pid_task(ucontext->tgid, PIDTYPE_PID); >>> + else >>> + ctx_task = current; >>> + dev_cg = task_devcgroup(ctx_task); >>> + >>> + spin_lock(&ctx_task->rdma_res_counter->lock); >> Don't you need an rcu read lock and rcu_dereference to access >> rdma_res_counter? > > I believe, its not required because when uncharge() is happening, it > can happen only from 3 contexts. > (a) from the caller task context, who has made allocation call, so no > synchronizing needed. > (b) from the dealloc resource context, again this is from the same > task context which allocated, it so this is single threaded, no need > to syncronize. I don't think it is true. You can access uverbs from multiple threads. What may help your case here I think is the fact that only when the last ucontext is released you can change the rdma_res_counter field, and ucontext release takes the ib_uverbs_file->mutex. Still, I think it would be best to use rcu_dereference(), if only for documentation and sparse. > (c) from the fput() context when process is terminated abruptly or as > part of differed cleanup, when this is happening there cannot be > allocator task anyway.