From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755012AbbIHOLY (ORCPT ); Tue, 8 Sep 2015 10:11:24 -0400 Received: from mail-db3on0064.outbound.protection.outlook.com ([157.55.234.64]:35004 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754603AbbIHOLQ (ORCPT ); Tue, 8 Sep 2015 10:11:16 -0400 Authentication-Results: spf=pass (sender IP is 193.47.165.134) smtp.mailfrom=mellanox.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=bestguesspass action=none header.from=mellanox.com; Subject: Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource. To: Parav Pandit References: <1441658303-18081-1-git-send-email-pandit.parav@gmail.com> <1441658303-18081-6-git-send-email-pandit.parav@gmail.com> <55EE9DF5.7030401@mellanox.com> CC: , , , , , , Johannes Weiner , Doug Ledford , Jonathan Corbet , , , Or Gerlitz , Matan Barak , , , From: Haggai Eran Message-ID: <55EEEC6A.4030702@mellanox.com> Date: Tue, 8 Sep 2015 17:10:50 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.0.52.254] X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;DB3FFO11FD052;1:yyZMECCKIbmC5fnuus2SHUP8P8bV+YNRHY9qYqSP6Xv9dhMXLQEg5nXk0Vb5e1GTGgKQa3/1EK4kHy4vbji5/Fd/rePqohYs/bOjGXFj0TmpU/mvDWFt9pcOHjxu6JdUW+05A8IJwgF8c/BjNZhmuFi3JaCzG7qm2/IlZjd/3rqCBzxowF0HF+jnu2y+WYurH/ymOCKDmA4goLogFhL0GpHCXla0fkkvBxrrRmmvI/hMg1Gr8BCFlNxSjE1SrG3L8O50nGiFH1vAGqJICG1t5rRBSDNGWGPamlX52RT8WP4pl0AqhHwkNWpFkv3HEli8PaBsGXuZ+X0F+PfG60UQBBIwAxzOlaa49N5xCFhs9fyO5Jo/VA2rVW4W+dQYo8OOsDqHcRs4zdEQZdwaCZP/yw== X-Forefront-Antispam-Report: CIP:193.47.165.134;CTRY:IL;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(438002)(377454003)(24454002)(479174004)(199003)(189002)(43544003)(106466001)(19580395003)(5004730100002)(97736004)(65806001)(189998001)(4001540100001)(50986999)(19580405001)(33656002)(110136002)(76176999)(65816999)(64706001)(2950100001)(36756003)(54356999)(87266999)(64126003)(6806004)(80316001)(92566002)(68736005)(65956001)(50466002)(93886004)(5001830100001)(83506001)(77096005)(4001350100001)(23676002)(87936001)(86362001)(5001860100001)(5007970100001)(46102003)(62966003)(59896002)(77156002)(47776003)(11100500001)(3940600001);DIR:OUT;SFP:1101;SCL:1;SRVR:DB3PR05MB345;H:mtlcas13.mtl.com;FPR:;SPF:Pass;PTR:ErrorRetry;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB345;2:tC1k9e8hnJvEw6o1zijljQzNu0oKG2Gf+WdtP1TTzhzyTJ6iEZc71f3u28ZrHm4w0PL6XYjiEu2tBsl3B7vbiu0pK1iiMBPona+mLDFkmfAfIvKQ1+9VlrohTQu50jMnhQjr40Ze+UYUwVl3k+RtR/F11uc0H4dyxcYzfNZG830=;3:1HQY10EzzAKK46VmwcD/UGSKtgef2w7UYc3Uo6y4wfhOizg8GOdnR5FiDAmmqNM/83D3rhF3APv258OJpvl9hhrIQqp8MWNRMgS6232QNYOarT2PVWjqLUg+lwGsjJXd6e61SMw9Qdm2S6aMPu5g6pMDEyncNp/zXv+O9h8sTBucZivGbfRey3HLDOjxd6ar4TWWhNPIoYQ4FlOwE8Hh9YThkBpm11YIZHx2Js7J0e2cb9qh5OlUj0Cjij/V3ANWSs8OPfUuj+CReUqcxuwjpg==;25:oOqAdOj2nqPabJjhVJKahbAv1XkYCe0sp2Ihany6ez/sA48aPWdMJFkNmaByf79ubjhaECDN2WI93c61LRbEOpGEnZY2sgEh1IkQH1oXqdbnLzoUHWehs8kuUOerj6aWFxrbsb7vMfrBxdxgdTE2Jh1/yAw5qW3ABOlUgZyBA52Cv0kiZk9m5L84b5E4HIIbCTLzDXNCycNejpKMJf1LYWnlz2nDVZ6P6Suu9tBO6FuV3AYDi6z4pVxbJP06bzMpnx/+zpeO4X1YTjJWiN1pUw== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(8251501001);SRVR:DB3PR05MB345; X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB345;20:qXjo8gZU6Lnb8lX1RJyl7lM2AWVWl23OfqIatrp97YCJK7FtObFDk2eXiGcd+E4Z5FC8ITCrpPJVvhWv1CfDGBmJmFUOBQRDIZGrwDtrOztcqru8esmQs7IlNQodpqlR4jWOolmbgJqanVgnkqgbQMEb73ZlDMPJ+vRwCu8shqeXdxbk+JrO/vxMJQqzXO0+Ler5BNronkX7wInlM205GwSDRZhr2M1BMxTRQwT3YLfNRC3/A5EBmWfMe4pH1c/nGQsZ2QFzL8/Glw7tGzjU99qKYGaH1o8z+DMnpCvHQ4sV8YLuhT55Uk6u3bdumicYjgZuynKmqJHb9XO20HzCH4cHaUXQPFzMTgkY4Kai8WwjrAfcSLafJdeX2QmJpy5GxNzZsA10TdCMXCLT/8+rIpX0dpVcG06GDl192AY2Y0jvTurI+oozac/WSmQeJoV324SRcKa1slrDbRjs24mYbHDL3neCzrOLBLpn8FrRzcNl62hWpG+tWzlWiWA0td2s;4:hw+b/7f2+KiDrcVCKbIE7tzK+71F6dfFp4mI8Nf0tm7F3uZfVswnktjUpdf7xi5YEFD86eCz6YYRtS6wEu1Go1m+Vzepln3Bho5y+ZhETMR8ifmQO16lh24T50o/OE4sQEsimB1gsPsUF82xf7WsCdpQzurOWZJhZdyVAYWV6ThClRkI6wCkieRoukNXllUSTKrAOnMOb6D33O5kL9V3SnH8MBTYMTwuPmJxYrU6qD2Bxar+BAEBdTfpy90XjthMM0+a7+xuDrXOjcC/1MRjJw3vHlpl3Wzz+E1RCd8xk+ps9+YQisQiFZO0ZunqUv0YXhTD9gM9668qO7DYAnRHoQ== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(8121501046)(3002001);SRVR:DB3PR05MB345;BCL:0;PCL:0;RULEID:;SRVR:DB3PR05MB345; X-Forefront-PRVS: 069373DFB6 X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtEQjNQUjA1TUIzNDU7MjM6SGlyeC9zNnFnSnVrcXBWN01jYmdGbXJHd3k4?= =?utf-8?B?VURzSkpKd21LVlYwRWtUZFpidlQxQTNOMC9pV1p1NDB0K3VZS0x2YlNybFNr?= =?utf-8?B?Q0w0YTFyZDErTkxpeWU2ZVNCcytWUjBwQ0IrTWpnT254bm5oVUhFTGlXdEVV?= =?utf-8?B?SXBxMTUySHQxc2ZhTlFnYzhpQWI3NW93QkhUNmdrbjRyZndxd2wycEVZTEsx?= =?utf-8?B?QUY4VzloUkNpTG13TGtnZXpGS3hIZUhjdkN4aDl5S1Q4bVZjVkJtN0h2MzF4?= =?utf-8?B?Zy9hMTJOVmxMWHREM1E0YkxtOHVKSzc4eTFybTB5R3ZzaDVlQThwUysyUnkr?= =?utf-8?B?TEFTUFhQVkYyMWpxWlptQVp4SDlEWThxcjlnc21RUVlwN0hMNVE1NC9hS1g1?= =?utf-8?B?UkY5Q0VNdng2OFJSR2JmaEpPbmhmaitOTFFiWWlhalFzaFhya3hzNmZYMmRD?= =?utf-8?B?NmFWbzM0d0dhQ3VPNnZjR1h4MU1ZeFZqZUpQWXBGdGdpcEQ2RXNFdG41TjZj?= =?utf-8?B?aXQrbWRnRkZqL3YzY0xLRk44NEFIY2VyWmxDby9rUmRKdGtxZ0Zka0xRUm1l?= =?utf-8?B?UFJ4aHdhdmdncERKdVMvajFyMFVDbEdXeDFGMkYvbnUxV0tqMkJBYVhBenps?= =?utf-8?B?SmlyNVl3b3NrVzlxSE9xVTM2TVBqcDFySWR2czVkQ0k5UnpPbnU5eisyWUFX?= =?utf-8?B?ZTlKM0M5SDAyY3B3SHRMWTNFeGdiS0tJOWdSeE1UMHFna04vZFFZaW5lMFQw?= =?utf-8?B?TUZyNHQvc3pJb2VGTWwvUEQzcS94Vy9sbzRYYjROZlc2SFVoVUJ5MzY1OGk1?= =?utf-8?B?aTNIZWtxeXdtL3pkbzhQUVZFaVV5VndqdTNLait0elFmQWVPMHlYcHVoYm10?= =?utf-8?B?aFFvd3pmcy8rWm9MeEo4VVkrZld3NXJ3aGRZL0w2K0VaakM4STFFVmZON0NT?= =?utf-8?B?azZsZVVPNEdBb2NRd2F4eUFnNVhiZnlPaEJ0OEVFSTNWUmNyaGVRNnhETDVE?= =?utf-8?B?WVoyMDZpSXZBR1FTZm80WVhpa1lPZ1BBSlg5MCtLK0VsbE1zcUVpT3hDYmdZ?= =?utf-8?B?S0lsQzBlQmcwNWdlVmhQLzlzMDNsbXNzOTR0UHBuOVQrR0V1ZEgzOUJFekVI?= =?utf-8?B?bldFTDhvWnVaVHpwKzZyZ3BaVGNMZ2QzRG9IOW80NDV2Qk1mdTh2b29aVCtN?= =?utf-8?B?UFZCbWF2MWFIU3ZsbkdkYzFTNFA1c21HMVM0ZEJGMDlZVjFVbFNqNDZTcUZh?= =?utf-8?B?WVp2UnJxZ1FzZE80d1l1NTIxM1hsT3FtYjkwMnF0cDVzZzZkU0d2UDV1bzVV?= =?utf-8?B?L3VERjBrZE5pWHpxa2Njb0wyOXBVL1BsM2VIV2doa1dZdElYNXhpWmpYVWRW?= =?utf-8?B?aXZKdnVURUI5OTdveU5leFZCbGszRnM2YVFEN1hoL0h3YjhsWk9nQUdOY1Nz?= =?utf-8?B?eVZqeWNUOGJUeUUxUjFON29YK3ZqUXJiVkR6TFJBc3ZDRklyZ1VOTE9xU1U5?= =?utf-8?B?dUVTRmFNRXhtL2J3NWEzdlpzVUR0bHNGZi9VY0RZNUlRTnQ1MjM1c05sR29m?= =?utf-8?B?TjNDcmZqdVBaalQ3czIyRXk2b3lzTEVkeUtwYitGdDdHRVFkeFlOU29QSkV3?= =?utf-8?B?Qi9FTVFpTUZIclA5Rk9VR0lQU05wK25hVTJQQW93SjA3RlZEZ1FyQjVKaEFN?= =?utf-8?B?blpZYWVJd09aUHVwcExMZ0piRmgvYUo1SEcwVVVqTU13c0tNVHVzOWxlSXVk?= =?utf-8?B?dUZGTU94bGZMMzlRODQvS0I5ZUczdGhKV2s0Z1BWdTJuNytSbFlVSm9hMFkr?= =?utf-8?B?TWdrbktxajYxYVd0YmhJWmVINmlRUEtMa21mUmt6MXBuZz09?= X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB345;5:C1+x9fPPz7FqO4dOn69+O4cCfqMQKqifGRpLP7GFAFugNuYDDeJyukChYFFkXhuobJ71+iTTW4ASwJZ9CUsz2BdGrL1c5/w2Xzv+F0eL6XM+Ymr9LBIagvHhq7/LYIM7E30e7yGzhL2D02V+81oC9Q==;24:NQ1FDDbg+nh7dXVFMx3cT5ZrEd69g4drG+gMDpCG17/UNOy/bI39CgYsO/WfdRul0TdseIxjV5SbuH8/220ZDYO0DIHjr1YvdAMWS1jUCMA=;20:nGfoHlza5cR0P2wdbkyWXgn/acBEaOJwSEjt4MZRDgxgyU4rsJYEUtha0vGaTjlkP0pBYcUFSujeepeGz19kfA== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Sep 2015 14:11:08.9710 (UTC) X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=a652971c-7d2e-4d9b-a6a4-d149256f461b;Ip=[193.47.165.134];Helo=[mtlcas13.mtl.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR05MB345 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/09/2015 13:50, Parav Pandit wrote: > On Tue, Sep 8, 2015 at 2:06 PM, Haggai Eran wrote: >> On 07/09/2015 23:38, Parav Pandit wrote: >>> +void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext, >>> + enum devcgroup_rdma_rt type, int num) >>> +{ >>> + struct dev_cgroup *dev_cg, *p; >>> + struct task_struct *ctx_task; >>> + >>> + if (!num) >>> + return; >>> + >>> + /* get cgroup of ib_ucontext it belong to, to uncharge >>> + * so that when its called from any worker tasks or any >>> + * other tasks to which this resource doesn't belong to, >>> + * it can be uncharged correctly. >>> + */ >>> + if (ucontext) >>> + ctx_task = get_pid_task(ucontext->tgid, PIDTYPE_PID); >>> + else >>> + ctx_task = current; >> So what happens if a process creates a ucontext, forks, and then the >> child creates and destroys a CQ? If I understand correctly, created >> resources are always charged to the current process (the child), but >> when it is destroyed the owner of the ucontext (the parent) will be >> uncharged. >> >> Since ucontexts are not meant to be used by multiple processes, I think >> it would be okay to always charge the owner process (the one that >> created the ucontext). > > I need to think about it. I would like to avoid keep per task resource > counters for two reasons. > For a while I thought that native fork() doesn't take care to share > the RDMA resources and all CQ, QP dmaable memory from PID namespace > perspective. > > 1. Because, it could well happen that process and its child process is > created in PID namespace_A, after which child is migrated to new PID > namespace_B. > after which parent from the namespace_A is terminated. I am not sure > how the ucontext ownership changes from parent to child process at > that point today. > I prefer to keep this complexity out if at all it exists as process > migration across namespaces is not a frequent event for which to > optimize the code for. > > 2. by having per task counter (as cost of memory some memory) allows > to avoid using atomic during charge(), uncharge(). > > The intent is to have per task (process and thread) to have their > resource counter instance, but I can see that its broken where its > charging parent process as of now without atomics. > As you said its ok to always charge the owner process, I have to relax > 2nd requirement and fallback to use atomics for charge(), uncharge() > or I have to get rid of ucontext from the uncharge() API which is > difficult due to fput() being in worker thread context. > I think the cost of atomic operations here would normally be negligible compared to the cost of accessing the hardware to allocate or deallocate these resources. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Haggai Eran Subject: Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource. Date: Tue, 8 Sep 2015 17:10:50 +0300 Message-ID: <55EEEC6A.4030702@mellanox.com> References: <1441658303-18081-1-git-send-email-pandit.parav@gmail.com> <1441658303-18081-6-git-send-email-pandit.parav@gmail.com> <55EE9DF5.7030401@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-doc-owner@vger.kernel.org To: Parav Pandit Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, tj@kernel.org, lizefan@huawei.com, Johannes Weiner , Doug Ledford , Jonathan Corbet , james.l.morris@oracle.com, serge@hallyn.com, Or Gerlitz , Matan Barak , raindel@mellanox.com, akpm@linux-foundation.org, linux-security-module@vger.kernel.org List-Id: linux-rdma@vger.kernel.org On 08/09/2015 13:50, Parav Pandit wrote: > On Tue, Sep 8, 2015 at 2:06 PM, Haggai Eran wrote: >> On 07/09/2015 23:38, Parav Pandit wrote: >>> +void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext, >>> + enum devcgroup_rdma_rt type, int num) >>> +{ >>> + struct dev_cgroup *dev_cg, *p; >>> + struct task_struct *ctx_task; >>> + >>> + if (!num) >>> + return; >>> + >>> + /* get cgroup of ib_ucontext it belong to, to uncharge >>> + * so that when its called from any worker tasks or any >>> + * other tasks to which this resource doesn't belong to, >>> + * it can be uncharged correctly. >>> + */ >>> + if (ucontext) >>> + ctx_task = get_pid_task(ucontext->tgid, PIDTYPE_PID); >>> + else >>> + ctx_task = current; >> So what happens if a process creates a ucontext, forks, and then the >> child creates and destroys a CQ? If I understand correctly, created >> resources are always charged to the current process (the child), but >> when it is destroyed the owner of the ucontext (the parent) will be >> uncharged. >> >> Since ucontexts are not meant to be used by multiple processes, I think >> it would be okay to always charge the owner process (the one that >> created the ucontext). > > I need to think about it. I would like to avoid keep per task resource > counters for two reasons. > For a while I thought that native fork() doesn't take care to share > the RDMA resources and all CQ, QP dmaable memory from PID namespace > perspective. > > 1. Because, it could well happen that process and its child process is > created in PID namespace_A, after which child is migrated to new PID > namespace_B. > after which parent from the namespace_A is terminated. I am not sure > how the ucontext ownership changes from parent to child process at > that point today. > I prefer to keep this complexity out if at all it exists as process > migration across namespaces is not a frequent event for which to > optimize the code for. > > 2. by having per task counter (as cost of memory some memory) allows > to avoid using atomic during charge(), uncharge(). > > The intent is to have per task (process and thread) to have their > resource counter instance, but I can see that its broken where its > charging parent process as of now without atomics. > As you said its ok to always charge the owner process, I have to relax > 2nd requirement and fallback to use atomics for charge(), uncharge() > or I have to get rid of ucontext from the uncharge() API which is > difficult due to fput() being in worker thread context. > I think the cost of atomic operations here would normally be negligible compared to the cost of accessing the hardware to allocate or deallocate these resources.