All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings
@ 2015-07-13 13:18 Kirill A. Shutemov
  2015-07-14  5:11 ` Cong Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Kirill A. Shutemov @ 2015-07-13 13:18 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel

Hi,

This simple test-case trigers few locking asserts in kernel:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <linux/netlink.h>

#define SOL_NETLINK 270

int main(int argc, char **argv)
{
        unsigned int block_size = 16 * 4096;
        struct nl_mmap_req req = {
                .nm_block_size          = block_size,
                .nm_block_nr            = 64,
                .nm_frame_size          = 16384,
                .nm_frame_nr            = 64 * block_size / 16384,
        };
        unsigned int ring_size;
	int fd;

	fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                exit(1);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                exit(1);

	ring_size = req.nm_block_nr * req.nm_block_size;
	mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	return 0;
}

+++ exited with 0 +++
[    2.500126] BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
[    2.501328] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
[    2.501997] 3 locks held by init/1:
[    2.502380]  #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
[    2.503328]  #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
[    2.504659]  #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
[    2.505724] Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20
[    2.506443] 
[    2.506612] CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
[    2.507378] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
[    2.508386]  ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
[    2.509233]  0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
[    2.510057]  ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
[    2.510882] Call Trace:
[    2.511146]  <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
[    2.511763]  [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
[    2.512476]  [<ffffffff81085bed>] __might_sleep+0x4d/0x90
[    2.513071]  [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
[    2.513683]  [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[    2.514385]  [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
[    2.515066]  [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
[    2.515694]  [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
[    2.516411]  [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
[    2.517070]  [<ffffffff817e484d>] __sk_free+0x1d/0x160
[    2.517607]  [<ffffffff817e49a9>] sk_free+0x19/0x20
[    2.518118]  [<ffffffff8182e020>] deferred_put_nlk_sk+0x20/0x30
[    2.518735]  [<ffffffff810d391c>] rcu_do_batch.isra.49+0x79c/0x10c0
[    2.519386]  [<ffffffff810d32e0>] ? rcu_do_batch.isra.49+0x160/0x10c0
[    2.520101]  [<ffffffff810d787b>] rcu_process_callbacks+0xdb/0x6d0
[    2.520790]  [<ffffffff8105dd52>] __do_softirq+0x152/0x630
[    2.521370]  [<ffffffff8105e3be>] irq_exit+0x8e/0xb0
[    2.521895]  [<ffffffff81936366>] smp_apic_timer_interrupt+0x46/0x60
[    2.522558]  [<ffffffff8145365f>] ? __delay+0xf/0x20
[    2.523079]  [<ffffffff81934a00>] apic_timer_interrupt+0x70/0x80
[    2.523705]  <EOI>  [<ffffffff8145365f>] ? __delay+0xf/0x20
[    2.524366]  [<ffffffff810b26ab>] ? in_lock_functions+0x1b/0x20
[    2.524995]  [<ffffffff8108ab81>] get_parent_ip+0x11/0x50
[    2.525562]  [<ffffffff8108ad1f>] preempt_count_sub+0x9f/0xf0
[    2.526179]  [<ffffffff81453778>] delay_tsc+0x68/0xc0
[    2.526706]  [<ffffffff8145365f>] __delay+0xf/0x20
[    2.527207]  [<ffffffff8145369a>] __const_udelay+0x2a/0x30
[    2.527781]  [<ffffffff8172d05a>] md_notify_reboot+0xea/0x100
[    2.528489]  [<ffffffff8107f379>] ? __blocking_notifier_call_chain+0x39/0x70
[    2.529236]  [<ffffffff8107efc6>] notifier_call_chain+0x66/0x90
[    2.529856]  [<ffffffff8107f391>] __blocking_notifier_call_chain+0x51/0x70
[    2.530570]  [<ffffffff810ae8c6>] ? __lock_acquire+0x606/0xf50
[    2.531172]  [<ffffffff8107f3c6>] blocking_notifier_call_chain+0x16/0x20
[    2.531869]  [<ffffffff8108061d>] kernel_restart_prepare+0x1d/0x40
[    2.532593]  [<ffffffff810806e6>] kernel_restart+0x16/0x60
[    2.533183]  [<ffffffff81080a07>] SyS_reboot+0x157/0x220
[    2.533738]  [<ffffffff81010778>] ? __restore_xstate_sig+0xf8/0x720
[    2.534390]  [<ffffffff81464127>] ? debug_smp_processor_id+0x17/0x20
[    2.535051]  [<ffffffff810a836e>] ? put_lock_stats.isra.19+0xe/0x30
[    2.535707]  [<ffffffff81933040>] ? _raw_spin_unlock_irq+0x30/0x60
[    2.536446]  [<ffffffff8108ad2b>] ? preempt_count_sub+0xab/0xf0
[    2.537112]  [<ffffffff81933daa>] ? syscall_return+0x11/0x54
[    2.537709]  [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
[    2.538399]  [<ffffffff810abf73>] ? trace_hardirqs_on_caller+0xf3/0x240
[    2.539094]  [<ffffffff81454b98>] ? trace_hardirqs_on_thunk+0x17/0x19
[    2.539764]  [<ffffffff81933b97>] system_call_fastpath+0x12/0x6f
[    2.540523] 
[    2.540695] =================================
[    2.541161] [ INFO: inconsistent lock state ]
[    2.541618] 4.1.0-00009-gbddf4c4818e0 #253 Not tainted
[    2.542154] ---------------------------------
[    2.542610] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[    2.543236] init/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
[    2.543729]  (&nlk->pg_vec_lock){+.?.+.}, at: [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
[    2.544503] {SOFTIRQ-ON-W} state was registered at:
[    2.544503]   [<ffffffff810aedce>] __lock_acquire+0xb0e/0xf50
[    2.544503]   [<ffffffff810afb22>] lock_acquire+0xd2/0x2b0
[    2.544503]   [<ffffffff8192e9b1>] mutex_lock_nested+0x71/0x430
[    2.544503]   [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
[    2.544503]   [<ffffffff81832010>] netlink_setsockopt+0x1b0/0x300
[    2.544503]   [<ffffffff817e0133>] SyS_setsockopt+0x63/0xc0
[    2.544503]   [<ffffffff81933cbc>] tracesys_phase2+0x84/0x89
[    2.544503] irq event stamp: 1159264
[    2.544503] hardirqs last  enabled at (1159264): [<ffffffff810c3495>] vprintk_emit+0x2c5/0x600
[    2.544503] hardirqs last disabled at (1159263): [<ffffffff810c321e>] vprintk_emit+0x4e/0x600
[    2.544503] softirqs last  enabled at (1159014): [<ffffffff8105de2e>] __do_softirq+0x22e/0x630
[    2.544503] softirqs last disabled at (1159017): [<ffffffff8105e3be>] irq_exit+0x8e/0xb0
[    2.544503] 
[    2.544503] other info that might help us debug this:
[    2.544503]  Possible unsafe locking scenario:
[    2.544503] 
[    2.544503]        CPU0
[    2.544503]        ----
[    2.544503]   lock(&nlk->pg_vec_lock);
[    2.544503]   <Interrupt>
[    2.544503]     lock(&nlk->pg_vec_lock);
[    2.544503] 
[    2.544503]  *** DEADLOCK ***
[    2.544503] 
[    2.544503] 3 locks held by init/1:
[    2.544503]  #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
[    2.544503]  #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
[    2.544503]  #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
[    2.544503] 
[    2.544503] stack backtrace:
[    2.544503] CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
[    2.544503] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
[    2.544503]  ffff88017b3d8000 ffff88027bc03af8 ffffffff81929ceb 0000000000000103
[    2.544503]  ffffffff82ad2ed0 ffff88027bc03b38 ffffffff819279f8 0000000000000001
[    2.544503]  ffff88017b3d8810 0000000000000004 0000000000000004 0000000000000006
[    2.544503] Call Trace:
[    2.544503]  <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
[    2.544503]  [<ffffffff819279f8>] print_usage_bug.part.35+0x281/0x290
[    2.544503]  [<ffffffff810ab89c>] mark_lock+0x22c/0x760
[    2.544503]  [<ffffffff810a97b0>] ? check_usage_backwards+0x150/0x150
[    2.544503]  [<ffffffff810aed77>] __lock_acquire+0xab7/0xf50
[    2.544503]  [<ffffffff810afb22>] lock_acquire+0xd2/0x2b0
[    2.544503]  [<ffffffff8182fc3d>] ? netlink_set_ring+0x1ed/0x350
[    2.544503]  [<ffffffff81085a9d>] ? ___might_sleep+0x16d/0x270
[    2.544503]  [<ffffffff8182fc3d>] ? netlink_set_ring+0x1ed/0x350
[    2.544503]  [<ffffffff8192e9b1>] mutex_lock_nested+0x71/0x430
[    2.544503]  [<ffffffff8182fc3d>] ? netlink_set_ring+0x1ed/0x350
[    2.544503]  [<ffffffff8182fc3d>] ? netlink_set_ring+0x1ed/0x350
[    2.544503]  [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
[    2.544503]  [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
[    2.544503]  [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
[    2.544503]  [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
[    2.544503]  [<ffffffff817e484d>] __sk_free+0x1d/0x160
[    2.544503]  [<ffffffff817e49a9>] sk_free+0x19/0x20
[    2.544503]  [<ffffffff8182e020>] deferred_put_nlk_sk+0x20/0x30
[    2.544503]  [<ffffffff810d391c>] rcu_do_batch.isra.49+0x79c/0x10c0
[    2.544503]  [<ffffffff810d32e0>] ? rcu_do_batch.isra.49+0x160/0x10c0
[    2.544503]  [<ffffffff810d787b>] rcu_process_callbacks+0xdb/0x6d0
[    2.544503]  [<ffffffff8105dd52>] __do_softirq+0x152/0x630
[    2.544503]  [<ffffffff8105e3be>] irq_exit+0x8e/0xb0
[    2.544503]  [<ffffffff81936366>] smp_apic_timer_interrupt+0x46/0x60
[    2.544503]  [<ffffffff8145365f>] ? __delay+0xf/0x20
[    2.544503]  [<ffffffff81934a00>] apic_timer_interrupt+0x70/0x80
[    2.544503]  <EOI>  [<ffffffff8145365f>] ? __delay+0xf/0x20
[    2.544503]  [<ffffffff810b26ab>] ? in_lock_functions+0x1b/0x20
[    2.544503]  [<ffffffff8108ab81>] get_parent_ip+0x11/0x50
[    2.544503]  [<ffffffff8108ad1f>] preempt_count_sub+0x9f/0xf0
[    2.544503]  [<ffffffff81453778>] delay_tsc+0x68/0xc0
[    2.544503]  [<ffffffff8145365f>] __delay+0xf/0x20
[    2.544503]  [<ffffffff8145369a>] __const_udelay+0x2a/0x30
[    2.544503]  [<ffffffff8172d05a>] md_notify_reboot+0xea/0x100
[    2.544503]  [<ffffffff8107f379>] ? __blocking_notifier_call_chain+0x39/0x70
[    2.544503]  [<ffffffff8107efc6>] notifier_call_chain+0x66/0x90
[    2.544503]  [<ffffffff8107f391>] __blocking_notifier_call_chain+0x51/0x70
[    2.544503]  [<ffffffff810ae8c6>] ? __lock_acquire+0x606/0xf50
[    2.544503]  [<ffffffff8107f3c6>] blocking_notifier_call_chain+0x16/0x20
[    2.544503]  [<ffffffff8108061d>] kernel_restart_prepare+0x1d/0x40
[    2.544503]  [<ffffffff810806e6>] kernel_restart+0x16/0x60
[    2.544503]  [<ffffffff81080a07>] SyS_reboot+0x157/0x220
[    2.544503]  [<ffffffff81010778>] ? __restore_xstate_sig+0xf8/0x720
[    2.544503]  [<ffffffff81464127>] ? debug_smp_processor_id+0x17/0x20
[    2.544503]  [<ffffffff810a836e>] ? put_lock_stats.isra.19+0xe/0x30
[    2.544503]  [<ffffffff81933040>] ? _raw_spin_unlock_irq+0x30/0x60
[    2.544503]  [<ffffffff8108ad2b>] ? preempt_count_sub+0xab/0xf0
[    2.544503]  [<ffffffff81933daa>] ? syscall_return+0x11/0x54
[    2.544503]  [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
[    2.544503]  [<ffffffff810abf73>] ? trace_hardirqs_on_caller+0xf3/0x240
[    2.544503]  [<ffffffff81454b98>] ? trace_hardirqs_on_thunk+0x17/0x19
[    2.544503]  [<ffffffff81933b97>] system_call_fastpath+0x12/0x6f

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings
  2015-07-13 13:18 mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings Kirill A. Shutemov
@ 2015-07-14  5:11 ` Cong Wang
  2015-07-14  9:38   ` Thomas Graf
  0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2015-07-14  5:11 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: David S. Miller, netdev, linux-kernel@vger.kernel.org,
	Thomas Graf

On Mon, Jul 13, 2015 at 6:18 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> Hi,
>
> This simple test-case trigers few locking asserts in kernel:
>
> #define _GNU_SOURCE
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/mman.h>
> #include <sys/socket.h>
> #include <sys/types.h>
> #include <linux/netlink.h>
>
> #define SOL_NETLINK 270
>
> int main(int argc, char **argv)
> {
>         unsigned int block_size = 16 * 4096;
>         struct nl_mmap_req req = {
>                 .nm_block_size          = block_size,
>                 .nm_block_nr            = 64,
>                 .nm_frame_size          = 16384,
>                 .nm_frame_nr            = 64 * block_size / 16384,
>         };
>         unsigned int ring_size;
>         int fd;
>
>         fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
>         if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
>                 exit(1);
>         if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
>                 exit(1);
>
>         ring_size = req.nm_block_nr * req.nm_block_size;
>         mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>         return 0;
> }
>
> +++ exited with 0 +++
> [    2.500126] BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
> [    2.501328] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
> [    2.501997] 3 locks held by init/1:
> [    2.502380]  #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
> [    2.503328]  #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
> [    2.504659]  #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
> [    2.505724] Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20
> [    2.506443]
> [    2.506612] CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
> [    2.507378] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
> [    2.508386]  ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
> [    2.509233]  0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
> [    2.510057]  ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
> [    2.510882] Call Trace:
> [    2.511146]  <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
> [    2.511763]  [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
> [    2.512476]  [<ffffffff81085bed>] __might_sleep+0x4d/0x90
> [    2.513071]  [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
> [    2.513683]  [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
> [    2.514385]  [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
> [    2.515066]  [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
> [    2.515694]  [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
> [    2.516411]  [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
> [    2.517070]  [<ffffffff817e484d>] __sk_free+0x1d/0x160
> [    2.517607]  [<ffffffff817e49a9>] sk_free+0x19/0x20
> [    2.518118]  [<ffffffff8182e020>] deferred_put_nlk_sk+0x20/0x30
> [    2.518735]  [<ffffffff810d391c>] rcu_do_batch.isra.49+0x79c/0x10c0

Caused by:

commit 21e4902aea80ef35afc00ee8d2abdea4f519b7f7
Author: Thomas Graf <tgraf@suug.ch>
Date:   Fri Jan 2 23:00:22 2015 +0100

    netlink: Lockless lookup with RCU grace period in socket release

    Defers the release of the socket reference using call_rcu() to
    allow using an RCU read-side protected call to rhashtable_lookup()

    This restores behaviour and performance gains as previously
    introduced by e341694 ("netlink: Convert netlink_lookup() to use
    RCU protected hash table") without the side effect of severely
    delayed socket destruction.

    Signed-off-by: Thomas Graf <tgraf@suug.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>


We can't hold mutex lock in a rcu callback, perhaps we could
defer the mmap ring cleanup to a workqueue.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings
  2015-07-14  5:11 ` Cong Wang
@ 2015-07-14  9:38   ` Thomas Graf
  2015-07-14  9:50     ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Graf @ 2015-07-14  9:38 UTC (permalink / raw)
  To: Cong Wang
  Cc: Kirill A. Shutemov, David S. Miller, netdev,
	linux-kernel@vger.kernel.org

On 07/13/15 at 10:11pm, Cong Wang wrote:
> Caused by:
> 
> commit 21e4902aea80ef35afc00ee8d2abdea4f519b7f7
> Author: Thomas Graf <tgraf@suug.ch>
> Date:   Fri Jan 2 23:00:22 2015 +0100
> 
>     netlink: Lockless lookup with RCU grace period in socket release
> 
>     Defers the release of the socket reference using call_rcu() to
>     allow using an RCU read-side protected call to rhashtable_lookup()
> 
>     This restores behaviour and performance gains as previously
>     introduced by e341694 ("netlink: Convert netlink_lookup() to use
>     RCU protected hash table") without the side effect of severely
>     delayed socket destruction.
> 
>     Signed-off-by: Thomas Graf <tgraf@suug.ch>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 
> We can't hold mutex lock in a rcu callback, perhaps we could
> defer the mmap ring cleanup to a workqueue.

The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all. It would also get rid of all the special handling
for closing vs. non-closing.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings
  2015-07-14  9:38   ` Thomas Graf
@ 2015-07-14  9:50     ` Florian Westphal
  2015-07-14 18:40       ` Stephen Hemminger
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2015-07-14  9:50 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Cong Wang, Kirill A. Shutemov, David S. Miller, netdev,
	linux-kernel@vger.kernel.org

Thomas Graf <tgraf@suug.ch> wrote:
> On 07/13/15 at 10:11pm, Cong Wang wrote:
> > Caused by:
> > 
> > commit 21e4902aea80ef35afc00ee8d2abdea4f519b7f7
> > Author: Thomas Graf <tgraf@suug.ch>
> > Date:   Fri Jan 2 23:00:22 2015 +0100
> > 
> >     netlink: Lockless lookup with RCU grace period in socket release
> > 
> >     Defers the release of the socket reference using call_rcu() to
> >     allow using an RCU read-side protected call to rhashtable_lookup()
> > 
> >     This restores behaviour and performance gains as previously
> >     introduced by e341694 ("netlink: Convert netlink_lookup() to use
> >     RCU protected hash table") without the side effect of severely
> >     delayed socket destruction.
> > 
> >     Signed-off-by: Thomas Graf <tgraf@suug.ch>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > 
> > We can't hold mutex lock in a rcu callback, perhaps we could
> > defer the mmap ring cleanup to a workqueue.
> 
> The socket should be dead at this point. It might be simpler to
> add a netlink_release_ring() function which doesn't require
> locking at all. It would also get rid of all the special handling
> for closing vs. non-closing.

Do you know of any users of mmap'd netlink?
Support for this was never added to libmnl; given the fact that netlink
mmap doesn't have any advantages vs. socket interface i'd just revert it
(or mark CONFIG_NETLINK_MMAP broken -- although I don't see
 how the unconditional copy on tx could ever be avoided).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings
  2015-07-14  9:50     ` Florian Westphal
@ 2015-07-14 18:40       ` Stephen Hemminger
  0 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2015-07-14 18:40 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Thomas Graf, Cong Wang, Kirill A. Shutemov, David S. Miller,
	netdev, linux-kernel@vger.kernel.org

On Tue, 14 Jul 2015 11:50:37 +0200
Florian Westphal <fw@strlen.de> wrote:

> Thomas Graf <tgraf@suug.ch> wrote:
> > On 07/13/15 at 10:11pm, Cong Wang wrote:
> > > Caused by:
> > > 
> > > commit 21e4902aea80ef35afc00ee8d2abdea4f519b7f7
> > > Author: Thomas Graf <tgraf@suug.ch>
> > > Date:   Fri Jan 2 23:00:22 2015 +0100
> > > 
> > >     netlink: Lockless lookup with RCU grace period in socket release
> > > 
> > >     Defers the release of the socket reference using call_rcu() to
> > >     allow using an RCU read-side protected call to rhashtable_lookup()
> > > 
> > >     This restores behaviour and performance gains as previously
> > >     introduced by e341694 ("netlink: Convert netlink_lookup() to use
> > >     RCU protected hash table") without the side effect of severely
> > >     delayed socket destruction.
> > > 
> > >     Signed-off-by: Thomas Graf <tgraf@suug.ch>
> > >     Signed-off-by: David S. Miller <davem@davemloft.net>
> > > 
> > > 
> > > We can't hold mutex lock in a rcu callback, perhaps we could
> > > defer the mmap ring cleanup to a workqueue.
> > 
> > The socket should be dead at this point. It might be simpler to
> > add a netlink_release_ring() function which doesn't require
> > locking at all. It would also get rid of all the special handling
> > for closing vs. non-closing.
> 
> Do you know of any users of mmap'd netlink?
> Support for this was never added to libmnl; given the fact that netlink
> mmap doesn't have any advantages vs. socket interface i'd just revert it
> (or mark CONFIG_NETLINK_MMAP broken -- although I don't see
>  how the unconditional copy on tx could ever be avoided).

We use netlink mmap because it gives slightly better performance
when processing a route flap.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-07-14 18:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-13 13:18 mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings Kirill A. Shutemov
2015-07-14  5:11 ` Cong Wang
2015-07-14  9:38   ` Thomas Graf
2015-07-14  9:50     ` Florian Westphal
2015-07-14 18:40       ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.