All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* "ip netns create" hangs forever, spamming console with "unregister_netdevice: waiting for lo to become free"
@ 2015-05-26 16:21 Zack Weinberg
  2015-05-26 18:56 ` Zack Weinberg
  0 siblings, 1 reply; 4+ messages in thread
From: Zack Weinberg @ 2015-05-26 16:21 UTC (permalink / raw
  To: netdev

[This was originally posted at
https://bugzilla.kernel.org/show_bug.cgi?id=97811 ]

I have an application that makes heavy use of network namespaces,
creating and destroying them on the fly during operation.  With 100%
reproducibility, the first invocation of "ip netns create" after any
"ip netns del" hangs forever in D-state; only rebooting the machine
clears the condition.  This used to be an intermittent fault, going
back at least as far as 3.16, but 4.0 seems to have upgraded it to an
every-single-time fault.  I'm running 4.0.1 right now, but another
person reports that it happens with 4.0.4 as well.

lockdep hung task detection reports:

May  5 22:48:07 laguz kernel: INFO: task ip:572431 blocked for more
than 120 seconds.
May  5 22:48:07 laguz kernel:      Not tainted 4.0.1-linode55-lockdep #2
May  5 22:48:07 laguz kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  5 22:48:07 laguz kernel: ip              D ffff88000e28bda8     0
572431 572428 0x00000000
May  5 22:48:07 laguz kernel: ffff88000e28bda8 ffff880058b60000
ffff8801512c6540 ffff88000e28bda8
May  5 22:48:07 laguz kernel: ffff88000e28bfd8 ffff8800a07e6500
ffffffff81ecee88 ffff88000e28be28
May  5 22:48:07 laguz kernel: ffff8801512c6540 ffff88000e28bdc8
ffffffff81964ca9 0000000000000006
May  5 22:48:07 laguz kernel: Call Trace:
May  5 22:48:07 laguz kernel: [<ffffffff81964ca9>] schedule+0x74/0x83
May  5 22:48:07 laguz kernel: [<ffffffff81964cc6>]
schedule_preempt_disabled+0xe/0x10
May  5 22:48:07 laguz kernel: [<ffffffff81965b0f>] mutex_lock_nested+0x200/0x462
May  5 22:48:07 laguz kernel: [<ffffffff8175a651>] ? copy_net_ns+0x79/0x103
May  5 22:48:07 laguz kernel: [<ffffffff8175a61c>] ? copy_net_ns+0x44/0x103
May  5 22:48:07 laguz kernel: [<ffffffff811139d5>] ?
lockdep_trace_alloc+0xca/0xdc
May  5 22:48:07 laguz kernel: [<ffffffff8175a651>] copy_net_ns+0x79/0x103
May  5 22:48:07 laguz kernel: [<ffffffff810f4b40>]
create_new_namespaces+0x103/0x17f
May  5 22:48:07 laguz kernel: [<ffffffff810f4d8b>]
unshare_nsproxy_namespaces+0x8b/0xb0
May  5 22:48:07 laguz kernel: [<ffffffff810da7f0>] SyS_unshare+0x120/0x2aa
May  5 22:48:07 laguz kernel: [<ffffffff8196a2c9>]
system_call_fastpath+0x12/0x17
May  5 22:48:07 laguz kernel: 1 lock held by ip/572431:
May  5 22:48:07 laguz kernel: #0:  (net_mutex){+.+.+.}, at:
[<ffffffff8175a651>] copy_net_ns+0x79/0x103

When the bug manifests, there is invariably also a kworker process
stuck in D-state, but lockdep doesn't pick that up because it's
looping in netdev_wait_allrefs: (the pids don't match below and above
because the machine was restarted in between, but the stack trace for
the ip process is identical) (netdev_wait_allrefs was inlined into
netdev_run_todo):

# ps -e -o pid,stat,comm,wchan | grep D
   PID STAT COMMAND         WCHAN
257795 D    kworker/u12:0   msleep
290845 D+   ip              copy_net_ns

# cat /proc/257795/stack
[<ffffffff8112a9a7>] msleep+0x1e/0x22
[<ffffffff81763b13>] netdev_run_todo+0x17a/0x2a5
[<ffffffff81770284>] rtnl_unlock+0xe/0x10
[<ffffffff81764764>] default_device_exit_batch+0x12c/0x13b
[<ffffffff81759c25>] ops_exit_list+0x2a/0x55
[<ffffffff8175a532>] cleanup_net+0x14c/0x1f2
[<ffffffff810eecf1>] process_one_work+0x216/0x3fb
[<ffffffff810ef17d>] worker_thread+0x278/0x35c
[<ffffffff810f3ec5>] kthread+0x102/0x10a
[<ffffffff8196a218>] ret_from_fork+0x58/0x90

I am not aware of any way to extract the list of held locks for a
specific process, but based on code inspection, this process *should*
be repeatedly acquiring and dropping the rtnl_mutex but not holding
any other locks.

The point where it's stuck in copy_net_ns is

        mutex_lock(&net_mutex);
        rv = setup_net(net, user_ns);
        if (rv == 0) {
                rtnl_lock();
                list_add_tail_rcu(&net->list, &net_namespace_list);
                rtnl_unlock();
        }
        mutex_unlock(&net_mutex);

The assembly around copy_net_ns+0x79 is

   0xffffffff8175a643 <+107>:   xor    %esi,%esi
   0xffffffff8175a645 <+109>:   mov    $0xffffffff81ecee80,%rdi
   0xffffffff8175a64c <+116>:   callq  0xffffffff8196590f <mutex_lock_nested>
   0xffffffff8175a651 <+121>:   mov    %rbx,%rsi
   0xffffffff8175a654 <+124>:   mov    %r12,%rdi
   0xffffffff8175a657 <+127>:   callq  0xffffffff8175a0a5 <setup_net>

(note decimal vs hex offsets) so it's blocked on an attempt to acquire
the net_mutex, which lockdep thinks it *already holds, having acquired
it at that very point*.  This doesn't make any sense to me; I can
imagine some sort of bug in the guts of the mutex implementation, that
would cause it to block forever *after* having acquired the mutex it
was asked to acquire, but that strikes me as extraordinarily unlikely.

A more plausible theory based on code inspection is that it's
_actually_ stuck trying to acquire the rtnl_mutex -- the kworker does
not hold that lock during its calls to msleep() AFAICT, but perhaps
the mutex implementation is so profoundly unfair (in this scenario
anyway) that the lock never gets awarded to the ip process, and
meanwhile the kworker never makes any forward progress because
something on one of the notifier chains called from
netdev_wait_allrefs needs the net_mutex (and is not actually _taking_
that mutex, just noticing that it's already locked and giving up).
But this would entail that the stack trace for the ip process is
completely wrong.

I might try moving the mutex_unlock(&net_mutex) up to right after the
call to setup_net to see what happens, but I'm not sure that's safe
and it may be some time before I know if it helped.

Any advice or assistance anyone can provide would be much appreciated.
I'm happy to test patches.  I'm also happy to provide further
information as requested, but please give specific instructions on how
to gather that information, I have almost no experience debugging the
kernel.

The .config is attached to the bugzilla report; it is Linode's latest
stock 64-bit kernel (as of May 5) + activation of lockdep.

Note I am not subscribed to this mailing list, please cc: me on reples.

thanks,
zw

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: "ip netns create" hangs forever, spamming console with "unregister_netdevice: waiting for lo to become free"
  2015-05-26 16:21 "ip netns create" hangs forever, spamming console with "unregister_netdevice: waiting for lo to become free" Zack Weinberg
@ 2015-05-26 18:56 ` Zack Weinberg
  2015-06-08  3:56   ` Joe Stringer
  0 siblings, 1 reply; 4+ messages in thread
From: Zack Weinberg @ 2015-05-26 18:56 UTC (permalink / raw
  To: netdev

On Tue, May 26, 2015 at 12:21 PM, Zack Weinberg <zackw@panix.com> wrote:
> I have an application that makes heavy use of network namespaces,
> creating and destroying them on the fly during operation.  With 100%
> reproducibility, the first invocation of "ip netns create" after any
> "ip netns del" hangs forever in D-state; only rebooting the machine
> clears the condition.

Following up to myself to say that reproduction is not as simple as
'ip netns add test; ip netns del test; ip netns add test2'.  In fact,
not even bringing the namespace (and all associated interfaces) up and
then  down again _exactly_ as my production code does it will trigger
the bug.  It appears to be necessary to push a significant amount of
data through interfaces attached to the namespace.

Since I had to reset the machine to attempt to create a repro recipe,
I can no longer perform diagnostics on the hung processes, but I've
restarted the application and it should reach the problem state again
in a day or two.

zw

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: "ip netns create" hangs forever, spamming console with "unregister_netdevice: waiting for lo to become free"
  2015-05-26 18:56 ` Zack Weinberg
@ 2015-06-08  3:56   ` Joe Stringer
  2015-06-15 17:34     ` Zack Weinberg
  0 siblings, 1 reply; 4+ messages in thread
From: Joe Stringer @ 2015-06-08  3:56 UTC (permalink / raw
  To: Zack Weinberg; +Cc: Linux Netdev List

On 26 May 2015 at 11:56, Zack Weinberg <zackw@panix.com> wrote:
> On Tue, May 26, 2015 at 12:21 PM, Zack Weinberg <zackw@panix.com> wrote:
>> I have an application that makes heavy use of network namespaces,
>> creating and destroying them on the fly during operation.  With 100%
>> reproducibility, the first invocation of "ip netns create" after any
>> "ip netns del" hangs forever in D-state; only rebooting the machine
>> clears the condition.
>
> Following up to myself to say that reproduction is not as simple as
> 'ip netns add test; ip netns del test; ip netns add test2'.  In fact,
> not even bringing the namespace (and all associated interfaces) up and
> then  down again _exactly_ as my production code does it will trigger
> the bug.  It appears to be necessary to push a significant amount of
> data through interfaces attached to the namespace.
>
> Since I had to reset the machine to attempt to create a repro recipe,
> I can no longer perform diagnostics on the hung processes, but I've
> restarted the application and it should reach the problem state again
> in a day or two.

Hi Zack, have you had any further development on this issue?

We've been running into a similar issue during OVS development where
in a fairly complex environment running tests using ovs with tunnels
and conntrack inside docker containers reliably produces very similar
lockdep reports. At face value, it seems like something grabs the net
mutex and doesn't let go. The problem only arises after sending
traffic between containers. I'm trying to reduce the possibilities in
this test case to get better information on when/how it fails.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: "ip netns create" hangs forever, spamming console with "unregister_netdevice: waiting for lo to become free"
  2015-06-08  3:56   ` Joe Stringer
@ 2015-06-15 17:34     ` Zack Weinberg
  0 siblings, 0 replies; 4+ messages in thread
From: Zack Weinberg @ 2015-06-15 17:34 UTC (permalink / raw
  To: Joe Stringer; +Cc: Linux Netdev List

On Sun, Jun 7, 2015 at 11:56 PM, Joe Stringer <joestringer@nicira.com> wrote:
>
> Hi Zack, have you had any further development on this issue?

Sorry for the delay.  The only additional piece of information I have
for you is that
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 may be
another report of the same bug, or a closely-related one.

I am fairly sure this would be easy to fix if someone could just come
up with a reliable reproducer, but I've had no further luck myself,
although that Ubuntu bug did give me an idea -- specifically, it might
be necessary to make a lot of outbound connections from within a
network namespace to trigger the problem.  I am juggling several other
issues right now and haven't had time to follow up on that idea.

zw

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-15 17:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-26 16:21 "ip netns create" hangs forever, spamming console with "unregister_netdevice: waiting for lo to become free" Zack Weinberg
2015-05-26 18:56 ` Zack Weinberg
2015-06-08  3:56   ` Joe Stringer
2015-06-15 17:34     ` Zack Weinberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.