From mboxrd@z Thu Jan  1 00:00:00 1970
From: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Subject: Re: cpuidle: Kernel panics with AMD Opteron 6300 entering C2 - clock
 related
Date: Thu, 18 Jun 2015 13:21:44 +0200
Message-ID: <5582A9C8.8050200@profitbricks.com>
References: <55814696.1050803@profitbricks.com> <55828DBD.5000109@linaro.org> <5582A2DC.7060001@profitbricks.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Return-path: <linux-pm-owner@vger.kernel.org>
Received: from mail-wg0-f47.google.com ([74.125.82.47]:35140 "EHLO
	mail-wg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751001AbbFRLVr (ORCPT
	<rfc822;linux-pm@vger.kernel.org>); Thu, 18 Jun 2015 07:21:47 -0400
Received: by wgbhy7 with SMTP id hy7so60937124wgb.2
        for <linux-pm@vger.kernel.org>; Thu, 18 Jun 2015 04:21:46 -0700 (PDT)
In-Reply-To: <5582A2DC.7060001@profitbricks.com>
Sender: linux-pm-owner@vger.kernel.org
List-Id: linux-pm@vger.kernel.org
To: Daniel Lezcano <daniel.lezcano@linaro.org>, "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org

On 18.06.2015 12:52, Sebastian Parschauer wrote:
> On 18.06.2015 11:22, Daniel Lezcano wrote:
>> On 06/17/2015 12:06 PM, Sebastian Parschauer wrote:
>>> Hi cpuidle maintainers,
>>>
>>> we notice kernel panics with CPUs from the AMD Opteron 6300 series and
>>> kernel 3.12 when entering C2. In that C-state the clock is shut down but
>>> the flag CPUIDLE_FLAG_TIMER_STOP isn't set. We use the TSC clock source
>>> for performance as our servers host KVM VMs. During the panics
>>> interrupts are enabled again and the timer interrupt corrupts the
>>> instruction pointer and/or the stack pointer.
>>>
>>> Would it help to set the flag CPUIDLE_FLAG_TIMER_STOP for C2?
>>> Or how to fix this?
>>
>> Did you try the flag ? Does it fix it ?
> 
> Thanks for your reply. Unfortunately we can't roll out new kernels fast
> (VMs have to be migrated). But we've disabled the C2 state via sysfs for
> all CPU cores and all servers and had one more kernel panic with the
> same call trace although C2 was (or should have been) disabled. We use
> the menu governor and a v3.12.40 kernel.
> 
> It's strange to me coming into the same code path with state index 2 as
> parameter again. I think I'll prepare a kernel with some debug messages
> when transitioning from one state to another and deploy it to a test system.
> 
> Is there any better method to debug the cpuidle driver?
> 
> How do you guys test it?
> 
> Can we provide any missing additional information?
> 
> Maybe something else corrupts the memory in an interrupt and the cpuidle
> driver is just the one noticing an unrelated problem.

Sorry, I had a closer look at the most recent crash again. It happened
at entering C1 with disabled C2. So maybe our problem is not cpuidle
related.

> 
>>> ==========
>>> Additional debug info:
>>>
>>> BUG: unable to handle kernel NULL pointer dereference at           (null)
>>> IP: [<          (null)>]           (null)
>>> ...
>>> Call trace:
>>> [<ffffffff815af9b5>] cpuidle_idle_call+0xc5/0x150
>>> [<ffffffff8100b529>] arch_cpu_idle+0x9/0x20
>>> [<ffffffff81092e6f>] cpu_startup_entry+0xaf/0x240
>>> [<ffffffff8102df4b>] start_secondary+0x1db/0x240
>>>
>>> The CPUs provide three C-states:
>>> 0: POLL
>>> 1: C1
>>> 2: C2
>>>
>>> C2 information from the crash dump:
>>>
>>>> {
>>>>        name = "C2\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>        desc = "ACPI IOPORT
>>>> 0x815\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>        flags = 1,
>>>>        exit_latency = 100,
>>>>        power_usage = 0,
>>>>        target_residency = 200,
>>>>        disabled = false,
>>>>        enter = 0xffffffffa00ab026 <acpi_idle_enter_simple>,
>>>>        enter_dead = 0xffffffffa00aa39c <acpi_idle_play_dead>
>>>> }
>>>
>>> Assembly level analysis:
>>>
>>>> RDX: 0000000225c17d03
>>>
>>> So EDX is 00000002 and that's the entered state C2.
>>>
>>>> RDI: ffffffff81c15540
>>>> ..
>>>> crash> info symbol 0xffffffff81c15540
>>>> clocksource_tsc in section .data
>>>>
>>>> crash> disassemble cpuidle_enter_state
>>>> ...
>>>>     0xffffffff815af5fc <+60>:    callq  0xffffffff8109b360 <ktime_get>
>>>>     0xffffffff815af601 <+65>:    sti
>>>>     0xffffffff815af602 <+66>:    sub    %r13,%rax <- here rdi still
>>>> points to clocksource_tsc
>>>>     0xffffffff815af605 <+69>:    mov    %rax,%rdi <- rdi is
>>>> overwritten by the ktime_get return address
>>
>>
>