Linux-Watchdog Archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] TCO Watchdog warning interrupt driver creation
       [not found] <B9C02DB17496AC4197F41A146D371B9B075AF9DE@hasmsx107.ger.corp.intel.com>
@ 2014-12-11  4:04 ` Darren Hart
  2014-12-11 14:30   ` Guenter Roeck
  0 siblings, 1 reply; 18+ messages in thread
From: Darren Hart @ 2014-12-11  4:04 UTC (permalink / raw
  To: Muller, Francois-nicolas
  Cc: 'platform-driver-x86@vger.kernel.org', Rafael Wysocki,
	Linux ACPI Mailing List, linux-watchdog, Wim Van Sebroeck

On Fri, Dec 05, 2014 at 10:08:45AM +0000, Muller, Francois-nicolas wrote:
> This driver provides support for TCO watchdog warning interrupt.
> 

Hi Francois,

Cc: Rafael and linux-acpi per his request on ACPI drivers (even if they are for
platform/drivers/x86).

> This feature is useful to root cause watchdog expiration.

Would this not be better suited as a debug option to the iTCO_wdt.c driver under
drivers/watchdog?

Cc: Wim Van Sebroeck and linux-watchdog

> Upon first expiration of the TCO watchdog, a warning interrupt is fired
> to this driver, which calls panic() function and dumps debug information
> (registers and call stacks).

Nit: Newlines between paragraphs for legibility here and in comments please.

> This implies TCO watchdog driver has enabled the interrupt (SCI) and ACPI
> tables contain GPE mapping information.
> After the interrupt has been fired, TCO watchdog reloads automatically and
> upon second expiration it trigs a reset of the platform.

triggers

> 
> Change-Id: I39b615b59dd4336bf208454f08b3e9eac9eb2880

Please run through checkpatch.pl (Gerrit change-id's should be removed).

> Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> ---
> drivers/platform/x86/Kconfig          |   13 +++++
> drivers/platform/x86/Makefile         |    1 +
> drivers/platform/x86/intel_warn_int.c |   98 +++++++++++++++++++++++++++++++++
> 3 files changed, 112 insertions(+)
> create mode 100644 drivers/platform/x86/intel_warn_int.c
> 
> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
> index 5ae65c1..79cba16 100644
> --- a/drivers/platform/x86/Kconfig
> +++ b/drivers/platform/x86/Kconfig
> @@ -827,4 +827,17 @@ config INTEL_BAYTRAIL_MBI
>                  Interface. This is a requirement for systems that need to configure
>                  the PUNIT for power management features such as RAPL.
> +config INTEL_WARN_INT
> +             tristate "TCO Watchdog warning interrupt"
> +             depends on ITCO_WDT
> +             ---help---
> +               This driver provides support for TCO watchdog warning interrupt.
> +               Upon first expiration of the TCO watchdog, a warning interrupt is
> +               fired and the driver calls panic() function to dump debug information

s/function//

> +               (registers and call stacks).

newline

> +               At the same time, the TCO watchdog reloads with 2.4 seconds timeout
> +               value and runs till the second expiration. At the second expiration of

s/till/until/

> +               the TCO watchdog, the platform resets (the dump is supposed to last less
> +               than 2.4 seconds).
> +
> endif # X86_PLATFORM_DEVICES
> diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
> index 32caae3..2d47b4d 100644
> --- a/drivers/platform/x86/Makefile
> +++ b/drivers/platform/x86/Makefile
> @@ -58,3 +58,4 @@ obj-$(CONFIG_PVPANIC)                               += pvpanic.o
> obj-$(CONFIG_INTEL_BAYTRAIL_MBI)                += intel_baytrail.o
> obj-$(CONFIG_INTEL_SOC_PMIC)        += dc_ti_cc.o
> obj-$(CONFIG_ACPI)                   += intel_em_config.o
> +obj-$(CONFIG_INTEL_WARN_INT)      += intel_warn_int.o
> diff --git a/drivers/platform/x86/intel_warn_int.c b/drivers/platform/x86/intel_warn_int.c
> new file mode 100644
> index 0000000..7ec8b73
> --- /dev/null
> +++ b/drivers/platform/x86/intel_warn_int.c
> @@ -0,0 +1,98 @@
> +/*
> + * intel_warn_int.c - This driver provides support for TCO watchdog warning
> + * interrupt.

Newlines between paragraphs in this section too.

> + * This feature is useful to root cause watchdog expiration.
> + * Upon first expiration of the TCO watchdog, a warning interrupt is fired
> + * to this driver, which calls panic() function and dumps debug information

s/function//

> + * (registers and call stacks).
> + * This implies TCO watchdog driver has enabled the interrupt (SCI) and ACPI

the ACPI...

> + * tables contain GPE mapping information.

the GPE...

> + * After the interrupt has been fired, TCO watchdog reloads automatically and
> + * upon second expiration it trigs a reset of the platform.

triggers

> + * Copyright (c) 2014, Intel Corporation.

All rights reserved.

> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *

extra * line, can remove.

> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/interrupt.h>
> +#include <linux/acpi.h>
> +#include <linux/nmi.h>
> +#include <acpi/actypes.h>
> +
> +#define DRV_NAME "warn_int"
> +#define TCO_CLASS DRV_NAME
> +
> +static const struct acpi_device_id tco_device_ids[] = {
> +             {"8086229C", 0},

What is this device ID? Is it specific to a debug ID for the WDT? If not, will
this eventually conflict with a non-debug driver for this WDT?

> +             {"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, tco_device_ids);
> +
> +static u32 warn_irq_handler(acpi_handle gpe_device, u32 gpe, void *context)
> +{
> +             pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);

Indenting with 13 spaces? Please review Documentation/CodingStyle and update the
patch. Rather than point out all these sorts of errors, I'm goign to stop here -
please read the doc and correct throughout.

> +
> +             trigger_all_cpu_backtrace();
> +             panic("Kernel Watchdog");
> +
> +             /* This code should not be reached */
> +             return IRQ_HANDLED;
> +}
> +
> +static int acpi_tco_add(struct acpi_device *device)
> +{
> +             acpi_status status;
> +             unsigned long long tco_gpe;

Declare variables in decreasing line length order.

> +
> +             status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &tco_gpe);
> +             if (ACPI_FAILURE(status))
> +                             return -EINVAL;
> +
> +             status = acpi_install_gpe_handler(NULL, tco_gpe,
> +                                                                              ACPI_GPE_EDGE_TRIGGERED,
> +                                                                              warn_irq_handler, NULL);
> +             if (ACPI_FAILURE(status))
> +                             return -ENODEV;
> +
> +             acpi_enable_gpe(NULL, tco_gpe);
> +
> +             pr_info("initialized. Interrupt=SCI GPE 0x%02llx", tco_gpe);
> +             return 0;
> +}
> +
> +static struct acpi_driver tco_driver = {
> +             .name = "warn_int",
> +             .class = TCO_CLASS,
> +             .ids = tco_device_ids,
> +             .ops = {
> +                             .add = acpi_tco_add,
> +             },
> +};
> +
> +static int __init warn_int_init(void)
> +{
> +             return acpi_bus_register_driver(&tco_driver);
> +}
> +
> +module_init(warn_int_init);
> +/* no module_exit, this driver shouldn't be unloaded */
> +
> +MODULE_AUTHOR("Francois-Nicolas Muller <francois-nicolas.muller@intel.com>");
> +MODULE_DESCRIPTION("Intel TCO WatchDog Warning Interrupt");
> +MODULE_LICENSE("GPL");
> +MODULE_ALIAS("platform:" DRV_NAME);
> --
> 1.7.9.5
> 

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] TCO Watchdog warning interrupt driver creation
  2014-12-11  4:04 ` [PATCH] TCO Watchdog warning interrupt driver creation Darren Hart
@ 2014-12-11 14:30   ` Guenter Roeck
  2014-12-22 13:18     ` Muller, Francois-nicolas
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2014-12-11 14:30 UTC (permalink / raw
  To: Darren Hart, Muller, Francois-nicolas
  Cc: 'platform-driver-x86@vger.kernel.org', Rafael Wysocki,
	Linux ACPI Mailing List, linux-watchdog, Wim Van Sebroeck

On 12/10/2014 08:04 PM, Darren Hart wrote:
> On Fri, Dec 05, 2014 at 10:08:45AM +0000, Muller, Francois-nicolas wrote:
>> This driver provides support for TCO watchdog warning interrupt.
>>
>
> Hi Francois,
>
> Cc: Rafael and linux-acpi per his request on ACPI drivers (even if they are for
> platform/drivers/x86).
>
>> This feature is useful to root cause watchdog expiration.
>
> Would this not be better suited as a debug option to the iTCO_wdt.c driver under
> drivers/watchdog?
>

I agree, this should be in the watchdog driver.

Normally a watchdog is expected to reset the system if it fires. This is its
"normal" operation. Correct, some watchdogs create an NMI or some other interrupt
before this happens, as a last warning to give the system a chance to do
something about it.

Catching this interrupt and dumping a message maybe a good idea, but letting
the system panic as response seems to be excessive and may even be
counter-productive. Maybe there should be a run-time option (eg a boot
parameter), in addition or instead of the Kconfig entry, to enable it,
if it is considered useful for debugging.

I find the headline in the Kconfig entry a bit misleading. It states "support
interrupt", while what it really does is to crash the system if that interrupt
actually happens. I don't usually call that "support".

Thanks,
Guenter

> Cc: Wim Van Sebroeck and linux-watchdog
>
>> Upon first expiration of the TCO watchdog, a warning interrupt is fired
>> to this driver, which calls panic() function and dumps debug information
>> (registers and call stacks).
>
> Nit: Newlines between paragraphs for legibility here and in comments please.
>
>> This implies TCO watchdog driver has enabled the interrupt (SCI) and ACPI
>> tables contain GPE mapping information.
>> After the interrupt has been fired, TCO watchdog reloads automatically and
>> upon second expiration it trigs a reset of the platform.
>
> triggers
>
>>
>> Change-Id: I39b615b59dd4336bf208454f08b3e9eac9eb2880
>
> Please run through checkpatch.pl (Gerrit change-id's should be removed).
>
>> Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
>> ---
>> drivers/platform/x86/Kconfig          |   13 +++++
>> drivers/platform/x86/Makefile         |    1 +
>> drivers/platform/x86/intel_warn_int.c |   98 +++++++++++++++++++++++++++++++++
>> 3 files changed, 112 insertions(+)
>> create mode 100644 drivers/platform/x86/intel_warn_int.c
>>
>> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
>> index 5ae65c1..79cba16 100644
>> --- a/drivers/platform/x86/Kconfig
>> +++ b/drivers/platform/x86/Kconfig
>> @@ -827,4 +827,17 @@ config INTEL_BAYTRAIL_MBI
>>                   Interface. This is a requirement for systems that need to configure
>>                   the PUNIT for power management features such as RAPL.
>> +config INTEL_WARN_INT
>> +             tristate "TCO Watchdog warning interrupt"
>> +             depends on ITCO_WDT
>> +             ---help---
>> +               This driver provides support for TCO watchdog warning interrupt.
>> +               Upon first expiration of the TCO watchdog, a warning interrupt is
>> +               fired and the driver calls panic() function to dump debug information
>
> s/function//
>
>> +               (registers and call stacks).
>
> newline
>
>> +               At the same time, the TCO watchdog reloads with 2.4 seconds timeout
>> +               value and runs till the second expiration. At the second expiration of
>
> s/till/until/
>
>> +               the TCO watchdog, the platform resets (the dump is supposed to last less
>> +               than 2.4 seconds).
>> +
>> endif # X86_PLATFORM_DEVICES
>> diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
>> index 32caae3..2d47b4d 100644
>> --- a/drivers/platform/x86/Makefile
>> +++ b/drivers/platform/x86/Makefile
>> @@ -58,3 +58,4 @@ obj-$(CONFIG_PVPANIC)                               += pvpanic.o
>> obj-$(CONFIG_INTEL_BAYTRAIL_MBI)                += intel_baytrail.o
>> obj-$(CONFIG_INTEL_SOC_PMIC)        += dc_ti_cc.o
>> obj-$(CONFIG_ACPI)                   += intel_em_config.o
>> +obj-$(CONFIG_INTEL_WARN_INT)      += intel_warn_int.o
>> diff --git a/drivers/platform/x86/intel_warn_int.c b/drivers/platform/x86/intel_warn_int.c
>> new file mode 100644
>> index 0000000..7ec8b73
>> --- /dev/null
>> +++ b/drivers/platform/x86/intel_warn_int.c
>> @@ -0,0 +1,98 @@
>> +/*
>> + * intel_warn_int.c - This driver provides support for TCO watchdog warning
>> + * interrupt.
>
> Newlines between paragraphs in this section too.
>
>> + * This feature is useful to root cause watchdog expiration.
>> + * Upon first expiration of the TCO watchdog, a warning interrupt is fired
>> + * to this driver, which calls panic() function and dumps debug information
>
> s/function//
>
>> + * (registers and call stacks).
>> + * This implies TCO watchdog driver has enabled the interrupt (SCI) and ACPI
>
> the ACPI...
>
>> + * tables contain GPE mapping information.
>
> the GPE...
>
>> + * After the interrupt has been fired, TCO watchdog reloads automatically and
>> + * upon second expiration it trigs a reset of the platform.
>
> triggers
>
>> + * Copyright (c) 2014, Intel Corporation.
>
> All rights reserved.
>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>
> extra * line, can remove.
>
>> + */
>> +
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +
>> +#include <linux/module.h>
>> +#include <linux/types.h>
>> +#include <linux/errno.h>
>> +#include <linux/kernel.h>
>> +#include <linux/init.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/acpi.h>
>> +#include <linux/nmi.h>
>> +#include <acpi/actypes.h>
>> +
>> +#define DRV_NAME "warn_int"
>> +#define TCO_CLASS DRV_NAME
>> +
>> +static const struct acpi_device_id tco_device_ids[] = {
>> +             {"8086229C", 0},
>
> What is this device ID? Is it specific to a debug ID for the WDT? If not, will
> this eventually conflict with a non-debug driver for this WDT?
>
>> +             {"", 0},
>> +};
>> +MODULE_DEVICE_TABLE(acpi, tco_device_ids);
>> +
>> +static u32 warn_irq_handler(acpi_handle gpe_device, u32 gpe, void *context)
>> +{
>> +             pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
>
> Indenting with 13 spaces? Please review Documentation/CodingStyle and update the
> patch. Rather than point out all these sorts of errors, I'm goign to stop here -
> please read the doc and correct throughout.
>
>> +
>> +             trigger_all_cpu_backtrace();
>> +             panic("Kernel Watchdog");
>> +
>> +             /* This code should not be reached */
>> +             return IRQ_HANDLED;
>> +}
>> +
>> +static int acpi_tco_add(struct acpi_device *device)
>> +{
>> +             acpi_status status;
>> +             unsigned long long tco_gpe;
>
> Declare variables in decreasing line length order.
>
>> +
>> +             status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &tco_gpe);
>> +             if (ACPI_FAILURE(status))
>> +                             return -EINVAL;
>> +
>> +             status = acpi_install_gpe_handler(NULL, tco_gpe,
>> +                                                                              ACPI_GPE_EDGE_TRIGGERED,
>> +                                                                              warn_irq_handler, NULL);
>> +             if (ACPI_FAILURE(status))
>> +                             return -ENODEV;
>> +
>> +             acpi_enable_gpe(NULL, tco_gpe);
>> +
>> +             pr_info("initialized. Interrupt=SCI GPE 0x%02llx", tco_gpe);
>> +             return 0;
>> +}
>> +
>> +static struct acpi_driver tco_driver = {
>> +             .name = "warn_int",
>> +             .class = TCO_CLASS,
>> +             .ids = tco_device_ids,
>> +             .ops = {
>> +                             .add = acpi_tco_add,
>> +             },
>> +};
>> +
>> +static int __init warn_int_init(void)
>> +{
>> +             return acpi_bus_register_driver(&tco_driver);
>> +}
>> +
>> +module_init(warn_int_init);
>> +/* no module_exit, this driver shouldn't be unloaded */
>> +
>> +MODULE_AUTHOR("Francois-Nicolas Muller <francois-nicolas.muller@intel.com>");
>> +MODULE_DESCRIPTION("Intel TCO WatchDog Warning Interrupt");
>> +MODULE_LICENSE("GPL");
>> +MODULE_ALIAS("platform:" DRV_NAME);
>> --
>> 1.7.9.5
>>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2014-12-11 14:30   ` Guenter Roeck
@ 2014-12-22 13:18     ` Muller, Francois-nicolas
  2014-12-22 15:41       ` Darren Hart
  0 siblings, 1 reply; 18+ messages in thread
From: Muller, Francois-nicolas @ 2014-12-22 13:18 UTC (permalink / raw
  To: Guenter Roeck, Darren Hart
  Cc: 'platform-driver-x86@vger.kernel.org', Rafael Wysocki,
	Linux ACPI Mailing List, linux-watchdog@vger.kernel.org,
	Wim Van Sebroeck

Thanks Darren and Guenter for your review.
Here is a new patchset, including the feature in the watchdog driver and using a boot parameter to be enabled.
François-Nicolas

Subject: [PATCH] TCO watchdog warning interrupt handler

Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
---
 drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index e802a54..1ce9ad8 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -49,6 +49,8 @@
 /* Module and version information */
 #define DRV_NAME	"iTCO_wdt"
 #define DRV_VERSION	"1.11"
+#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
+#define TCO_CLASS	DRV_NAME
 
 /* Includes */
 #include <linux/module.h>		/* For module specific items */
@@ -68,6 +70,9 @@
 #include <linux/pm.h>			/* For suspend/resume */
 #include <linux/mfd/core.h>
 #include <linux/mfd/lpc_ich.h>
+#include <linux/nmi.h>
+#include <linux/acpi.h>
+#include <acpi/actypes.h>
 
 #include "iTCO_vendor.h"
 
@@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
 	bool started;
 } iTCO_wdt_private;
 
+static const struct acpi_device_id iTCO_wdt_ids[] = {
+	{"8086229C", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
+
 /* module parameters */
 #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
 static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */
@@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
 MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
 	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
 
+static bool warn_irq;
+module_param(warn_irq, bool, 0);
+MODULE_PARM_DESC(warn_irq,
+	"Watchdog trigs a panic at first expiration (default=0)");
+
 /*
  * Some TCO specific functions
  */
@@ -200,6 +216,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
 	return ret; /* returns: 0 = OK, -EIO = Error */
 }
 
+static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
+{
+	pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
+
+	trigger_all_cpu_backtrace();
+	panic("Kernel Watchdog");
+
+	/* This code should not be reached */
+	return IRQ_HANDLED;
+}
+
+static int iTCO_wdt_acpi_add(struct acpi_device *device)
+{
+	unsigned long long gpe;
+	acpi_status status;
+
+	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+					  iTCO_wdt_wirq, NULL);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	acpi_enable_gpe(NULL, gpe);
+
+	pr_info("interrupt=SCI GPE=0x%02llx", gpe);
+	return 0;
+}
+
 static int iTCO_wdt_start(struct watchdog_device *wd_dev)
 {
 	unsigned int val;
@@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
 	},
 };
 
+static struct acpi_driver iTCO_wdt_acpi_driver = {
+	.name = DRV_NAME_ACPI,
+	.class = TCO_CLASS,
+	.ids = iTCO_wdt_ids,
+	.ops = {
+		.add = iTCO_wdt_acpi_add,
+	},
+};
+
 static int __init iTCO_wdt_init_module(void)
 {
 	int err;
@@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
 	if (err)
 		return err;
 
+	if (warn_irq) {
+		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
+		if (err) {
+			platform_driver_unregister(&iTCO_wdt_driver);
+			return err;
+		}
+	}
+
 	return 0;
 }
 
 static void __exit iTCO_wdt_cleanup_module(void)
 {
 	platform_driver_unregister(&iTCO_wdt_driver);
+	if (warn_irq)
+		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
 	pr_info("Watchdog Module Unloaded\n");
 }
 
-- 
1.7.9.5



-----Original Message-----
From: Guenter Roeck [mailto:linux@roeck-us.net] 
Sent: Thursday, December 11, 2014 3:30 PM
To: Darren Hart; Muller, Francois-nicolas
Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation

On 12/10/2014 08:04 PM, Darren Hart wrote:
> On Fri, Dec 05, 2014 at 10:08:45AM +0000, Muller, Francois-nicolas wrote:
>> This driver provides support for TCO watchdog warning interrupt.
>>
>
> Hi Francois,
>
> Cc: Rafael and linux-acpi per his request on ACPI drivers (even if 
> they are for platform/drivers/x86).
>
>> This feature is useful to root cause watchdog expiration.
>
> Would this not be better suited as a debug option to the iTCO_wdt.c 
> driver under drivers/watchdog?
>

I agree, this should be in the watchdog driver.

Normally a watchdog is expected to reset the system if it fires. This is its "normal" operation. Correct, some watchdogs create an NMI or some other interrupt before this happens, as a last warning to give the system a chance to do something about it.

Catching this interrupt and dumping a message maybe a good idea, but letting the system panic as response seems to be excessive and may even be counter-productive. Maybe there should be a run-time option (eg a boot parameter), in addition or instead of the Kconfig entry, to enable it, if it is considered useful for debugging.

I find the headline in the Kconfig entry a bit misleading. It states "support interrupt", while what it really does is to crash the system if that interrupt actually happens. I don't usually call that "support".

Thanks,
Guenter

> Cc: Wim Van Sebroeck and linux-watchdog
>
>> Upon first expiration of the TCO watchdog, a warning interrupt is 
>> fired to this driver, which calls panic() function and dumps debug 
>> information (registers and call stacks).
>
> Nit: Newlines between paragraphs for legibility here and in comments please.
>
>> This implies TCO watchdog driver has enabled the interrupt (SCI) and 
>> ACPI tables contain GPE mapping information.
>> After the interrupt has been fired, TCO watchdog reloads 
>> automatically and upon second expiration it trigs a reset of the platform.
>
> triggers
>
>>
>> Change-Id: I39b615b59dd4336bf208454f08b3e9eac9eb2880
>
> Please run through checkpatch.pl (Gerrit change-id's should be removed).
>
>> Signed-off-by: Francois-Nicolas Muller 
>> <francois-nicolas.muller@intel.com>
>> ---
>> drivers/platform/x86/Kconfig          |   13 +++++
>> drivers/platform/x86/Makefile         |    1 +
>> drivers/platform/x86/intel_warn_int.c |   98 +++++++++++++++++++++++++++++++++
>> 3 files changed, 112 insertions(+)
>> create mode 100644 drivers/platform/x86/intel_warn_int.c
>>
>> diff --git a/drivers/platform/x86/Kconfig 
>> b/drivers/platform/x86/Kconfig index 5ae65c1..79cba16 100644
>> --- a/drivers/platform/x86/Kconfig
>> +++ b/drivers/platform/x86/Kconfig
>> @@ -827,4 +827,17 @@ config INTEL_BAYTRAIL_MBI
>>                   Interface. This is a requirement for systems that need to configure
>>                   the PUNIT for power management features such as RAPL.
>> +config INTEL_WARN_INT
>> +             tristate "TCO Watchdog warning interrupt"
>> +             depends on ITCO_WDT
>> +             ---help---
>> +               This driver provides support for TCO watchdog warning interrupt.
>> +               Upon first expiration of the TCO watchdog, a warning interrupt is
>> +               fired and the driver calls panic() function to dump 
>> +debug information
>
> s/function//
>
>> +               (registers and call stacks).
>
> newline
>
>> +               At the same time, the TCO watchdog reloads with 2.4 seconds timeout
>> +               value and runs till the second expiration. At the 
>> + second expiration of
>
> s/till/until/
>
>> +               the TCO watchdog, the platform resets (the dump is supposed to last less
>> +               than 2.4 seconds).
>> +
>> endif # X86_PLATFORM_DEVICES
>> diff --git a/drivers/platform/x86/Makefile 
>> b/drivers/platform/x86/Makefile index 32caae3..2d47b4d 100644
>> --- a/drivers/platform/x86/Makefile
>> +++ b/drivers/platform/x86/Makefile
>> @@ -58,3 +58,4 @@ obj-$(CONFIG_PVPANIC)                               += pvpanic.o
>> obj-$(CONFIG_INTEL_BAYTRAIL_MBI)                += intel_baytrail.o
>> obj-$(CONFIG_INTEL_SOC_PMIC)        += dc_ti_cc.o
>> obj-$(CONFIG_ACPI)                   += intel_em_config.o
>> +obj-$(CONFIG_INTEL_WARN_INT)      += intel_warn_int.o
>> diff --git a/drivers/platform/x86/intel_warn_int.c 
>> b/drivers/platform/x86/intel_warn_int.c
>> new file mode 100644
>> index 0000000..7ec8b73
>> --- /dev/null
>> +++ b/drivers/platform/x86/intel_warn_int.c
>> @@ -0,0 +1,98 @@
>> +/*
>> + * intel_warn_int.c - This driver provides support for TCO watchdog 
>> +warning
>> + * interrupt.
>
> Newlines between paragraphs in this section too.
>
>> + * This feature is useful to root cause watchdog expiration.
>> + * Upon first expiration of the TCO watchdog, a warning interrupt is 
>> + fired
>> + * to this driver, which calls panic() function and dumps debug 
>> + information
>
> s/function//
>
>> + * (registers and call stacks).
>> + * This implies TCO watchdog driver has enabled the interrupt (SCI) 
>> + and ACPI
>
> the ACPI...
>
>> + * tables contain GPE mapping information.
>
> the GPE...
>
>> + * After the interrupt has been fired, TCO watchdog reloads 
>> + automatically and
>> + * upon second expiration it trigs a reset of the platform.
>
> triggers
>
>> + * Copyright (c) 2014, Intel Corporation.
>
> All rights reserved.
>
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> + modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> + WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of 
>> + MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> + License for
>> + * more details.
>> + *
>
> extra * line, can remove.
>
>> + */
>> +
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +
>> +#include <linux/module.h>
>> +#include <linux/types.h>
>> +#include <linux/errno.h>
>> +#include <linux/kernel.h>
>> +#include <linux/init.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/acpi.h>
>> +#include <linux/nmi.h>
>> +#include <acpi/actypes.h>
>> +
>> +#define DRV_NAME "warn_int"
>> +#define TCO_CLASS DRV_NAME
>> +
>> +static const struct acpi_device_id tco_device_ids[] = {
>> +             {"8086229C", 0},
>
> What is this device ID? Is it specific to a debug ID for the WDT? If 
> not, will this eventually conflict with a non-debug driver for this WDT?
>
>> +             {"", 0},
>> +};
>> +MODULE_DEVICE_TABLE(acpi, tco_device_ids);
>> +
>> +static u32 warn_irq_handler(acpi_handle gpe_device, u32 gpe, void 
>> +*context) {
>> +             pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", 
>> +__func__);
>
> Indenting with 13 spaces? Please review Documentation/CodingStyle and 
> update the patch. Rather than point out all these sorts of errors, I'm 
> goign to stop here - please read the doc and correct throughout.
>
>> +
>> +             trigger_all_cpu_backtrace();
>> +             panic("Kernel Watchdog");
>> +
>> +             /* This code should not be reached */
>> +             return IRQ_HANDLED;
>> +}
>> +
>> +static int acpi_tco_add(struct acpi_device *device) {
>> +             acpi_status status;
>> +             unsigned long long tco_gpe;
>
> Declare variables in decreasing line length order.
>
>> +
>> +             status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &tco_gpe);
>> +             if (ACPI_FAILURE(status))
>> +                             return -EINVAL;
>> +
>> +             status = acpi_install_gpe_handler(NULL, tco_gpe,
>> +                                                                              ACPI_GPE_EDGE_TRIGGERED,
>> +                                                                              warn_irq_handler, NULL);
>> +             if (ACPI_FAILURE(status))
>> +                             return -ENODEV;
>> +
>> +             acpi_enable_gpe(NULL, tco_gpe);
>> +
>> +             pr_info("initialized. Interrupt=SCI GPE 0x%02llx", tco_gpe);
>> +             return 0;
>> +}
>> +
>> +static struct acpi_driver tco_driver = {
>> +             .name = "warn_int",
>> +             .class = TCO_CLASS,
>> +             .ids = tco_device_ids,
>> +             .ops = {
>> +                             .add = acpi_tco_add,
>> +             },
>> +};
>> +
>> +static int __init warn_int_init(void) {
>> +             return acpi_bus_register_driver(&tco_driver);
>> +}
>> +
>> +module_init(warn_int_init);
>> +/* no module_exit, this driver shouldn't be unloaded */
>> +
>> +MODULE_AUTHOR("Francois-Nicolas Muller 
>> +<francois-nicolas.muller@intel.com>");
>> +MODULE_DESCRIPTION("Intel TCO WatchDog Warning Interrupt"); 
>> +MODULE_LICENSE("GPL"); MODULE_ALIAS("platform:" DRV_NAME);
>> --
>> 1.7.9.5
>>
>


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2014-12-22 13:18     ` Muller, Francois-nicolas
@ 2014-12-22 15:41       ` Darren Hart
  2014-12-22 16:07         ` Muller, Francois-nicolas
  0 siblings, 1 reply; 18+ messages in thread
From: Darren Hart @ 2014-12-22 15:41 UTC (permalink / raw
  To: Muller, Francois-nicolas, Guenter Roeck
  Cc: 'platform-driver-x86@vger.kernel.org', Rafael Wysocki,
	Linux ACPI Mailing List, linux-watchdog@vger.kernel.org,
	Wim Van Sebroeck



On December 22, 2014 5:18:33 AM PST, "Muller, Francois-nicolas" <francois-nicolas.muller@intel.com> wrote:
>Thanks Darren and Guenter for your review.
>Here is a new patchset, including the feature in the watchdog driver
>and using a boot parameter to be enabled.
>François-Nicolas
>
>Subject: [PATCH] TCO watchdog warning interrupt handler
>

Please provide a complete commit message.

>Signed-off-by: Francois-Nicolas Muller
><francois-nicolas.muller@intel.com>

-- 
Darren Hart
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2014-12-22 15:41       ` Darren Hart
@ 2014-12-22 16:07         ` Muller, Francois-nicolas
  2015-01-09  5:09           ` Guenter Roeck
  0 siblings, 1 reply; 18+ messages in thread
From: Muller, Francois-nicolas @ 2014-12-22 16:07 UTC (permalink / raw
  To: Darren Hart, Guenter Roeck
  Cc: 'platform-driver-x86@vger.kernel.org', Rafael Wysocki,
	Linux ACPI Mailing List, linux-watchdog@vger.kernel.org,
	Wim Van Sebroeck

Subject: [PATCH] TCO watchdog warning interrupt handler

Adding TCO watchdog warning interrupt handling.

This feature is useful to root cause watchdog expiration.
It is activated by boot parameter 'warn_irq' (disabled by default).

Upon first expiration of the TCO watchdog, a warning interrupt is
fired and the driver calls panic() to dump debug information (registers and
call stacks).

At the same time, the TCO watchdog reloads with 2.4 seconds timeout
value and runs until the second expiration.

Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
---
 drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c index e802a54..1ce9ad8 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -49,6 +49,8 @@
 /* Module and version information */
 #define DRV_NAME	"iTCO_wdt"
 #define DRV_VERSION	"1.11"
+#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
+#define TCO_CLASS	DRV_NAME
 
 /* Includes */
 #include <linux/module.h>		/* For module specific items */
@@ -68,6 +70,9 @@
 #include <linux/pm.h>			/* For suspend/resume */
 #include <linux/mfd/core.h>
 #include <linux/mfd/lpc_ich.h>
+#include <linux/nmi.h>
+#include <linux/acpi.h>
+#include <acpi/actypes.h>
 
 #include "iTCO_vendor.h"
 
@@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
 	bool started;
 } iTCO_wdt_private;
 
+static const struct acpi_device_id iTCO_wdt_ids[] = {
+	{"8086229C", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
+
 /* module parameters */
 #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
 static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
 	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
 
+static bool warn_irq;
+module_param(warn_irq, bool, 0);
+MODULE_PARM_DESC(warn_irq,
+	"Watchdog trigs a panic at first expiration (default=0)");
+
 /*
  * Some TCO specific functions
  */
@@ -200,6 +216,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
 	return ret; /* returns: 0 = OK, -EIO = Error */  }
 
+static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void 
+*context) {
+	pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
+
+	trigger_all_cpu_backtrace();
+	panic("Kernel Watchdog");
+
+	/* This code should not be reached */
+	return IRQ_HANDLED;
+}
+
+static int iTCO_wdt_acpi_add(struct acpi_device *device) {
+	unsigned long long gpe;
+	acpi_status status;
+
+	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+					  iTCO_wdt_wirq, NULL);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	acpi_enable_gpe(NULL, gpe);
+
+	pr_info("interrupt=SCI GPE=0x%02llx", gpe);
+	return 0;
+}
+
 static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
 	unsigned int val;
@@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
 	},
 };
 
+static struct acpi_driver iTCO_wdt_acpi_driver = {
+	.name = DRV_NAME_ACPI,
+	.class = TCO_CLASS,
+	.ids = iTCO_wdt_ids,
+	.ops = {
+		.add = iTCO_wdt_acpi_add,
+	},
+};
+
 static int __init iTCO_wdt_init_module(void)  {
 	int err;
@@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
 	if (err)
 		return err;
 
+	if (warn_irq) {
+		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
+		if (err) {
+			platform_driver_unregister(&iTCO_wdt_driver);
+			return err;
+		}
+	}
+
 	return 0;
 }
 
 static void __exit iTCO_wdt_cleanup_module(void)  {
 	platform_driver_unregister(&iTCO_wdt_driver);
+	if (warn_irq)
+		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
 	pr_info("Watchdog Module Unloaded\n");  }
 
--
1.7.9.5


-----Original Message-----
From: Darren Hart [mailto:dvhart@infradead.org] 
Sent: Monday, December 22, 2014 4:41 PM
To: Muller, Francois-nicolas; Guenter Roeck
Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation



On December 22, 2014 5:18:33 AM PST, "Muller, Francois-nicolas" <francois-nicolas.muller@intel.com> wrote:
>Thanks Darren and Guenter for your review.
>Here is a new patchset, including the feature in the watchdog driver 
>and using a boot parameter to be enabled.
>François-Nicolas
>
>Subject: [PATCH] TCO watchdog warning interrupt handler
>

Please provide a complete commit message.

>Signed-off-by: Francois-Nicolas Muller
><francois-nicolas.muller@intel.com>

--
Darren Hart
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] TCO Watchdog warning interrupt driver creation
  2014-12-22 16:07         ` Muller, Francois-nicolas
@ 2015-01-09  5:09           ` Guenter Roeck
  2015-01-14 16:32             ` Muller, Francois-nicolas
  2015-01-14 16:38             ` Muller, Francois-nicolas
  0 siblings, 2 replies; 18+ messages in thread
From: Guenter Roeck @ 2015-01-09  5:09 UTC (permalink / raw
  To: Muller, Francois-nicolas, Darren Hart
  Cc: 'platform-driver-x86@vger.kernel.org', Rafael Wysocki,
	Linux ACPI Mailing List, linux-watchdog@vger.kernel.org,
	Wim Van Sebroeck

On 12/22/2014 08:07 AM, Muller, Francois-nicolas wrote:
> Subject: [PATCH] TCO watchdog warning interrupt handler
>
> Adding TCO watchdog warning interrupt handling.
>
> This feature is useful to root cause watchdog expiration.
> It is activated by boot parameter 'warn_irq' (disabled by default).
>
> Upon first expiration of the TCO watchdog, a warning interrupt is
> fired and the driver calls panic() to dump debug information (registers and
> call stacks).
>
> At the same time, the TCO watchdog reloads with 2.4 seconds timeout
> value and runs until the second expiration.
>
> Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> ---
>   drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 66 insertions(+)
>
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c index e802a54..1ce9ad8 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -49,6 +49,8 @@
>   /* Module and version information */
>   #define DRV_NAME	"iTCO_wdt"
>   #define DRV_VERSION	"1.11"
> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
> +#define TCO_CLASS	DRV_NAME
>
>   /* Includes */
>   #include <linux/module.h>		/* For module specific items */
> @@ -68,6 +70,9 @@
>   #include <linux/pm.h>			/* For suspend/resume */
>   #include <linux/mfd/core.h>
>   #include <linux/mfd/lpc_ich.h>
> +#include <linux/nmi.h>
> +#include <linux/acpi.h>
> +#include <acpi/actypes.h>
>
>   #include "iTCO_vendor.h"
>
> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>   	bool started;
>   } iTCO_wdt_private;
>
> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> +	{"8086229C", 0},
> +	{"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> +

Doesn't that result in auto-loading the driver ?

If so, if that is really desirable, it should be a separate patch.

>   /* module parameters */
>   #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>   static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>   	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
>

Really looks like the patch is corrupted, with lines merged.

> +static bool warn_irq;
> +module_param(warn_irq, bool, 0);
> +MODULE_PARM_DESC(warn_irq,
> +	"Watchdog trigs a panic at first expiration (default=0)");
> +
>   /*
>    * Some TCO specific functions
>    */
> @@ -200,6 +216,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>   	return ret; /* returns: 0 = OK, -EIO = Error */  }
>
> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void
> +*context) {

Either the patch is corrupted, or you need to look into kernel
coding style rules.

> +	pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
> +
That is kind of redundant with the following panic.

> +	trigger_all_cpu_backtrace();
> +	panic("Kernel Watchdog");
> +

Quite frankly, I don't see the point of this patch. Sure, this will
dump the stack. Ok, but so what ? What value would it have for the
user to see the interrupt call stack that is seen if the watchdog
times out ?

Guess I must be missing something.

> +	/* This code should not be reached */
> +	return IRQ_HANDLED;
> +}
> +
> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
> +	unsigned long long gpe;
> +	acpi_status status;
> +
> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
> +	if (ACPI_FAILURE(status))
> +		return -EINVAL;
> +
> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
> +					  iTCO_wdt_wirq, NULL);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	acpi_enable_gpe(NULL, gpe);
> +
> +	pr_info("interrupt=SCI GPE=0x%02llx", gpe);

I fail to see the value of this log message.

> +	return 0;
> +}
> +
>   static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>   	unsigned int val;
> @@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
>   	},
>   };
>
> +static struct acpi_driver iTCO_wdt_acpi_driver = {
> +	.name = DRV_NAME_ACPI,
> +	.class = TCO_CLASS,
> +	.ids = iTCO_wdt_ids,
> +	.ops = {
> +		.add = iTCO_wdt_acpi_add,
> +	},
> +};
> +
>   static int __init iTCO_wdt_init_module(void)  {
>   	int err;
> @@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
>   	if (err)
>   		return err;
>
> +	if (warn_irq) {
> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
> +		if (err) {
> +			platform_driver_unregister(&iTCO_wdt_driver);
> +			return err;
> +		}
> +	}
> +
>   	return 0;
>   }
>
>   static void __exit iTCO_wdt_cleanup_module(void)  {

The original code is not formatted like this.

>   	platform_driver_unregister(&iTCO_wdt_driver);
> +	if (warn_irq)
> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>   	pr_info("Watchdog Module Unloaded\n");  }
>
> --
> 1.7.9.5
>
>
> -----Original Message-----
> From: Darren Hart [mailto:dvhart@infradead.org]
> Sent: Monday, December 22, 2014 4:41 PM
> To: Muller, Francois-nicolas; Guenter Roeck
> Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
> Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation
>
>
>
> On December 22, 2014 5:18:33 AM PST, "Muller, Francois-nicolas" <francois-nicolas.muller@intel.com> wrote:
>> Thanks Darren and Guenter for your review.
>> Here is a new patchset, including the feature in the watchdog driver
>> and using a boot parameter to be enabled.
>> François-Nicolas
>>
>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>
>
> Please provide a complete commit message.
>
>> Signed-off-by: Francois-Nicolas Muller
>> <francois-nicolas.muller@intel.com>
>
> --
> Darren Hart
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-01-09  5:09           ` Guenter Roeck
@ 2015-01-14 16:32             ` Muller, Francois-nicolas
  2015-01-14 18:16               ` Guenter Roeck
  2015-01-14 16:38             ` Muller, Francois-nicolas
  1 sibling, 1 reply; 18+ messages in thread
From: Muller, Francois-nicolas @ 2015-01-14 16:32 UTC (permalink / raw
  To: Guenter Roeck, Darren Hart
  Cc: 'platform-driver-x86@vger.kernel.org', Rafael Wysocki,
	Linux ACPI Mailing List, linux-watchdog@vger.kernel.org,
	Wim Van Sebroeck

Thanks for your review, my comments embedded below.
I'm sending the new version of the patch in the following email.
François-Nicolas

>-----Original Message-----
>From: Guenter Roeck [mailto:linux@roeck-us.net] 
>Sent: Friday, January 09, 2015 6:09 AM
>To: Muller, Francois-nicolas; Darren Hart
>Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
>Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation
>
>On 12/22/2014 08:07 AM, Muller, Francois-nicolas wrote:
>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>
>> Adding TCO watchdog warning interrupt handling.
>>
>> This feature is useful to root cause watchdog expiration.
>> It is activated by boot parameter 'warn_irq' (disabled by default).
>>
>> Upon first expiration of the TCO watchdog, a warning interrupt is 
>> fired and the driver calls panic() to dump debug information 
>> (registers and call stacks).
>>
>> At the same time, the TCO watchdog reloads with 2.4 seconds timeout 
>> value and runs until the second expiration.
>>
>> Signed-off-by: Francois-Nicolas Muller 
>> <francois-nicolas.muller@intel.com>
>> ---
>>   drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c 
>> index e802a54..1ce9ad8 100644
>> --- a/drivers/watchdog/iTCO_wdt.c
>> +++ b/drivers/watchdog/iTCO_wdt.c
>> @@ -49,6 +49,8 @@
>>   /* Module and version information */
>>   #define DRV_NAME	"iTCO_wdt"
>>   #define DRV_VERSION	"1.11"
>> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
>> +#define TCO_CLASS	DRV_NAME
>>
>>   /* Includes */
>>   #include <linux/module.h>		/* For module specific items */
>> @@ -68,6 +70,9 @@
>>   #include <linux/pm.h>			/* For suspend/resume */
>>   #include <linux/mfd/core.h>
>>   #include <linux/mfd/lpc_ich.h>
>> +#include <linux/nmi.h>
>> +#include <linux/acpi.h>
>> +#include <acpi/actypes.h>
>>
>>   #include "iTCO_vendor.h"
>>
>> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>>   	bool started;
>>   } iTCO_wdt_private;
>>
>> +static const struct acpi_device_id iTCO_wdt_ids[] = {
>> +	{"8086229C", 0},
>> +	{"", 0},
>> +};
>> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
>> +
>
>Doesn't that result in auto-loading the driver ?

The driver is loaded only when Bios exposes the acpi device and the boot param warn_irq is true.
Could you clarify ? I don't understand the issue. 

>
>If so, if that is really desirable, it should be a separate patch.
>
>>   /* module parameters */
>>   #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>>   static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>>   	"Turn off SMI clearing watchdog (depends on 
>> TCO-version)(default=1)");
>>
>
>Really looks like the patch is corrupted, with lines merged.

Yes sorry for this, my mail reader doesn't handle well plain text.

>
>> +static bool warn_irq;
>> +module_param(warn_irq, bool, 0);
>> +MODULE_PARM_DESC(warn_irq,
>> +	"Watchdog trigs a panic at first expiration (default=0)");
>> +
>>   /*
>>    * Some TCO specific functions
>>    */
>> @@ -200,6 +216,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>>   	return ret; /* returns: 0 = OK, -EIO = Error */  }
>>
>> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void
>> +*context) {
>
>Either the patch is corrupted, or you need to look into kernel coding style rules.
>
>> +	pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
>> +
>That is kind of redundant with the following panic.
>

Sure I will remove it.

>> +	trigger_all_cpu_backtrace();
>> +	panic("Kernel Watchdog");
>> +
>
>Quite frankly, I don't see the point of this patch. Sure, this will dump the stack. Ok, but so what ? What value would it have for the user to see the interrupt call stack that is seen if the watchdog times out ?
>
>Guess I must be missing something.
>

Trigger_all_cpu_backtrace() is useful to dump stacks of all running cpu (if any) by triggering an NMI to them.
The cause of a cpu hang may be related to another cpu and all cpu stacks are helpful information.
Panic() also dumps the interrupt call stack, which is useless here, but notifies panic handlers which could add some debug information.
I think both are useful for these reasons.

>> +	/* This code should not be reached */
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
>> +	unsigned long long gpe;
>> +	acpi_status status;
>> +
>> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
>> +	if (ACPI_FAILURE(status))
>> +		return -EINVAL;
>> +
>> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
>> +					  iTCO_wdt_wirq, NULL);
>> +	if (ACPI_FAILURE(status))
>> +		return -ENODEV;
>> +
>> +	acpi_enable_gpe(NULL, gpe);
>> +
>> +	pr_info("interrupt=SCI GPE=0x%02llx", gpe);
>
>I fail to see the value of this log message.
>

I will change it to pr_debug

>> +	return 0;
>> +}
>> +
>>   static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>>   	unsigned int val;
>> @@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
>>   	},
>>   };
>>
>> +static struct acpi_driver iTCO_wdt_acpi_driver = {
>> +	.name = DRV_NAME_ACPI,
>> +	.class = TCO_CLASS,
>> +	.ids = iTCO_wdt_ids,
>> +	.ops = {
>> +		.add = iTCO_wdt_acpi_add,
>> +	},
>> +};
>> +
>>   static int __init iTCO_wdt_init_module(void)  {
>>   	int err;
>> @@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
>>   	if (err)
>>   		return err;
>>
>> +	if (warn_irq) {
>> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
>> +		if (err) {
>> +			platform_driver_unregister(&iTCO_wdt_driver);
>> +			return err;
>> +		}
>> +	}
>> +
>>   	return 0;
>>   }
>>
>>   static void __exit iTCO_wdt_cleanup_module(void)  {
>
>The original code is not formatted like this.
>
>>   	platform_driver_unregister(&iTCO_wdt_driver);
>> +	if (warn_irq)
>> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>>   	pr_info("Watchdog Module Unloaded\n");  }
>>
>> --
>> 1.7.9.5
>>
>>
>> -----Original Message-----
>> From: Darren Hart [mailto:dvhart@infradead.org]
>> Sent: Monday, December 22, 2014 4:41 PM
>> To: Muller, Francois-nicolas; Guenter Roeck
>> Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI 
>> Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
>> Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation
>>
>>
>>
>> On December 22, 2014 5:18:33 AM PST, "Muller, Francois-nicolas" <francois-nicolas.muller@intel.com> wrote:
>>> Thanks Darren and Guenter for your review.
>>> Here is a new patchset, including the feature in the watchdog driver 
>>> and using a boot parameter to be enabled.
>>> François-Nicolas
>>>
>>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>>
>>
>> Please provide a complete commit message.
>>
>>> Signed-off-by: Francois-Nicolas Muller 
>>> <francois-nicolas.muller@intel.com>
>>
>> --
>> Darren Hart
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>

-----Original Message-----
From: Guenter Roeck [mailto:linux@roeck-us.net] 
Sent: Friday, January 09, 2015 6:09 AM
To: Muller, Francois-nicolas; Darren Hart
Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation

On 12/22/2014 08:07 AM, Muller, Francois-nicolas wrote:
> Subject: [PATCH] TCO watchdog warning interrupt handler
>
> Adding TCO watchdog warning interrupt handling.
>
> This feature is useful to root cause watchdog expiration.
> It is activated by boot parameter 'warn_irq' (disabled by default).
>
> Upon first expiration of the TCO watchdog, a warning interrupt is 
> fired and the driver calls panic() to dump debug information 
> (registers and call stacks).
>
> At the same time, the TCO watchdog reloads with 2.4 seconds timeout 
> value and runs until the second expiration.
>
> Signed-off-by: Francois-Nicolas Muller 
> <francois-nicolas.muller@intel.com>
> ---
>   drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 66 insertions(+)
>
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c 
> index e802a54..1ce9ad8 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -49,6 +49,8 @@
>   /* Module and version information */
>   #define DRV_NAME	"iTCO_wdt"
>   #define DRV_VERSION	"1.11"
> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
> +#define TCO_CLASS	DRV_NAME
>
>   /* Includes */
>   #include <linux/module.h>		/* For module specific items */
> @@ -68,6 +70,9 @@
>   #include <linux/pm.h>			/* For suspend/resume */
>   #include <linux/mfd/core.h>
>   #include <linux/mfd/lpc_ich.h>
> +#include <linux/nmi.h>
> +#include <linux/acpi.h>
> +#include <acpi/actypes.h>
>
>   #include "iTCO_vendor.h"
>
> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>   	bool started;
>   } iTCO_wdt_private;
>
> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> +	{"8086229C", 0},
> +	{"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> +

Doesn't that result in auto-loading the driver ?

If so, if that is really desirable, it should be a separate patch.

>   /* module parameters */
>   #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>   static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>   	"Turn off SMI clearing watchdog (depends on 
> TCO-version)(default=1)");
>

Really looks like the patch is corrupted, with lines merged.

> +static bool warn_irq;
> +module_param(warn_irq, bool, 0);
> +MODULE_PARM_DESC(warn_irq,
> +	"Watchdog trigs a panic at first expiration (default=0)");
> +
>   /*
>    * Some TCO specific functions
>    */
> @@ -200,6 +216,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>   	return ret; /* returns: 0 = OK, -EIO = Error */  }
>
> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void
> +*context) {

Either the patch is corrupted, or you need to look into kernel coding style rules.

> +	pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
> +
That is kind of redundant with the following panic.

> +	trigger_all_cpu_backtrace();
> +	panic("Kernel Watchdog");
> +

Quite frankly, I don't see the point of this patch. Sure, this will dump the stack. Ok, but so what ? What value would it have for the user to see the interrupt call stack that is seen if the watchdog times out ?

Guess I must be missing something.

> +	/* This code should not be reached */
> +	return IRQ_HANDLED;
> +}
> +
> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
> +	unsigned long long gpe;
> +	acpi_status status;
> +
> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
> +	if (ACPI_FAILURE(status))
> +		return -EINVAL;
> +
> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
> +					  iTCO_wdt_wirq, NULL);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	acpi_enable_gpe(NULL, gpe);
> +
> +	pr_info("interrupt=SCI GPE=0x%02llx", gpe);

I fail to see the value of this log message.

> +	return 0;
> +}
> +
>   static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>   	unsigned int val;
> @@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
>   	},
>   };
>
> +static struct acpi_driver iTCO_wdt_acpi_driver = {
> +	.name = DRV_NAME_ACPI,
> +	.class = TCO_CLASS,
> +	.ids = iTCO_wdt_ids,
> +	.ops = {
> +		.add = iTCO_wdt_acpi_add,
> +	},
> +};
> +
>   static int __init iTCO_wdt_init_module(void)  {
>   	int err;
> @@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
>   	if (err)
>   		return err;
>
> +	if (warn_irq) {
> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
> +		if (err) {
> +			platform_driver_unregister(&iTCO_wdt_driver);
> +			return err;
> +		}
> +	}
> +
>   	return 0;
>   }
>
>   static void __exit iTCO_wdt_cleanup_module(void)  {

The original code is not formatted like this.

>   	platform_driver_unregister(&iTCO_wdt_driver);
> +	if (warn_irq)
> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>   	pr_info("Watchdog Module Unloaded\n");  }
>
> --
> 1.7.9.5
>
>
> -----Original Message-----
> From: Darren Hart [mailto:dvhart@infradead.org]
> Sent: Monday, December 22, 2014 4:41 PM
> To: Muller, Francois-nicolas; Guenter Roeck
> Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI 
> Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
> Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation
>
>
>
> On December 22, 2014 5:18:33 AM PST, "Muller, Francois-nicolas" <francois-nicolas.muller@intel.com> wrote:
>> Thanks Darren and Guenter for your review.
>> Here is a new patchset, including the feature in the watchdog driver 
>> and using a boot parameter to be enabled.
>> François-Nicolas
>>
>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>
>
> Please provide a complete commit message.
>
>> Signed-off-by: Francois-Nicolas Muller 
>> <francois-nicolas.muller@intel.com>
>
> --
> Darren Hart
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-01-09  5:09           ` Guenter Roeck
  2015-01-14 16:32             ` Muller, Francois-nicolas
@ 2015-01-14 16:38             ` Muller, Francois-nicolas
  2015-01-20 15:00               ` Rafael J. Wysocki
  1 sibling, 1 reply; 18+ messages in thread
From: Muller, Francois-nicolas @ 2015-01-14 16:38 UTC (permalink / raw
  To: Guenter Roeck, Darren Hart
  Cc: 'platform-driver-x86@vger.kernel.org', Rafael Wysocki,
	Linux ACPI Mailing List, linux-watchdog@vger.kernel.org,
	Wim Van Sebroeck

From 54d2ff5e13c1b35d5019b82376dabb903ebe30d6 Mon Sep 17 00:00:00 2001
From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
Date: Wed, 14 Jan 2015 14:27:43 +0100
Subject: [PATCH] Adding TCO watchdog warning interrupt handling.

This feature is useful to root cause watchdog expiration.
It is activated by boot parameter 'warn_irq' (disabled by default).

Upon first expiration of the TCO watchdog, a warning interrupt is fired then the
interrupt handler dumps registers and call stack of all available cores.

Finally panic() is called and notifies the panic handlers if any. At the same
time, the TCO watchdog reloads with 2.4 seconds timeout value.

When warning interrupt is enabled, platform reboot depends on
CONFIG_PANIC_TIMEOUT value :

- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
reset the platform if second expiration happens before TCO has been kicked
again.

- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
restart procedure).

- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
seconds delay (emergency restart procedure).

Change-Id: I009c41f2f3dc3bd091b4d2a45b4ea0be85c8ce27
Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
---
 drivers/watchdog/iTCO_wdt.c | 64 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index e802a54..a8c16d2 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -49,6 +49,8 @@
 /* Module and version information */
 #define DRV_NAME	"iTCO_wdt"
 #define DRV_VERSION	"1.11"
+#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
+#define TCO_CLASS	DRV_NAME
 
 /* Includes */
 #include <linux/module.h>		/* For module specific items */
@@ -68,6 +70,9 @@
 #include <linux/pm.h>			/* For suspend/resume */
 #include <linux/mfd/core.h>
 #include <linux/mfd/lpc_ich.h>
+#include <linux/nmi.h>
+#include <linux/acpi.h>
+#include <acpi/actypes.h>
 
 #include "iTCO_vendor.h"
 
@@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
 	bool started;
 } iTCO_wdt_private;
 
+static const struct acpi_device_id iTCO_wdt_ids[] = {
+	{"8086229C", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
+
 /* module parameters */
 #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
 static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */
@@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
 MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
 	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
 
+static bool warn_irq;
+module_param(warn_irq, bool, 0);
+MODULE_PARM_DESC(warn_irq,
+	"Watchdog trigs a panic at first expiration (default=0)");
+
 /*
  * Some TCO specific functions
  */
@@ -200,6 +216,35 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
 	return ret; /* returns: 0 = OK, -EIO = Error */
 }
 
+static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
+{
+	trigger_all_cpu_backtrace();
+	panic("Kernel Watchdog");
+
+	/* This code should not be reached */
+	return IRQ_HANDLED;
+}
+
+static int iTCO_wdt_acpi_add(struct acpi_device *device)
+{
+	unsigned long long gpe;
+	acpi_status status;
+
+	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+					  iTCO_wdt_wirq, NULL);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	acpi_enable_gpe(NULL, gpe);
+
+	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
+	return 0;
+}
+
 static int iTCO_wdt_start(struct watchdog_device *wd_dev)
 {
 	unsigned int val;
@@ -628,6 +673,15 @@ static struct platform_driver iTCO_wdt_driver = {
 	},
 };
 
+static struct acpi_driver iTCO_wdt_acpi_driver = {
+	.name = DRV_NAME_ACPI,
+	.class = TCO_CLASS,
+	.ids = iTCO_wdt_ids,
+	.ops = {
+		.add = iTCO_wdt_acpi_add,
+	},
+};
+
 static int __init iTCO_wdt_init_module(void)
 {
 	int err;
@@ -638,12 +692,22 @@ static int __init iTCO_wdt_init_module(void)
 	if (err)
 		return err;
 
+	if (warn_irq) {
+		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
+		if (err) {
+			platform_driver_unregister(&iTCO_wdt_driver);
+			return err;
+		}
+	}
+
 	return 0;
 }
 
 static void __exit iTCO_wdt_cleanup_module(void)
 {
 	platform_driver_unregister(&iTCO_wdt_driver);
+	if (warn_irq)
+		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
 	pr_info("Watchdog Module Unloaded\n");
 }
 
-- 
1.9.1


-----Original Message-----
From: Muller, Francois-nicolas 
Sent: Wednesday, January 14, 2015 5:33 PM
To: 'Guenter Roeck'; Darren Hart
Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation

Thanks for your review, my comments embedded below.
I'm sending the new version of the patch in the following email.
François-Nicolas

>-----Original Message-----
>From: Guenter Roeck [mailto:linux@roeck-us.net]
>Sent: Friday, January 09, 2015 6:09 AM
>To: Muller, Francois-nicolas; Darren Hart
>Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI 
>Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
>Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation
>
>On 12/22/2014 08:07 AM, Muller, Francois-nicolas wrote:
>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>
>> Adding TCO watchdog warning interrupt handling.
>>
>> This feature is useful to root cause watchdog expiration.
>> It is activated by boot parameter 'warn_irq' (disabled by default).
>>
>> Upon first expiration of the TCO watchdog, a warning interrupt is 
>> fired and the driver calls panic() to dump debug information 
>> (registers and call stacks).
>>
>> At the same time, the TCO watchdog reloads with 2.4 seconds timeout 
>> value and runs until the second expiration.
>>
>> Signed-off-by: Francois-Nicolas Muller 
>> <francois-nicolas.muller@intel.com>
>> ---
>>   drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/drivers/watchdog/iTCO_wdt.c 
>> b/drivers/watchdog/iTCO_wdt.c index e802a54..1ce9ad8 100644
>> --- a/drivers/watchdog/iTCO_wdt.c
>> +++ b/drivers/watchdog/iTCO_wdt.c
>> @@ -49,6 +49,8 @@
>>   /* Module and version information */
>>   #define DRV_NAME	"iTCO_wdt"
>>   #define DRV_VERSION	"1.11"
>> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
>> +#define TCO_CLASS	DRV_NAME
>>
>>   /* Includes */
>>   #include <linux/module.h>		/* For module specific items */
>> @@ -68,6 +70,9 @@
>>   #include <linux/pm.h>			/* For suspend/resume */
>>   #include <linux/mfd/core.h>
>>   #include <linux/mfd/lpc_ich.h>
>> +#include <linux/nmi.h>
>> +#include <linux/acpi.h>
>> +#include <acpi/actypes.h>
>>
>>   #include "iTCO_vendor.h"
>>
>> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>>   	bool started;
>>   } iTCO_wdt_private;
>>
>> +static const struct acpi_device_id iTCO_wdt_ids[] = {
>> +	{"8086229C", 0},
>> +	{"", 0},
>> +};
>> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
>> +
>
>Doesn't that result in auto-loading the driver ?

The driver is loaded only when Bios exposes the acpi device and the boot param warn_irq is true.
Could you clarify ? I don't understand the issue. 

>
>If so, if that is really desirable, it should be a separate patch.
>
>>   /* module parameters */
>>   #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>>   static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>>   	"Turn off SMI clearing watchdog (depends on 
>> TCO-version)(default=1)");
>>
>
>Really looks like the patch is corrupted, with lines merged.

Yes sorry for this, my mail reader doesn't handle well plain text.

>
>> +static bool warn_irq;
>> +module_param(warn_irq, bool, 0);
>> +MODULE_PARM_DESC(warn_irq,
>> +	"Watchdog trigs a panic at first expiration (default=0)");
>> +
>>   /*
>>    * Some TCO specific functions
>>    */
>> @@ -200,6 +216,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>>   	return ret; /* returns: 0 = OK, -EIO = Error */  }
>>
>> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void
>> +*context) {
>
>Either the patch is corrupted, or you need to look into kernel coding style rules.
>
>> +	pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
>> +
>That is kind of redundant with the following panic.
>

Sure I will remove it.

>> +	trigger_all_cpu_backtrace();
>> +	panic("Kernel Watchdog");
>> +
>
>Quite frankly, I don't see the point of this patch. Sure, this will dump the stack. Ok, but so what ? What value would it have for the user to see the interrupt call stack that is seen if the watchdog times out ?
>
>Guess I must be missing something.
>

Trigger_all_cpu_backtrace() is useful to dump stacks of all running cpu (if any) by triggering an NMI to them.
The cause of a cpu hang may be related to another cpu and all cpu stacks are helpful information.
Panic() also dumps the interrupt call stack, which is useless here, but notifies panic handlers which could add some debug information.
I think both are useful for these reasons.

>> +	/* This code should not be reached */
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
>> +	unsigned long long gpe;
>> +	acpi_status status;
>> +
>> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
>> +	if (ACPI_FAILURE(status))
>> +		return -EINVAL;
>> +
>> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
>> +					  iTCO_wdt_wirq, NULL);
>> +	if (ACPI_FAILURE(status))
>> +		return -ENODEV;
>> +
>> +	acpi_enable_gpe(NULL, gpe);
>> +
>> +	pr_info("interrupt=SCI GPE=0x%02llx", gpe);
>
>I fail to see the value of this log message.
>

I will change it to pr_debug

>> +	return 0;
>> +}
>> +
>>   static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>>   	unsigned int val;
>> @@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
>>   	},
>>   };
>>
>> +static struct acpi_driver iTCO_wdt_acpi_driver = {
>> +	.name = DRV_NAME_ACPI,
>> +	.class = TCO_CLASS,
>> +	.ids = iTCO_wdt_ids,
>> +	.ops = {
>> +		.add = iTCO_wdt_acpi_add,
>> +	},
>> +};
>> +
>>   static int __init iTCO_wdt_init_module(void)  {
>>   	int err;
>> @@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
>>   	if (err)
>>   		return err;
>>
>> +	if (warn_irq) {
>> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
>> +		if (err) {
>> +			platform_driver_unregister(&iTCO_wdt_driver);
>> +			return err;
>> +		}
>> +	}
>> +
>>   	return 0;
>>   }
>>
>>   static void __exit iTCO_wdt_cleanup_module(void)  {
>
>The original code is not formatted like this.
>
>>   	platform_driver_unregister(&iTCO_wdt_driver);
>> +	if (warn_irq)
>> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>>   	pr_info("Watchdog Module Unloaded\n");  }
>>
>> --
>> 1.7.9.5
>>
>>
>> -----Original Message-----
>> From: Darren Hart [mailto:dvhart@infradead.org]
>> Sent: Monday, December 22, 2014 4:41 PM
>> To: Muller, Francois-nicolas; Guenter Roeck
>> Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI 
>> Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
>> Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation
>>
>>
>>
>> On December 22, 2014 5:18:33 AM PST, "Muller, Francois-nicolas" <francois-nicolas.muller@intel.com> wrote:
>>> Thanks Darren and Guenter for your review.
>>> Here is a new patchset, including the feature in the watchdog driver 
>>> and using a boot parameter to be enabled.
>>> François-Nicolas
>>>
>>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>>
>>
>> Please provide a complete commit message.
>>
>>> Signed-off-by: Francois-Nicolas Muller 
>>> <francois-nicolas.muller@intel.com>
>>
>> --
>> Darren Hart
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>

-----Original Message-----
From: Guenter Roeck [mailto:linux@roeck-us.net]
Sent: Friday, January 09, 2015 6:09 AM
To: Muller, Francois-nicolas; Darren Hart
Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation

On 12/22/2014 08:07 AM, Muller, Francois-nicolas wrote:
> Subject: [PATCH] TCO watchdog warning interrupt handler
>
> Adding TCO watchdog warning interrupt handling.
>
> This feature is useful to root cause watchdog expiration.
> It is activated by boot parameter 'warn_irq' (disabled by default).
>
> Upon first expiration of the TCO watchdog, a warning interrupt is 
> fired and the driver calls panic() to dump debug information 
> (registers and call stacks).
>
> At the same time, the TCO watchdog reloads with 2.4 seconds timeout 
> value and runs until the second expiration.
>
> Signed-off-by: Francois-Nicolas Muller 
> <francois-nicolas.muller@intel.com>
> ---
>   drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 66 insertions(+)
>
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c 
> index e802a54..1ce9ad8 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -49,6 +49,8 @@
>   /* Module and version information */
>   #define DRV_NAME	"iTCO_wdt"
>   #define DRV_VERSION	"1.11"
> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
> +#define TCO_CLASS	DRV_NAME
>
>   /* Includes */
>   #include <linux/module.h>		/* For module specific items */
> @@ -68,6 +70,9 @@
>   #include <linux/pm.h>			/* For suspend/resume */
>   #include <linux/mfd/core.h>
>   #include <linux/mfd/lpc_ich.h>
> +#include <linux/nmi.h>
> +#include <linux/acpi.h>
> +#include <acpi/actypes.h>
>
>   #include "iTCO_vendor.h"
>
> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>   	bool started;
>   } iTCO_wdt_private;
>
> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> +	{"8086229C", 0},
> +	{"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> +

Doesn't that result in auto-loading the driver ?

If so, if that is really desirable, it should be a separate patch.

>   /* module parameters */
>   #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>   static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>   	"Turn off SMI clearing watchdog (depends on 
> TCO-version)(default=1)");
>

Really looks like the patch is corrupted, with lines merged.

> +static bool warn_irq;
> +module_param(warn_irq, bool, 0);
> +MODULE_PARM_DESC(warn_irq,
> +	"Watchdog trigs a panic at first expiration (default=0)");
> +
>   /*
>    * Some TCO specific functions
>    */
> @@ -200,6 +216,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>   	return ret; /* returns: 0 = OK, -EIO = Error */  }
>
> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void
> +*context) {

Either the patch is corrupted, or you need to look into kernel coding style rules.

> +	pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
> +
That is kind of redundant with the following panic.

> +	trigger_all_cpu_backtrace();
> +	panic("Kernel Watchdog");
> +

Quite frankly, I don't see the point of this patch. Sure, this will dump the stack. Ok, but so what ? What value would it have for the user to see the interrupt call stack that is seen if the watchdog times out ?

Guess I must be missing something.

> +	/* This code should not be reached */
> +	return IRQ_HANDLED;
> +}
> +
> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
> +	unsigned long long gpe;
> +	acpi_status status;
> +
> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
> +	if (ACPI_FAILURE(status))
> +		return -EINVAL;
> +
> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
> +					  iTCO_wdt_wirq, NULL);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	acpi_enable_gpe(NULL, gpe);
> +
> +	pr_info("interrupt=SCI GPE=0x%02llx", gpe);

I fail to see the value of this log message.

> +	return 0;
> +}
> +
>   static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>   	unsigned int val;
> @@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
>   	},
>   };
>
> +static struct acpi_driver iTCO_wdt_acpi_driver = {
> +	.name = DRV_NAME_ACPI,
> +	.class = TCO_CLASS,
> +	.ids = iTCO_wdt_ids,
> +	.ops = {
> +		.add = iTCO_wdt_acpi_add,
> +	},
> +};
> +
>   static int __init iTCO_wdt_init_module(void)  {
>   	int err;
> @@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
>   	if (err)
>   		return err;
>
> +	if (warn_irq) {
> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
> +		if (err) {
> +			platform_driver_unregister(&iTCO_wdt_driver);
> +			return err;
> +		}
> +	}
> +
>   	return 0;
>   }
>
>   static void __exit iTCO_wdt_cleanup_module(void)  {

The original code is not formatted like this.

>   	platform_driver_unregister(&iTCO_wdt_driver);
> +	if (warn_irq)
> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>   	pr_info("Watchdog Module Unloaded\n");  }
>
> --
> 1.7.9.5
>
>
> -----Original Message-----
> From: Darren Hart [mailto:dvhart@infradead.org]
> Sent: Monday, December 22, 2014 4:41 PM
> To: Muller, Francois-nicolas; Guenter Roeck
> Cc: 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI 
> Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
> Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation
>
>
>
> On December 22, 2014 5:18:33 AM PST, "Muller, Francois-nicolas" <francois-nicolas.muller@intel.com> wrote:
>> Thanks Darren and Guenter for your review.
>> Here is a new patchset, including the feature in the watchdog driver 
>> and using a boot parameter to be enabled.
>> François-Nicolas
>>
>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>
>
> Please provide a complete commit message.
>
>> Signed-off-by: Francois-Nicolas Muller 
>> <francois-nicolas.muller@intel.com>
>
> --
> Darren Hart
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-01-14 16:32             ` Muller, Francois-nicolas
@ 2015-01-14 18:16               ` Guenter Roeck
  2015-01-15 13:27                 ` Muller, Francois-nicolas
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2015-01-14 18:16 UTC (permalink / raw
  To: Muller, Francois-nicolas
  Cc: Darren Hart, 'platform-driver-x86@vger.kernel.org',
	Rafael Wysocki, Linux ACPI Mailing List,
	linux-watchdog@vger.kernel.org, Wim Van Sebroeck

On Wed, Jan 14, 2015 at 04:32:33PM +0000, Muller, Francois-nicolas wrote:
> Thanks for your review, my comments embedded below.
> I'm sending the new version of the patch in the following email.
> François-Nicolas
> 
...
> >>
> >> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> >> +	{"8086229C", 0},
> >> +	{"", 0},
> >> +};
> >> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> >> +
> >
> >Doesn't that result in auto-loading the driver ?
> 
> The driver is loaded only when Bios exposes the acpi device and the boot param warn_irq is true.
> Could you clarify ? I don't understand the issue. 
> 
Unless I am missing something, you are making two changes
in a single patch.

- auto-load driver if acpi device is exposed by bios
- Add interrupt handling support

I suggested to separate those changes into two patches.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-01-14 18:16               ` Guenter Roeck
@ 2015-01-15 13:27                 ` Muller, Francois-nicolas
  2015-01-15 14:49                   ` Guenter Roeck
  0 siblings, 1 reply; 18+ messages in thread
From: Muller, Francois-nicolas @ 2015-01-15 13:27 UTC (permalink / raw
  To: Guenter Roeck
  Cc: Darren Hart, 'platform-driver-x86@vger.kernel.org',
	Rafael Wysocki, Linux ACPI Mailing List,
	linux-watchdog@vger.kernel.org, Wim Van Sebroeck

TCO driver is anyway auto-loaded by mfd driver lpc_ich as a sub-function of it.

The aim of my patch is only to add warning interrupt support in TCO driver.
For this it need the GPE number which is exposed by Bios in acpi tables.
So the patch registers also the TCO driver as an acpi driver to be able to retrieve this value.

The acpi code part is only required by the interrupt handling support, not needed for the loading of the driver itself (already done by lpc_ich).

In this context I don't see the point of dissociating the driver part from the loading, unless the loading would need to be reverted later.

Thanks,
François-Nicolas

-----Original Message-----
From: Guenter Roeck [mailto:linux@roeck-us.net] 
Sent: Wednesday, January 14, 2015 7:17 PM
To: Muller, Francois-nicolas
Cc: Darren Hart; 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation

On Wed, Jan 14, 2015 at 04:32:33PM +0000, Muller, Francois-nicolas wrote:
> Thanks for your review, my comments embedded below.
> I'm sending the new version of the patch in the following email.
> François-Nicolas
> 
...
> >>
> >> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> >> +	{"8086229C", 0},
> >> +	{"", 0},
> >> +};
> >> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> >> +
> >
> >Doesn't that result in auto-loading the driver ?
> 
> The driver is loaded only when Bios exposes the acpi device and the boot param warn_irq is true.
> Could you clarify ? I don't understand the issue. 
> 
Unless I am missing something, you are making two changes in a single patch.

- auto-load driver if acpi device is exposed by bios
- Add interrupt handling support

I suggested to separate those changes into two patches.

Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-01-15 13:27                 ` Muller, Francois-nicolas
@ 2015-01-15 14:49                   ` Guenter Roeck
  2015-01-20 14:25                     ` Muller, Francois-nicolas
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2015-01-15 14:49 UTC (permalink / raw
  To: Muller, Francois-nicolas
  Cc: Darren Hart, 'platform-driver-x86@vger.kernel.org',
	Rafael Wysocki, Linux ACPI Mailing List,
	linux-watchdog@vger.kernel.org, Wim Van Sebroeck

On 01/15/2015 05:27 AM, Muller, Francois-nicolas wrote:
> TCO driver is anyway auto-loaded by mfd driver lpc_ich as a sub-function of it.
>
> The aim of my patch is only to add warning interrupt support in TCO driver.
> For this it need the GPE number which is exposed by Bios in acpi tables.
> So the patch registers also the TCO driver as an acpi driver to be able to retrieve this value.
>
> The acpi code part is only required by the interrupt handling support, not needed for the loading of the driver itself (already done by lpc_ich).
>
> In this context I don't see the point of dissociating the driver part from the loading, unless the loading would need to be reverted later.
>

Ok, makes sense.

Regarding the patch itself, I'll leave it up to Wim to decide what to do.
Personally I dislike the notion of panicing as response to a watchdog timeout.

Guenter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-01-15 14:49                   ` Guenter Roeck
@ 2015-01-20 14:25                     ` Muller, Francois-nicolas
  0 siblings, 0 replies; 18+ messages in thread
From: Muller, Francois-nicolas @ 2015-01-20 14:25 UTC (permalink / raw
  To: Guenter Roeck
  Cc: Darren Hart, 'platform-driver-x86@vger.kernel.org',
	Rafael Wysocki, Linux ACPI Mailing List,
	linux-watchdog@vger.kernel.org, Wim Van Sebroeck

The panic is only there for debug purposes at first expiry.
Watchdog timeout behavior (second expiry) is not changed.
I added the panic as a config option, enabled by default.
François-Nicolas

>From 5e1680aa0df9f49459cdc7211ba80d6934f65fbd Mon Sep 17 00:00:00 2001
From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
Date: Tue, 20 Jan 2015 14:55:42 +0100
Subject: [PATCH] Adding TCO watchdog warning interrupt handling.

This feature is useful to root cause watchdog expiration.
It is activated by boot parameter 'warn_irq' (disabled by default).

Upon first expiration of the TCO watchdog, a warning interrupt is fired, then
the interrupt handler dumps registers and call stack of all available cpus.
TCO watchdog reloads with 2.4 seconds timeout for second expiration.

If CONFIG_ITCO_WARN_PANIC is set, the warning interrupt also calls panic()
which notifies the panic handlers then reboots the platform, depending on
CONFIG_PANIC_TIMEOUT value :

- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
reset the platform if second expiration happens before TCO has been kicked
again.

- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
restart procedure).

- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
seconds delay (emergency restart procedure).

Change-Id: I48dcb9d38218c8218e35f9969f064b9d5cf316f1
Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
---
 drivers/watchdog/Kconfig    |   13 +++++++++
 drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 79d2589..15c3807 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -674,6 +674,19 @@ config ITCO_VENDOR_SUPPORT
 	  devices. At this moment we only have additional support for some
 	  SuperMicro Inc. motherboards.
 
+config ITCO_WARNING_PANIC
+	bool "Intel TCO Timer/Watchdog panic on warning interrupt"
+	depends on ITCO_WDT
+	default y
+	---help---
+	  Force a call to panic() when TCO warning interrupt occurs.
+
+	  Warning interrupt happens if warn_irq module parameter is set and
+	  TCO timer first expires.
+
+	  If not set, only cpu backtraces are dumped, no call to panic() and
+	  no notification of panic are done.
+
 config IT8712F_WDT
 	tristate "IT8712F (Smart Guardian) Watchdog Timer"
 	depends on X86
diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index e802a54..e7c5169 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -49,6 +49,8 @@
 /* Module and version information */
 #define DRV_NAME	"iTCO_wdt"
 #define DRV_VERSION	"1.11"
+#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
+#define TCO_CLASS	DRV_NAME
 
 /* Includes */
 #include <linux/module.h>		/* For module specific items */
@@ -68,6 +70,9 @@
 #include <linux/pm.h>			/* For suspend/resume */
 #include <linux/mfd/core.h>
 #include <linux/mfd/lpc_ich.h>
+#include <linux/nmi.h>
+#include <linux/acpi.h>
+#include <acpi/actypes.h>
 
 #include "iTCO_vendor.h"
 
@@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
 	bool started;
 } iTCO_wdt_private;
 
+static const struct acpi_device_id iTCO_wdt_ids[] = {
+	{"8086229C", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
+
 /* module parameters */
 #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
 static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */
@@ -126,6 +137,13 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
 MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
 	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
 
+static bool warn_irq;
+module_param(warn_irq, bool, 0);
+MODULE_PARM_DESC(warn_irq,
+	"Dump all cpus backtraces at first watchdog timer expiration (default=0)");
+
+static bool warn_irq_panic = CONFIG_ITCO_WARNING_PANIC;
+
 /*
  * Some TCO specific functions
  */
@@ -200,6 +218,35 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
 	return ret; /* returns: 0 = OK, -EIO = Error */
 }
 
+static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
+{
+	trigger_all_cpu_backtrace();
+	if (warn_irq_panic)
+		panic("Kernel Watchdog");
+
+	return IRQ_HANDLED;
+}
+
+static int iTCO_wdt_acpi_add(struct acpi_device *device)
+{
+	unsigned long long gpe;
+	acpi_status status;
+
+	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+					  iTCO_wdt_wirq, NULL);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	acpi_enable_gpe(NULL, gpe);
+
+	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
+	return 0;
+}
+
 static int iTCO_wdt_start(struct watchdog_device *wd_dev)
 {
 	unsigned int val;
@@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
 	},
 };
 
+static struct acpi_driver iTCO_wdt_acpi_driver = {
+	.name = DRV_NAME_ACPI,
+	.class = TCO_CLASS,
+	.ids = iTCO_wdt_ids,
+	.ops = {
+		.add = iTCO_wdt_acpi_add,
+	},
+};
+
 static int __init iTCO_wdt_init_module(void)
 {
 	int err;
@@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
 	if (err)
 		return err;
 
+	if (warn_irq) {
+		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
+		if (err) {
+			platform_driver_unregister(&iTCO_wdt_driver);
+			return err;
+		}
+	}
+
 	return 0;
 }
 
 static void __exit iTCO_wdt_cleanup_module(void)
 {
 	platform_driver_unregister(&iTCO_wdt_driver);
+	if (warn_irq)
+		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
 	pr_info("Watchdog Module Unloaded\n");
 }
 
-- 
1.7.9.5



-----Original Message-----
From: Guenter Roeck [mailto:linux@roeck-us.net] 
Sent: Thursday, January 15, 2015 3:49 PM
To: Muller, Francois-nicolas
Cc: Darren Hart; 'platform-driver-x86@vger.kernel.org'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation

On 01/15/2015 05:27 AM, Muller, Francois-nicolas wrote:
> TCO driver is anyway auto-loaded by mfd driver lpc_ich as a sub-function of it.
>
> The aim of my patch is only to add warning interrupt support in TCO driver.
> For this it need the GPE number which is exposed by Bios in acpi tables.
> So the patch registers also the TCO driver as an acpi driver to be able to retrieve this value.
>
> The acpi code part is only required by the interrupt handling support, not needed for the loading of the driver itself (already done by lpc_ich).
>
> In this context I don't see the point of dissociating the driver part from the loading, unless the loading would need to be reverted later.
>

Ok, makes sense.

Regarding the patch itself, I'll leave it up to Wim to decide what to do.
Personally I dislike the notion of panicing as response to a watchdog timeout.

Guenter

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-01-14 16:38             ` Muller, Francois-nicolas
@ 2015-01-20 15:00               ` Rafael J. Wysocki
  2015-02-12 10:13                 ` Muller, Francois-nicolas
  0 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2015-01-20 15:00 UTC (permalink / raw
  To: Muller, Francois-nicolas
  Cc: Guenter Roeck, Darren Hart,
	'platform-driver-x86@vger.kernel.org',
	Linux ACPI Mailing List, linux-watchdog@vger.kernel.org,
	Wim Van Sebroeck

On Wednesday, January 14, 2015 04:38:49 PM Muller, Francois-nicolas wrote:
> From 54d2ff5e13c1b35d5019b82376dabb903ebe30d6 Mon Sep 17 00:00:00 2001
> From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> Date: Wed, 14 Jan 2015 14:27:43 +0100
> Subject: [PATCH] Adding TCO watchdog warning interrupt handling.
> 
> This feature is useful to root cause watchdog expiration.
> It is activated by boot parameter 'warn_irq' (disabled by default).
> 
> Upon first expiration of the TCO watchdog, a warning interrupt is fired then the
> interrupt handler dumps registers and call stack of all available cores.
> 
> Finally panic() is called and notifies the panic handlers if any. At the same
> time, the TCO watchdog reloads with 2.4 seconds timeout value.
> 
> When warning interrupt is enabled, platform reboot depends on
> CONFIG_PANIC_TIMEOUT value :
> 
> - If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
> reset the platform if second expiration happens before TCO has been kicked
> again.
> 
> - If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
> restart procedure).
> 
> - If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
> seconds delay (emergency restart procedure).
> 
> Change-Id: I009c41f2f3dc3bd091b4d2a45b4ea0be85c8ce27
> Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> ---
>  drivers/watchdog/iTCO_wdt.c | 64 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
> index e802a54..a8c16d2 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -49,6 +49,8 @@
>  /* Module and version information */
>  #define DRV_NAME	"iTCO_wdt"
>  #define DRV_VERSION	"1.11"
> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
> +#define TCO_CLASS	DRV_NAME
>  
>  /* Includes */
>  #include <linux/module.h>		/* For module specific items */
> @@ -68,6 +70,9 @@
>  #include <linux/pm.h>			/* For suspend/resume */
>  #include <linux/mfd/core.h>
>  #include <linux/mfd/lpc_ich.h>
> +#include <linux/nmi.h>
> +#include <linux/acpi.h>
> +#include <acpi/actypes.h>
>  
>  #include "iTCO_vendor.h"
>  
> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>  	bool started;
>  } iTCO_wdt_private;
>  
> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> +	{"8086229C", 0},

This is not a proper ACPI or PNP device name as far as I can say.

What is it?

> +	{"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> +
>  /* module parameters */
>  #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>  static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */
> @@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
>  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>  	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
>  
> +static bool warn_irq;
> +module_param(warn_irq, bool, 0);
> +MODULE_PARM_DESC(warn_irq,
> +	"Watchdog trigs a panic at first expiration (default=0)");
> +
>  /*
>   * Some TCO specific functions
>   */
> @@ -200,6 +216,35 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>  	return ret; /* returns: 0 = OK, -EIO = Error */
>  }
>  
> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
> +{
> +	trigger_all_cpu_backtrace();
> +	panic("Kernel Watchdog");
> +
> +	/* This code should not be reached */
> +	return IRQ_HANDLED;
> +}
> +
> +static int iTCO_wdt_acpi_add(struct acpi_device *device)
> +{
> +	unsigned long long gpe;
> +	acpi_status status;
> +

The code below seems to mean: "If the device _HID returns '8086229C', there
should be a _GPE object under it which then returns the number of the GPE
to bind to within the FADT GPE blocks."

Where is this documented?

> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
> +	if (ACPI_FAILURE(status))
> +		return -EINVAL;
> +
> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
> +					  iTCO_wdt_wirq, NULL);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	acpi_enable_gpe(NULL, gpe);
> +
> +	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
> +	return 0;
> +}
> +
>  static int iTCO_wdt_start(struct watchdog_device *wd_dev)
>  {
>  	unsigned int val;
> @@ -628,6 +673,15 @@ static struct platform_driver iTCO_wdt_driver = {
>  	},
>  };
>  
> +static struct acpi_driver iTCO_wdt_acpi_driver = {
> +	.name = DRV_NAME_ACPI,
> +	.class = TCO_CLASS,
> +	.ids = iTCO_wdt_ids,
> +	.ops = {
> +		.add = iTCO_wdt_acpi_add,
> +	},
> +};
> +
>  static int __init iTCO_wdt_init_module(void)
>  {
>  	int err;
> @@ -638,12 +692,22 @@ static int __init iTCO_wdt_init_module(void)
>  	if (err)
>  		return err;
>  
> +	if (warn_irq) {
> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
> +		if (err) {
> +			platform_driver_unregister(&iTCO_wdt_driver);
> +			return err;
> +		}
> +	}
> +
>  	return 0;
>  }
>  
>  static void __exit iTCO_wdt_cleanup_module(void)
>  {
>  	platform_driver_unregister(&iTCO_wdt_driver);
> +	if (warn_irq)
> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>  	pr_info("Watchdog Module Unloaded\n");
>  }
>  
> -- 
> 1.9.1

And does it build for CONFIG_ACPI unset?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-01-20 15:00               ` Rafael J. Wysocki
@ 2015-02-12 10:13                 ` Muller, Francois-nicolas
  2015-03-10 16:14                   ` Muller, Francois-nicolas
  2015-03-18 23:00                   ` Rafael J. Wysocki
  0 siblings, 2 replies; 18+ messages in thread
From: Muller, Francois-nicolas @ 2015-02-12 10:13 UTC (permalink / raw
  To: Rafael J. Wysocki
  Cc: Guenter Roeck, Darren Hart,
	'platform-driver-x86@vger.kernel.org',
	Linux ACPI Mailing List, linux-watchdog@vger.kernel.org,
	Wim Van Sebroeck

Please find hereafter a new version of the patch with a documentation file and that builds with CONFIG_ACPI unset.

From a7135e6b4bc7c91d6ac72a4f38157f7f2308615b Mon Sep 17 00:00:00 2001
From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
Date: Tue, 20 Jan 2015 14:55:42 +0100
Subject: [PATCH] Adding TCO watchdog warning interrupt handling.

This feature is useful to root cause watchdog expiration.
It is activated by boot parameter 'warn_irq' (disabled by default).

Upon first expiration of the TCO watchdog, a warning interrupt is fired, then
the interrupt handler dumps registers and call stack of all available cpus.
TCO watchdog reloads with 2.4 seconds timeout for second expiration.

If CONFIG_ITCO_WARN_PANIC is set, the warning interrupt also calls panic()
which notifies the panic handlers then reboots the platform, depending on
CONFIG_PANIC_TIMEOUT value :

- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
reset the platform if second expiration happens before TCO has been kicked
again.

- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
restart procedure).

- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
seconds delay (emergency restart procedure).

See Documentation/watchdog/tco-wdt-warning-interrupt.txt for more details.

Change-Id: I7314a50812529423b117cf28e4a195a356da2f57
Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
---
 .../watchdog/tco-wdt-warning-interrupt.txt         |   85 ++++++++++++++++++++
 drivers/watchdog/Kconfig                           |   13 +++
 drivers/watchdog/iTCO_wdt.c                        |   80 ++++++++++++++++++
 3 files changed, 178 insertions(+)
 create mode 100644 Documentation/watchdog/tco-wdt-warning-interrupt.txt

diff --git a/Documentation/watchdog/tco-wdt-warning-interrupt.txt b/Documentation/watchdog/tco-wdt-warning-interrupt.txt
new file mode 100644
index 0000000..2e4eebf
--- /dev/null
+++ b/Documentation/watchdog/tco-wdt-warning-interrupt.txt
@@ -0,0 +1,85 @@
+Last reviewed: 02/12/2015
+
+                     TCO watchdog warning interrupt
+                 handled by drivers/watchdog/iTCO_wdt.c
+                      Documentation and code by
+       Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
+
+
+Introduction
+------------
+Intel TCO watchdog is intended to detect and recover from locks up of the
+platform. It contains a countdown timer, that should be reloaded on-time by
+software before reaching zero.
+
+If the platform locks up and is not able to reload the timer, then when it
+reaches zero:
+- the timer is automatically reloaded with 04h and starts decrementing again,
+- timeout bit is set in TCO1_STS register,
+- SMI or SCI interrupt is generated (optional).
+
+If it reaches zero a second time while timeout bit is set,
+- second_to_sts bit is set,
+- reset of the platform is initiated.
+
+At first timeout, the SMI (or SCI) can be used to provide debug information
+about the system state and help on fixing the cause of the hang. This is the
+"warning interrupt".
+
+Warning interrupt
+-----------------
+Warning interrupt handler is called when system is hung, so it is useful to
+gather maximum information about system state at this point for root-causing the
+issue.
+
+When the interrupt occurs,
+- call stacks of all running cpus are dumped,
+- panic() is called (optional)
+
+Enabling the warning interrupt
+------------------------------
+Boot parameter "warn_irq" (boolean) enables warning interrupt generation at
+first timer expiration (disabled by default).
+
+As this is a command line option, configuration can be changed easily without
+building again the code.
+
+Enabling panic upon warning interrupt
+-------------------------------------
+Warning interrupt handler can call panic() when Kconfig option
+CONFIG_ITCO_WARNING_PANIC is set.
+
+panic() call is useful in case of some panic handlers have been registered and
+need to be run at this time.
+
+When CONFIG_ITCO_WARN_PANIC is set,
+- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
+  reset the platform if second expiration happens before TCO has been kicked
+  again.
+- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
+  restart procedure).
+- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
+  seconds delay (emergency restart procedure).
+
+SCI vs SMI
+----------
+For the moment, the TCO watchdog warning interrupt feature is only available for
+platforms that are able to trigger a SCI upon first expiration of TCO watchdog.
+
+There is no support of the SMI option yet.
+
+ACPI configuration
+------------------
+Bios is configuring the GPE associated to the warning interrupt. The driver uses
+acpi tables to get the GPE number.
+
+This change is intended for Intel Cherrytrail platform. As TCO watchdog is part
+of lpc_ich module, its _HID is used in the driver to retrieve GPE configuration
+from Bios.
+
+If no GPE information is provided by the Bios, the interrupt is not handled and
+appears in the dmesg log as a warning. Second timeout is still able to trigger a
+reset.
+
+-- Francois-Nicolas Muller
+   (francois-nicolas.muller@intel.com)
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 79d2589..41f3647 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -674,6 +674,19 @@ config ITCO_VENDOR_SUPPORT
 	  devices. At this moment we only have additional support for some
 	  SuperMicro Inc. motherboards.
 
+config ITCO_WARNING_PANIC
+	bool "Intel TCO Timer/Watchdog panic on warning interrupt"
+	depends on ITCO_WDT && ACPI
+	default y
+	---help---
+	  Force a call to panic() when TCO warning interrupt occurs.
+
+	  Warning interrupt happens if warn_irq module parameter is set and
+	  TCO timer first expires.
+
+	  If not set, only cpu backtraces are dumped, no call to panic() and
+	  no notification of panic are done.
+
 config IT8712F_WDT
 	tristate "IT8712F (Smart Guardian) Watchdog Timer"
 	depends on X86
diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index e802a54..a25794c 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -49,6 +49,8 @@
 /* Module and version information */
 #define DRV_NAME	"iTCO_wdt"
 #define DRV_VERSION	"1.11"
+#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
+#define TCO_CLASS	DRV_NAME
 
 /* Includes */
 #include <linux/module.h>		/* For module specific items */
@@ -68,6 +70,11 @@
 #include <linux/pm.h>			/* For suspend/resume */
 #include <linux/mfd/core.h>
 #include <linux/mfd/lpc_ich.h>
+#include <linux/nmi.h>
+#ifdef CONFIG_ACPI
+#include <linux/acpi.h>
+#include <acpi/actypes.h>
+#endif
 
 #include "iTCO_vendor.h"
 
@@ -107,6 +114,14 @@ static struct {		/* this is private data for the iTCO_wdt device */
 	bool started;
 } iTCO_wdt_private;
 
+#ifdef CONFIG_ACPI
+static const struct acpi_device_id iTCO_wdt_ids[] = {
+	{"8086229C", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
+#endif
+
 /* module parameters */
 #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
 static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */
@@ -126,6 +141,15 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
 MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
 	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
 
+static bool warn_irq;
+module_param(warn_irq, bool, 0);
+MODULE_PARM_DESC(warn_irq,
+	"Dump all cpus backtraces at first watchdog timer expiration (default=0)");
+
+#ifdef CONFIG_ACPI
+static bool warn_irq_panic = CONFIG_ITCO_WARNING_PANIC;
+#endif
+
 /*
  * Some TCO specific functions
  */
@@ -200,6 +224,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
 	return ret; /* returns: 0 = OK, -EIO = Error */
 }
 
+#ifdef CONFIG_ACPI
+static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
+{
+	trigger_all_cpu_backtrace();
+	if (warn_irq_panic)
+		panic("Kernel Watchdog");
+
+	return IRQ_HANDLED;
+}
+
+static int iTCO_wdt_acpi_add(struct acpi_device *device)
+{
+	unsigned long long gpe;
+	acpi_status status;
+
+	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+					  iTCO_wdt_wirq, NULL);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	acpi_enable_gpe(NULL, gpe);
+
+	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
+	return 0;
+}
+#endif
+
 static int iTCO_wdt_start(struct watchdog_device *wd_dev)
 {
 	unsigned int val;
@@ -628,6 +683,17 @@ static struct platform_driver iTCO_wdt_driver = {
 	},
 };
 
+#ifdef CONFIG_ACPI
+static struct acpi_driver iTCO_wdt_acpi_driver = {
+	.name = DRV_NAME_ACPI,
+	.class = TCO_CLASS,
+	.ids = iTCO_wdt_ids,
+	.ops = {
+		.add = iTCO_wdt_acpi_add,
+	},
+};
+#endif
+
 static int __init iTCO_wdt_init_module(void)
 {
 	int err;
@@ -638,12 +704,26 @@ static int __init iTCO_wdt_init_module(void)
 	if (err)
 		return err;
 
+#ifdef CONFIG_ACPI
+	if (warn_irq) {
+		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
+		if (err) {
+			platform_driver_unregister(&iTCO_wdt_driver);
+			return err;
+		}
+	}
+#endif
+
 	return 0;
 }
 
 static void __exit iTCO_wdt_cleanup_module(void)
 {
 	platform_driver_unregister(&iTCO_wdt_driver);
+#ifdef CONFIG_ACPI
+	if (warn_irq)
+		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
+#endif
 	pr_info("Watchdog Module Unloaded\n");
 }
 
-- 
1.7.9.5




-----Original Message-----
From: Rafael J. Wysocki [mailto:rjw@rjwysocki.net] 
Sent: Tuesday, January 20, 2015 4:00 PM
To: Muller, Francois-nicolas
Cc: Guenter Roeck; Darren Hart; 'platform-driver-x86@vger.kernel.org'; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation

On Wednesday, January 14, 2015 04:38:49 PM Muller, Francois-nicolas wrote:
> From 54d2ff5e13c1b35d5019b82376dabb903ebe30d6 Mon Sep 17 00:00:00 2001
> From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> Date: Wed, 14 Jan 2015 14:27:43 +0100
> Subject: [PATCH] Adding TCO watchdog warning interrupt handling.
> 
> This feature is useful to root cause watchdog expiration.
> It is activated by boot parameter 'warn_irq' (disabled by default).
> 
> Upon first expiration of the TCO watchdog, a warning interrupt is 
> fired then the interrupt handler dumps registers and call stack of all available cores.
> 
> Finally panic() is called and notifies the panic handlers if any. At 
> the same time, the TCO watchdog reloads with 2.4 seconds timeout value.
> 
> When warning interrupt is enabled, platform reboot depends on 
> CONFIG_PANIC_TIMEOUT value :
> 
> - If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO 
> watchdog will reset the platform if second expiration happens before 
> TCO has been kicked again.
> 
> - If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately 
> (emergency restart procedure).
> 
> - If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot 
> after 1 or 2 seconds delay (emergency restart procedure).
> 
> Change-Id: I009c41f2f3dc3bd091b4d2a45b4ea0be85c8ce27
> Signed-off-by: Francois-Nicolas Muller 
> <francois-nicolas.muller@intel.com>
> ---
>  drivers/watchdog/iTCO_wdt.c | 64 
> +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c 
> index e802a54..a8c16d2 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -49,6 +49,8 @@
>  /* Module and version information */
>  #define DRV_NAME	"iTCO_wdt"
>  #define DRV_VERSION	"1.11"
> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
> +#define TCO_CLASS	DRV_NAME
>  
>  /* Includes */
>  #include <linux/module.h>		/* For module specific items */
> @@ -68,6 +70,9 @@
>  #include <linux/pm.h>			/* For suspend/resume */
>  #include <linux/mfd/core.h>
>  #include <linux/mfd/lpc_ich.h>
> +#include <linux/nmi.h>
> +#include <linux/acpi.h>
> +#include <acpi/actypes.h>
>  
>  #include "iTCO_vendor.h"
>  
> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>  	bool started;
>  } iTCO_wdt_private;
>  
> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> +	{"8086229C", 0},

This is not a proper ACPI or PNP device name as far as I can say.

What is it?

> +	{"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> +
>  /* module parameters */
>  #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>  static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 
> +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  
> MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>  	"Turn off SMI clearing watchdog (depends on 
> TCO-version)(default=1)");
>  
> +static bool warn_irq;
> +module_param(warn_irq, bool, 0);
> +MODULE_PARM_DESC(warn_irq,
> +	"Watchdog trigs a panic at first expiration (default=0)");
> +
>  /*
>   * Some TCO specific functions
>   */
> @@ -200,6 +216,35 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>  	return ret; /* returns: 0 = OK, -EIO = Error */  }
>  
> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void 
> +*context) {
> +	trigger_all_cpu_backtrace();
> +	panic("Kernel Watchdog");
> +
> +	/* This code should not be reached */
> +	return IRQ_HANDLED;
> +}
> +
> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
> +	unsigned long long gpe;
> +	acpi_status status;
> +

The code below seems to mean: "If the device _HID returns '8086229C', there should be a _GPE object under it which then returns the number of the GPE to bind to within the FADT GPE blocks."

Where is this documented?

> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
> +	if (ACPI_FAILURE(status))
> +		return -EINVAL;
> +
> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
> +					  iTCO_wdt_wirq, NULL);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	acpi_enable_gpe(NULL, gpe);
> +
> +	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
> +	return 0;
> +}
> +
>  static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>  	unsigned int val;
> @@ -628,6 +673,15 @@ static struct platform_driver iTCO_wdt_driver = {
>  	},
>  };
>  
> +static struct acpi_driver iTCO_wdt_acpi_driver = {
> +	.name = DRV_NAME_ACPI,
> +	.class = TCO_CLASS,
> +	.ids = iTCO_wdt_ids,
> +	.ops = {
> +		.add = iTCO_wdt_acpi_add,
> +	},
> +};
> +
>  static int __init iTCO_wdt_init_module(void)  {
>  	int err;
> @@ -638,12 +692,22 @@ static int __init iTCO_wdt_init_module(void)
>  	if (err)
>  		return err;
>  
> +	if (warn_irq) {
> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
> +		if (err) {
> +			platform_driver_unregister(&iTCO_wdt_driver);
> +			return err;
> +		}
> +	}
> +
>  	return 0;
>  }
>  
>  static void __exit iTCO_wdt_cleanup_module(void)  {
>  	platform_driver_unregister(&iTCO_wdt_driver);
> +	if (warn_irq)
> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>  	pr_info("Watchdog Module Unloaded\n");  }
>  
> --
> 1.9.1

And does it build for CONFIG_ACPI unset?


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-02-12 10:13                 ` Muller, Francois-nicolas
@ 2015-03-10 16:14                   ` Muller, Francois-nicolas
  2015-03-18 23:00                   ` Rafael J. Wysocki
  1 sibling, 0 replies; 18+ messages in thread
From: Muller, Francois-nicolas @ 2015-03-10 16:14 UTC (permalink / raw
  To: 'Rafael J. Wysocki', 'Guenter Roeck',
	'Darren Hart',
	'platform-driver-x86@vger.kernel.org',
	'Linux ACPI Mailing List',
	'linux-watchdog@vger.kernel.org',
	'Wim Van Sebroeck'

Any update?
Thanks,
François-Nicolas

-----Original Message-----
From: Muller, Francois-nicolas 
Sent: Thursday, February 12, 2015 11:14 AM
To: Rafael J. Wysocki
Cc: Guenter Roeck; Darren Hart; 'platform-driver-x86@vger.kernel.org'; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation

Please find hereafter a new version of the patch with a documentation file and that builds with CONFIG_ACPI unset.

From a7135e6b4bc7c91d6ac72a4f38157f7f2308615b Mon Sep 17 00:00:00 2001
From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
Date: Tue, 20 Jan 2015 14:55:42 +0100
Subject: [PATCH] Adding TCO watchdog warning interrupt handling.

This feature is useful to root cause watchdog expiration.
It is activated by boot parameter 'warn_irq' (disabled by default).

Upon first expiration of the TCO watchdog, a warning interrupt is fired, then the interrupt handler dumps registers and call stack of all available cpus.
TCO watchdog reloads with 2.4 seconds timeout for second expiration.

If CONFIG_ITCO_WARN_PANIC is set, the warning interrupt also calls panic() which notifies the panic handlers then reboots the platform, depending on CONFIG_PANIC_TIMEOUT value :

- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will reset the platform if second expiration happens before TCO has been kicked again.

- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency restart procedure).

- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2 seconds delay (emergency restart procedure).

See Documentation/watchdog/tco-wdt-warning-interrupt.txt for more details.

Change-Id: I7314a50812529423b117cf28e4a195a356da2f57
Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
---
 .../watchdog/tco-wdt-warning-interrupt.txt         |   85 ++++++++++++++++++++
 drivers/watchdog/Kconfig                           |   13 +++
 drivers/watchdog/iTCO_wdt.c                        |   80 ++++++++++++++++++
 3 files changed, 178 insertions(+)
 create mode 100644 Documentation/watchdog/tco-wdt-warning-interrupt.txt

diff --git a/Documentation/watchdog/tco-wdt-warning-interrupt.txt b/Documentation/watchdog/tco-wdt-warning-interrupt.txt
new file mode 100644
index 0000000..2e4eebf
--- /dev/null
+++ b/Documentation/watchdog/tco-wdt-warning-interrupt.txt
@@ -0,0 +1,85 @@
+Last reviewed: 02/12/2015
+
+                     TCO watchdog warning interrupt
+                 handled by drivers/watchdog/iTCO_wdt.c
+                      Documentation and code by
+       Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
+
+
+Introduction
+------------
+Intel TCO watchdog is intended to detect and recover from locks up of 
+the platform. It contains a countdown timer, that should be reloaded 
+on-time by software before reaching zero.
+
+If the platform locks up and is not able to reload the timer, then when 
+it reaches zero:
+- the timer is automatically reloaded with 04h and starts decrementing 
+again,
+- timeout bit is set in TCO1_STS register,
+- SMI or SCI interrupt is generated (optional).
+
+If it reaches zero a second time while timeout bit is set,
+- second_to_sts bit is set,
+- reset of the platform is initiated.
+
+At first timeout, the SMI (or SCI) can be used to provide debug 
+information about the system state and help on fixing the cause of the 
+hang. This is the "warning interrupt".
+
+Warning interrupt
+-----------------
+Warning interrupt handler is called when system is hung, so it is 
+useful to gather maximum information about system state at this point 
+for root-causing the issue.
+
+When the interrupt occurs,
+- call stacks of all running cpus are dumped,
+- panic() is called (optional)
+
+Enabling the warning interrupt
+------------------------------
+Boot parameter "warn_irq" (boolean) enables warning interrupt 
+generation at first timer expiration (disabled by default).
+
+As this is a command line option, configuration can be changed easily 
+without building again the code.
+
+Enabling panic upon warning interrupt
+-------------------------------------
+Warning interrupt handler can call panic() when Kconfig option 
+CONFIG_ITCO_WARNING_PANIC is set.
+
+panic() call is useful in case of some panic handlers have been 
+registered and need to be run at this time.
+
+When CONFIG_ITCO_WARN_PANIC is set,
+- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO 
+watchdog will
+  reset the platform if second expiration happens before TCO has been 
+kicked
+  again.
+- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately 
+(emergency
+  restart procedure).
+- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 
+1 or 2
+  seconds delay (emergency restart procedure).
+
+SCI vs SMI
+----------
+For the moment, the TCO watchdog warning interrupt feature is only 
+available for platforms that are able to trigger a SCI upon first expiration of TCO watchdog.
+
+There is no support of the SMI option yet.
+
+ACPI configuration
+------------------
+Bios is configuring the GPE associated to the warning interrupt. The 
+driver uses acpi tables to get the GPE number.
+
+This change is intended for Intel Cherrytrail platform. As TCO watchdog 
+is part of lpc_ich module, its _HID is used in the driver to retrieve 
+GPE configuration from Bios.
+
+If no GPE information is provided by the Bios, the interrupt is not 
+handled and appears in the dmesg log as a warning. Second timeout is 
+still able to trigger a reset.
+
+-- Francois-Nicolas Muller
+   (francois-nicolas.muller@intel.com)
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index 79d2589..41f3647 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -674,6 +674,19 @@ config ITCO_VENDOR_SUPPORT
 	  devices. At this moment we only have additional support for some
 	  SuperMicro Inc. motherboards.
 
+config ITCO_WARNING_PANIC
+	bool "Intel TCO Timer/Watchdog panic on warning interrupt"
+	depends on ITCO_WDT && ACPI
+	default y
+	---help---
+	  Force a call to panic() when TCO warning interrupt occurs.
+
+	  Warning interrupt happens if warn_irq module parameter is set and
+	  TCO timer first expires.
+
+	  If not set, only cpu backtraces are dumped, no call to panic() and
+	  no notification of panic are done.
+
 config IT8712F_WDT
 	tristate "IT8712F (Smart Guardian) Watchdog Timer"
 	depends on X86
diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c index e802a54..a25794c 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -49,6 +49,8 @@
 /* Module and version information */
 #define DRV_NAME	"iTCO_wdt"
 #define DRV_VERSION	"1.11"
+#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
+#define TCO_CLASS	DRV_NAME
 
 /* Includes */
 #include <linux/module.h>		/* For module specific items */
@@ -68,6 +70,11 @@
 #include <linux/pm.h>			/* For suspend/resume */
 #include <linux/mfd/core.h>
 #include <linux/mfd/lpc_ich.h>
+#include <linux/nmi.h>
+#ifdef CONFIG_ACPI
+#include <linux/acpi.h>
+#include <acpi/actypes.h>
+#endif
 
 #include "iTCO_vendor.h"
 
@@ -107,6 +114,14 @@ static struct {		/* this is private data for the iTCO_wdt device */
 	bool started;
 } iTCO_wdt_private;
 
+#ifdef CONFIG_ACPI
+static const struct acpi_device_id iTCO_wdt_ids[] = {
+	{"8086229C", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids); #endif
+
 /* module parameters */
 #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
 static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 +141,15 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
 	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
 
+static bool warn_irq;
+module_param(warn_irq, bool, 0);
+MODULE_PARM_DESC(warn_irq,
+	"Dump all cpus backtraces at first watchdog timer expiration 
+(default=0)");
+
+#ifdef CONFIG_ACPI
+static bool warn_irq_panic = CONFIG_ITCO_WARNING_PANIC; #endif
+
 /*
  * Some TCO specific functions
  */
@@ -200,6 +224,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
 	return ret; /* returns: 0 = OK, -EIO = Error */  }
 
+#ifdef CONFIG_ACPI
+static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void 
+*context) {
+	trigger_all_cpu_backtrace();
+	if (warn_irq_panic)
+		panic("Kernel Watchdog");
+
+	return IRQ_HANDLED;
+}
+
+static int iTCO_wdt_acpi_add(struct acpi_device *device) {
+	unsigned long long gpe;
+	acpi_status status;
+
+	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+					  iTCO_wdt_wirq, NULL);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	acpi_enable_gpe(NULL, gpe);
+
+	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
+	return 0;
+}
+#endif
+
 static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
 	unsigned int val;
@@ -628,6 +683,17 @@ static struct platform_driver iTCO_wdt_driver = {
 	},
 };
 
+#ifdef CONFIG_ACPI
+static struct acpi_driver iTCO_wdt_acpi_driver = {
+	.name = DRV_NAME_ACPI,
+	.class = TCO_CLASS,
+	.ids = iTCO_wdt_ids,
+	.ops = {
+		.add = iTCO_wdt_acpi_add,
+	},
+};
+#endif
+
 static int __init iTCO_wdt_init_module(void)  {
 	int err;
@@ -638,12 +704,26 @@ static int __init iTCO_wdt_init_module(void)
 	if (err)
 		return err;
 
+#ifdef CONFIG_ACPI
+	if (warn_irq) {
+		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
+		if (err) {
+			platform_driver_unregister(&iTCO_wdt_driver);
+			return err;
+		}
+	}
+#endif
+
 	return 0;
 }
 
 static void __exit iTCO_wdt_cleanup_module(void)  {
 	platform_driver_unregister(&iTCO_wdt_driver);
+#ifdef CONFIG_ACPI
+	if (warn_irq)
+		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
+#endif
 	pr_info("Watchdog Module Unloaded\n");  }
 
--
1.7.9.5




-----Original Message-----
From: Rafael J. Wysocki [mailto:rjw@rjwysocki.net] 
Sent: Tuesday, January 20, 2015 4:00 PM
To: Muller, Francois-nicolas
Cc: Guenter Roeck; Darren Hart; 'platform-driver-x86@vger.kernel.org'; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation

On Wednesday, January 14, 2015 04:38:49 PM Muller, Francois-nicolas wrote:
> From 54d2ff5e13c1b35d5019b82376dabb903ebe30d6 Mon Sep 17 00:00:00 2001
> From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> Date: Wed, 14 Jan 2015 14:27:43 +0100
> Subject: [PATCH] Adding TCO watchdog warning interrupt handling.
> 
> This feature is useful to root cause watchdog expiration.
> It is activated by boot parameter 'warn_irq' (disabled by default).
> 
> Upon first expiration of the TCO watchdog, a warning interrupt is 
> fired then the interrupt handler dumps registers and call stack of all available cores.
> 
> Finally panic() is called and notifies the panic handlers if any. At 
> the same time, the TCO watchdog reloads with 2.4 seconds timeout value.
> 
> When warning interrupt is enabled, platform reboot depends on 
> CONFIG_PANIC_TIMEOUT value :
> 
> - If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO 
> watchdog will reset the platform if second expiration happens before 
> TCO has been kicked again.
> 
> - If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately 
> (emergency restart procedure).
> 
> - If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot 
> after 1 or 2 seconds delay (emergency restart procedure).
> 
> Change-Id: I009c41f2f3dc3bd091b4d2a45b4ea0be85c8ce27
> Signed-off-by: Francois-Nicolas Muller 
> <francois-nicolas.muller@intel.com>
> ---
>  drivers/watchdog/iTCO_wdt.c | 64 
> +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c 
> index e802a54..a8c16d2 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -49,6 +49,8 @@
>  /* Module and version information */
>  #define DRV_NAME	"iTCO_wdt"
>  #define DRV_VERSION	"1.11"
> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
> +#define TCO_CLASS	DRV_NAME
>  
>  /* Includes */
>  #include <linux/module.h>		/* For module specific items */
> @@ -68,6 +70,9 @@
>  #include <linux/pm.h>			/* For suspend/resume */
>  #include <linux/mfd/core.h>
>  #include <linux/mfd/lpc_ich.h>
> +#include <linux/nmi.h>
> +#include <linux/acpi.h>
> +#include <acpi/actypes.h>
>  
>  #include "iTCO_vendor.h"
>  
> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>  	bool started;
>  } iTCO_wdt_private;
>  
> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> +	{"8086229C", 0},

This is not a proper ACPI or PNP device name as far as I can say.

What is it?

> +	{"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> +
>  /* module parameters */
>  #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>  static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 
> +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  
> MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>  	"Turn off SMI clearing watchdog (depends on 
> TCO-version)(default=1)");
>  
> +static bool warn_irq;
> +module_param(warn_irq, bool, 0);
> +MODULE_PARM_DESC(warn_irq,
> +	"Watchdog trigs a panic at first expiration (default=0)");
> +
>  /*
>   * Some TCO specific functions
>   */
> @@ -200,6 +216,35 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>  	return ret; /* returns: 0 = OK, -EIO = Error */  }
>  
> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void 
> +*context) {
> +	trigger_all_cpu_backtrace();
> +	panic("Kernel Watchdog");
> +
> +	/* This code should not be reached */
> +	return IRQ_HANDLED;
> +}
> +
> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
> +	unsigned long long gpe;
> +	acpi_status status;
> +

The code below seems to mean: "If the device _HID returns '8086229C', there should be a _GPE object under it which then returns the number of the GPE to bind to within the FADT GPE blocks."

Where is this documented?

> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
> +	if (ACPI_FAILURE(status))
> +		return -EINVAL;
> +
> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
> +					  iTCO_wdt_wirq, NULL);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	acpi_enable_gpe(NULL, gpe);
> +
> +	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
> +	return 0;
> +}
> +
>  static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>  	unsigned int val;
> @@ -628,6 +673,15 @@ static struct platform_driver iTCO_wdt_driver = {
>  	},
>  };
>  
> +static struct acpi_driver iTCO_wdt_acpi_driver = {
> +	.name = DRV_NAME_ACPI,
> +	.class = TCO_CLASS,
> +	.ids = iTCO_wdt_ids,
> +	.ops = {
> +		.add = iTCO_wdt_acpi_add,
> +	},
> +};
> +
>  static int __init iTCO_wdt_init_module(void)  {
>  	int err;
> @@ -638,12 +692,22 @@ static int __init iTCO_wdt_init_module(void)
>  	if (err)
>  		return err;
>  
> +	if (warn_irq) {
> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
> +		if (err) {
> +			platform_driver_unregister(&iTCO_wdt_driver);
> +			return err;
> +		}
> +	}
> +
>  	return 0;
>  }
>  
>  static void __exit iTCO_wdt_cleanup_module(void)  {
>  	platform_driver_unregister(&iTCO_wdt_driver);
> +	if (warn_irq)
> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>  	pr_info("Watchdog Module Unloaded\n");  }
>  
> --
> 1.9.1

And does it build for CONFIG_ACPI unset?


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-02-12 10:13                 ` Muller, Francois-nicolas
  2015-03-10 16:14                   ` Muller, Francois-nicolas
@ 2015-03-18 23:00                   ` Rafael J. Wysocki
  2015-03-30 13:45                     ` Muller, Francois-nicolas
  1 sibling, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2015-03-18 23:00 UTC (permalink / raw
  To: Muller, Francois-nicolas
  Cc: Guenter Roeck, Darren Hart,
	'platform-driver-x86@vger.kernel.org',
	Linux ACPI Mailing List, linux-watchdog@vger.kernel.org,
	Wim Van Sebroeck

On Thursday, February 12, 2015 10:13:43 AM Muller, Francois-nicolas wrote:
> Please find hereafter a new version of the patch with a documentation file and that builds with CONFIG_ACPI unset.

First of all, if I'm supposed to apply this, please note that I'm not the
maintainer of drivers/watchdog/iTCO_wdt.c (or anything under drivers/watchdog/
for that matter).

> From a7135e6b4bc7c91d6ac72a4f38157f7f2308615b Mon Sep 17 00:00:00 2001
> From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> Date: Tue, 20 Jan 2015 14:55:42 +0100
> Subject: [PATCH] Adding TCO watchdog warning interrupt handling.
> 
> This feature is useful to root cause watchdog expiration.
> It is activated by boot parameter 'warn_irq' (disabled by default).
> 
> Upon first expiration of the TCO watchdog, a warning interrupt is fired, then
> the interrupt handler dumps registers and call stack of all available cpus.
> TCO watchdog reloads with 2.4 seconds timeout for second expiration.
> 
> If CONFIG_ITCO_WARN_PANIC is set, the warning interrupt also calls panic()
> which notifies the panic handlers then reboots the platform, depending on
> CONFIG_PANIC_TIMEOUT value :
> 
> - If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
> reset the platform if second expiration happens before TCO has been kicked
> again.
> 
> - If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
> restart procedure).
> 
> - If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
> seconds delay (emergency restart procedure).
> 
> See Documentation/watchdog/tco-wdt-warning-interrupt.txt for more details.
> 
> Change-Id: I7314a50812529423b117cf28e4a195a356da2f57
> Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> ---
>  .../watchdog/tco-wdt-warning-interrupt.txt         |   85 ++++++++++++++++++++
>  drivers/watchdog/Kconfig                           |   13 +++
>  drivers/watchdog/iTCO_wdt.c                        |   80 ++++++++++++++++++
>  3 files changed, 178 insertions(+)
>  create mode 100644 Documentation/watchdog/tco-wdt-warning-interrupt.txt
> 
> diff --git a/Documentation/watchdog/tco-wdt-warning-interrupt.txt b/Documentation/watchdog/tco-wdt-warning-interrupt.txt
> new file mode 100644
> index 0000000..2e4eebf
> --- /dev/null
> +++ b/Documentation/watchdog/tco-wdt-warning-interrupt.txt
> @@ -0,0 +1,85 @@
> +Last reviewed: 02/12/2015
> +
> +                     TCO watchdog warning interrupt
> +                 handled by drivers/watchdog/iTCO_wdt.c
> +                      Documentation and code by
> +       Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> +
> +
> +Introduction
> +------------
> +Intel TCO watchdog is intended to detect and recover from locks up of the
> +platform. It contains a countdown timer, that should be reloaded on-time by
> +software before reaching zero.
> +
> +If the platform locks up and is not able to reload the timer, then when it
> +reaches zero:
> +- the timer is automatically reloaded with 04h and starts decrementing again,
> +- timeout bit is set in TCO1_STS register,
> +- SMI or SCI interrupt is generated (optional).

Is the timeout configurable?  If not, what's the hard-coded  value?

Is there any documentation of that feature you can point people to?

> +
> +If it reaches zero a second time while timeout bit is set,
> +- second_to_sts bit is set,
> +- reset of the platform is initiated.
> +
> +At first timeout, the SMI (or SCI) can be used to provide debug information
> +about the system state and help on fixing the cause of the hang. This is the
> +"warning interrupt".

I'm not sure how SMI would help here.

> +
> +Warning interrupt
> +-----------------
> +Warning interrupt handler is called when system is hung, so it is useful to
> +gather maximum information about system state at this point for root-causing the
> +issue.
> +
> +When the interrupt occurs,
> +- call stacks of all running cpus are dumped,
> +- panic() is called (optional)

What is the panic() useful for given that the second timeout will reset the
system anyway?

> +
> +Enabling the warning interrupt
> +------------------------------
> +Boot parameter "warn_irq" (boolean) enables warning interrupt generation at
> +first timer expiration (disabled by default).

Why is it disabled by default?

> +
> +As this is a command line option, configuration can be changed easily without
> +building again the code.
> +
> +Enabling panic upon warning interrupt
> +-------------------------------------
> +Warning interrupt handler can call panic() when Kconfig option
> +CONFIG_ITCO_WARNING_PANIC is set.

This isn't helpful.  Distribution ship binary kernel images, so they can't realy
on Kconfig options to disable/enable things.  They have to decide whether or not
to include the feature upfront.

> +
> +panic() call is useful in case of some panic handlers have been registered and
> +need to be run at this time.
> +
> +When CONFIG_ITCO_WARN_PANIC is set,
> +- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
> +  reset the platform if second expiration happens before TCO has been kicked
> +  again.

OK, so the timeout is configurable.  Why isn't the timeout configurable via
command line or sysfs?

> +- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
> +  restart procedure).
> +- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
> +  seconds delay (emergency restart procedure).
> +
> +SCI vs SMI
> +----------
> +For the moment, the TCO watchdog warning interrupt feature is only available for
> +platforms that are able to trigger a SCI upon first expiration of TCO watchdog.
> +
> +There is no support of the SMI option yet.

So what's the reason to mention SMI at all?

> +
> +ACPI configuration
> +------------------
> +Bios is configuring the GPE associated to the warning interrupt. The driver uses
> +acpi tables to get the GPE number.

What documentation is describing how it is supposed to do that?

> +
> +This change is intended for Intel Cherrytrail platform. As TCO watchdog is part
> +of lpc_ich module, its _HID is used in the driver to retrieve GPE configuration
> +from Bios.
> +
> +If no GPE information is provided by the Bios, the interrupt is not handled and
> +appears in the dmesg log as a warning. Second timeout is still able to trigger a
> +reset.
> +
> +-- Francois-Nicolas Muller
> +   (francois-nicolas.muller@intel.com)
> diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
> index 79d2589..41f3647 100644
> --- a/drivers/watchdog/Kconfig
> +++ b/drivers/watchdog/Kconfig
> @@ -674,6 +674,19 @@ config ITCO_VENDOR_SUPPORT
>  	  devices. At this moment we only have additional support for some
>  	  SuperMicro Inc. motherboards.
>  
> +config ITCO_WARNING_PANIC
> +	bool "Intel TCO Timer/Watchdog panic on warning interrupt"
> +	depends on ITCO_WDT && ACPI

Do we really need a new Kconfig option for that?

> +	default y
> +	---help---
> +	  Force a call to panic() when TCO warning interrupt occurs.
> +
> +	  Warning interrupt happens if warn_irq module parameter is set and
> +	  TCO timer first expires.
> +
> +	  If not set, only cpu backtraces are dumped, no call to panic() and
> +	  no notification of panic are done.
> +
>  config IT8712F_WDT
>  	tristate "IT8712F (Smart Guardian) Watchdog Timer"
>  	depends on X86
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
> index e802a54..a25794c 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -49,6 +49,8 @@
>  /* Module and version information */
>  #define DRV_NAME	"iTCO_wdt"
>  #define DRV_VERSION	"1.11"
> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
> +#define TCO_CLASS	DRV_NAME
>  
>  /* Includes */
>  #include <linux/module.h>		/* For module specific items */
> @@ -68,6 +70,11 @@
>  #include <linux/pm.h>			/* For suspend/resume */
>  #include <linux/mfd/core.h>
>  #include <linux/mfd/lpc_ich.h>
> +#include <linux/nmi.h>
> +#ifdef CONFIG_ACPI
> +#include <linux/acpi.h>
> +#include <acpi/actypes.h>
> +#endif

No, this isn't how you're supposed to do it.  Please include <linux/acpi.h>
only, it contains the necessary CONFIG_ACPI checks.

>  
>  #include "iTCO_vendor.h"
>  
> @@ -107,6 +114,14 @@ static struct {		/* this is private data for the iTCO_wdt device */
>  	bool started;
>  } iTCO_wdt_private;
>  
> +#ifdef CONFIG_ACPI
> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> +	{"8086229C", 0},
> +	{"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> +#endif

This is for module auto-loading, right?

You seem to have too many #ifdef CONFIG_ACPI blocks in this code.  Any chance
to combine them all into one?

> +
>  /* module parameters */
>  #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>  static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */
> @@ -126,6 +141,15 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
>  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>  	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
>  
> +static bool warn_irq;
> +module_param(warn_irq, bool, 0);
> +MODULE_PARM_DESC(warn_irq,
> +	"Dump all cpus backtraces at first watchdog timer expiration (default=0)");
> +
> +#ifdef CONFIG_ACPI
> +static bool warn_irq_panic = CONFIG_ITCO_WARNING_PANIC;
> +#endif
> +
>  /*
>   * Some TCO specific functions
>   */
> @@ -200,6 +224,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>  	return ret; /* returns: 0 = OK, -EIO = Error */
>  }
>  
> +#ifdef CONFIG_ACPI
> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
> +{
> +	trigger_all_cpu_backtrace();
> +	if (warn_irq_panic)
> +		panic("Kernel Watchdog");

Is the panic() useful at all?

> +
> +	return IRQ_HANDLED;

That should return ACPI_INTERRUPT_HANDLED or
ACPI_INTERRUPT_HANDLED | ACPI_REENABLE_GPE depending on what you want to
achieve here.

> +}
> +
> +static int iTCO_wdt_acpi_add(struct acpi_device *device)
> +{
> +	unsigned long long gpe;
> +	acpi_status status;
> +
> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
> +	if (ACPI_FAILURE(status))
> +		return -EINVAL;
> +
> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
> +					  iTCO_wdt_wirq, NULL);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	acpi_enable_gpe(NULL, gpe);
> +
> +	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
> +	return 0;
> +}
> +#endif
> +
>  static int iTCO_wdt_start(struct watchdog_device *wd_dev)
>  {
>  	unsigned int val;
> @@ -628,6 +683,17 @@ static struct platform_driver iTCO_wdt_driver = {
>  	},
>  };
>  
> +#ifdef CONFIG_ACPI
> +static struct acpi_driver iTCO_wdt_acpi_driver = {
> +	.name = DRV_NAME_ACPI,
> +	.class = TCO_CLASS,
> +	.ids = iTCO_wdt_ids,
> +	.ops = {
> +		.add = iTCO_wdt_acpi_add,
> +	},
> +};
> +#endif
> +
>  static int __init iTCO_wdt_init_module(void)
>  {
>  	int err;
> @@ -638,12 +704,26 @@ static int __init iTCO_wdt_init_module(void)
>  	if (err)
>  		return err;
>  
> +#ifdef CONFIG_ACPI
> +	if (warn_irq) {
> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);

OK, so what's the guarantee that the ACPI core won't create a platform
device for _HID "8086229C" and if it does, why is it correct to register
an ACPI driver for that device object?

Can you possibly use acpi_get_devices() instead and install the GPE
handler if _HID == "8086229C" is found?

> +		if (err) {
> +			platform_driver_unregister(&iTCO_wdt_driver);
> +			return err;
> +		}
> +	}
> +#endif
> +
>  	return 0;
>  }
>  
>  static void __exit iTCO_wdt_cleanup_module(void)
>  {
>  	platform_driver_unregister(&iTCO_wdt_driver);
> +#ifdef CONFIG_ACPI
> +	if (warn_irq)
> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
> +#endif
>  	pr_info("Watchdog Module Unloaded\n");
>  }
>  
> -- 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] TCO Watchdog warning interrupt driver creation
  2015-03-18 23:00                   ` Rafael J. Wysocki
@ 2015-03-30 13:45                     ` Muller, Francois-nicolas
  2015-05-12  1:28                       ` Guenter Roeck
  0 siblings, 1 reply; 18+ messages in thread
From: Muller, Francois-nicolas @ 2015-03-30 13:45 UTC (permalink / raw
  To: Rafael J. Wysocki, Linux ACPI Mailing List,
	'platform-driver-x86@vger.kernel.org', Darren Hart,
	Guenter Roeck, linux-watchdog@vger.kernel.org, Wim Van Sebroeck

Thanks Rafael for your comments.
Please find my answers embedded below and a new version of the patch.
François-Nicolas

>-----Original Message-----
>From: Rafael J. Wysocki [mailto:rjw@rjwysocki.net] 
>Sent: Thursday, March 19, 2015 12:01 AM
>To: Muller, Francois-nicolas
>Cc: Guenter Roeck; Darren Hart; 'platform-driver-x86@vger.kernel.org'; Linux ACPI Mailing List; linux-watchdog@vger.kernel.org; Wim Van Sebroeck
>Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation
>
>On Thursday, February 12, 2015 10:13:43 AM Muller, Francois-nicolas wrote:
>> Please find hereafter a new version of the patch with a documentation file and that builds with CONFIG_ACPI unset.
>
>First of all, if I'm supposed to apply this, please note that I'm not the maintainer of drivers/watchdog/iTCO_wdt.c (or anything under drivers/watchdog/ for that matter).
>
>> From a7135e6b4bc7c91d6ac72a4f38157f7f2308615b Mon Sep 17 00:00:00 2001
>> From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
>> Date: Tue, 20 Jan 2015 14:55:42 +0100
>> Subject: [PATCH] Adding TCO watchdog warning interrupt handling.
>> 
>> This feature is useful to root cause watchdog expiration.
>> It is activated by boot parameter 'warn_irq' (disabled by default).
>> 
>> Upon first expiration of the TCO watchdog, a warning interrupt is 
>> fired, then the interrupt handler dumps registers and call stack of all available cpus.
>> TCO watchdog reloads with 2.4 seconds timeout for second expiration.
>> 
>> If CONFIG_ITCO_WARN_PANIC is set, the warning interrupt also calls 
>> panic() which notifies the panic handlers then reboots the platform, 
>> depending on CONFIG_PANIC_TIMEOUT value :
>> 
>> - If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO 
>> watchdog will reset the platform if second expiration happens before 
>> TCO has been kicked again.
>> 
>> - If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately 
>> (emergency restart procedure).
>> 
>> - If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot 
>> after 1 or 2 seconds delay (emergency restart procedure).
>> 
>> See Documentation/watchdog/tco-wdt-warning-interrupt.txt for more details.
>> 
>> Change-Id: I7314a50812529423b117cf28e4a195a356da2f57
>> Signed-off-by: Francois-Nicolas Muller 
>> <francois-nicolas.muller@intel.com>
>> ---
>>  .../watchdog/tco-wdt-warning-interrupt.txt         |   85 ++++++++++++++++++++
>>  drivers/watchdog/Kconfig                           |   13 +++
>>  drivers/watchdog/iTCO_wdt.c                        |   80 ++++++++++++++++++
>>  3 files changed, 178 insertions(+)
>>  create mode 100644 
>> Documentation/watchdog/tco-wdt-warning-interrupt.txt
>> 
>> diff --git a/Documentation/watchdog/tco-wdt-warning-interrupt.txt 
>> b/Documentation/watchdog/tco-wdt-warning-interrupt.txt
>> new file mode 100644
>> index 0000000..2e4eebf
>> --- /dev/null
>> +++ b/Documentation/watchdog/tco-wdt-warning-interrupt.txt
>> @@ -0,0 +1,85 @@
>> +Last reviewed: 02/12/2015
>> +
>> +                     TCO watchdog warning interrupt
>> +                 handled by drivers/watchdog/iTCO_wdt.c
>> +                      Documentation and code by
>> +       Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
>> +
>> +
>> +Introduction
>> +------------
>> +Intel TCO watchdog is intended to detect and recover from locks up of 
>> +the platform. It contains a countdown timer, that should be reloaded 
>> +on-time by software before reaching zero.
>> +
>> +If the platform locks up and is not able to reload the timer, then 
>> +when it reaches zero:
>> +- the timer is automatically reloaded with 04h and starts 
>> +decrementing again,
>> +- timeout bit is set in TCO1_STS register,
>> +- SMI or SCI interrupt is generated (optional).
>
>Is the timeout configurable?  If not, what's the hard-coded  value?
>

First timeout is configurable (see TCO specification : http://download.intel.com/design/chipsets/applnots/29227301.pdf)
Second one is fixed (04h) and cannot be changed.

>Is there any documentation of that feature you can point people to?
>

TCO specification : http://download.intel.com/design/chipsets/applnots/29227301.pdf

>> +
>> +If it reaches zero a second time while timeout bit is set,
>> +- second_to_sts bit is set,
>> +- reset of the platform is initiated.
>> +
>> +At first timeout, the SMI (or SCI) can be used to provide debug 
>> +information about the system state and help on fixing the cause of 
>> +the hang. This is the "warning interrupt".
>
>I'm not sure how SMI would help here.
>

See TCO specification, TCO is able to generate a SMI.

>> +
>> +Warning interrupt
>> +-----------------
>> +Warning interrupt handler is called when system is hung, so it is 
>> +useful to gather maximum information about system state at this point 
>> +for root-causing the issue.
>> +
>> +When the interrupt occurs,
>> +- call stacks of all running cpus are dumped,
>> +- panic() is called (optional)
>
>What is the panic() useful for given that the second timeout will reset the system anyway?
>

This is useful for debugging the cause of system hang that lead to the second timeout. This has already been discussed earlier in this thread.

>> +
>> +Enabling the warning interrupt
>> +------------------------------
>> +Boot parameter "warn_irq" (boolean) enables warning interrupt 
>> +generation at first timer expiration (disabled by default).
>
>Why is it disabled by default?
>
>> +
>> +As this is a command line option, configuration can be changed easily 
>> +without building again the code.
>> +
>> +Enabling panic upon warning interrupt
>> +-------------------------------------
>> +Warning interrupt handler can call panic() when Kconfig option 
>> +CONFIG_ITCO_WARNING_PANIC is set.
>
>This isn't helpful.  Distribution ship binary kernel images, so they can't realy on Kconfig options to disable/enable things.  They have to decide whether or not to include the feature upfront.
>

This is a debug feature, and the other debug features I've seen in the Kernel were also dependant on Kconfig options. I don't understand your point, could you elaborate ?

>> +
>> +panic() call is useful in case of some panic handlers have been 
>> +registered and need to be run at this time.
>> +
>> +When CONFIG_ITCO_WARN_PANIC is set,
>> +- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO 
>> +watchdog will
>> +  reset the platform if second expiration happens before TCO has been 
>> +kicked
>> +  again.
>
>OK, so the timeout is configurable.  Why isn't the timeout configurable via command line or sysfs?
>

Only first timeout is configurable, my goal is to add a debug facility of watchdog timeout, it is not about watchdog timeout configuration.

>> +- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately 
>> +(emergency
>> +  restart procedure).
>> +- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot 
>> +after 1 or 2
>> +  seconds delay (emergency restart procedure).
>> +
>> +SCI vs SMI
>> +----------
>> +For the moment, the TCO watchdog warning interrupt feature is only 
>> +available for platforms that are able to trigger a SCI upon first expiration of TCO watchdog.
>> +
>> +There is no support of the SMI option yet.
>
>So what's the reason to mention SMI at all?
>

Because SMI is part of the TCO specification.

>> +
>> +ACPI configuration
>> +------------------
>> +Bios is configuring the GPE associated to the warning interrupt. The 
>> +driver uses acpi tables to get the GPE number.
>
>What documentation is describing how it is supposed to do that?
>

I don't think this needs documentation, Kernel only sees a GPE and retrieves its number with _GPE acpi variable.

>> +
>> +This change is intended for Intel Cherrytrail platform. As TCO 
>> +watchdog is part of lpc_ich module, its _HID is used in the driver to 
>> +retrieve GPE configuration from Bios.
>> +
>> +If no GPE information is provided by the Bios, the interrupt is not 
>> +handled and appears in the dmesg log as a warning. Second timeout is 
>> +still able to trigger a reset.
>> +
>> +-- Francois-Nicolas Muller
>> +   (francois-nicolas.muller@intel.com)
>> diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index 
>> 79d2589..41f3647 100644
>> --- a/drivers/watchdog/Kconfig
>> +++ b/drivers/watchdog/Kconfig
>> @@ -674,6 +674,19 @@ config ITCO_VENDOR_SUPPORT
>>  	  devices. At this moment we only have additional support for some
>>  	  SuperMicro Inc. motherboards.
>>  
>> +config ITCO_WARNING_PANIC
>> +	bool "Intel TCO Timer/Watchdog panic on warning interrupt"
>> +	depends on ITCO_WDT && ACPI
>
>Do we really need a new Kconfig option for that?
>

I need to be able to enable the panic or not after the warning interrupt.
I could also use module parameters instead of Kconfig, I don't know which one
is preferred in this case. Any advice ?


>> +	default y
>> +	---help---
>> +	  Force a call to panic() when TCO warning interrupt occurs.
>> +
>> +	  Warning interrupt happens if warn_irq module parameter is set and
>> +	  TCO timer first expires.
>> +
>> +	  If not set, only cpu backtraces are dumped, no call to panic() and
>> +	  no notification of panic are done.
>> +
>>  config IT8712F_WDT
>>  	tristate "IT8712F (Smart Guardian) Watchdog Timer"
>>  	depends on X86
>> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c 
>> index e802a54..a25794c 100644
>> --- a/drivers/watchdog/iTCO_wdt.c
>> +++ b/drivers/watchdog/iTCO_wdt.c
>> @@ -49,6 +49,8 @@
>>  /* Module and version information */
>>  #define DRV_NAME	"iTCO_wdt"
>>  #define DRV_VERSION	"1.11"
>> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
>> +#define TCO_CLASS	DRV_NAME
>>  
>>  /* Includes */
>>  #include <linux/module.h>		/* For module specific items */
>> @@ -68,6 +70,11 @@
>>  #include <linux/pm.h>			/* For suspend/resume */
>>  #include <linux/mfd/core.h>
>>  #include <linux/mfd/lpc_ich.h>
>> +#include <linux/nmi.h>
>> +#ifdef CONFIG_ACPI
>> +#include <linux/acpi.h>
>> +#include <acpi/actypes.h>
>> +#endif
>
>No, this isn't how you're supposed to do it.  Please include <linux/acpi.h> only, it contains the necessary CONFIG_ACPI checks.
>

Change done in new patch version.

>>  
>>  #include "iTCO_vendor.h"
>>  
>> @@ -107,6 +114,14 @@ static struct {		/* this is private data for the iTCO_wdt device */
>>  	bool started;
>>  } iTCO_wdt_private;
>>  
>> +#ifdef CONFIG_ACPI
>> +static const struct acpi_device_id iTCO_wdt_ids[] = {
>> +	{"8086229C", 0},
>> +	{"", 0},
>> +};
>> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids); #endif
>
>This is for module auto-loading, right?
>
>You seem to have too many #ifdef CONFIG_ACPI blocks in this code.  Any chance to combine them all into one?
>

Change done in new patch version.

>> +
>>  /* module parameters */
>>  #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>>  static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 
>> +141,15 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  
>> MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>>  	"Turn off SMI clearing watchdog (depends on 
>> TCO-version)(default=1)");
>>  
>> +static bool warn_irq;
>> +module_param(warn_irq, bool, 0);
>> +MODULE_PARM_DESC(warn_irq,
>> +	"Dump all cpus backtraces at first watchdog timer expiration 
>> +(default=0)");
>> +
>> +#ifdef CONFIG_ACPI
>> +static bool warn_irq_panic = CONFIG_ITCO_WARNING_PANIC; #endif
>> +
>>  /*
>>   * Some TCO specific functions
>>   */
>> @@ -200,6 +224,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>>  	return ret; /* returns: 0 = OK, -EIO = Error */  }
>>  
>> +#ifdef CONFIG_ACPI
>> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void 
>> +*context) {
>> +	trigger_all_cpu_backtrace();
>> +	if (warn_irq_panic)
>> +		panic("Kernel Watchdog");
>
>Is the panic() useful at all?
>

Already discussed earlier in this thread, please refer to it.

>> +
>> +	return IRQ_HANDLED;
>
>That should return ACPI_INTERRUPT_HANDLED or ACPI_INTERRUPT_HANDLED | ACPI_REENABLE_GPE depending on what you want to achieve here.
>
>> +}
>> +
>> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
>> +	unsigned long long gpe;
>> +	acpi_status status;
>> +
>> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
>> +	if (ACPI_FAILURE(status))
>> +		return -EINVAL;
>> +
>> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
>> +					  iTCO_wdt_wirq, NULL);
>> +	if (ACPI_FAILURE(status))
>> +		return -ENODEV;
>> +
>> +	acpi_enable_gpe(NULL, gpe);
>> +
>> +	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
>> +	return 0;
>> +}
>> +#endif
>> +
>>  static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>>  	unsigned int val;
>> @@ -628,6 +683,17 @@ static struct platform_driver iTCO_wdt_driver = {
>>  	},
>>  };
>>  
>> +#ifdef CONFIG_ACPI
>> +static struct acpi_driver iTCO_wdt_acpi_driver = {
>> +	.name = DRV_NAME_ACPI,
>> +	.class = TCO_CLASS,
>> +	.ids = iTCO_wdt_ids,
>> +	.ops = {
>> +		.add = iTCO_wdt_acpi_add,
>> +	},
>> +};
>> +#endif
>> +
>>  static int __init iTCO_wdt_init_module(void)  {
>>  	int err;
>> @@ -638,12 +704,26 @@ static int __init iTCO_wdt_init_module(void)
>>  	if (err)
>>  		return err;
>>  
>> +#ifdef CONFIG_ACPI
>> +	if (warn_irq) {
>> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
>
>OK, so what's the guarantee that the ACPI core won't create a platform device for _HID "8086229C" and if it does, why is it correct to register an ACPI driver for that device object?
>
>Can you possibly use acpi_get_devices() instead and install the GPE handler if _HID == "8086229C" is found?
>

Change done in new patch version.

>> +		if (err) {
>> +			platform_driver_unregister(&iTCO_wdt_driver);
>> +			return err;
>> +		}
>> +	}
>> +#endif
>> +
>>  	return 0;
>>  }
>>  
>>  static void __exit iTCO_wdt_cleanup_module(void)  {
>>  	platform_driver_unregister(&iTCO_wdt_driver);
>> +#ifdef CONFIG_ACPI
>> +	if (warn_irq)
>> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>> +#endif
>>  	pr_info("Watchdog Module Unloaded\n");  }
>>  
>> --

##################################################################

From 255184705e94f2983d965ad711f30281a1eed816 Mon Sep 17 00:00:00 2001
From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
Date: Mon, 30 Mar 2015 14:56:39 +0200
Subject: [PATCH] Adding TCO watchdog warning interrupt handling.

This feature is useful to root cause watchdog expiration.
It is activated by boot parameter 'warn_irq' (disabled by default).

Upon first expiration of the TCO watchdog, a warning interrupt is fired,
then the interrupt handler dumps registers and call stack of all available
cpus.
TCO watchdog reloads with 2.4 seconds timeout for second expiration.

If CONFIG_ITCO_WARN_PANIC is set, the warning interrupt also calls panic()
which notifies the panic handlers then reboots the platform, depending on
CONFIG_PANIC_TIMEOUT value :

- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
reset the platform if second expiration happens before TCO has been kicked again.

- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
restart procedure).

- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
seconds delay (emergency restart procedure).

See Documentation/watchdog/tco-wdt-warning-interrupt.txt for more details.

Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
---
 drivers/watchdog/Kconfig    | 13 ++++++++++++
 drivers/watchdog/iTCO_wdt.c | 48 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 79d2589..41f3647 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -674,6 +674,19 @@ config ITCO_VENDOR_SUPPORT
 	  devices. At this moment we only have additional support for some
 	  SuperMicro Inc. motherboards.
 
+config ITCO_WARNING_PANIC
+	bool "Intel TCO Timer/Watchdog panic on warning interrupt"
+	depends on ITCO_WDT && ACPI
+	default y
+	---help---
+	  Force a call to panic() when TCO warning interrupt occurs.
+
+	  Warning interrupt happens if warn_irq module parameter is set and
+	  TCO timer first expires.
+
+	  If not set, only cpu backtraces are dumped, no call to panic() and
+	  no notification of panic are done.
+
 config IT8712F_WDT
 	tristate "IT8712F (Smart Guardian) Watchdog Timer"
 	depends on X86
diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index e802a54..8cfa5e7 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -68,6 +68,8 @@
 #include <linux/pm.h>			/* For suspend/resume */
 #include <linux/mfd/core.h>
 #include <linux/mfd/lpc_ich.h>
+#include <linux/nmi.h>
+#include <linux/acpi.h>
 
 #include "iTCO_vendor.h"
 
@@ -126,6 +128,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
 MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
 	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
 
+static bool warn_irq;
+module_param(warn_irq, bool, 0);
+MODULE_PARM_DESC(warn_irq,
+	"Dump all cpus backtraces at first watchdog timer expiration (default=0)");
+
 /*
  * Some TCO specific functions
  */
@@ -200,6 +207,41 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
 	return ret; /* returns: 0 = OK, -EIO = Error */
 }
 
+#ifdef CONFIG_ACPI
+static unsigned char *tco_hid = "8086229C";
+static bool warn_irq_panic = CONFIG_ITCO_WARNING_PANIC;
+
+static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
+{
+	trigger_all_cpu_backtrace();
+	if (warn_irq_panic)
+		panic("Kernel Watchdog");
+
+	return ACPI_INTERRUPT_HANDLED;
+}
+
+static acpi_status __init iTCO_wdt_register_gpe(acpi_handle handle,
+					u32 lvl, void *context, void **rv)
+{
+	unsigned long long gpe;
+	acpi_status status;
+
+	status = acpi_evaluate_integer(handle, "_GPE", NULL, &gpe);
+	if (ACPI_FAILURE(status))
+		return status;
+
+	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+					  iTCO_wdt_wirq, NULL);
+	if (ACPI_FAILURE(status))
+		return status;
+
+	acpi_enable_gpe(NULL, gpe);
+
+	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
+	return AE_OK;
+}
+#endif
+
 static int iTCO_wdt_start(struct watchdog_device *wd_dev)
 {
 	unsigned int val;
@@ -638,9 +680,15 @@ static int __init iTCO_wdt_init_module(void)
 	if (err)
 		return err;
 
+#ifdef CONFIG_ACPI
+	if (warn_irq)
+		acpi_get_devices(tco_hid, iTCO_wdt_register_gpe, NULL, NULL);
+#endif
+
 	return 0;
 }
 
+
 static void __exit iTCO_wdt_cleanup_module(void)
 {
 	platform_driver_unregister(&iTCO_wdt_driver);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: TCO Watchdog warning interrupt driver creation
  2015-03-30 13:45                     ` Muller, Francois-nicolas
@ 2015-05-12  1:28                       ` Guenter Roeck
  0 siblings, 0 replies; 18+ messages in thread
From: Guenter Roeck @ 2015-05-12  1:28 UTC (permalink / raw
  To: Muller, Francois-nicolas
  Cc: Rafael J. Wysocki, Linux ACPI Mailing List,
	'platform-driver-x86@vger.kernel.org', Darren Hart,
	linux-watchdog@vger.kernel.org, Wim Van Sebroeck

On Mon, Mar 30, 2015 at 01:45:08PM +0000, Muller, Francois-nicolas wrote:
> Thanks Rafael for your comments.
> Please find my answers embedded below and a new version of the patch.

This makes it quite difficult to find your patch.
Please follow the guidelines in Documentation/SubmittingPatches
when (re)submitting kernel patches.

[ ... ]
> 
> I need to be able to enable the panic or not after the warning interrupt.
> I could also use module parameters instead of Kconfig, I don't know which one
> is preferred in this case. Any advice ?
> 
There should be one and only one option, and that should trigger a panic.
Pretty much follow the "pretimeout" semantics in
Documentation/watchdog/watchdog-api.txt.

> 
> ##################################################################
> 
> >From 255184705e94f2983d965ad711f30281a1eed816 Mon Sep 17 00:00:00 2001
> From: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> Date: Mon, 30 Mar 2015 14:56:39 +0200
> Subject: [PATCH] Adding TCO watchdog warning interrupt handling.
> 
> This feature is useful to root cause watchdog expiration.
> It is activated by boot parameter 'warn_irq' (disabled by default).
> 
> Upon first expiration of the TCO watchdog, a warning interrupt is fired,
> then the interrupt handler dumps registers and call stack of all available
> cpus.
> TCO watchdog reloads with 2.4 seconds timeout for second expiration.
> 
> If CONFIG_ITCO_WARN_PANIC is set, the warning interrupt also calls panic()
> which notifies the panic handlers then reboots the platform, depending on
> CONFIG_PANIC_TIMEOUT value :
> 
> - If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
> reset the platform if second expiration happens before TCO has been kicked again.
> 
> - If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
> restart procedure).
> 
> - If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
> seconds delay (emergency restart procedure).
> 
> See Documentation/watchdog/tco-wdt-warning-interrupt.txt for more details.
> 
> Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@intel.com>
> ---
>  drivers/watchdog/Kconfig    | 13 ++++++++++++
>  drivers/watchdog/iTCO_wdt.c | 48 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 61 insertions(+)
> 
> -- 
> 1.9.1
> 
> diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
> index 79d2589..41f3647 100644
> --- a/drivers/watchdog/Kconfig
> +++ b/drivers/watchdog/Kconfig
> @@ -674,6 +674,19 @@ config ITCO_VENDOR_SUPPORT
>  	  devices. At this moment we only have additional support for some
>  	  SuperMicro Inc. motherboards.
>  
> +config ITCO_WARNING_PANIC
> +	bool "Intel TCO Timer/Watchdog panic on warning interrupt"
> +	depends on ITCO_WDT && ACPI
> +	default y
> +	---help---
> +	  Force a call to panic() when TCO warning interrupt occurs.
> +
> +	  Warning interrupt happens if warn_irq module parameter is set and
> +	  TCO timer first expires.
> +
> +	  If not set, only cpu backtraces are dumped, no call to panic() and
> +	  no notification of panic are done.
> +

This option should go away.

>  config IT8712F_WDT
>  	tristate "IT8712F (Smart Guardian) Watchdog Timer"
>  	depends on X86
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
> index e802a54..8cfa5e7 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -68,6 +68,8 @@
>  #include <linux/pm.h>			/* For suspend/resume */
>  #include <linux/mfd/core.h>
>  #include <linux/mfd/lpc_ich.h>
> +#include <linux/nmi.h>
> +#include <linux/acpi.h>
>  
>  #include "iTCO_vendor.h"
>  
> @@ -126,6 +128,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
>  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>  	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
>  
> +static bool warn_irq;
> +module_param(warn_irq, bool, 0);
> +MODULE_PARM_DESC(warn_irq,
> +	"Dump all cpus backtraces at first watchdog timer expiration (default=0)");
> +
This option (possibly with a better name) should be the only one
needed to trigger the new functionality.

>  /*
>   * Some TCO specific functions
>   */
> @@ -200,6 +207,41 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>  	return ret; /* returns: 0 = OK, -EIO = Error */
>  }
>  
> +#ifdef CONFIG_ACPI

Question is if this ifdef is needed. Most if not all ACPI functions
are provided as dummies if ACPI is disabled.

> +static unsigned char *tco_hid = "8086229C";
> +static bool warn_irq_panic = CONFIG_ITCO_WARNING_PANIC;
> +
> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
> +{
> +	trigger_all_cpu_backtrace();
> +	if (warn_irq_panic)
> +		panic("Kernel Watchdog");

panic already implies a backtrace, doesn't it ?

> +
> +	return ACPI_INTERRUPT_HANDLED;
> +}
> +
> +static acpi_status __init iTCO_wdt_register_gpe(acpi_handle handle,
> +					u32 lvl, void *context, void **rv)
> +{
> +	unsigned long long gpe;
> +	acpi_status status;
> +
> +	status = acpi_evaluate_integer(handle, "_GPE", NULL, &gpe);
> +	if (ACPI_FAILURE(status))
> +		return status;
> +
> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
> +					  iTCO_wdt_wirq, NULL);
> +	if (ACPI_FAILURE(status))
> +		return status;
> +
> +	acpi_enable_gpe(NULL, gpe);
> +
> +	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);

Is this message really useful ?

> +	return AE_OK;
> +}
> +#endif
> +
>  static int iTCO_wdt_start(struct watchdog_device *wd_dev)
>  {
>  	unsigned int val;
> @@ -638,9 +680,15 @@ static int __init iTCO_wdt_init_module(void)
>  	if (err)
>  		return err;
>  
> +#ifdef CONFIG_ACPI
> +	if (warn_irq)
> +		acpi_get_devices(tco_hid, iTCO_wdt_register_gpe, NULL, NULL);

acpi_get_devices provides a dummy function if ACPI is not configured,
so the #ifdef above is not needed.

> +#endif
> +
>  	return 0;
>  }
>  
> +

Please no whitespace changes (much less whitespace changes introducing
checkpatch problems).

>  static void __exit iTCO_wdt_cleanup_module(void)
>  {
>  	platform_driver_unregister(&iTCO_wdt_driver);

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-05-12  1:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <B9C02DB17496AC4197F41A146D371B9B075AF9DE@hasmsx107.ger.corp.intel.com>
2014-12-11  4:04 ` [PATCH] TCO Watchdog warning interrupt driver creation Darren Hart
2014-12-11 14:30   ` Guenter Roeck
2014-12-22 13:18     ` Muller, Francois-nicolas
2014-12-22 15:41       ` Darren Hart
2014-12-22 16:07         ` Muller, Francois-nicolas
2015-01-09  5:09           ` Guenter Roeck
2015-01-14 16:32             ` Muller, Francois-nicolas
2015-01-14 18:16               ` Guenter Roeck
2015-01-15 13:27                 ` Muller, Francois-nicolas
2015-01-15 14:49                   ` Guenter Roeck
2015-01-20 14:25                     ` Muller, Francois-nicolas
2015-01-14 16:38             ` Muller, Francois-nicolas
2015-01-20 15:00               ` Rafael J. Wysocki
2015-02-12 10:13                 ` Muller, Francois-nicolas
2015-03-10 16:14                   ` Muller, Francois-nicolas
2015-03-18 23:00                   ` Rafael J. Wysocki
2015-03-30 13:45                     ` Muller, Francois-nicolas
2015-05-12  1:28                       ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).