All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Redfearn <matt.redfearn@mips.com>
To: Stafford Horne <shorne@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Jonas Bonn <jonas@southpole.se>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Jan Henrik Weinstock <jan.weinstock@ice.rwth-aachen.de>,
	Matt Redfearn <matt.redfearn@imgtec.com>,
	James Hogan <james.hogan@imgtec.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	<openrisc@lists.librecores.org>,
	Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>,
	James Hogan <james.hogan@mips.com>,
	Linux MIPS Mailing List <linux-mips@linux-mips.org>
Subject: Re: [PATCH v4 13/13] openrisc: add tick timer multi-core sync logic
Date: Wed, 1 Nov 2017 09:26:43 +0000	[thread overview]
Message-ID: <132de3e6-52b5-b5fe-8199-9da427a1baf4@mips.com> (raw)
In-Reply-To: <20171101003447.GC29237@lianli.shorne-pla.net>



On 01/11/17 00:34, Stafford Horne wrote:
> On Wed, Nov 01, 2017 at 08:17:59AM +0900, Stafford Horne wrote:
>> On Tue, Oct 31, 2017 at 02:06:21PM +0000, Matt Redfearn wrote:
>>> Hi,
>>>
>>>
>>> On 29/10/17 23:11, Stafford Horne wrote:
>>>> In case timers are not in sync when cpus start (i.e. hot plug / offset
>>>> resets) we need to synchronize the secondary cpus internal timer with
>>>> the main cpu.  This is needed as in OpenRISC SMP there is only one
>>>> clocksource registered which reads from the same ttcr register on each
>>>> cpu.
>>>>
>>>> This synchronization routine heavily borrows from mips implementation that
>>>> does something similar.
>> [..]
>>>> diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c
>>>> index 4763b8b9161e..4d80ce6fa045 100644
>>>> --- a/arch/openrisc/kernel/smp.c
>>>> +++ b/arch/openrisc/kernel/smp.c
>>>> @@ -100,6 +100,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>>>    		pr_crit("CPU%u: failed to start\n", cpu);
>>>>    		return -EIO;
>>>>    	}
>>>> +	synchronise_count_master(cpu);
>>>>    	return 0;
>>>>    }
>>>> @@ -129,6 +130,8 @@ asmlinkage __init void secondary_start_kernel(void)
>>>>    	set_cpu_online(cpu, true);
>>>>    	complete(&cpu_running);
>>>> +	synchronise_count_slave(cpu);
>>>> +
>>>
>>> Note that until 8f46cca1e6c06a058374816887059bcc017b382f, the MIPS timer
>>> synchronization code contained the possibility of deadlock. If you mark a
>>> CPU online before it goes into the synchronize loop, then the boot CPU can
>>> schedule a different thread and send IPIs to all "online" CPUs. It gets
>>> stuck waiting for the secondary to ack it's IPI, since this secondary CPU
>>> has not enabled IRQs yet, and is stuck waiting for the master to synchronise
>>> with it. The system then deadlocks.
>>> Commit 8f46cca1e6c06a058374816887059bcc017b382f fixed this for MIPS and you
>>> might want to similarly move the
>>>
>>> set_cpu_online(cpu, true);
>>>
>>> after counters are synchronized.
>> Thank you for the heads up.  I do remember having interim issues with the timer
>> syncing but I havent seen it for a while.  I think I fixed it by also moving
>> synchronise_count_slave.
>>
>> Let me double check.  Also, I see your patch 8f46cca1e6c06a0583748168 was merged
>> last year?
> Hello,
>
> I should have read a bit more closely, definitely this could be an issue if the
> boot cpu has other things running.
>
> However, looking at mainline I can see the clock sync comes after set_cpu_online
> again after this patch in mips.
>
>    6f542ebeaee0 MIPS: Fix race on setting and getting cpu_online_mask
>    Author: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
>
> Is this deadlock an issue in mips again?
>
> -Stafford

Hi Stafford,

Yes - the deadlock is an issue again, it was re-introduced by 
6f542ebeaee0. That patch was based on testing with 4.4, where the core 
CPU hotplug code did not contain it's own completion event. Since commit 
8df3e07e7f21f ("cpu/hotplug: Let upcoming cpu bring itself fully up"), 
which was added in 4.6, this is no longer the case and there is no race 
condition. I have https://patchwork.linux-mips.org/patch/17376/ pending 
which fixes this race in pre-4.6 stable kernels, and guards against the 
deadlock as well. Unfortunately because of the backport to stable this 
gets a little more complex.

Unless your patches to add SMP are going to be applied to pre-4.6 
kernels, then you will not suffer the race condition. The potential 
deadlock is the only pitfall you need to guard against - which will be 
OK if you put the clock sync before marking the CPU online.

Thanks,
Matt

WARNING: multiple messages have this Message-ID (diff)
From: Matt Redfearn <matt.redfearn@mips.com>
To: Stafford Horne <shorne@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Jonas Bonn <jonas@southpole.se>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Jan Henrik Weinstock <jan.weinstock@ice.rwth-aachen.de>,
	Matt Redfearn <matt.redfearn@imgtec.com>,
	James Hogan <james.hogan@imgtec.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	openrisc@lists.librecores.org,
	Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>,
	James Hogan <james.hogan@mips.com>,
	Linux MIPS Mailing List <linux-mips@linux-mips.org>
Subject: Re: [PATCH v4 13/13] openrisc: add tick timer multi-core sync logic
Date: Wed, 1 Nov 2017 09:26:43 +0000	[thread overview]
Message-ID: <132de3e6-52b5-b5fe-8199-9da427a1baf4@mips.com> (raw)
Message-ID: <20171101092643.JibHZDDRVtXRk8ojYqAg_8J2z3h-ZQDRZRcoSLV42TA@z> (raw)
In-Reply-To: <20171101003447.GC29237@lianli.shorne-pla.net>



On 01/11/17 00:34, Stafford Horne wrote:
> On Wed, Nov 01, 2017 at 08:17:59AM +0900, Stafford Horne wrote:
>> On Tue, Oct 31, 2017 at 02:06:21PM +0000, Matt Redfearn wrote:
>>> Hi,
>>>
>>>
>>> On 29/10/17 23:11, Stafford Horne wrote:
>>>> In case timers are not in sync when cpus start (i.e. hot plug / offset
>>>> resets) we need to synchronize the secondary cpus internal timer with
>>>> the main cpu.  This is needed as in OpenRISC SMP there is only one
>>>> clocksource registered which reads from the same ttcr register on each
>>>> cpu.
>>>>
>>>> This synchronization routine heavily borrows from mips implementation that
>>>> does something similar.
>> [..]
>>>> diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c
>>>> index 4763b8b9161e..4d80ce6fa045 100644
>>>> --- a/arch/openrisc/kernel/smp.c
>>>> +++ b/arch/openrisc/kernel/smp.c
>>>> @@ -100,6 +100,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>>>    		pr_crit("CPU%u: failed to start\n", cpu);
>>>>    		return -EIO;
>>>>    	}
>>>> +	synchronise_count_master(cpu);
>>>>    	return 0;
>>>>    }
>>>> @@ -129,6 +130,8 @@ asmlinkage __init void secondary_start_kernel(void)
>>>>    	set_cpu_online(cpu, true);
>>>>    	complete(&cpu_running);
>>>> +	synchronise_count_slave(cpu);
>>>> +
>>>
>>> Note that until 8f46cca1e6c06a058374816887059bcc017b382f, the MIPS timer
>>> synchronization code contained the possibility of deadlock. If you mark a
>>> CPU online before it goes into the synchronize loop, then the boot CPU can
>>> schedule a different thread and send IPIs to all "online" CPUs. It gets
>>> stuck waiting for the secondary to ack it's IPI, since this secondary CPU
>>> has not enabled IRQs yet, and is stuck waiting for the master to synchronise
>>> with it. The system then deadlocks.
>>> Commit 8f46cca1e6c06a058374816887059bcc017b382f fixed this for MIPS and you
>>> might want to similarly move the
>>>
>>> set_cpu_online(cpu, true);
>>>
>>> after counters are synchronized.
>> Thank you for the heads up.  I do remember having interim issues with the timer
>> syncing but I havent seen it for a while.  I think I fixed it by also moving
>> synchronise_count_slave.
>>
>> Let me double check.  Also, I see your patch 8f46cca1e6c06a0583748168 was merged
>> last year?
> Hello,
>
> I should have read a bit more closely, definitely this could be an issue if the
> boot cpu has other things running.
>
> However, looking at mainline I can see the clock sync comes after set_cpu_online
> again after this patch in mips.
>
>    6f542ebeaee0 MIPS: Fix race on setting and getting cpu_online_mask
>    Author: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
>
> Is this deadlock an issue in mips again?
>
> -Stafford

Hi Stafford,

Yes - the deadlock is an issue again, it was re-introduced by 
6f542ebeaee0. That patch was based on testing with 4.4, where the core 
CPU hotplug code did not contain it's own completion event. Since commit 
8df3e07e7f21f ("cpu/hotplug: Let upcoming cpu bring itself fully up"), 
which was added in 4.6, this is no longer the case and there is no race 
condition. I have https://patchwork.linux-mips.org/patch/17376/ pending 
which fixes this race in pre-4.6 stable kernels, and guards against the 
deadlock as well. Unfortunately because of the backport to stable this 
gets a little more complex.

Unless your patches to add SMP are going to be applied to pre-4.6 
kernels, then you will not suffer the race condition. The potential 
deadlock is the only pitfall you need to guard against - which will be 
OK if you put the clock sync before marking the CPU online.

Thanks,
Matt

WARNING: multiple messages have this Message-ID (diff)
From: Matt Redfearn <matt.redfearn@mips.com>
To: openrisc@lists.librecores.org
Subject: [OpenRISC] [PATCH v4 13/13] openrisc: add tick timer multi-core sync logic
Date: Wed, 1 Nov 2017 09:26:43 +0000	[thread overview]
Message-ID: <132de3e6-52b5-b5fe-8199-9da427a1baf4@mips.com> (raw)
In-Reply-To: <20171101003447.GC29237@lianli.shorne-pla.net>



On 01/11/17 00:34, Stafford Horne wrote:
> On Wed, Nov 01, 2017 at 08:17:59AM +0900, Stafford Horne wrote:
>> On Tue, Oct 31, 2017 at 02:06:21PM +0000, Matt Redfearn wrote:
>>> Hi,
>>>
>>>
>>> On 29/10/17 23:11, Stafford Horne wrote:
>>>> In case timers are not in sync when cpus start (i.e. hot plug / offset
>>>> resets) we need to synchronize the secondary cpus internal timer with
>>>> the main cpu.  This is needed as in OpenRISC SMP there is only one
>>>> clocksource registered which reads from the same ttcr register on each
>>>> cpu.
>>>>
>>>> This synchronization routine heavily borrows from mips implementation that
>>>> does something similar.
>> [..]
>>>> diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c
>>>> index 4763b8b9161e..4d80ce6fa045 100644
>>>> --- a/arch/openrisc/kernel/smp.c
>>>> +++ b/arch/openrisc/kernel/smp.c
>>>> @@ -100,6 +100,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>>>    		pr_crit("CPU%u: failed to start\n", cpu);
>>>>    		return -EIO;
>>>>    	}
>>>> +	synchronise_count_master(cpu);
>>>>    	return 0;
>>>>    }
>>>> @@ -129,6 +130,8 @@ asmlinkage __init void secondary_start_kernel(void)
>>>>    	set_cpu_online(cpu, true);
>>>>    	complete(&cpu_running);
>>>> +	synchronise_count_slave(cpu);
>>>> +
>>>
>>> Note that until 8f46cca1e6c06a058374816887059bcc017b382f, the MIPS timer
>>> synchronization code contained the possibility of deadlock. If you mark a
>>> CPU online before it goes into the synchronize loop, then the boot CPU can
>>> schedule a different thread and send IPIs to all "online" CPUs. It gets
>>> stuck waiting for the secondary to ack it's IPI, since this secondary CPU
>>> has not enabled IRQs yet, and is stuck waiting for the master to synchronise
>>> with it. The system then deadlocks.
>>> Commit 8f46cca1e6c06a058374816887059bcc017b382f fixed this for MIPS and you
>>> might want to similarly move the
>>>
>>> set_cpu_online(cpu, true);
>>>
>>> after counters are synchronized.
>> Thank you for the heads up.  I do remember having interim issues with the timer
>> syncing but I havent seen it for a while.  I think I fixed it by also moving
>> synchronise_count_slave.
>>
>> Let me double check.  Also, I see your patch 8f46cca1e6c06a0583748168 was merged
>> last year?
> Hello,
>
> I should have read a bit more closely, definitely this could be an issue if the
> boot cpu has other things running.
>
> However, looking at mainline I can see the clock sync comes after set_cpu_online
> again after this patch in mips.
>
>    6f542ebeaee0 MIPS: Fix race on setting and getting cpu_online_mask
>    Author: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
>
> Is this deadlock an issue in mips again?
>
> -Stafford

Hi Stafford,

Yes - the deadlock is an issue again, it was re-introduced by 
6f542ebeaee0. That patch was based on testing with 4.4, where the core 
CPU hotplug code did not contain it's own completion event. Since commit 
8df3e07e7f21f ("cpu/hotplug: Let upcoming cpu bring itself fully up"), 
which was added in 4.6, this is no longer the case and there is no race 
condition. I have https://patchwork.linux-mips.org/patch/17376/ pending 
which fixes this race in pre-4.6 stable kernels, and guards against the 
deadlock as well. Unfortunately because of the backport to stable this 
gets a little more complex.

Unless your patches to add SMP are going to be applied to pre-4.6 
kernels, then you will not suffer the race condition. The potential 
deadlock is the only pitfall you need to guard against - which will be 
OK if you put the clock sync before marking the CPU online.

Thanks,
Matt



  reply	other threads:[~2017-11-01  9:28 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-29 23:11 [PATCH v4 00/13] OpenRISC SMP Support Stafford Horne
2017-10-29 23:11 ` [PATCH v4 01/13] openrisc: use shadow registers to save regs on exception Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 02/13] openrisc: add 1 and 2 byte cmpxchg support Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 03/13] openrisc: use qspinlocks and qrwlocks Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 04/13] dt-bindings: add openrisc to vendor prefixes list Stafford Horne
2017-10-29 23:11 ` [PATCH v4 05/13] irqchip: add initial support for ompic Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-30  2:29   ` Marc Zyngier
2017-10-30  2:29     ` [OpenRISC] " Marc Zyngier
2017-10-30  2:29     ` Marc Zyngier
2017-10-30  4:18     ` Stafford Horne
2017-10-30  4:18       ` [OpenRISC] " Stafford Horne
2017-10-30  4:18       ` Stafford Horne
2017-10-30  6:11       ` Marc Zyngier
2017-10-30  6:11         ` [OpenRISC] " Marc Zyngier
2017-11-01 12:17         ` Stafford Horne
2017-11-01 12:17           ` [OpenRISC] " Stafford Horne
2017-11-01 12:17           ` Stafford Horne
2017-10-29 23:11 ` [PATCH v4 06/13] openrisc: initial SMP support Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 07/13] openrisc: fix initial preempt state for secondary cpu tasks Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 08/13] openrisc: sleep instead of spin on secondary wait Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 09/13] openrisc: add cacheflush support to fix icache aliasing Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 10/13] openrisc: add simple_smp dts and defconfig for simulators Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 11/13] openrisc: support framepointers and STACKTRACE_SUPPORT Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 12/13] openrisc: enable LOCKDEP_SUPPORT and irqflags tracing Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 13/13] openrisc: add tick timer multi-core sync logic Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-31 14:06   ` Matt Redfearn
2017-10-31 14:06     ` [OpenRISC] " Matt Redfearn
2017-10-31 23:17     ` Stafford Horne
2017-10-31 23:17       ` [OpenRISC] " Stafford Horne
2017-11-01  0:34       ` Stafford Horne
2017-11-01  0:34         ` [OpenRISC] " Stafford Horne
2017-11-01  9:26         ` Matt Redfearn [this message]
2017-11-01  9:26           ` Matt Redfearn
2017-11-01  9:26           ` Matt Redfearn
2017-11-01 12:15           ` Stafford Horne
2017-11-01 12:15             ` [OpenRISC] " Stafford Horne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=132de3e6-52b5-b5fe-8199-9da427a1baf4@mips.com \
    --to=matt.redfearn@mips.com \
    --cc=james.hogan@imgtec.com \
    --cc=james.hogan@mips.com \
    --cc=jan.weinstock@ice.rwth-aachen.de \
    --cc=jonas@southpole.se \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=matija.glavinic-pecotic.ext@nokia.com \
    --cc=matt.redfearn@imgtec.com \
    --cc=openrisc@lists.librecores.org \
    --cc=shorne@gmail.com \
    --cc=stefan.kristiansson@saunalahti.fi \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.