From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932674AbdJaXSE (ORCPT ); Tue, 31 Oct 2017 19:18:04 -0400 Received: from mail-pg0-f68.google.com ([74.125.83.68]:55942 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752319AbdJaXSC (ORCPT ); Tue, 31 Oct 2017 19:18:02 -0400 X-Google-Smtp-Source: ABhQp+RqQDnKAWyVQdwycToFtayGcX29VDaYJGlOwFwzWxoYpAdavgwtFayBHuXkGnWKc1nytsyv6w== Date: Wed, 1 Nov 2017 08:17:59 +0900 From: Stafford Horne To: Matt Redfearn Cc: LKML , Jonas Bonn , Stefan Kristiansson , Jan Henrik Weinstock , Matt Redfearn , James Hogan , Thomas Gleixner , openrisc@lists.librecores.org Subject: Re: [PATCH v4 13/13] openrisc: add tick timer multi-core sync logic Message-ID: <20171031231759.GB29237@lianli.shorne-pla.net> References: <20171029231123.27281-1-shorne@gmail.com> <20171029231123.27281-14-shorne@gmail.com> <05333dd1-f8df-c96e-03df-1623ff67ab39@mips.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <05333dd1-f8df-c96e-03df-1623ff67ab39@mips.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 31, 2017 at 02:06:21PM +0000, Matt Redfearn wrote: > Hi, > > > On 29/10/17 23:11, Stafford Horne wrote: > > In case timers are not in sync when cpus start (i.e. hot plug / offset > > resets) we need to synchronize the secondary cpus internal timer with > > the main cpu. This is needed as in OpenRISC SMP there is only one > > clocksource registered which reads from the same ttcr register on each > > cpu. > > > > This synchronization routine heavily borrows from mips implementation that > > does something similar. [..] > > diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c > > index 4763b8b9161e..4d80ce6fa045 100644 > > --- a/arch/openrisc/kernel/smp.c > > +++ b/arch/openrisc/kernel/smp.c > > @@ -100,6 +100,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) > > pr_crit("CPU%u: failed to start\n", cpu); > > return -EIO; > > } > > + synchronise_count_master(cpu); > > return 0; > > } > > @@ -129,6 +130,8 @@ asmlinkage __init void secondary_start_kernel(void) > > set_cpu_online(cpu, true); > > complete(&cpu_running); > > + synchronise_count_slave(cpu); > > + > > > Note that until 8f46cca1e6c06a058374816887059bcc017b382f, the MIPS timer > synchronization code contained the possibility of deadlock. If you mark a > CPU online before it goes into the synchronize loop, then the boot CPU can > schedule a different thread and send IPIs to all "online" CPUs. It gets > stuck waiting for the secondary to ack it's IPI, since this secondary CPU > has not enabled IRQs yet, and is stuck waiting for the master to synchronise > with it. The system then deadlocks. > Commit 8f46cca1e6c06a058374816887059bcc017b382f fixed this for MIPS and you > might want to similarly move the > > set_cpu_online(cpu, true); > > after counters are synchronized. Thank you for the heads up. I do remember having interim issues with the timer syncing but I havent seen it for a while. I think I fixed it by also moving synchronise_count_slave. Let me double check. Also, I see your patch 8f46cca1e6c06a0583748168 was merged last year? -Stafford From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stafford Horne Date: Wed, 1 Nov 2017 08:17:59 +0900 Subject: [OpenRISC] [PATCH v4 13/13] openrisc: add tick timer multi-core sync logic In-Reply-To: <05333dd1-f8df-c96e-03df-1623ff67ab39@mips.com> References: <20171029231123.27281-1-shorne@gmail.com> <20171029231123.27281-14-shorne@gmail.com> <05333dd1-f8df-c96e-03df-1623ff67ab39@mips.com> Message-ID: <20171031231759.GB29237@lianli.shorne-pla.net> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: openrisc@lists.librecores.org On Tue, Oct 31, 2017 at 02:06:21PM +0000, Matt Redfearn wrote: > Hi, > > > On 29/10/17 23:11, Stafford Horne wrote: > > In case timers are not in sync when cpus start (i.e. hot plug / offset > > resets) we need to synchronize the secondary cpus internal timer with > > the main cpu. This is needed as in OpenRISC SMP there is only one > > clocksource registered which reads from the same ttcr register on each > > cpu. > > > > This synchronization routine heavily borrows from mips implementation that > > does something similar. [..] > > diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c > > index 4763b8b9161e..4d80ce6fa045 100644 > > --- a/arch/openrisc/kernel/smp.c > > +++ b/arch/openrisc/kernel/smp.c > > @@ -100,6 +100,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) > > pr_crit("CPU%u: failed to start\n", cpu); > > return -EIO; > > } > > + synchronise_count_master(cpu); > > return 0; > > } > > @@ -129,6 +130,8 @@ asmlinkage __init void secondary_start_kernel(void) > > set_cpu_online(cpu, true); > > complete(&cpu_running); > > + synchronise_count_slave(cpu); > > + > > > Note that until 8f46cca1e6c06a058374816887059bcc017b382f, the MIPS timer > synchronization code contained the possibility of deadlock. If you mark a > CPU online before it goes into the synchronize loop, then the boot CPU can > schedule a different thread and send IPIs to all "online" CPUs. It gets > stuck waiting for the secondary to ack it's IPI, since this secondary CPU > has not enabled IRQs yet, and is stuck waiting for the master to synchronise > with it. The system then deadlocks. > Commit 8f46cca1e6c06a058374816887059bcc017b382f fixed this for MIPS and you > might want to similarly move the > > set_cpu_online(cpu, true); > > after counters are synchronized. Thank you for the heads up. I do remember having interim issues with the timer syncing but I havent seen it for a while. I think I fixed it by also moving synchronise_count_slave. Let me double check. Also, I see your patch 8f46cca1e6c06a0583748168 was merged last year? -Stafford