All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Stafford Horne <shorne@gmail.com>
To: Matt Redfearn <matt.redfearn@mips.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Jonas Bonn <jonas@southpole.se>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Jan Henrik Weinstock <jan.weinstock@ice.rwth-aachen.de>,
	Matt Redfearn <matt.redfearn@imgtec.com>,
	James Hogan <james.hogan@imgtec.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	openrisc@lists.librecores.org,
	Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>,
	James Hogan <james.hogan@mips.com>,
	Linux MIPS Mailing List <linux-mips@linux-mips.org>
Subject: Re: [PATCH v4 13/13] openrisc: add tick timer multi-core sync logic
Date: Wed, 1 Nov 2017 21:15:49 +0900	[thread overview]
Message-ID: <20171101121549.GG29237@lianli.shorne-pla.net> (raw)
In-Reply-To: <132de3e6-52b5-b5fe-8199-9da427a1baf4@mips.com>

On Wed, Nov 01, 2017 at 09:26:43AM +0000, Matt Redfearn wrote:
> 
> 
> On 01/11/17 00:34, Stafford Horne wrote:
> > On Wed, Nov 01, 2017 at 08:17:59AM +0900, Stafford Horne wrote:
> > > On Tue, Oct 31, 2017 at 02:06:21PM +0000, Matt Redfearn wrote:
> > > > Hi,
> > > > 
> > > > 
> > > > On 29/10/17 23:11, Stafford Horne wrote:
> > > > > In case timers are not in sync when cpus start (i.e. hot plug / offset
> > > > > resets) we need to synchronize the secondary cpus internal timer with
> > > > > the main cpu.  This is needed as in OpenRISC SMP there is only one
> > > > > clocksource registered which reads from the same ttcr register on each
> > > > > cpu.
> > > > > 
> > > > > This synchronization routine heavily borrows from mips implementation that
> > > > > does something similar.
> > > [..]
> > > > > diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c
> > > > > index 4763b8b9161e..4d80ce6fa045 100644
> > > > > --- a/arch/openrisc/kernel/smp.c
> > > > > +++ b/arch/openrisc/kernel/smp.c
> > > > > @@ -100,6 +100,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> > > > >    		pr_crit("CPU%u: failed to start\n", cpu);
> > > > >    		return -EIO;
> > > > >    	}
> > > > > +	synchronise_count_master(cpu);
> > > > >    	return 0;
> > > > >    }
> > > > > @@ -129,6 +130,8 @@ asmlinkage __init void secondary_start_kernel(void)
> > > > >    	set_cpu_online(cpu, true);
> > > > >    	complete(&cpu_running);
> > > > > +	synchronise_count_slave(cpu);
> > > > > +
> > > > 
> > > > Note that until 8f46cca1e6c06a058374816887059bcc017b382f, the MIPS timer
> > > > synchronization code contained the possibility of deadlock. If you mark a
> > > > CPU online before it goes into the synchronize loop, then the boot CPU can
> > > > schedule a different thread and send IPIs to all "online" CPUs. It gets
> > > > stuck waiting for the secondary to ack it's IPI, since this secondary CPU
> > > > has not enabled IRQs yet, and is stuck waiting for the master to synchronise
> > > > with it. The system then deadlocks.
> > > > Commit 8f46cca1e6c06a058374816887059bcc017b382f fixed this for MIPS and you
> > > > might want to similarly move the
> > > > 
> > > > set_cpu_online(cpu, true);
> > > > 
> > > > after counters are synchronized.
> > > Thank you for the heads up.  I do remember having interim issues with the timer
> > > syncing but I havent seen it for a while.  I think I fixed it by also moving
> > > synchronise_count_slave.
> > > 
> > > Let me double check.  Also, I see your patch 8f46cca1e6c06a0583748168 was merged
> > > last year?
> > Hello,
> > 
> > I should have read a bit more closely, definitely this could be an issue if the
> > boot cpu has other things running.
> > 
> > However, looking at mainline I can see the clock sync comes after set_cpu_online
> > again after this patch in mips.
> > 
> >    6f542ebeaee0 MIPS: Fix race on setting and getting cpu_online_mask
> >    Author: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
> > 
> > Is this deadlock an issue in mips again?
> > 
> > -Stafford
> 
> Hi Stafford,
> 
> Yes - the deadlock is an issue again, it was re-introduced by 6f542ebeaee0.
> That patch was based on testing with 4.4, where the core CPU hotplug code
> did not contain it's own completion event. Since commit 8df3e07e7f21f
> ("cpu/hotplug: Let upcoming cpu bring itself fully up"), which was added in
> 4.6, this is no longer the case and there is no race condition. I have
> https://patchwork.linux-mips.org/patch/17376/ pending which fixes this race
> in pre-4.6 stable kernels, and guards against the deadlock as well.
> Unfortunately because of the backport to stable this gets a little more
> complex.
> 
> Unless your patches to add SMP are going to be applied to pre-4.6 kernels,
> then you will not suffer the race condition. The potential deadlock is the
> only pitfall you need to guard against - which will be OK if you put the
> clock sync before marking the CPU online.
> 
> Thanks,
> Matt

Hi Matt,

Thanks for the details.  I will make the patch and test it out.

On a side note, I was thinking to pull the sync code out into asm-generic in
case any other architectures want to use it, it seems generic enough to work for
other architectures.  Any thoughts?

-Stafford

WARNING: multiple messages have this Message-ID (diff)
From: Stafford Horne <shorne@gmail.com>
To: openrisc@lists.librecores.org
Subject: [OpenRISC] [PATCH v4 13/13] openrisc: add tick timer multi-core sync logic
Date: Wed, 1 Nov 2017 21:15:49 +0900	[thread overview]
Message-ID: <20171101121549.GG29237@lianli.shorne-pla.net> (raw)
In-Reply-To: <132de3e6-52b5-b5fe-8199-9da427a1baf4@mips.com>

On Wed, Nov 01, 2017 at 09:26:43AM +0000, Matt Redfearn wrote:
> 
> 
> On 01/11/17 00:34, Stafford Horne wrote:
> > On Wed, Nov 01, 2017 at 08:17:59AM +0900, Stafford Horne wrote:
> > > On Tue, Oct 31, 2017 at 02:06:21PM +0000, Matt Redfearn wrote:
> > > > Hi,
> > > > 
> > > > 
> > > > On 29/10/17 23:11, Stafford Horne wrote:
> > > > > In case timers are not in sync when cpus start (i.e. hot plug / offset
> > > > > resets) we need to synchronize the secondary cpus internal timer with
> > > > > the main cpu.  This is needed as in OpenRISC SMP there is only one
> > > > > clocksource registered which reads from the same ttcr register on each
> > > > > cpu.
> > > > > 
> > > > > This synchronization routine heavily borrows from mips implementation that
> > > > > does something similar.
> > > [..]
> > > > > diff --git a/arch/openrisc/kernel/smp.c b/arch/openrisc/kernel/smp.c
> > > > > index 4763b8b9161e..4d80ce6fa045 100644
> > > > > --- a/arch/openrisc/kernel/smp.c
> > > > > +++ b/arch/openrisc/kernel/smp.c
> > > > > @@ -100,6 +100,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> > > > >    		pr_crit("CPU%u: failed to start\n", cpu);
> > > > >    		return -EIO;
> > > > >    	}
> > > > > +	synchronise_count_master(cpu);
> > > > >    	return 0;
> > > > >    }
> > > > > @@ -129,6 +130,8 @@ asmlinkage __init void secondary_start_kernel(void)
> > > > >    	set_cpu_online(cpu, true);
> > > > >    	complete(&cpu_running);
> > > > > +	synchronise_count_slave(cpu);
> > > > > +
> > > > 
> > > > Note that until 8f46cca1e6c06a058374816887059bcc017b382f, the MIPS timer
> > > > synchronization code contained the possibility of deadlock. If you mark a
> > > > CPU online before it goes into the synchronize loop, then the boot CPU can
> > > > schedule a different thread and send IPIs to all "online" CPUs. It gets
> > > > stuck waiting for the secondary to ack it's IPI, since this secondary CPU
> > > > has not enabled IRQs yet, and is stuck waiting for the master to synchronise
> > > > with it. The system then deadlocks.
> > > > Commit 8f46cca1e6c06a058374816887059bcc017b382f fixed this for MIPS and you
> > > > might want to similarly move the
> > > > 
> > > > set_cpu_online(cpu, true);
> > > > 
> > > > after counters are synchronized.
> > > Thank you for the heads up.  I do remember having interim issues with the timer
> > > syncing but I havent seen it for a while.  I think I fixed it by also moving
> > > synchronise_count_slave.
> > > 
> > > Let me double check.  Also, I see your patch 8f46cca1e6c06a0583748168 was merged
> > > last year?
> > Hello,
> > 
> > I should have read a bit more closely, definitely this could be an issue if the
> > boot cpu has other things running.
> > 
> > However, looking at mainline I can see the clock sync comes after set_cpu_online
> > again after this patch in mips.
> > 
> >    6f542ebeaee0 MIPS: Fix race on setting and getting cpu_online_mask
> >    Author: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
> > 
> > Is this deadlock an issue in mips again?
> > 
> > -Stafford
> 
> Hi Stafford,
> 
> Yes - the deadlock is an issue again, it was re-introduced by 6f542ebeaee0.
> That patch was based on testing with 4.4, where the core CPU hotplug code
> did not contain it's own completion event. Since commit 8df3e07e7f21f
> ("cpu/hotplug: Let upcoming cpu bring itself fully up"), which was added in
> 4.6, this is no longer the case and there is no race condition. I have
> https://patchwork.linux-mips.org/patch/17376/ pending which fixes this race
> in pre-4.6 stable kernels, and guards against the deadlock as well.
> Unfortunately because of the backport to stable this gets a little more
> complex.
> 
> Unless your patches to add SMP are going to be applied to pre-4.6 kernels,
> then you will not suffer the race condition. The potential deadlock is the
> only pitfall you need to guard against - which will be OK if you put the
> clock sync before marking the CPU online.
> 
> Thanks,
> Matt

Hi Matt,

Thanks for the details.  I will make the patch and test it out.

On a side note, I was thinking to pull the sync code out into asm-generic in
case any other architectures want to use it, it seems generic enough to work for
other architectures.  Any thoughts?

-Stafford


  reply	other threads:[~2017-11-01 12:15 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-29 23:11 [PATCH v4 00/13] OpenRISC SMP Support Stafford Horne
2017-10-29 23:11 ` [PATCH v4 01/13] openrisc: use shadow registers to save regs on exception Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 02/13] openrisc: add 1 and 2 byte cmpxchg support Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 03/13] openrisc: use qspinlocks and qrwlocks Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 04/13] dt-bindings: add openrisc to vendor prefixes list Stafford Horne
2017-10-29 23:11 ` [PATCH v4 05/13] irqchip: add initial support for ompic Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-30  2:29   ` Marc Zyngier
2017-10-30  2:29     ` [OpenRISC] " Marc Zyngier
2017-10-30  2:29     ` Marc Zyngier
2017-10-30  4:18     ` Stafford Horne
2017-10-30  4:18       ` [OpenRISC] " Stafford Horne
2017-10-30  4:18       ` Stafford Horne
2017-10-30  6:11       ` Marc Zyngier
2017-10-30  6:11         ` [OpenRISC] " Marc Zyngier
2017-11-01 12:17         ` Stafford Horne
2017-11-01 12:17           ` [OpenRISC] " Stafford Horne
2017-11-01 12:17           ` Stafford Horne
2017-10-29 23:11 ` [PATCH v4 06/13] openrisc: initial SMP support Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 07/13] openrisc: fix initial preempt state for secondary cpu tasks Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 08/13] openrisc: sleep instead of spin on secondary wait Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 09/13] openrisc: add cacheflush support to fix icache aliasing Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 10/13] openrisc: add simple_smp dts and defconfig for simulators Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 11/13] openrisc: support framepointers and STACKTRACE_SUPPORT Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 12/13] openrisc: enable LOCKDEP_SUPPORT and irqflags tracing Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-29 23:11 ` [PATCH v4 13/13] openrisc: add tick timer multi-core sync logic Stafford Horne
2017-10-29 23:11   ` [OpenRISC] " Stafford Horne
2017-10-31 14:06   ` Matt Redfearn
2017-10-31 14:06     ` [OpenRISC] " Matt Redfearn
2017-10-31 23:17     ` Stafford Horne
2017-10-31 23:17       ` [OpenRISC] " Stafford Horne
2017-11-01  0:34       ` Stafford Horne
2017-11-01  0:34         ` [OpenRISC] " Stafford Horne
2017-11-01  9:26         ` Matt Redfearn
2017-11-01  9:26           ` [OpenRISC] " Matt Redfearn
2017-11-01  9:26           ` Matt Redfearn
2017-11-01 12:15           ` Stafford Horne [this message]
2017-11-01 12:15             ` [OpenRISC] " Stafford Horne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171101121549.GG29237@lianli.shorne-pla.net \
    --to=shorne@gmail.com \
    --cc=james.hogan@imgtec.com \
    --cc=james.hogan@mips.com \
    --cc=jan.weinstock@ice.rwth-aachen.de \
    --cc=jonas@southpole.se \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=matija.glavinic-pecotic.ext@nokia.com \
    --cc=matt.redfearn@imgtec.com \
    --cc=matt.redfearn@mips.com \
    --cc=openrisc@lists.librecores.org \
    --cc=stefan.kristiansson@saunalahti.fi \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.