From: Alex Kogan <alex.kogan@oracle.com> To: Peter Zijlstra <peterz@infradead.org> Cc: linux@armlinux.org.uk, Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>, Arnd Bergmann <arnd@arndb.de>, longman@redhat.com, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com, jglauber@marvell.com, steven.sistare@oracle.com, daniel.m.jordan@oracle.com, dave.dice@oracle.com Subject: Re: [External] : Re: [PATCH v14 4/6] locking/qspinlock: Introduce starvation avoidance into CNA Date: Thu, 15 Apr 2021 22:52:57 -0400 [thread overview] Message-ID: <02D4688A-FB4C-4100-8B85-C915F130BB99@oracle.com> (raw) In-Reply-To: <YHWIezK9pbmbWxsu@hirez.programming.kicks-ass.net> > On Apr 13, 2021, at 8:03 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Apr 01, 2021 at 11:31:54AM -0400, Alex Kogan wrote: > >> @@ -49,13 +55,33 @@ struct cna_node { >> u16 real_numa_node; >> u32 encoded_tail; /* self */ >> u32 partial_order; /* enum val */ >> + s32 start_time; >> }; > >> +/* >> + * Controls the threshold time in ms (default = 10) for intra-node lock >> + * hand-offs before the NUMA-aware variant of spinlock is forced to be >> + * passed to a thread on another NUMA node. The default setting can be >> + * changed with the "numa_spinlock_threshold" boot option. >> + */ >> +#define MSECS_TO_JIFFIES(m) \ >> + (((m) + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ)) >> +static int intra_node_handoff_threshold __ro_after_init = MSECS_TO_JIFFIES(10); >> + >> +static inline bool intra_node_threshold_reached(struct cna_node *cn) >> +{ >> + s32 current_time = (s32)jiffies; >> + s32 threshold = cn->start_time + intra_node_handoff_threshold; >> + >> + return current_time - threshold > 0; >> +} > > None of this makes any sense: > > - why do you track time elapsed as a signed entity? > - why are you using jiffies; that's terrible granularity. Good points. I will address that (see below). I will just mention that those suggestions came from senior folks on this mailing list, and it seemed prudent to take their counsel. > > As Andi already said, 10ms is silly large. You've just inflated the > lock-acquire time for every contended lock to stupid land just because > NUMA. I just ran a few quick tests — local_clock() (a wrapper around sched_clock()) works well, so I will switch to using that. I also took a few numbers with different thresholds. Looks like we can drop the threshold to 1ms with a minor penalty to performance. However, pushing the threshold to 100us has a more significant cost. Here are the numbers for reference: will-it-scale/lock2_threads: threshold: 10ms 1ms 100us speedup at 142 threads: 2.184 1.974 1.1418 will-it-scale/open1_threads: threshold: 10ms 1ms 100us speedup at 142 threads: 2.146 1.974 1.291 Would you be more comfortable with setting the default at 1ms? > And this also brings me to the whole premise of this series; *why* are > we optimizing this? What locks are so contended that this actually helps > and shouldn't you be spending your time breaking those locks? That would > improve throughput more than this ever can. I think for the same reason the kernel switched from ticket locks to queue locks several years back. There always will be applications with contended locks. Sometimes the workarounds are easy, but many times they are not, like with legacy applications or when the workload is skewed (e.g., every client tries to update the metadata of the same file protected by the same lock). The results show that for those cases we leave > 2x performance on the table. Those are not only our numbers — LKP reports show similar or even better results, on a wide range of benchmarks, e.g.: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/HGVOCYDEE5KTLYPTAFBD2RXDQOCDPFUJ/ https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/OUPS7MZ3GJA2XYWM52GMU7H7EI25IT37/ https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/DNMEQPXJRQY2IKHZ3ERGRY6TUPWDTFUN/ Regards, — Alex
WARNING: multiple messages have this Message-ID (diff)
From: Alex Kogan <alex.kogan@oracle.com> To: Peter Zijlstra <peterz@infradead.org> Cc: linux@armlinux.org.uk, Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>, Arnd Bergmann <arnd@arndb.de>, longman@redhat.com, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com, jglauber@marvell.com, steven.sistare@oracle.com, daniel.m.jordan@oracle.com, dave.dice@oracle.com Subject: Re: [External] : Re: [PATCH v14 4/6] locking/qspinlock: Introduce starvation avoidance into CNA Date: Thu, 15 Apr 2021 22:52:57 -0400 [thread overview] Message-ID: <02D4688A-FB4C-4100-8B85-C915F130BB99@oracle.com> (raw) In-Reply-To: <YHWIezK9pbmbWxsu@hirez.programming.kicks-ass.net> > On Apr 13, 2021, at 8:03 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Apr 01, 2021 at 11:31:54AM -0400, Alex Kogan wrote: > >> @@ -49,13 +55,33 @@ struct cna_node { >> u16 real_numa_node; >> u32 encoded_tail; /* self */ >> u32 partial_order; /* enum val */ >> + s32 start_time; >> }; > >> +/* >> + * Controls the threshold time in ms (default = 10) for intra-node lock >> + * hand-offs before the NUMA-aware variant of spinlock is forced to be >> + * passed to a thread on another NUMA node. The default setting can be >> + * changed with the "numa_spinlock_threshold" boot option. >> + */ >> +#define MSECS_TO_JIFFIES(m) \ >> + (((m) + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ)) >> +static int intra_node_handoff_threshold __ro_after_init = MSECS_TO_JIFFIES(10); >> + >> +static inline bool intra_node_threshold_reached(struct cna_node *cn) >> +{ >> + s32 current_time = (s32)jiffies; >> + s32 threshold = cn->start_time + intra_node_handoff_threshold; >> + >> + return current_time - threshold > 0; >> +} > > None of this makes any sense: > > - why do you track time elapsed as a signed entity? > - why are you using jiffies; that's terrible granularity. Good points. I will address that (see below). I will just mention that those suggestions came from senior folks on this mailing list, and it seemed prudent to take their counsel. > > As Andi already said, 10ms is silly large. You've just inflated the > lock-acquire time for every contended lock to stupid land just because > NUMA. I just ran a few quick tests — local_clock() (a wrapper around sched_clock()) works well, so I will switch to using that. I also took a few numbers with different thresholds. Looks like we can drop the threshold to 1ms with a minor penalty to performance. However, pushing the threshold to 100us has a more significant cost. Here are the numbers for reference: will-it-scale/lock2_threads: threshold: 10ms 1ms 100us speedup at 142 threads: 2.184 1.974 1.1418 will-it-scale/open1_threads: threshold: 10ms 1ms 100us speedup at 142 threads: 2.146 1.974 1.291 Would you be more comfortable with setting the default at 1ms? > And this also brings me to the whole premise of this series; *why* are > we optimizing this? What locks are so contended that this actually helps > and shouldn't you be spending your time breaking those locks? That would > improve throughput more than this ever can. I think for the same reason the kernel switched from ticket locks to queue locks several years back. There always will be applications with contended locks. Sometimes the workarounds are easy, but many times they are not, like with legacy applications or when the workload is skewed (e.g., every client tries to update the metadata of the same file protected by the same lock). The results show that for those cases we leave > 2x performance on the table. Those are not only our numbers — LKP reports show similar or even better results, on a wide range of benchmarks, e.g.: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/HGVOCYDEE5KTLYPTAFBD2RXDQOCDPFUJ/ https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/OUPS7MZ3GJA2XYWM52GMU7H7EI25IT37/ https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/DNMEQPXJRQY2IKHZ3ERGRY6TUPWDTFUN/ Regards, — Alex _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-04-16 2:54 UTC|newest] Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-04-01 15:31 [PATCH v14 0/6] Add NUMA-awareness to qspinlock Alex Kogan 2021-04-01 15:31 ` Alex Kogan 2021-04-01 15:31 ` [PATCH v14 1/6] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic Alex Kogan 2021-04-01 15:31 ` Alex Kogan 2021-04-01 15:31 ` [PATCH v14 2/6] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan 2021-04-01 15:31 ` Alex Kogan 2021-04-01 15:31 ` [PATCH v14 3/6] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan 2021-04-01 15:31 ` Alex Kogan 2021-04-13 11:30 ` Peter Zijlstra 2021-04-13 11:30 ` Peter Zijlstra 2021-04-14 2:29 ` [External] : " Alex Kogan 2021-04-14 2:29 ` Alex Kogan 2021-04-01 15:31 ` [PATCH v14 4/6] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan 2021-04-01 15:31 ` Alex Kogan 2021-04-13 6:03 ` Andi Kleen 2021-04-13 6:03 ` Andi Kleen 2021-04-13 6:12 ` Andi Kleen 2021-04-13 6:12 ` Andi Kleen 2021-04-13 21:01 ` [External] : " Alex Kogan 2021-04-13 21:01 ` Alex Kogan 2021-04-13 21:22 ` Andi Kleen 2021-04-13 21:22 ` Andi Kleen 2021-04-14 2:30 ` Alex Kogan 2021-04-14 2:30 ` Alex Kogan 2021-04-14 16:41 ` Waiman Long 2021-04-14 16:41 ` Waiman Long 2021-04-14 17:26 ` Andi Kleen 2021-04-14 17:26 ` Andi Kleen 2021-04-14 17:31 ` Waiman Long 2021-04-14 17:31 ` Waiman Long 2021-04-13 12:03 ` Peter Zijlstra 2021-04-13 12:03 ` Peter Zijlstra 2021-04-16 2:52 ` Alex Kogan [this message] 2021-04-16 2:52 ` [External] : " Alex Kogan 2021-04-01 15:31 ` [PATCH v14 5/6] locking/qspinlock: Avoid moving certain threads between waiting queues in CNA Alex Kogan 2021-04-01 15:31 ` Alex Kogan 2021-04-01 15:31 ` [PATCH v14 6/6] locking/qspinlock: Introduce the shuffle reduction optimization into CNA Alex Kogan 2021-04-01 15:31 ` Alex Kogan 2021-04-14 7:47 ` Andreas Herrmann 2021-04-14 7:47 ` Andreas Herrmann 2021-04-14 18:18 ` [External] : " Alex Kogan 2021-04-14 18:18 ` Alex Kogan
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=02D4688A-FB4C-4100-8B85-C915F130BB99@oracle.com \ --to=alex.kogan@oracle.com \ --cc=arnd@arndb.de \ --cc=bp@alien8.de \ --cc=daniel.m.jordan@oracle.com \ --cc=dave.dice@oracle.com \ --cc=guohanjun@huawei.com \ --cc=hpa@zytor.com \ --cc=jglauber@marvell.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=longman@redhat.com \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=steven.sistare@oracle.com \ --cc=tglx@linutronix.de \ --cc=will.deacon@arm.com \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.