BPF Archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <clm@meta.com>
To: Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, martin.lau@kernel.org, joshdon@google.com,
	brho@google.com, pjt@google.com, derkling@google.com,
	haoluo@google.com, dvernet@meta.com, dschatzberg@meta.com,
	dskarlat@cs.cmu.edu, riel@surriel.com, changwoo@igalia.com,
	himadrics@inria.fr, memxor@gmail.com, andrea.righi@canonical.com,
	joel@joelfernandes.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCHSET v6] sched: Implement BPF extensible scheduler class
Date: Tue, 14 May 2024 16:22:25 -0400	[thread overview]
Message-ID: <6eb9302e-59c9-4242-bfb1-e473d3e5380e@meta.com> (raw)
In-Reply-To: <20240513080359.GI30852@noisy.programming.kicks-ass.net>

On 5/13/24 4:03 AM, Peter Zijlstra wrote:
> On Sun, May 05, 2024 at 01:31:26PM -1000, Tejun Heo wrote:
> 
>>> You Google/Facebook are touting collaboration, collaborate on fixing it.
>>> Instead of re-posting this over and over. After all, your main
>>> motivation for starting this was the cpu-cgroup overhead.
>>
>> The hierarchical scheduling overhead isn't the main motivation for us. We
>> can't use the CPU controller for all workloads and while it'd be nice to
>> improve that,
> 
> Hurmph, I had the impression from the earlier threads that this ~5%
> cgroup overhead was most definitely a problem and a motivator for all
> this.
> 
> The overhead was prohibitive, it was claimed, and you needed a solution.
> Did not previous versions use this very argument in order to push for
> all this?
> 
> By improving the cgroup mess -- I very much agree that the cgroup thing
> is not very nice. This whole argument goes away and we all get a better
> cgroup implementation.
> 
>> This view works only if you assume that the entire world contains only a
>> handful of developers who can work on schedulers. The only way that would be
>> the case is if the barrier of entry is raised unreasonably high. Sometimes a
>> high barrier of entry can't be avoided or is beneficial. However, if it's
>> pushed up high enough to leave only a handful of people to work on an area
>> as large as scheduling, something probably is wrong.
> 
> I've never really felt there were too few sched patches to stare at on
> any one day (quite the opposite on many days in fact).
> 
> There have also always been plenty out of tree scheduler patches --
> although I rarely if ever have time to look at them.
> 
> Writing a custom scheduler isn't that hard, simply ripping out
> fair_sched_class and replacing it with something simple really isn't
> *that* hard.
> 
> The only really hard requirement is respecting affinities, you'll crash
> and burn real hard if you get that wrong (think of all the per-cpu
> kthreads that hard rely on the per-cpu-ness of them).
> 
> But you can easily ignore cgroups, uclamp and a ton of other stuff and
> still boot and play around.
> 
>> I believe we agree that we want more people contributing to the scheduling
>> area. 
> 
> I think therein lies the rub -- contribution. If we were to do this
> thing, random loadable BPF schedulers, then how do we ensure people will
> contribute back?
> 
> That is, from where I am sitting I see $vendor mandate their $enterprise
> product needs their $BPF scheduler. At which point $vendor will have no
> incentive to ever contribute back.

Especially in the scheduler space, the incentive to contribute back
today is somewhat inverted. As you mention above, it's relatively easy
to make custom things, and it's also very difficult to get features and
patches included. The cost of maintaining patches out of tree is
relatively low in comparison with the cost of working through inclusion,
and the scheduler stands out in terms of how hard it is to land changes.

I think the scheduler balances the needs of a wide variety of workloads
exceptionally well, but based on the volume of out of tree scheduler
infrastructure, it feels like the community is struggling to meet their
collaboration needs in the upstream tree.

Just like I can’t imagine one filesystem working for everything, I think
we need to open up the field a little on schedulers.  As we develop for
new variations in workloads, power management, and hardware types, I
think sched_ext gives us a way to do more collaboration in the upstream
tree, and while I’m not pretending it’s perfect, it’s definitely ready
for expansion and broader use.

I do think that sched_ext developers will keep participating upstream,
and I agree with a lot of the points that Steve makes in his reply.
People are going to keep sending patches in because the kernel community
is just the best place to build and maintain this functionality.

-chris

  parent reply	other threads:[~2024-05-14 20:24 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01 15:09 [PATCHSET v6] sched: Implement BPF extensible scheduler class Tejun Heo
2024-05-01 15:09 ` [PATCH 01/39] cgroup: Implement cgroup_show_cftypes() Tejun Heo
2024-05-01 15:09 ` [PATCH 02/39] sched: Restructure sched_class order sanity checks in sched_init() Tejun Heo
2024-05-01 15:09 ` [PATCH 03/39] sched: Allow sched_cgroup_fork() to fail and introduce sched_cancel_fork() Tejun Heo
2024-05-01 15:09 ` [PATCH 04/39] sched: Add sched_class->reweight_task() Tejun Heo
2024-06-24 10:23   ` Peter Zijlstra
2024-06-24 10:31     ` Peter Zijlstra
2024-06-24 23:59     ` Tejun Heo
2024-06-25  7:29       ` Peter Zijlstra
2024-06-25 23:57         ` Tejun Heo
2024-06-26  1:29           ` [PATCH sched/urgent] sched/fair: set_load_weight() must also call reweight_task() for SCHED_IDLE tasks Tejun Heo
2024-06-26  2:19           ` [PATCH sched_ext/for-6.11] sched_ext: Account for idle policy when setting p->scx.weight in scx_ops_enable_task() Tejun Heo
2024-05-01 15:09 ` [PATCH 05/39] sched: Add sched_class->switching_to() and expose check_class_changing/changed() Tejun Heo
2024-06-24 11:06   ` Peter Zijlstra
2024-06-24 22:18     ` Tejun Heo
2024-06-25  8:16       ` Peter Zijlstra
2024-05-01 15:09 ` [PATCH 06/39] sched: Factor out cgroup weight conversion functions Tejun Heo
2024-05-01 15:09 ` [PATCH 07/39] sched: Expose css_tg() and __setscheduler_prio() Tejun Heo
2024-06-24 11:19   ` Peter Zijlstra
2024-06-24 18:56     ` Tejun Heo
2024-05-01 15:09 ` [PATCH 08/39] sched: Enumerate CPU cgroup file types Tejun Heo
2024-05-01 15:09 ` [PATCH 09/39] sched: Add @reason to sched_class->rq_{on|off}line() Tejun Heo
2024-06-24 11:32   ` Peter Zijlstra
2024-06-24 21:18     ` Tejun Heo
2024-06-25  8:29       ` Peter Zijlstra
2024-06-25 23:41         ` Tejun Heo
2024-06-26  8:23           ` Peter Zijlstra
2024-06-26 18:01             ` Tejun Heo
2024-06-27  1:27               ` [PATCH sched_ext/for-6.11] sched_ext: Disallow loading BPF scheduler if isolcpus= domain isolation is in effect Tejun Heo
2024-05-01 15:09 ` [PATCH 10/39] sched: Factor out update_other_load_avgs() from __update_blocked_others() Tejun Heo
2024-06-24 12:35   ` Peter Zijlstra
2024-06-24 16:15     ` Vincent Guittot
2024-06-24 19:24       ` Tejun Heo
2024-06-25  9:13         ` Vincent Guittot
2024-06-26 20:49           ` Tejun Heo
2024-05-01 15:09 ` [PATCH 11/39] cpufreq_schedutil: Refactor sugov_cpu_is_busy() Tejun Heo
2024-05-01 15:09 ` [PATCH 12/39] sched: Add normal_policy() Tejun Heo
2024-05-01 15:09 ` [PATCH 13/39] sched_ext: Add boilerplate for extensible scheduler class Tejun Heo
2024-05-01 15:09 ` [PATCH 14/39] sched_ext: Implement BPF " Tejun Heo
2024-05-01 15:09 ` [PATCH 15/39] sched_ext: Add scx_simple and scx_example_qmap example schedulers Tejun Heo
2024-05-01 15:09 ` [PATCH 16/39] sched_ext: Add sysrq-S which disables the BPF scheduler Tejun Heo
2024-05-01 15:09 ` [PATCH 17/39] sched_ext: Implement runnable task stall watchdog Tejun Heo
2024-05-01 15:09 ` [PATCH 18/39] sched_ext: Allow BPF schedulers to disallow specific tasks from joining SCHED_EXT Tejun Heo
2024-06-24 12:40   ` Peter Zijlstra
2024-06-24 19:06     ` Tejun Heo
2024-05-01 15:09 ` [PATCH 19/39] sched_ext: Print sched_ext info when dumping stack Tejun Heo
2024-06-24 12:46   ` Peter Zijlstra
2024-06-24 14:25     ` Linus Torvalds
2024-06-24 18:34     ` Tejun Heo
2024-05-01 15:09 ` [PATCH 20/39] sched_ext: Print debug dump after an error exit Tejun Heo
2024-05-01 15:09 ` [PATCH 21/39] tools/sched_ext: Add scx_show_state.py Tejun Heo
2024-05-01 15:09 ` [PATCH 22/39] sched_ext: Implement scx_bpf_kick_cpu() and task preemption support Tejun Heo
2024-05-01 15:09 ` [PATCH 23/39] sched_ext: Add a central scheduler which makes all scheduling decisions on one CPU Tejun Heo
2024-05-01 15:09 ` [PATCH 24/39] sched_ext: Make watchdog handle ops.dispatch() looping stall Tejun Heo
2024-05-01 15:10 ` [PATCH 25/39] sched_ext: Add task state tracking operations Tejun Heo
2024-05-01 15:10 ` [PATCH 26/39] sched_ext: Implement tickless support Tejun Heo
2024-05-01 15:10 ` [PATCH 27/39] sched_ext: Track tasks that are subjects of the in-flight SCX operation Tejun Heo
2024-05-01 15:10 ` [PATCH 28/39] sched_ext: Add cgroup support Tejun Heo
2024-05-01 15:10 ` [PATCH 29/39] sched_ext: Add a cgroup scheduler which uses flattened hierarchy Tejun Heo
2024-05-01 15:10 ` [PATCH 30/39] sched_ext: Implement SCX_KICK_WAIT Tejun Heo
2024-05-01 15:10 ` [PATCH 31/39] sched_ext: Implement sched_ext_ops.cpu_acquire/release() Tejun Heo
2024-05-01 15:10 ` [PATCH 32/39] sched_ext: Implement sched_ext_ops.cpu_online/offline() Tejun Heo
2024-05-01 15:10 ` [PATCH 33/39] sched_ext: Bypass BPF scheduler while PM events are in progress Tejun Heo
2024-05-01 15:10 ` [PATCH 34/39] sched_ext: Implement core-sched support Tejun Heo
2024-05-01 15:10 ` [PATCH 35/39] sched_ext: Add vtime-ordered priority queue to dispatch_q's Tejun Heo
2024-05-01 15:10 ` [PATCH 36/39] sched_ext: Implement DSQ iterator Tejun Heo
2024-05-01 15:10 ` [PATCH 37/39] sched_ext: Add cpuperf support Tejun Heo
2024-05-01 15:10 ` [PATCH 38/39] sched_ext: Documentation: scheduler: Document extensible scheduler class Tejun Heo
2024-05-02  2:24   ` Bagas Sanjaya
2024-05-01 15:10 ` [PATCH 39/39] sched_ext: Add selftests Tejun Heo
2024-05-02  8:48 ` [PATCHSET v6] sched: Implement BPF extensible scheduler class Peter Zijlstra
2024-05-02 19:20   ` Tejun Heo
2024-05-03  8:52     ` Peter Zijlstra
2024-05-05 23:31       ` Tejun Heo
2024-05-13  8:03         ` Peter Zijlstra
2024-05-13 18:26           ` Steven Rostedt
2024-05-14  0:07             ` Qais Yousef
2024-05-14 21:34               ` David Vernet
2024-05-27 21:25                 ` Qais Yousef
2024-05-28 23:46                   ` Tejun Heo
2024-05-29 22:09                     ` Qais Yousef
2024-05-17  9:58               ` Peter Zijlstra
2024-05-27 20:29                 ` Qais Yousef
2024-05-14 20:22           ` Chris Mason [this message]
2024-05-14 22:06           ` Josh Don
2024-05-15 20:41           ` Tejun Heo
2024-05-21  0:19             ` Tejun Heo
2024-05-30 16:49               ` Tejun Heo
2024-05-06 18:47       ` Rik van Riel
2024-05-07 19:33         ` Tejun Heo
2024-05-07 19:47           ` Rik van Riel
2024-05-09  7:38       ` Changwoo Min
2024-05-10 18:24 ` Peter Jung
2024-05-13 20:36 ` Andrea Righi
2024-06-11 21:34 ` Linus Torvalds
2024-06-13 23:38   ` Tejun Heo
2024-06-19 20:56   ` Thomas Gleixner
2024-06-19 22:10     ` Linus Torvalds
2024-06-19 22:27       ` Thomas Gleixner
2024-06-19 22:55         ` Linus Torvalds
2024-06-20  2:35           ` Thomas Gleixner
2024-06-20  5:07             ` Linus Torvalds
2024-06-20 17:11               ` Linus Torvalds
2024-06-20 17:41                 ` Tejun Heo
2024-06-20 22:15                   ` [PATCH sched_ext/for-6.11] sched, sched_ext: Replace scx_next_task_picked() with sched_class->switch_class() Tejun Heo
2024-06-20 22:42                     ` Linus Torvalds
2024-06-21 19:46                       ` Tejun Heo
2024-06-24  9:04                         ` Peter Zijlstra
2024-06-24 18:41                           ` Tejun Heo
2024-06-24  9:02                       ` Peter Zijlstra
2024-06-21 19:52                     ` Tejun Heo
2024-06-24  8:59                     ` Peter Zijlstra
2024-06-24 21:01                       ` Tejun Heo
2024-06-25  7:49                         ` Peter Zijlstra
2024-06-25 23:30                           ` Tejun Heo
2024-06-26  8:28                             ` Peter Zijlstra
2024-06-26 17:56                               ` Tejun Heo
2024-06-20 18:47               ` [PATCHSET v6] sched: Implement BPF extensible scheduler class Thomas Gleixner
2024-06-20 19:20                 ` Linus Torvalds
2024-06-21  9:35                   ` Thomas Gleixner
2024-06-21 16:34                     ` Linus Torvalds
2024-06-23  2:00                       ` Tejun Heo
2024-06-23 10:31                       ` Thomas Gleixner
2024-06-23 10:33                       ` Thomas Gleixner
2024-06-24 14:23                         ` Jason Gunthorpe
2024-06-20 19:58                 ` Tejun Heo
2024-06-24  9:34                   ` Peter Zijlstra
2024-06-24 20:17                     ` Tejun Heo
2024-06-24 20:51                       ` [PATCH sched_ext/for-6.11] sched, sched_ext: Simplify dl_prio() case handling in sched_fork() Tejun Heo
2024-06-20 19:35             ` [PATCHSET v6] sched: Implement BPF extensible scheduler class Tejun Heo
2024-06-21 10:46               ` Thomas Gleixner
2024-06-21 21:14                 ` Chris Mason
2024-06-23  8:14                   ` Thomas Gleixner
2024-06-24 16:42                     ` Chris Mason
2024-06-24 18:11                       ` Tejun Heo
2024-06-24 22:01                         ` Peter Oskolkov
2024-06-24 22:17                     ` David Vernet
2024-06-24 21:54             ` Peter Oskolkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6eb9302e-59c9-4242-bfb1-e473d3e5380e@meta.com \
    --to=clm@meta.com \
    --cc=andrea.righi@canonical.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brho@google.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=daniel@iogearbox.net \
    --cc=derkling@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dschatzberg@meta.com \
    --cc=dskarlat@cs.cmu.edu \
    --cc=dvernet@meta.com \
    --cc=haoluo@google.com \
    --cc=himadrics@inria.fr \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).