From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
Juri Lelli <juri.lelli@arm.com>,
Steve Muckle <steve.muckle@linaro.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [Update][PATCH 3/3] cpufreq: governor: Replace timers with utilization update callbacks
Date: Thu, 04 Feb 2016 11:54:56 +0100 [thread overview]
Message-ID: <2948252.khkHOicXLP@vostro.rjw.lan> (raw)
In-Reply-To: <20160204044959.GS3469@vireshk>
On Thursday, February 04, 2016 10:19:59 AM Viresh Kumar wrote:
> On 03-02-16, 02:16, Rafael J. Wysocki wrote:
> > Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
> > -void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay)
> > +void gov_set_update_util(struct cpu_common_dbs_info *shared,
> > + unsigned int delay_us)
> > {
> > + struct cpufreq_policy *policy = shared->policy;
> > struct dbs_data *dbs_data = policy->governor_data;
> > - struct cpu_dbs_info *cdbs;
> > int cpu;
> >
> > + shared->sample_delay_ns = delay_us * NSEC_PER_USEC;
> > + shared->time_stamp = ktime_get();
> > +
> > for_each_cpu(cpu, policy->cpus) {
> > - cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> > - cdbs->timer.expires = jiffies + delay;
> > - add_timer_on(&cdbs->timer, cpu);
> > + struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> > +
> > + cdbs->last_sample_time = 0;
> > + cpufreq_set_update_util_data(cpu, &cdbs->update_util);
>
> Why no synchronize_rcu() here?
Because it is not needed. This always changes a NULL pointer into a non-NULL.
> This can be called from ondemand governor on sampling-rate updates ..
But that calls gov_cancel_work() before, right?
>
> > }
> > }
> > -EXPORT_SYMBOL_GPL(gov_add_timers);
> > +EXPORT_SYMBOL_GPL(gov_set_update_util);
> >
> > -static inline void gov_cancel_timers(struct cpufreq_policy *policy)
> > +static inline void gov_clear_update_util(struct cpufreq_policy *policy)
> > {
> > - struct dbs_data *dbs_data = policy->governor_data;
> > - struct cpu_dbs_info *cdbs;
> > int i;
> >
> > - for_each_cpu(i, policy->cpus) {
> > - cdbs = dbs_data->cdata->get_cpu_cdbs(i);
> > - del_timer_sync(&cdbs->timer);
> > - }
> > + for_each_cpu(i, policy->cpus)
> > + cpufreq_set_update_util_data(i, NULL);
> > +
> > + synchronize_rcu();
> > }
> >
> > void gov_cancel_work(struct cpu_common_dbs_info *shared)
> > {
> > - /* Tell dbs_timer_handler() to skip queuing up work items. */
> > + /* Tell dbs_update_util_handler() to skip queuing up work items. */
> > atomic_inc(&shared->skip_work);
> > /*
> > - * If dbs_timer_handler() is already running, it may not notice the
> > - * incremented skip_work, so wait for it to complete to prevent its work
> > - * item from being queued up after the cancel_work_sync() below.
> > - */
> > - gov_cancel_timers(shared->policy);
> > - /*
> > - * In case dbs_timer_handler() managed to run and spawn a work item
> > - * before the timers have been canceled, wait for that work item to
> > - * complete and then cancel all of the timers set up by it. If
> > - * dbs_timer_handler() runs again at that point, it will see the
> > - * positive value of skip_work and won't spawn any more work items.
> > + * If dbs_update_util_handler() is already running, it may not notice
> > + * the incremented skip_work, so wait for it to complete to prevent its
> > + * work item from being queued up after the cancel_work_sync() below.
> > */
> > + gov_clear_update_util(shared->policy);
> > cancel_work_sync(&shared->work);
>
> How are we sure that the irq-work can't be pending at this point of
> time, which will queue the above works again ?
Good point. The irq_work has to be waited for here too.
> > - gov_cancel_timers(shared->policy);
> > atomic_set(&shared->skip_work, 0);
> > }
> > EXPORT_SYMBOL_GPL(gov_cancel_work);
> >
> > -/* Will return if we need to evaluate cpu load again or not */
> > -static bool need_load_eval(struct cpu_common_dbs_info *shared,
> > - unsigned int sampling_rate)
> > -{
> > - if (policy_is_shared(shared->policy)) {
> > - ktime_t time_now = ktime_get();
> > - s64 delta_us = ktime_us_delta(time_now, shared->time_stamp);
> > -
> > - /* Do nothing if we recently have sampled */
> > - if (delta_us < (s64)(sampling_rate / 2))
> > - return false;
> > - else
> > - shared->time_stamp = time_now;
> > - }
> > -
> > - return true;
> > -}
> > -
> > static void dbs_work_handler(struct work_struct *work)
> > {
> > struct cpu_common_dbs_info *shared = container_of(work, struct
> > @@ -235,14 +212,10 @@ static void dbs_work_handler(struct work
> > struct cpufreq_policy *policy;
> > struct dbs_data *dbs_data;
> > unsigned int sampling_rate, delay;
> > - bool eval_load;
> >
> > policy = shared->policy;
> > dbs_data = policy->governor_data;
> >
> > - /* Kill all timers */
> > - gov_cancel_timers(policy);
> > -
> > if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
> > struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
> >
> > @@ -253,37 +226,53 @@ static void dbs_work_handler(struct work
> > sampling_rate = od_tuners->sampling_rate;
> > }
> >
> > - eval_load = need_load_eval(shared, sampling_rate);
> > -
> > /*
> > - * Make sure cpufreq_governor_limits() isn't evaluating load in
> > + * Make sure cpufreq_governor_limits() isn't evaluating load or the
> > + * ondemand governor isn't reading the time stamp and sampling rate in
> > * parallel.
> > */
> > mutex_lock(&shared->timer_mutex);
> > - delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
> > + delay = dbs_data->cdata->gov_dbs_timer(policy);
> > + shared->sample_delay_ns = jiffies_to_nsecs(delay);
> > + shared->time_stamp = ktime_get();
> > mutex_unlock(&shared->timer_mutex);
> >
> > + smp_mb__before_atomic();
>
> And why is this required exactly ? Maybe a comment as well to clarify
> this as this isn't obvious ?
OK, you have a point.
This relies on the atomic_dec() below to happen after sample_delay_ns has
been updated, to prevent dbs_update_util_handler() from using a stale
value.
> > atomic_dec(&shared->skip_work);
> > +}
> >
> > - gov_add_timers(policy, delay);
> > +static void dbs_irq_work(struct irq_work *irq_work)
> > +{
> > + struct cpu_common_dbs_info *shared;
> > +
> > + shared = container_of(irq_work, struct cpu_common_dbs_info, irq_work);
> > + schedule_work(&shared->work);
> > }
> >
> > -static void dbs_timer_handler(unsigned long data)
> > +static void dbs_update_util_handler(struct update_util_data *data, u64 time,
> > + unsigned long util, unsigned long max)
> > {
> > - struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
> > + struct cpu_dbs_info *cdbs = container_of(data, struct cpu_dbs_info, update_util);
> > struct cpu_common_dbs_info *shared = cdbs->shared;
> >
> > /*
> > - * Timer handler may not be allowed to queue the work at the moment,
> > - * because:
> > - * - Another timer handler has done that
> > - * - We are stopping the governor
> > - * - Or we are updating the sampling rate of the ondemand governor
> > + * The work may not be allowed to be queued up right now.
> > + * Possible reasons:
> > + * - Work has already been queued up or is in progress.
> > + * - The governor is being stopped.
> > + * - It is too early (too little time from the previous sample).
> > */
> > - if (atomic_inc_return(&shared->skip_work) > 1)
> > - atomic_dec(&shared->skip_work);
> > - else
> > - queue_work(system_wq, &shared->work);
> > + if (atomic_inc_return(&shared->skip_work) == 1) {
> > + u64 delta_ns;
> > +
> > + delta_ns = time - cdbs->last_sample_time;
> > + if ((s64)delta_ns >= shared->sample_delay_ns) {
> > + cdbs->last_sample_time = time;
> > + irq_work_queue_on(&shared->irq_work, smp_processor_id());
> > + return;
> > + }
> > + }
> > + atomic_dec(&shared->skip_work);
> > }
> >
> > static void set_sampling_rate(struct dbs_data *dbs_data,
> > @@ -467,9 +456,6 @@ static int cpufreq_governor_start(struct
> > io_busy = od_tuners->io_is_busy;
> > }
> >
> > - shared->policy = policy;
> > - shared->time_stamp = ktime_get();
> > -
> > for_each_cpu(j, policy->cpus) {
> > struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j);
> > unsigned int prev_load;
> > @@ -485,10 +471,10 @@ static int cpufreq_governor_start(struct
> > if (ignore_nice)
> > j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
> >
> > - __setup_timer(&j_cdbs->timer, dbs_timer_handler,
> > - (unsigned long)j_cdbs,
> > - TIMER_DEFERRABLE | TIMER_IRQSAFE);
> > + j_cdbs->update_util.func = dbs_update_util_handler;
> > }
> > + shared->policy = policy;
> > + init_irq_work(&shared->irq_work, dbs_irq_work);
> >
> > if (cdata->governor == GOV_CONSERVATIVE) {
> > struct cs_cpu_dbs_info_s *cs_dbs_info =
> > @@ -505,7 +491,7 @@ static int cpufreq_governor_start(struct
> > od_ops->powersave_bias_init_cpu(cpu);
> > }
> >
> > - gov_add_timers(policy, delay_for_sampling_rate(sampling_rate));
> > + gov_set_update_util(shared, sampling_rate);
> > return 0;
> > }
> >
> > Index: linux-pm/drivers/cpufreq/cpufreq_ondemand.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/cpufreq_ondemand.c
> > +++ linux-pm/drivers/cpufreq/cpufreq_ondemand.c
> > @@ -191,7 +191,7 @@ static void od_check_cpu(int cpu, unsign
> > }
> > }
> >
> > -static unsigned int od_dbs_timer(struct cpufreq_policy *policy, bool modify_all)
> > +static unsigned int od_dbs_timer(struct cpufreq_policy *policy)
> > {
> > struct dbs_data *dbs_data = policy->governor_data;
> > unsigned int cpu = policy->cpu;
> > @@ -200,9 +200,6 @@ static unsigned int od_dbs_timer(struct
> > struct od_dbs_tuners *od_tuners = dbs_data->tuners;
> > int delay = 0, sample_type = dbs_info->sample_type;
>
> Perhaps, the delay = 0 can be dropped now and ...
>
> >
> > - if (!modify_all)
> > - goto max_delay;
> > -
> > /* Common NORMAL_SAMPLE setup */
> > dbs_info->sample_type = OD_NORMAL_SAMPLE;
> > if (sample_type == OD_SUB_SAMPLE) {
> > @@ -218,7 +215,6 @@ static unsigned int od_dbs_timer(struct
> > }
> > }
> >
> > -max_delay:
> > if (!delay)
> > delay = delay_for_sampling_rate(od_tuners->sampling_rate
> > * dbs_info->rate_mult);
>
> ^^ can be moved to the else part of above block ..
Both this and the above are valid observation, but those changes should be
made in a follow-up patch IMO.
> > @@ -264,7 +260,7 @@ static void update_sampling_rate(struct
> > struct od_cpu_dbs_info_s *dbs_info;
> > struct cpu_dbs_info *cdbs;
> > struct cpu_common_dbs_info *shared;
> > - unsigned long next_sampling, appointed_at;
> > + ktime_t next_sampling, appointed_at;
> >
> > dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
> > cdbs = &dbs_info->cdbs;
> > @@ -292,16 +288,19 @@ static void update_sampling_rate(struct
> > continue;
> >
> > /*
> > - * Checking this for any CPU should be fine, timers for all of
> > - * them are scheduled together.
> > + * Checking this for any CPU sharing the policy should be fine,
> > + * they are all scheduled to sample at the same time.
> > */
> > - next_sampling = jiffies + usecs_to_jiffies(new_rate);
> > - appointed_at = dbs_info->cdbs.timer.expires;
> > + next_sampling = ktime_add_us(ktime_get(), new_rate);
> >
> > - if (time_before(next_sampling, appointed_at)) {
> > - gov_cancel_work(shared);
> > - gov_add_timers(policy, usecs_to_jiffies(new_rate));
> > + mutex_lock(&shared->timer_mutex);
>
> Why is taking this lock important here ?
Because this reads both time_stamp and sample_delay_ns and uses them in
a computation. If they happen to be out of sync, this surely isn't right.
> > + appointed_at = ktime_add_ns(shared->time_stamp,
>
> Also I failed to understand why we need time_stamp variable at all?
> Why can't we use last_sample_time ?
Because the time base for last_sample_time may be different, so comparing it
to the return value of ktime_get() may not lead to correct decisions, so to
speak.
> > + shared->sample_delay_ns);
> > + mutex_unlock(&shared->timer_mutex);
> >
> > + if (ktime_before(next_sampling, appointed_at)) {
> > + gov_cancel_work(shared);
> > + gov_set_update_util(shared, new_rate);
>
> You don't need to a complete update here, the pointers are all fine.
I do, but that's not because of the pointers.
Effectively, I need to change sample_delay_ns and that's the most startghtforward
way to do that safely.
It may not be the most efficient, but this is not a fast path anyway.
> > }
> > }
>
>
Thanks,
Rafael
next prev parent reply other threads:[~2016-02-04 10:54 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-29 22:52 [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks Rafael J. Wysocki
2016-01-29 22:53 ` [PATCH 1/3] cpufreq: Add a mechanism for registering " Rafael J. Wysocki
2016-02-04 3:31 ` Viresh Kumar
2016-01-29 22:56 ` [PATCH 2/3] cpufreq: intel_pstate: Replace timers with " Rafael J. Wysocki
2016-01-29 22:59 ` [PATCH 3/3] cpufreq: governor: " Rafael J. Wysocki
2016-02-03 1:16 ` [Update][PATCH " Rafael J. Wysocki
2016-02-04 4:49 ` Viresh Kumar
2016-02-04 10:54 ` Rafael J. Wysocki [this message]
2016-02-05 1:28 ` [PATCH 3/3 v3] " Rafael J. Wysocki
2016-02-05 6:50 ` Viresh Kumar
2016-02-05 13:36 ` Rafael J. Wysocki
2016-02-05 14:47 ` Viresh Kumar
2016-02-05 23:10 ` Rafael J. Wysocki
2016-02-07 9:10 ` Viresh Kumar
2016-02-07 14:43 ` Rafael J. Wysocki
2016-02-08 2:08 ` Rafael J. Wysocki
2016-02-08 11:52 ` Viresh Kumar
2016-02-08 12:52 ` Rafael J. Wysocki
2016-02-08 13:40 ` Rafael J. Wysocki
2016-02-05 23:01 ` Rafael J. Wysocki
2016-02-06 3:40 ` [PATCH 3/3 v4] " Rafael J. Wysocki
2016-02-07 9:20 ` Viresh Kumar
2016-02-07 14:36 ` Rafael J. Wysocki
2016-02-07 14:50 ` [PATCH 3/3 v5] " Rafael J. Wysocki
2016-02-07 15:36 ` Viresh Kumar
2016-02-09 10:01 ` Gautham R Shenoy
2016-02-09 18:49 ` Rafael J. Wysocki
2016-02-03 22:20 ` [PATCH 0/3] cpufreq: " Rafael J. Wysocki
2016-02-04 0:08 ` Srinivas Pandruvada
2016-02-04 17:16 ` Rafael J. Wysocki
2016-02-04 10:51 ` Juri Lelli
2016-02-04 17:19 ` Rafael J. Wysocki
2016-02-08 23:06 ` Rafael J. Wysocki
2016-02-09 0:39 ` Steve Muckle
2016-02-09 1:01 ` Rafael J. Wysocki
2016-02-09 20:05 ` Rafael J. Wysocki
2016-02-10 1:02 ` Steve Muckle
2016-02-10 1:57 ` Rafael J. Wysocki
2016-02-10 3:09 ` Rafael J. Wysocki
2016-02-10 19:47 ` Steve Muckle
2016-02-10 21:49 ` Rafael J. Wysocki
2016-02-10 22:07 ` Steve Muckle
2016-02-10 22:12 ` Rafael J. Wysocki
2016-02-11 11:59 ` Peter Zijlstra
2016-02-11 12:24 ` Juri Lelli
2016-02-11 15:26 ` Peter Zijlstra
2016-02-11 18:23 ` Vincent Guittot
2016-02-12 14:04 ` Peter Zijlstra
2016-02-12 14:48 ` Vincent Guittot
2016-03-01 13:58 ` Peter Zijlstra
2016-03-01 14:17 ` Juri Lelli
2016-03-01 14:24 ` Peter Zijlstra
2016-03-01 14:26 ` Peter Zijlstra
2016-03-01 14:42 ` Juri Lelli
2016-03-01 15:04 ` Peter Zijlstra
2016-03-01 19:49 ` Rafael J. Wysocki
2016-03-01 14:58 ` Vincent Guittot
2016-02-11 17:06 ` Steve Muckle
2016-02-11 17:30 ` Peter Zijlstra
2016-02-11 17:34 ` Rafael J. Wysocki
2016-02-11 17:38 ` Peter Zijlstra
2016-02-11 18:52 ` Steve Muckle
2016-02-11 19:04 ` Rafael J. Wysocki
2016-02-12 13:43 ` Rafael J. Wysocki
2016-02-12 14:10 ` Peter Zijlstra
2016-02-12 16:01 ` Rafael J. Wysocki
2016-02-12 16:15 ` Rafael J. Wysocki
2016-02-12 16:53 ` Ashwin Chaugule
2016-02-12 23:14 ` Rafael J. Wysocki
2016-02-12 17:02 ` Doug Smythies
2016-02-12 23:17 ` Rafael J. Wysocki
2016-02-10 12:33 ` Juri Lelli
2016-02-10 13:23 ` Rafael J. Wysocki
2016-02-10 14:03 ` Juri Lelli
2016-02-10 14:26 ` Rafael J. Wysocki
2016-02-10 14:46 ` Juri Lelli
2016-02-10 15:46 ` Rafael J. Wysocki
2016-02-10 16:05 ` Juri Lelli
2016-02-11 11:51 ` Peter Zijlstra
2016-02-11 12:08 ` Rafael J. Wysocki
2016-02-11 15:29 ` Peter Zijlstra
2016-02-11 15:58 ` Rafael J. Wysocki
2016-02-11 20:47 ` Rafael J. Wysocki
2016-02-10 15:17 ` [PATCH v6 " Rafael J. Wysocki
2016-02-10 15:21 ` [PATCH v6 1/3] cpufreq: Add mechanism for registering " Rafael J. Wysocki
2016-02-10 23:01 ` [PATCH v7 " Rafael J. Wysocki
2016-02-11 17:30 ` [PATCH v8 " Rafael J. Wysocki
2016-02-12 13:16 ` [PATCH v9 " Rafael J. Wysocki
2016-02-15 21:47 ` [PATCH v10 " Rafael J. Wysocki
2016-02-18 20:22 ` Rafael J. Wysocki
2016-02-19 8:09 ` Juri Lelli
2016-02-19 16:42 ` Srinivas Pandruvada
2016-02-19 17:26 ` Juri Lelli
2016-02-19 22:26 ` Rafael J. Wysocki
2016-02-22 9:42 ` Juri Lelli
2016-02-22 21:41 ` Rafael J. Wysocki
2016-02-23 11:10 ` Juri Lelli
2016-02-24 1:52 ` Rafael J. Wysocki
2016-02-22 10:45 ` Viresh Kumar
2016-02-19 17:28 ` Steve Muckle
2016-02-19 22:35 ` Rafael J. Wysocki
2016-02-23 3:58 ` Steve Muckle
2016-02-22 10:52 ` Peter Zijlstra
2016-02-22 14:33 ` Vincent Guittot
2016-02-22 15:31 ` Peter Zijlstra
2016-02-22 14:40 ` Juri Lelli
2016-02-22 15:42 ` Peter Zijlstra
2016-02-22 21:46 ` Rafael J. Wysocki
2016-02-19 22:14 ` Rafael J. Wysocki
2016-02-22 9:32 ` Juri Lelli
2016-02-22 21:26 ` Rafael J. Wysocki
2016-02-23 11:01 ` Juri Lelli
2016-02-24 2:01 ` Rafael J. Wysocki
2016-03-08 19:24 ` Michael Turquette
2016-03-08 20:40 ` Rafael J. Wysocki
[not found] ` <20160308220632.4103.13377@quark.deferred.io>
2016-03-08 22:43 ` Rafael J. Wysocki
2016-03-09 12:35 ` Peter Zijlstra
2016-03-09 13:22 ` Rafael J. Wysocki
2016-03-09 13:32 ` Ingo Molnar
2016-03-09 13:39 ` Rafael J. Wysocki
2016-03-10 2:12 ` Vincent Guittot
2016-02-10 15:25 ` [PATCH v6 2/3] cpufreq: intel_pstate: Replace timers with " Rafael J. Wysocki
2016-02-10 15:36 ` [PATCH v6 3/3] cpufreq: governor: " Rafael J. Wysocki
2016-02-10 23:11 ` [PATCH v6 0/3] cpufreq: " Doug Smythies
2016-02-10 23:17 ` Rafael J. Wysocki
2016-02-11 22:50 ` Doug Smythies
2016-02-11 23:28 ` Rafael J. Wysocki
2016-02-12 1:02 ` Doug Smythies
2016-02-12 1:20 ` Rafael J. Wysocki
2016-02-12 7:25 ` Doug Smythies
2016-02-12 13:39 ` Rafael J. Wysocki
2016-02-12 17:33 ` Doug Smythies
2016-02-12 23:21 ` Rafael J. Wysocki
2016-02-11 6:02 ` Srinivas Pandruvada
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2948252.khkHOicXLP@vostro.rjw.lan \
--to=rjw@rjwysocki.net \
--cc=juri.lelli@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=steve.muckle@linaro.org \
--cc=tglx@linutronix.de \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).