LKML Archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM list <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Juri Lelli <juri.lelli@arm.com>,
	Steve Muckle <steve.muckle@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 3/3 v3] cpufreq: governor: Replace timers with utilization update callbacks
Date: Fri, 5 Feb 2016 14:36:54 +0100	[thread overview]
Message-ID: <CAJZ5v0im=K5zVQMgOTKDUnXUTZ-ursAOQ+cu_YYAZisUbvx++g@mail.gmail.com> (raw)
In-Reply-To: <20160205065037.GD21792@vireshk>

On Fri, Feb 5, 2016 at 7:50 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Will suck some more blood, sorry about that :)
>
> On 05-02-16, 02:28, Rafael J. Wysocki wrote:
>> The v3 addresses some review comments from Viresh and a couple of issues found
>> by me.  Changes from the previous version:
>> - Synchronize gov_cancel_work() with the (new) irq_work properly.
>> - Add a comment about the (new) memory barrier.
>> - Move samle_delay_ns to "shared" (struct cpu_common_dbs_info) so it is the
>
> sample_delay_ns was already there, you moved last_sample_time instead :)
>
>> @@ -139,7 +141,11 @@ struct cpu_common_dbs_info {
>>       struct mutex timer_mutex;
>>
>>       ktime_t time_stamp;
>> +     u64 last_sample_time;
>> +     s64 sample_delay_ns;
>>       atomic_t skip_work;
>> +     struct irq_work irq_work;
>
> Just for my understanding, why can't we schedule a normal work directly? Is it
> because of scheduler's hotpath and queue_work() is slow?

No, that's not the reason.

That path can't call wake_up_process() as it may be holding the locks
this would have attempted to grab.

That said it is hot too.  For example, ktime_get() may be too slow to
be called from it on some systems.

>> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
>> +void gov_set_update_util(struct cpu_common_dbs_info *shared,
>> +                      unsigned int delay_us)
>>  {
>> +     struct cpufreq_policy *policy = shared->policy;
>>       struct dbs_data *dbs_data = policy->governor_data;
>> -     struct cpu_dbs_info *cdbs;
>>       int cpu;
>>
>> +     shared->sample_delay_ns = delay_us * NSEC_PER_USEC;
>> +     shared->time_stamp = ktime_get();
>> +     shared->last_sample_time = 0;
>
> Calling this routine from update_sampling_rate() is still wrong. Because that
> will also make last_sample_time = 0, which means that we will schedule the
> irq-work on the next util update.

That isn't a problem, though.

This is the case when the new rate is smaller than the old one and we
want it to take effect immediately.  Taking the next sample
immediately in that case is not going to hurt anyone.

And this observation actually leads to some interesting realization
about update_sampling_rate() (see below).

> We surely didn't wanted that to happen, isn't it ?

No, it isn't. :-)

>
>>       for_each_cpu(cpu, policy->cpus) {
>> -             cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
>> -             cdbs->timer.expires = jiffies + delay;
>> -             add_timer_on(&cdbs->timer, cpu);
>> +             struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
>> +
>> +             cpufreq_set_update_util_data(cpu, &cdbs->update_util);
>>       }
>>  }
>> -EXPORT_SYMBOL_GPL(gov_add_timers);
>> +EXPORT_SYMBOL_GPL(gov_set_update_util);
>
>>  void gov_cancel_work(struct cpu_common_dbs_info *shared)
>>  {
>> -     /* Tell dbs_timer_handler() to skip queuing up work items. */
>> +     /* Tell dbs_update_util_handler() to skip queuing up work items. */
>>       atomic_inc(&shared->skip_work);
>>       /*
>> -      * If dbs_timer_handler() is already running, it may not notice the
>> -      * incremented skip_work, so wait for it to complete to prevent its work
>> -      * item from being queued up after the cancel_work_sync() below.
>> -      */
>> -     gov_cancel_timers(shared->policy);
>> -     /*
>> -      * In case dbs_timer_handler() managed to run and spawn a work item
>> -      * before the timers have been canceled, wait for that work item to
>> -      * complete and then cancel all of the timers set up by it.  If
>> -      * dbs_timer_handler() runs again at that point, it will see the
>> -      * positive value of skip_work and won't spawn any more work items.
>> +      * If dbs_update_util_handler() is already running, it may not notice
>> +      * the incremented skip_work, so wait for it to complete to prevent its
>> +      * work item from being queued up after the cancel_work_sync() below.
>>        */
>> +     gov_clear_update_util(shared->policy);
>> +     wait_for_completion(&shared->irq_work_done);
>
> I may be wrong, but isn't running irq_work_sync() enough here instead ?

Yes, it is.

I assumed that it would only check if the irq_work is running at the
moment for some reason, but that's not the case.

>>       cancel_work_sync(&shared->work);
>> -     gov_cancel_timers(shared->policy);
>>       atomic_set(&shared->skip_work, 0);
>>  }
>>  EXPORT_SYMBOL_GPL(gov_cancel_work);
>
>> Index: linux-pm/drivers/cpufreq/cpufreq_ondemand.c
>> @@ -264,7 +260,7 @@ static void update_sampling_rate(struct
>>               struct od_cpu_dbs_info_s *dbs_info;
>>               struct cpu_dbs_info *cdbs;
>>               struct cpu_common_dbs_info *shared;
>> -             unsigned long next_sampling, appointed_at;
>> +             ktime_t next_sampling, appointed_at;
>>
>>               dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
>>               cdbs = &dbs_info->cdbs;
>> @@ -292,16 +288,19 @@ static void update_sampling_rate(struct
>>                       continue;
>>
>>               /*
>> -              * Checking this for any CPU should be fine, timers for all of
>> -              * them are scheduled together.
>> +              * Checking this for any CPU sharing the policy should be fine,
>> +              * they are all scheduled to sample at the same time.
>>                */
>> -             next_sampling = jiffies + usecs_to_jiffies(new_rate);
>> -             appointed_at = dbs_info->cdbs.timer.expires;
>> +             next_sampling = ktime_add_us(ktime_get(), new_rate);
>>
>> -             if (time_before(next_sampling, appointed_at)) {
>> -                     gov_cancel_work(shared);
>> -                     gov_add_timers(policy, usecs_to_jiffies(new_rate));
>> +             mutex_lock(&shared->timer_mutex);
>> +             appointed_at = ktime_add_ns(shared->time_stamp,
>> +                                         shared->sample_delay_ns);
>> +             mutex_unlock(&shared->timer_mutex);
>>
>> +             if (ktime_before(next_sampling, appointed_at)) {
>> +                     gov_cancel_work(shared);
>> +                     gov_set_update_util(shared, new_rate);
>
> So, I don't think we need to call these heavy routines at all here. Just use the
> above timer_mutex to update time_stamp and sample_delay_ns.

Well, the concern was that sample_delay_ns might not be updated
atomically on 32-bit architectures and that might be a problem for
dbs_update_util_handler().  However, this really isn't a problem,
because dbs_update_util_handler() only decides whether or not to take
a sample *this* time.  If it sees a semi-update value of
sample_delay_ns, that value will be either too small or too big, so it
will either skip the sample unnecessarily or take it immediately and
none of these is a real problem.  It doesn't hurt to take the sample
immediately at this point (as stated earlier) and if it is skipped, it
will be taken on the next attempt when the update has been completed
(which would have happened anyway had the update been atomic).

> Over that, that particular change might turn out to be a big big bonus for us.
> Why would we be taking the od_dbs_cdata.mutex in this routine anymore ? We
> aren't removing/adding timers anymore, just update the sample_delay_ns and there
> shouldn't be any races.

That's a very good point.

The only concern is that this function walks the entire collection of
cpu_dbs_infos and that's potentially racing with anything that updates
those.

> Of course you need to use the same timer_mutex in util's
> handler as well around sample_delay_ns, I believe.

That can't take any mutexes.  It might only take a raw spinlock if
really needed.

> And that will also kill the circular dependency lockdep we have been chasing
> badly :)
>
> Or am I being over excited here ? :(

Not really.  I think you're on the right track.

Before we drop the lock from here, though, we need to audit the code
for any possible races carefully.

Anyway, I'll send an update of the $subject patch later today when I
have a chance to run it through some test.

Thanks,
Rafael

  reply	other threads:[~2016-02-05 13:36 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-29 22:52 [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks Rafael J. Wysocki
2016-01-29 22:53 ` [PATCH 1/3] cpufreq: Add a mechanism for registering " Rafael J. Wysocki
2016-02-04  3:31   ` Viresh Kumar
2016-01-29 22:56 ` [PATCH 2/3] cpufreq: intel_pstate: Replace timers with " Rafael J. Wysocki
2016-01-29 22:59 ` [PATCH 3/3] cpufreq: governor: " Rafael J. Wysocki
2016-02-03  1:16   ` [Update][PATCH " Rafael J. Wysocki
2016-02-04  4:49     ` Viresh Kumar
2016-02-04 10:54       ` Rafael J. Wysocki
2016-02-05  1:28     ` [PATCH 3/3 v3] " Rafael J. Wysocki
2016-02-05  6:50       ` Viresh Kumar
2016-02-05 13:36         ` Rafael J. Wysocki [this message]
2016-02-05 14:47           ` Viresh Kumar
2016-02-05 23:10             ` Rafael J. Wysocki
2016-02-07  9:10               ` Viresh Kumar
2016-02-07 14:43                 ` Rafael J. Wysocki
2016-02-08  2:08                   ` Rafael J. Wysocki
2016-02-08 11:52                     ` Viresh Kumar
2016-02-08 12:52                       ` Rafael J. Wysocki
2016-02-08 13:40                         ` Rafael J. Wysocki
2016-02-05 23:01           ` Rafael J. Wysocki
2016-02-06  3:40       ` [PATCH 3/3 v4] " Rafael J. Wysocki
2016-02-07  9:20         ` Viresh Kumar
2016-02-07 14:36           ` Rafael J. Wysocki
2016-02-07 14:50         ` [PATCH 3/3 v5] " Rafael J. Wysocki
2016-02-07 15:36           ` Viresh Kumar
2016-02-09 10:01           ` Gautham R Shenoy
2016-02-09 18:49             ` Rafael J. Wysocki
2016-02-03 22:20 ` [PATCH 0/3] cpufreq: " Rafael J. Wysocki
2016-02-04  0:08   ` Srinivas Pandruvada
2016-02-04 17:16     ` Rafael J. Wysocki
2016-02-04 10:51   ` Juri Lelli
2016-02-04 17:19     ` Rafael J. Wysocki
2016-02-08 23:06   ` Rafael J. Wysocki
2016-02-09  0:39     ` Steve Muckle
2016-02-09  1:01       ` Rafael J. Wysocki
2016-02-09 20:05         ` Rafael J. Wysocki
2016-02-10  1:02           ` Steve Muckle
2016-02-10  1:57             ` Rafael J. Wysocki
2016-02-10  3:09               ` Rafael J. Wysocki
2016-02-10 19:47                 ` Steve Muckle
2016-02-10 21:49                   ` Rafael J. Wysocki
2016-02-10 22:07                     ` Steve Muckle
2016-02-10 22:12                       ` Rafael J. Wysocki
2016-02-11 11:59             ` Peter Zijlstra
2016-02-11 12:24               ` Juri Lelli
2016-02-11 15:26                 ` Peter Zijlstra
2016-02-11 18:23                   ` Vincent Guittot
2016-02-12 14:04                     ` Peter Zijlstra
2016-02-12 14:48                       ` Vincent Guittot
2016-03-01 13:58                         ` Peter Zijlstra
2016-03-01 14:17                           ` Juri Lelli
2016-03-01 14:24                             ` Peter Zijlstra
2016-03-01 14:26                               ` Peter Zijlstra
2016-03-01 14:42                                 ` Juri Lelli
2016-03-01 15:04                                   ` Peter Zijlstra
2016-03-01 19:49                                     ` Rafael J. Wysocki
2016-03-01 14:58                           ` Vincent Guittot
2016-02-11 17:06               ` Steve Muckle
2016-02-11 17:30                 ` Peter Zijlstra
2016-02-11 17:34                   ` Rafael J. Wysocki
2016-02-11 17:38                     ` Peter Zijlstra
2016-02-11 18:52                   ` Steve Muckle
2016-02-11 19:04                     ` Rafael J. Wysocki
2016-02-12 13:43                       ` Rafael J. Wysocki
2016-02-12 14:10                     ` Peter Zijlstra
2016-02-12 16:01                       ` Rafael J. Wysocki
2016-02-12 16:15                         ` Rafael J. Wysocki
2016-02-12 16:53                           ` Ashwin Chaugule
2016-02-12 23:14                             ` Rafael J. Wysocki
2016-02-12 17:02                         ` Doug Smythies
2016-02-12 23:17                           ` Rafael J. Wysocki
2016-02-10 12:33           ` Juri Lelli
2016-02-10 13:23             ` Rafael J. Wysocki
2016-02-10 14:03               ` Juri Lelli
2016-02-10 14:26                 ` Rafael J. Wysocki
2016-02-10 14:46                   ` Juri Lelli
2016-02-10 15:46                     ` Rafael J. Wysocki
2016-02-10 16:05                       ` Juri Lelli
2016-02-11 11:51           ` Peter Zijlstra
2016-02-11 12:08             ` Rafael J. Wysocki
2016-02-11 15:29               ` Peter Zijlstra
2016-02-11 15:58                 ` Rafael J. Wysocki
2016-02-11 20:47               ` Rafael J. Wysocki
2016-02-10 15:17 ` [PATCH v6 " Rafael J. Wysocki
2016-02-10 15:21   ` [PATCH v6 1/3] cpufreq: Add mechanism for registering " Rafael J. Wysocki
2016-02-10 23:01     ` [PATCH v7 " Rafael J. Wysocki
2016-02-11 17:30       ` [PATCH v8 " Rafael J. Wysocki
2016-02-12 13:16         ` [PATCH v9 " Rafael J. Wysocki
2016-02-15 21:47           ` [PATCH v10 " Rafael J. Wysocki
2016-02-18 20:22             ` Rafael J. Wysocki
2016-02-19  8:09               ` Juri Lelli
2016-02-19 16:42                 ` Srinivas Pandruvada
2016-02-19 17:26                   ` Juri Lelli
2016-02-19 22:26                     ` Rafael J. Wysocki
2016-02-22  9:42                       ` Juri Lelli
2016-02-22 21:41                         ` Rafael J. Wysocki
2016-02-23 11:10                           ` Juri Lelli
2016-02-24  1:52                             ` Rafael J. Wysocki
2016-02-22 10:45                       ` Viresh Kumar
2016-02-19 17:28                   ` Steve Muckle
2016-02-19 22:35                     ` Rafael J. Wysocki
2016-02-23  3:58                       ` Steve Muckle
2016-02-22 10:52                     ` Peter Zijlstra
2016-02-22 14:33                       ` Vincent Guittot
2016-02-22 15:31                         ` Peter Zijlstra
2016-02-22 14:40                       ` Juri Lelli
2016-02-22 15:42                         ` Peter Zijlstra
2016-02-22 21:46                       ` Rafael J. Wysocki
2016-02-19 22:14                 ` Rafael J. Wysocki
2016-02-22  9:32                   ` Juri Lelli
2016-02-22 21:26                     ` Rafael J. Wysocki
2016-02-23 11:01                       ` Juri Lelli
2016-02-24  2:01                         ` Rafael J. Wysocki
2016-03-08 19:24                           ` Michael Turquette
2016-03-08 20:40                             ` Rafael J. Wysocki
     [not found]                               ` <20160308220632.4103.13377@quark.deferred.io>
2016-03-08 22:43                                 ` Rafael J. Wysocki
2016-03-09 12:35             ` Peter Zijlstra
2016-03-09 13:22               ` Rafael J. Wysocki
2016-03-09 13:32               ` Ingo Molnar
2016-03-09 13:39                 ` Rafael J. Wysocki
2016-03-10  2:12               ` Vincent Guittot
2016-02-10 15:25   ` [PATCH v6 2/3] cpufreq: intel_pstate: Replace timers with " Rafael J. Wysocki
2016-02-10 15:36   ` [PATCH v6 3/3] cpufreq: governor: " Rafael J. Wysocki
2016-02-10 23:11   ` [PATCH v6 0/3] cpufreq: " Doug Smythies
2016-02-10 23:17     ` Rafael J. Wysocki
2016-02-11 22:50       ` Doug Smythies
2016-02-11 23:28         ` Rafael J. Wysocki
2016-02-12  1:02           ` Doug Smythies
2016-02-12  1:20             ` Rafael J. Wysocki
2016-02-12  7:25         ` Doug Smythies
2016-02-12 13:39           ` Rafael J. Wysocki
2016-02-12 17:33             ` Doug Smythies
2016-02-12 23:21               ` Rafael J. Wysocki
2016-02-11  6:02     ` Srinivas Pandruvada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJZ5v0im=K5zVQMgOTKDUnXUTZ-ursAOQ+cu_YYAZisUbvx++g@mail.gmail.com' \
    --to=rafael@kernel.org \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=steve.muckle@linaro.org \
    --cc=tglx@linutronix.de \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).