DM-Devel Archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Hannes Reinecke <hare@suse.de>,
	linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com>
Subject: Re: [PATCH v3 04/14] block: Fix reference counting for zone write plugs in error state
Date: Thu, 2 May 2024 18:37:21 +0900	[thread overview]
Message-ID: <8116c2dc-dd47-4678-9974-100006ffd0f7@kernel.org> (raw)
In-Reply-To: <d4f71b64-b2d3-4350-b502-bbcfcc9614ce@suse.de>

On 5/2/24 15:01, Hannes Reinecke wrote:
>> +static inline void disk_zone_wplug_set_error(struct gendisk *disk,
>> +					     struct blk_zone_wplug *zwplug)
>> +{
>> +	unsigned long flags;
>> +
>> +	if (zwplug->flags & BLK_ZONE_WPLUG_ERROR)
>> +		return;
>> +
> 
> I still get nervous when I see an unprotected flag being set.
> Especially in code which is known to race with error handling.
> Wouldn't it be better to check the flag under the lock or at
> least use 'test_and_set_bit' here?

It is protected: this is always called with the zone write plug spinlock being
locked.

> 
>> +	/*
>> +	 * At this point, we already have a reference on the zone write plug.
>> +	 * However, since we are going to add the plug to the disk zone write
>> +	 * plugs work list, increase its reference count. This reference will
>> +	 * be dropped in disk_zone_wplugs_work() once the error state is
>> +	 * handled, or in disk_zone_wplug_clear_error() if the zone is reset or
>> +	 * finished.
>> +	 */
>> +	zwplug->flags |= BLK_ZONE_WPLUG_ERROR;
> 
> And that is even worse. We might have been interrupted between these
> two lines, invalidating the first check.

Nope: zone write plug spinlock is locked.

> 
> Please consider using 'test_and_set_bit()' here.
> 
>> +	atomic_inc(&zwplug->ref);
>> +
>> +	spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
>> +	list_add_tail(&zwplug->link, &disk->zone_wplugs_err_list);
>> +	spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +}
>> +
>> +static inline void disk_zone_wplug_clear_error(struct gendisk *disk,
>> +					       struct blk_zone_wplug *zwplug)
>> +{
>> +	unsigned long flags;
>> +
>> +	if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR))
>> +		return;
>> +
>> +	/*
>> +	 * We are racing with the error handling work which drops the reference
>> +	 * on the zone write plug after handling the error state. So remove the
>> +	 * plug from the error list and drop its reference count only if the
>> +	 * error handling has not yet started, that is, if the zone write plug
>> +	 * is still listed.
>> +	 */
>> +	spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
>> +	if (!list_empty(&zwplug->link)) {
>> +		list_del_init(&zwplug->link);
>> +		zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR;
>> +		disk_put_zone_wplug(zwplug);
>> +	}
>> +	spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +}
>> +
> 
> Similar comments to above: you are clearing the flag under the lock,
> but don't check or set under the lock...

And similar comment: this is called with the zone write plug spinlock held. So
no race with the flag handling. What is racy is the error handling because we
can only hold disk->zone_wplugs_lock at first, and have to release that lock
before we take the zone write plug spinlock (otherwise we would have lock order
inversion). And the error hanlding needs to do a report zone, so no zone write
plug spinlock either, and in the meantime, the user may do a zone reset or reset
all... Hence the trickery here to look at if the error handling work already
took the plug out of the list for processing or not. If it has, then the error
handling will do what is needed with the error flag.

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2024-05-02  9:37 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01 11:08 [PATCH v3 00/14] Zone write plugging fixes and cleanup Damien Le Moal
2024-05-01 11:08 ` [PATCH v3 01/14] dm: Check that a zoned table leads to a valid mapped device Damien Le Moal
2024-05-01 11:08 ` [PATCH v3 02/14] block: Exclude conventional zones when faking max open limit Damien Le Moal
2024-05-02  5:52   ` Hannes Reinecke
2024-05-01 11:08 ` [PATCH v3 03/14] block: Fix zone write plug initialization from blk_revalidate_zone_cb() Damien Le Moal
2024-05-02  5:53   ` Hannes Reinecke
2024-05-01 11:08 ` [PATCH v3 04/14] block: Fix reference counting for zone write plugs in error state Damien Le Moal
2024-05-02  6:01   ` Hannes Reinecke
2024-05-02  9:37     ` Damien Le Moal [this message]
2024-05-01 11:08 ` [PATCH v3 05/14] block: Hold a reference on zone write plugs to schedule submission Damien Le Moal
2024-05-02  6:04   ` Hannes Reinecke
2024-05-01 11:08 ` [PATCH v3 06/14] block: Unhash a zone write plug only if needed Damien Le Moal
2024-05-02  6:05   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 07/14] block: Do not remove zone write plugs still in use Damien Le Moal
2024-05-02  5:38   ` Christoph Hellwig
2024-05-02  6:09   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 08/14] block: Fix flush request sector restore Damien Le Moal
2024-05-02  6:10   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 09/14] block: Fix handling of non-empty flush write requests to zones Damien Le Moal
2024-05-02  6:11   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 10/14] block: Improve blk_zone_write_plug_bio_merged() Damien Le Moal
2024-05-02  6:13   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 11/14] block: Improve zone write request completion handling Damien Le Moal
2024-05-02  6:14   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 12/14] block: Simplify blk_zone_write_plug_bio_endio() Damien Le Moal
2024-05-02  6:15   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 13/14] block: Simplify zone write plug BIO abort Damien Le Moal
2024-05-02  6:15   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 14/14] block: Cleanup blk_revalidate_zone_cb() Damien Le Moal
2024-05-02  6:17   ` Hannes Reinecke
2024-05-01 14:08 ` [PATCH v3 00/14] Zone write plugging fixes and cleanup Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8116c2dc-dd47-4678-9974-100006ffd0f7@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@lists.linux.dev \
    --cc=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).