All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Naohiro Aota <naohiro.aota@wdc.com>
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.com
Subject: Re: [PATCH v2] btrfs: zoned: move superblock logging zone location
Date: Fri, 9 Apr 2021 13:07:19 +0200	[thread overview]
Message-ID: <20210409110719.GR7604@twin.jikos.cz> (raw)
In-Reply-To: <2f58edb74695825632c77349b000d31f16cb3226.1617870145.git.naohiro.aota@wdc.com>

On Thu, Apr 08, 2021 at 05:25:28PM +0900, Naohiro Aota wrote:
> This commit moves the location of the superblock logging zones. The new
> locations of the logging zones are now determined based on fixed block
> addresses instead of on fixed zone numbers.
> 
> The old placement method based on fixed zone numbers causes problems when
> one needs to inspect a file system image without access to the drive zone
> information. In such case, the super block locations cannot be reliably
> determined as the zone size is unknown. By locating the superblock logging
> zones using fixed addresses, we can scan a dumped file system image without
> the zone information since a super block copy will always be present at or
> after the fixed location.
> 
> This commit introduces the following three pairs of zones containing fixed
> offset locations, regardless of the device zone size.
> 
>   - Primary superblock: zone starting at offset 0 and the following zone
>   - First copy: zone containing offset 64GB and the following zone
>   - Second copy: zone containing offset 256GB and the following zone
> 
> If a logging zone is outside of the disk capacity, we do not record the
> superblock copy.
> 
> The first copy position is much larger than for a regular btrfs volume
> (64M).  This increase is to avoid overlapping with the log zones for the
> primary superblock. This higher location is arbitrary but allows supporting
> devices with very large zone sizes, up to 32GB. Such large zone size is
> unrealistic and very unlikely to ever be seen in real devices. Currently,
> SMR disks have a zone size of 256MB, and we are expecting ZNS drives to be
> in the 1-4GB range, so this 32GB limit gives us room to breathe. For now,
> we only allow zone sizes up to 8GB, below this hard limit of 32GB.
> 
> The fixed location addresses are somewhat arbitrary, but with the intent of
> maintaining superblock reliability even for smaller devices. For this
> reason, the superblock fixed locations do not exceed 1TB.

Yeah, so how much should we adjust the offsets in favor of smaller
devices instead of larger ones? I think this is going the wrong
direction, the capacity will grow, the zones are supposed to be larger.

> The superblock logging zones are reserved for superblock logging and never
> used for data or metadata blocks. Note that we only reserve the two zones
> per primary/copy actually used for superblock logging. We do not reserve
> the ranges of zones possibly containing superblocks with the largest
> supported zone size (0-16GB, 64G-80GB, 256G-272GB).
> 
> The zones containing the fixed location offsets used to store superblocks
> in a regular btrfs volume (no zoned case) are also reserved to avoid
> confusion.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/zoned.c | 43 +++++++++++++++++++++++++++++++++++--------
>  1 file changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 1f972b75a9ab..a4b195fe08a0 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -21,9 +21,28 @@
>  /* Pseudo write pointer value for conventional zone */
>  #define WP_CONVENTIONAL ((u64)-2)
>  
> +/*
> + * Location of the first zone of superblock logging zone pairs.
> + * - Primary superblock: the zone containing offset 0 (zone 0)
> + * - First superblock copy: the zone containing offset 64G
> + * - Second superblock copy: the zone containing offset 256G
> + */
> +#define BTRFS_PRIMARY_SB_LOG_ZONE 0ULL
> +#define BTRFS_FIRST_SB_LOG_ZONE (64ULL * SZ_1G)
> +#define BTRFS_SECOND_SB_LOG_ZONE (256ULL * SZ_1G)
> +#define BTRFS_FIRST_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_FIRST_SB_LOG_ZONE)
> +#define BTRFS_SECOND_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_SECOND_SB_LOG_ZONE)

This should be named like BTRFS_SB_LOG_* ie. "namespaces" or common
prefixes for the same class of valuesd.

> +
>  /* Number of superblock log zones */
>  #define BTRFS_NR_SB_LOG_ZONES 2
>  
> +/*
> + * Maximum size of zones. Currently, SMR disks have a zone size of 256MB,
> + * and we are expecting ZNS drives to be in the 1-4GB range. We do not
> + * expect the zone size to become larger than 8GB in the near future.
> + */
> +#define BTRFS_MAX_ZONE_SIZE SZ_8G
> +
>  static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data)
>  {
>  	struct blk_zone *zones = data;
> @@ -111,11 +130,8 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
>  }
>  
>  /*
> - * The following zones are reserved as the circular buffer on ZONED btrfs.
> - *  - The primary superblock: zones 0 and 1
> - *  - The first copy: zones 16 and 17
> - *  - The second copy: zones 1024 or zone at 256GB which is minimum, and
> - *                     the following one
> + * Get the zone number of the first zone of a pair of contiguous zones used
> + * for superblock logging.
>   */
>  static inline u32 sb_zone_number(int shift, int mirror)
>  {
> @@ -123,8 +139,8 @@ static inline u32 sb_zone_number(int shift, int mirror)
>  
>  	switch (mirror) {
>  	case 0: return 0;
> -	case 1: return 16;
> -	case 2: return min_t(u64, btrfs_sb_offset(mirror) >> shift, 1024);
> +	case 1: return 1 << (BTRFS_FIRST_SB_LOG_ZONE_SHIFT - shift);
> +	case 2: return 1 << (BTRFS_SECOND_SB_LOG_ZONE_SHIFT - shift);
>  	}
>  
>  	return 0;
> @@ -300,10 +316,21 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
>  		zone_sectors = bdev_zone_sectors(bdev);
>  	}
>  
> -	nr_sectors = bdev_nr_sectors(bdev);
>  	/* Check if it's power of 2 (see is_power_of_2) */
>  	ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0);
>  	zone_info->zone_size = zone_sectors << SECTOR_SHIFT;
> +
> +	/* We reject devices with a zone size larger than 8GB. */
> +	if (zone_info->zone_size > BTRFS_MAX_ZONE_SIZE) {
> +		btrfs_err_in_rcu(fs_info,
> +				 "zoned: %s: zone size %llu is too large",
> +				 rcu_str_deref(device->name),
> +				 zone_info->zone_size);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	nr_sectors = bdev_nr_sectors(bdev);
>  	zone_info->zone_size_shift = ilog2(zone_info->zone_size);
>  	zone_info->max_zone_append_size =
>  		(u64)queue_max_zone_append_sectors(queue) << SECTOR_SHIFT;
> -- 
> 2.31.1

  parent reply	other threads:[~2021-04-09 11:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08  8:25 [PATCH v2] btrfs: zoned: move superblock logging zone location Naohiro Aota
2021-04-08 14:57 ` Josef Bacik
2021-04-09 10:48   ` David Sterba
2021-04-09 11:07 ` David Sterba [this message]
2021-04-10  9:29   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210409110719.GR7604@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=naohiro.aota@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.