All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] convert: rollback rework
@ 2017-03-14  8:05 Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 1/7] btrfs-progs: convert: Add comment for the overall convert workflow Qu Wenruo
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Qu Wenruo @ 2017-03-14  8:05 UTC (permalink / raw
  To: linux-btrfs; +Cc: dsterba

The patchset can be fetched from my github, based on v4.10 now.
https://github.com/adam900710/btrfs-progs/tree/rollback_rework_v4.10

Rework rollback to provide a easier to understand new mechnisim, which can
handle both old convert behavior and new convert behavior.

And with these patchset, we can pass convert test again.

The idea itself is to make full use of convert image, which is a valid fs
image, and most of them are at the same position on disk.

In fact, to rollback, we just need to read out the whole convert
image and write it into disk.
However we don't really need to read/write the whole image, but only the
relocated ranges of convert image.

Convert image layout:

|<------------------------ valid old fs ----------------------------->|
| Relocated |   | Direct |  | Direct |   | Direct | | Relocated |
                    ||          ||           ||
                    \/          \/           \/
|<------------------------  new btrfs  ------------------------------>|
| Reserved  |   | Direct |  | Direct |   | Direct | | Reserved  |

Where "Direct" is directly mapped old fs data, whose file offset is the
same as its physical position on disk.
For "Direct" old fs data, we don't need to do anything.

While "Relocated" data is new data, which content is completely the same
as old fs, but they are relocated to btrfs data chunk.
Such "relocated" data is all in btrfs reserved ranges.

If we copy such "Relocated" data from convert image, to their original
physical position, then the old fs is back online.

So the new rollback work flow is:
1) Open btrfs read-only
2) Check convert image
   All file extents except ones in btrfs_reserved_ranges[] must be
   mapped at the same position in convert image and on disk.
3) Read out relocated file extents of convert image
   Use a newly introduced function, btrfs_read_file() to read data at
   specified offset of convert image, and keep them in memory.
4) Close btrfs
5) Write relocated data into its original position

Now old fs is back online, what happened in btrfs is just another dream.

In theory, current behavior can handle the rollback even if the whole fs
is balanced, just ensure we have enough memory, and enlarge the
@btrfs_reserved_ranges to cover the whole convert image.
But that's too crazy and the convert/rollback doesn't see much usage in
recent days.

Along such rework, add more comment to explain both convert and rework.

Convert test will all pass now.

v2:
  Abstract the original code to read out data in one btrfs file to
  btrfs_read_file().
  Use simple_range and btrfs_reserved_ranges[] to cleanup open code.
v3:
  Rebased to v4.10.
  Squash modification in later commits to their previous owner.
  Fix a converity report, which doesn't exit when an error is found in
  check_convert_image()
  Fix a lot of check scripts warning

Qu Wenruo (7):
  btrfs-progs: convert: Add comment for the overall convert workflow
  btrfs-progs: convert: Introduce simple range structure for convert
    reserved ranges
  btrfs-progs: convert: Use reserved ranges array to cleanup open code
  btrfs-progs: file: Introduce function to read out file content
  btrfs-progs: convert: Introduce function to read out btrfs reserved
    range
  btrfs-progs: convert: Introduce function to check if convert image is
    able to be rolled back
  btrfs-progs: convert: Rework rollback

 convert/common.h    |  27 ++
 convert/main.c      | 929 ++++++++++++++++++++++++----------------------------
 convert/source-fs.c |   3 +
 ctree.h             |   2 +
 file.c              | 169 ++++++++++
 5 files changed, 636 insertions(+), 494 deletions(-)

-- 
2.12.0




^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/7] btrfs-progs: convert: Add comment for the overall convert workflow
  2017-03-14  8:05 [PATCH v3 0/7] convert: rollback rework Qu Wenruo
@ 2017-03-14  8:05 ` Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 2/7] btrfs-progs: convert: Introduce simple range structure for convert reserved ranges Qu Wenruo
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2017-03-14  8:05 UTC (permalink / raw
  To: linux-btrfs; +Cc: dsterba

Convert is now a little complex due to that fact we need to separate
meta and data chunks for different profiles.

Add a comment with ascii art explaining the whole workflow and points
out the really complex part, so any newcomers interested in convert can
get a quick overview of it before digging into the hard to read code.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 convert/main.c      | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 convert/source-fs.c |  3 +++
 2 files changed, 60 insertions(+)

diff --git a/convert/main.c b/convert/main.c
index dc6047e6..079c1ebd 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -16,6 +16,63 @@
  * Boston, MA 021110-1307, USA.
  */
 
+/*
+ * btrfs convert design:
+ * The overall design of btrfs convert is like the following:
+ * |<------------------Old fs----------------------------->|
+ * |<- used ->| |<- used ->|                    |<- used ->|
+ *                           ||
+ *                           \/
+ * |<---------------Btrfs fs------------------------------>|
+ * |<-   Old data chunk  ->|< new chunk (D/M/S)>|<- ODC  ->|
+ * |<-Old-FE->| |<-Old-FE->|<- Btrfs extents  ->|<-Old-FE->|
+ *
+ * ODC    = Old data chunk, btrfs chunks containing old fs data
+ *          Mapped 1:1 (logical address == device offset)
+ * Old-FE = file extents points to old fs.
+ *
+ * So old fs used space is (mostly) kept as is, while btrfs will insert
+ * its chunk(Data/Meta/Sys) into large enough free space.
+ * In this way, we can create different profiles for meta/data for converted fs.
+ *
+ * The DIRTY work is, we must reserve and relocate 3 ranges for btrfs:
+ * [0, 1M), [btrfs_sb_offset(1), +64K), [btrfs_sb_offset(2), +64K)
+ *
+ * Most work are spent handling corner cases around these reserved ranges.
+ *
+ * Detailed workflow is:
+ * 1)   Scan old fs used space and calculate data chunk layout
+ * 1.1) Scan old fs
+ *      We can a map of used space of old fs
+ *
+ * 1.2) Calculate data chunk layout <<< Complex work
+ *      New data chunks must meet 3 conditions using result from 1.1)
+ *      a. Large enough to be a chunk
+ *      b. Doesn't cover reserved ranges
+ *      c. Covers all the remaining old fs used space
+ *
+ *      XXX: This can be simplified if we don't need to handle backup supers
+ *
+ * 1.3) Calculate usable space for btrfs new chunks
+ *      Btrfs chunk usable space must meet 3 conditions using result from 1.2)
+ *      a. Large enough to be a chunk
+ *      b. Doesn't cover reserved ranges
+ *      c. Doesn't cover any data chunks in 1.1)
+ *
+ * 2)   Create basic btrfs
+ *      Initial metadata and sys chunks are inserted in the first available
+ *      space of result of 1.3)
+ *      Then insert all data chunks into the basic btrfs
+ *
+ * 3)   Create convert image
+ *      We need to relocate reserved ranges here.
+ *      After this step, the convert image is done, and we can use the image
+ *      as reflink source to create old files
+ *
+ * 4)   Iterate old fs to create files
+ *      We just reflink any offset in old fs to new file extents
+ */
+
 #include "kerncompat.h"
 
 #include <stdio.h>
diff --git a/convert/source-fs.c b/convert/source-fs.c
index 8217c893..9268639d 100644
--- a/convert/source-fs.c
+++ b/convert/source-fs.c
@@ -181,6 +181,9 @@ fail:
  * The special point is, old disk_block can point to a reserved range.
  * So here, we don't use disk_block directly but search convert_root
  * to get the real disk_bytenr.
+ *
+ * TODO: Better extract this function to btrfs_reflink(), in fact we are just
+ * reflinking from convert_image of convert_root.
  */
 int record_file_blocks(struct blk_iterate_data *data,
 			      u64 file_block, u64 disk_block, u64 num_blocks)
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/7] btrfs-progs: convert: Introduce simple range structure for convert reserved ranges
  2017-03-14  8:05 [PATCH v3 0/7] convert: rollback rework Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 1/7] btrfs-progs: convert: Add comment for the overall convert workflow Qu Wenruo
@ 2017-03-14  8:05 ` Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 3/7] btrfs-progs: convert: Use reserved ranges array to cleanup open code Qu Wenruo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2017-03-14  8:05 UTC (permalink / raw
  To: linux-btrfs; +Cc: dsterba

Introduce a new strucutre, simple_range, to present one continuous
range.

Also, use such structure to define btrfs_reserved_ranges(), which
convert and rollback will use.

Suggested-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 convert/common.h | 27 +++++++++++++++++++++++++++
 convert/main.c   |  1 +
 2 files changed, 28 insertions(+)

diff --git a/convert/common.h b/convert/common.h
index 0d3adeaa..28ca1461 100644
--- a/convert/common.h
+++ b/convert/common.h
@@ -26,6 +26,33 @@
 #include "common-defs.h"
 #include "extent-cache.h"
 
+/*
+ * Presents one simple continuous range.
+ *
+ * For multiple or non-continuous ranges, use extent_cache_tree from
+ * extent-cache.c
+ */
+struct simple_range {
+	u64 start;
+	u64 len;
+};
+
+const static struct simple_range btrfs_reserved_ranges[] = {
+	{0, SZ_1M},
+	{BTRFS_SB_MIRROR_OFFSET(1), SZ_64K},
+	{BTRFS_SB_MIRROR_OFFSET(2), SZ_64K}
+};
+
+/*
+ * Simple range functions
+ *
+ * Get range end (exclusive)
+ */
+static inline u64 range_end(const struct simple_range *range)
+{
+	return (range->start + range->len);
+}
+
 struct btrfs_mkfs_config;
 
 struct btrfs_convert_context {
diff --git a/convert/main.c b/convert/main.c
index 079c1ebd..41309645 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -81,6 +81,7 @@
 #include <fcntl.h>
 #include <unistd.h>
 #include <getopt.h>
+#include <stdbool.h>
 
 #include "ctree.h"
 #include "disk-io.h"
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/7] btrfs-progs: convert: Use reserved ranges array to cleanup open code
  2017-03-14  8:05 [PATCH v3 0/7] convert: rollback rework Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 1/7] btrfs-progs: convert: Add comment for the overall convert workflow Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 2/7] btrfs-progs: convert: Introduce simple range structure for convert reserved ranges Qu Wenruo
@ 2017-03-14  8:05 ` Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 4/7] btrfs-progs: file: Introduce function to read out file content Qu Wenruo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2017-03-14  8:05 UTC (permalink / raw
  To: linux-btrfs; +Cc: dsterba

Since we have reserved ranges array now, we can use them to skip all
these open codes.

And add some comment and asciidoc art for related part.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 convert/main.c | 138 ++++++++++++++++++++++++++-------------------------------
 1 file changed, 63 insertions(+), 75 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index 41309645..e834c6f9 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -205,47 +205,40 @@ static int create_image_file_range(struct btrfs_trans_handle *trans,
 	len = min_t(u64, len, BTRFS_MAX_EXTENT_SIZE);
 
 	/*
-	 * Skip sb ranges first
-	 * [0, 1M), [sb_offset(1), +64K), [sb_offset(2), +64K].
+	 * Skip reserved ranges first
 	 *
 	 * Or we will insert a hole into current image file, and later
 	 * migrate block will fail as there is already a file extent.
 	 */
-	if (bytenr < 1024 * 1024) {
-		*ret_len = 1024 * 1024 - bytenr;
-		return 0;
-	}
-	for (i = 1; i < BTRFS_SUPER_MIRROR_MAX; i++) {
-		u64 cur = btrfs_sb_offset(i);
+	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
+		const struct simple_range *reserved = &btrfs_reserved_ranges[i];
 
-		if (bytenr >= cur && bytenr < cur + BTRFS_STRIPE_LEN) {
-			*ret_len = cur + BTRFS_STRIPE_LEN - bytenr;
+		/*
+		 * |-- reserved --|
+		 *         |--range---|
+		 * or
+		 * |---- reserved ----|
+		 *    |-- range --|
+		 * Skip to reserved range end
+		 */
+		if (bytenr >= reserved->start && bytenr < range_end(reserved)) {
+			*ret_len = range_end(reserved) - bytenr;
 			return 0;
 		}
-	}
-	for (i = 1; i < BTRFS_SUPER_MIRROR_MAX; i++) {
-		u64 cur = btrfs_sb_offset(i);
 
 		/*
-		 *      |--reserved--|
+		 *      |---reserved---|
 		 * |----range-------|
-		 * May still need to go through file extent inserts
+		 * Leading part may still create a file extent
 		 */
-		if (bytenr < cur && bytenr + len >= cur) {
-			len = min_t(u64, len, cur - bytenr);
+		if (bytenr < reserved->start &&
+		    bytenr + len >= range_end(reserved)) {
+			len = min_t(u64, len, reserved->start - bytenr);
 			break;
 		}
-		/*
-		 * |--reserved--|
-		 *      |---range---|
-		 * Drop out, no need to insert anything
-		 */
-		if (bytenr >= cur && bytenr < cur + BTRFS_STRIPE_LEN) {
-			*ret_len = cur + BTRFS_STRIPE_LEN - bytenr;
-			return 0;
-		}
 	}
 
+	/* Check if we are going to insert regular file extent or hole */
 	cache = search_cache_extent(used, bytenr);
 	if (cache) {
 		if (cache->start <= bytenr) {
@@ -253,6 +246,7 @@ static int create_image_file_range(struct btrfs_trans_handle *trans,
 			 * |///////Used///////|
 			 *	|<--insert--->|
 			 *	bytenr
+			 * Regular file extent
 			 */
 			len = min_t(u64, len, cache->start + cache->size -
 				    bytenr);
@@ -262,6 +256,7 @@ static int create_image_file_range(struct btrfs_trans_handle *trans,
 			 *		|//Used//|
 			 *  |<-insert-->|
 			 *  bytenr
+			 * Hole
 			 */
 			len = min(len, cache->start - bytenr);
 			disk_bytenr = 0;
@@ -272,6 +267,7 @@ static int create_image_file_range(struct btrfs_trans_handle *trans,
 		 * |//Used//|		|EOF
 		 *	    |<-insert-->|
 		 *	    bytenr
+		 * Hole
 		 */
 		disk_bytenr = 0;
 		datacsum = 0;
@@ -317,24 +313,31 @@ static int migrate_one_reserved_range(struct btrfs_trans_handle *trans,
 				      struct btrfs_root *root,
 				      struct cache_tree *used,
 				      struct btrfs_inode_item *inode, int fd,
-				      u64 ino, u64 start, u64 len,
+				      u64 ino, const struct simple_range *range,
 				      u32 convert_flags)
 {
-	u64 cur_off = start;
-	u64 cur_len = len;
-	u64 hole_start = start;
+	u64 cur_off = range->start;
+	u64 cur_len = range->len;
+	u64 hole_start = range->start;
 	u64 hole_len;
 	struct cache_extent *cache;
 	struct btrfs_key key;
 	struct extent_buffer *eb;
 	int ret = 0;
 
-	while (cur_off < start + len) {
+	/*
+	 * It's possible that there are holes in reserved range:
+	 * |<---------------- Reserved range ---------------------->|
+	 *      |<- Old fs data ->|   |<- Old fs data ->|
+	 * So here we need to iterate through old fs used space and only
+	 * migrate ranges that covered by old fs data.
+	 */
+	while (cur_off < range_end(range)) {
 		cache = lookup_cache_extent(used, cur_off, cur_len);
 		if (!cache)
 			break;
 		cur_off = max(cache->start, cur_off);
-		cur_len = min(cache->start + cache->size, start + len) -
+		cur_len = min(cache->start + cache->size, range_end(range)) -
 			  cur_off;
 		BUG_ON(cur_len < root->sectorsize);
 
@@ -386,20 +389,22 @@ static int migrate_one_reserved_range(struct btrfs_trans_handle *trans,
 
 		cur_off += key.offset;
 		hole_start = cur_off;
-		cur_len = start + len - cur_off;
+		cur_len = range_end(range) - cur_off;
 	}
-	/* Last hole */
-	if (start + len - hole_start > 0)
+	/*
+	 * Last hole
+	 * |<---- reserved -------->|
+	 * |<- Old fs data ->|      |
+	 *                   | Hole |
+	 */
+	if (range_end(range) - hole_start > 0)
 		ret = btrfs_record_file_extent(trans, root, ino, inode,
-				hole_start, 0, start + len - hole_start);
+				hole_start, 0, range_end(range) - hole_start);
 	return ret;
 }
 
 /*
  * Relocate the used ext2 data in reserved ranges
- * [0,1M)
- * [btrfs_sb_offset(1), +BTRFS_STRIPE_LEN)
- * [btrfs_sb_offset(2), +BTRFS_STRIPE_LEN)
  */
 static int migrate_reserved_ranges(struct btrfs_trans_handle *trans,
 				   struct btrfs_root *root,
@@ -407,35 +412,19 @@ static int migrate_reserved_ranges(struct btrfs_trans_handle *trans,
 				   struct btrfs_inode_item *inode, int fd,
 				   u64 ino, u64 total_bytes, u32 convert_flags)
 {
-	u64 cur_off;
-	u64 cur_len;
+	int i;
 	int ret = 0;
 
-	/* 0 ~ 1M */
-	cur_off = 0;
-	cur_len = 1024 * 1024;
-	ret = migrate_one_reserved_range(trans, root, used, inode, fd, ino,
-					 cur_off, cur_len, convert_flags);
-	if (ret < 0)
-		return ret;
+	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
+		const struct simple_range *range = &btrfs_reserved_ranges[i];
 
-	/* second sb(fisrt sb is included in 0~1M) */
-	cur_off = btrfs_sb_offset(1);
-	cur_len = min(total_bytes, cur_off + BTRFS_STRIPE_LEN) - cur_off;
-	if (cur_off > total_bytes)
-		return ret;
-	ret = migrate_one_reserved_range(trans, root, used, inode, fd, ino,
-					 cur_off, cur_len, convert_flags);
-	if (ret < 0)
-		return ret;
-
-	/* Last sb */
-	cur_off = btrfs_sb_offset(2);
-	cur_len = min(total_bytes, cur_off + BTRFS_STRIPE_LEN) - cur_off;
-	if (cur_off > total_bytes)
-		return ret;
-	ret = migrate_one_reserved_range(trans, root, used, inode, fd, ino,
-					 cur_off, cur_len, convert_flags);
+		if (range->start > total_bytes)
+			return ret;
+		ret = migrate_one_reserved_range(trans, root, used, inode, fd,
+						 ino, range, convert_flags);
+		if (ret < 0)
+			return ret;
+	}
 	return ret;
 }
 
@@ -608,18 +597,17 @@ static int wipe_one_reserved_range(struct cache_tree *tree,
 static int wipe_reserved_ranges(struct cache_tree *tree, u64 min_stripe_size,
 				int ensure_size)
 {
+	int i;
 	int ret;
 
-	ret = wipe_one_reserved_range(tree, 0, 1024 * 1024, min_stripe_size,
-				      ensure_size);
-	if (ret < 0)
-		return ret;
-	ret = wipe_one_reserved_range(tree, btrfs_sb_offset(1),
-			BTRFS_STRIPE_LEN, min_stripe_size, ensure_size);
-	if (ret < 0)
-		return ret;
-	ret = wipe_one_reserved_range(tree, btrfs_sb_offset(2),
-			BTRFS_STRIPE_LEN, min_stripe_size, ensure_size);
+	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
+		const struct simple_range *range = &btrfs_reserved_ranges[i];
+
+		ret = wipe_one_reserved_range(tree, range->start, range->len,
+					      min_stripe_size, ensure_size);
+		if (ret < 0)
+			return ret;
+	}
 	return ret;
 }
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 4/7] btrfs-progs: file: Introduce function to read out file content
  2017-03-14  8:05 [PATCH v3 0/7] convert: rollback rework Qu Wenruo
                   ` (2 preceding siblings ...)
  2017-03-14  8:05 ` [PATCH v3 3/7] btrfs-progs: convert: Use reserved ranges array to cleanup open code Qu Wenruo
@ 2017-03-14  8:05 ` Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 5/7] btrfs-progs: convert: Introduce function to read out btrfs reserved range Qu Wenruo
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2017-03-14  8:05 UTC (permalink / raw
  To: linux-btrfs; +Cc: dsterba

Introduce a new function, btrfs_read_file(), to read out data of a file
inode.

This function will iterate through EXTENT_DATA items and handle
inline/prealloc/hole file extents.

Compression is not supported yet.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 ctree.h |   2 +
 file.c  | 169 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 171 insertions(+)

diff --git a/ctree.h b/ctree.h
index 1d0622d9..63d6d047 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2776,4 +2776,6 @@ int btrfs_get_extent(struct btrfs_trans_handle *trans,
 int btrfs_punch_hole(struct btrfs_trans_handle *trans,
 		     struct btrfs_root *root,
 		     u64 ino, u64 offset, u64 len);
+int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
+		    char *dest);
 #endif
diff --git a/file.c b/file.c
index dd0b04b8..06c1e2dd 100644
--- a/file.c
+++ b/file.c
@@ -18,6 +18,8 @@
 
 #include <sys/stat.h>
 #include "ctree.h"
+#include "utils.h"
+#include "disk-io.h"
 #include "transaction.h"
 #include "kerncompat.h"
 
@@ -160,3 +162,170 @@ out:
 	btrfs_free_path(path);
 	return ret;
 }
+
+/*
+ * Read out content of one inode.
+ *
+ * @root:  fs/subvolume root containing the inode
+ * @ino:   inode number
+ * @start: offset inside the file, aligned to sectorsize
+ * @len:   length to read, aligned to sectorisize
+ * @dest:  where data will be stored
+ *
+ * NOTE:
+ * 1) compression data is not supported yet.
+ * 2) @start and @len must be aligned to sectorsize
+ * 3) data read out is also aligned to sectorsize, not truncated to inode size
+ *
+ * Return < 0 for fatal error during read.
+ * Otherwise return the number of suceeful read out data.
+ */
+int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
+		    char *dest)
+{
+	struct btrfs_key key;
+	struct btrfs_path path;
+	struct extent_buffer *leaf;
+	struct btrfs_inode_item *ii;
+	u64 isize;
+	int no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
+	int slot;
+	int read = 0;
+	int ret;
+
+	if (!IS_ALIGNED(start, root->sectorsize) ||
+	    !IS_ALIGNED(len, root->sectorsize)) {
+		warning("@start and @len must be aligned to %u for function %s",
+			root->sectorsize, __func__);
+		return -EINVAL;
+	}
+
+	btrfs_init_path(&path);
+	key.objectid = ino;
+	key.offset = start;
+	key.type = BTRFS_EXTENT_DATA_KEY;
+
+	ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0);
+	if (ret < 0)
+		goto out;
+
+	if (ret > 0) {
+		ret = btrfs_previous_item(root, &path, ino,
+					  BTRFS_EXTENT_DATA_KEY);
+		if (ret > 0) {
+			ret = -ENOENT;
+			goto out;
+		}
+	}
+
+	/*
+	 * Reset @dest to all 0, so we don't need to care about holes in
+	 * no_hole mode, but focus on reading non-hole part.
+	 */
+	memset(dest, 0, len);
+	while (1) {
+		struct btrfs_file_extent_item *fi;
+		u64 extent_start;
+		u64 extent_len;
+		u64 read_start;
+		u64 read_len;
+		u64 read_len_ret;
+		u64 disk_bytenr;
+
+		leaf = path.nodes[0];
+		slot = path.slots[0];
+
+		btrfs_item_key_to_cpu(leaf, &key, slot);
+		if (key.objectid > ino)
+			break;
+		if (key.type != BTRFS_EXTENT_DATA_KEY || key.objectid != ino)
+			goto next;
+
+		extent_start = key.offset;
+		if (extent_start >= start + len)
+			break;
+
+		fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+		if (btrfs_file_extent_compression(leaf, fi) !=
+		    BTRFS_COMPRESS_NONE) {
+			ret = -ENOTTY;
+			break;
+		}
+
+		/* Inline extent, one inode should only one inline extent */
+		if (btrfs_file_extent_type(leaf, fi) ==
+		    BTRFS_FILE_EXTENT_INLINE) {
+			extent_len = btrfs_file_extent_inline_len(leaf, slot,
+								  fi);
+			if (extent_start + extent_len <= start)
+				goto next;
+			read_extent_buffer(leaf, dest,
+				btrfs_file_extent_inline_start(fi), extent_len);
+			read += round_up(extent_len, root->sectorsize);
+			break;
+		}
+
+		extent_len = btrfs_file_extent_num_bytes(leaf, fi);
+		if (extent_start + extent_len <= start)
+			goto next;
+
+		read_start = max(start, extent_start);
+		read_len = min(start + len, extent_start + extent_len) -
+			   read_start;
+
+		/* We have already zeroed @dest, nothing to do */
+		if (btrfs_file_extent_type(leaf, fi) ==
+		    BTRFS_FILE_EXTENT_PREALLOC ||
+		    btrfs_file_extent_disk_num_bytes(leaf, fi) == 0) {
+			read += read_len;
+			goto next;
+		}
+
+		disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi) +
+			      btrfs_file_extent_offset(leaf, fi);
+		read_len_ret = read_len;
+		ret = read_extent_data(root, dest + read_start - start,
+					disk_bytenr, &read_len_ret, 0);
+		if (ret < 0)
+			break;
+		/* Short read, something went wrong */
+		if (read_len_ret != read_len)
+			return -EIO;
+		read += read_len;
+next:
+		ret = btrfs_next_item(root, &path);
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+	}
+
+	/*
+	 * Special trick for no_holes, since for no_holes we don't have good
+	 * method to account skipped and tailling holes, we used
+	 * min(inode size, len) as return value
+	 */
+	if (no_holes) {
+		btrfs_release_path(&path);
+		key.objectid = ino;
+		key.offset = 0;
+		key.type = BTRFS_INODE_ITEM_KEY;
+		ret = btrfs_lookup_inode(NULL, root, &path, &key, 0);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = -ENOENT;
+			goto out;
+		}
+		ii = btrfs_item_ptr(path.nodes[0], path.slots[0],
+				    struct btrfs_inode_item);
+		isize = round_up(btrfs_inode_size(path.nodes[0], ii),
+				 root->sectorsize);
+		read = min_t(u64, isize - start, len);
+	}
+out:
+	btrfs_release_path(&path);
+	if (!ret)
+		ret = read;
+	return ret;
+}
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 5/7] btrfs-progs: convert: Introduce function to read out btrfs reserved range
  2017-03-14  8:05 [PATCH v3 0/7] convert: rollback rework Qu Wenruo
                   ` (3 preceding siblings ...)
  2017-03-14  8:05 ` [PATCH v3 4/7] btrfs-progs: file: Introduce function to read out file content Qu Wenruo
@ 2017-03-14  8:05 ` Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 6/7] btrfs-progs: convert: Introduce function to check if convert image is able to be rolled back Qu Wenruo
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2017-03-14  8:05 UTC (permalink / raw
  To: linux-btrfs; +Cc: dsterba

Introduce a new function, read_reserved_ranges(), to allow later
rollback to use these data to do rollback.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 convert/main.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/convert/main.c b/convert/main.c
index e834c6f9..505b4a70 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1506,6 +1506,37 @@ fail:
 	return -1;
 }
 
+/*
+ * Read out data of convert image which is in btrfs reserved range.
+ *
+ * So rollback can just use these data to overwrite these ranges of btrfs
+ */
+static int read_reserved_ranges(struct btrfs_root *root, u64 ino,
+				u64 total_bytes, char *reserved_ranges[])
+{
+	int i;
+	int ret = 0;
+
+	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
+		const struct simple_range *range = &btrfs_reserved_ranges[i];
+
+		if (range->start + range->len >= total_bytes)
+			break;
+		ret = btrfs_read_file(root, ino, range->start, range->len,
+				      reserved_ranges[i]);
+		if (ret < range->len) {
+			error(
+	"failed to read out data of convert image, offset=%llu len=%llu ret=%d",
+			      range->start, range->len, ret);
+			if (ret >= 0)
+				ret = -EIO;
+			break;
+		}
+		ret = 0;
+	}
+	return ret;
+}
+
 static int do_rollback(const char *devname)
 {
 	int fd = -1;
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 6/7] btrfs-progs: convert: Introduce function to check if convert image is able to be rolled back
  2017-03-14  8:05 [PATCH v3 0/7] convert: rollback rework Qu Wenruo
                   ` (4 preceding siblings ...)
  2017-03-14  8:05 ` [PATCH v3 5/7] btrfs-progs: convert: Introduce function to read out btrfs reserved range Qu Wenruo
@ 2017-03-14  8:05 ` Qu Wenruo
  2017-03-14  8:05 ` [PATCH v3 7/7] btrfs-progs: convert: Rework rollback Qu Wenruo
  2017-03-14 11:55 ` [PATCH v3 0/7] convert: rollback rework David Sterba
  7 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2017-03-14  8:05 UTC (permalink / raw
  To: linux-btrfs; +Cc: dsterba

Introduce a function, check_convert_image() to check if that image is
rollback-able.

This means all file extents excepts one in btrfs reserved ranges, must
be mapped 1:1 on disk.

1:1 mapped file extents must match the following condition:
1) Their file_offset(key.offset) matches its disk_bytenr
2) The corresponding chunk must be mapped 1:1 on disk
   That's to say, it's a SINGLE chunk, and chunk logical matches with
   stripe physical.

Above 2 conditions ensured that file extent lies the exactly the same
position as old filesystem.

For data in reserved ranges of btrfs, they are relocated to new places,
and in that case, we use btrfs_read_file() to read out the content for
later rollback use.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 convert/main.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 179 insertions(+)

diff --git a/convert/main.c b/convert/main.c
index 505b4a70..3ad7b2fb 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1537,6 +1537,185 @@ static int read_reserved_ranges(struct btrfs_root *root, u64 ino,
 	return ret;
 }
 
+static bool is_subset_of_reserved_ranges(u64 start, u64 len)
+{
+	int i;
+	bool ret = false;
+
+	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
+		const struct simple_range *range = &btrfs_reserved_ranges[i];
+
+		if (start >= range->start && start + len <= range_end(range)) {
+			ret = true;
+			break;
+		}
+	}
+	return ret;
+}
+
+static bool is_chunk_direct_mapped(struct btrfs_fs_info *fs_info, u64 start)
+{
+	struct cache_extent *ce;
+	struct map_lookup *map;
+	bool ret = false;
+
+	ce = search_cache_extent(&fs_info->mapping_tree.cache_tree, start);
+	if (!ce)
+		goto out;
+	if (ce->start > start || ce->start + ce->size < start)
+		goto out;
+
+	map = container_of(ce, struct map_lookup, ce);
+
+	/* Not SINGLE chunk */
+	if (map->num_stripes != 1)
+		goto out;
+
+	/* Chunk's logical doesn't match with phisical, not 1:1 mapped */
+	if (map->ce.start != map->stripes[0].physical)
+		goto out;
+	ret = true;
+out:
+	return ret;
+}
+
+/*
+ * Iterate all file extents of the convert image.
+ *
+ * All file extents except ones in btrfs_reserved_ranges[] must be mapped 1:1
+ * on disk. (Means their file_offset must match their on disk bytenr)
+ *
+ * File extents in reserved ranges can be relocated to other place, and in
+ * that case we will read them out for later use.
+ */
+static int check_convert_image(struct btrfs_root *image_root, u64 ino,
+			       u64 total_size, char *reserved_ranges[])
+{
+	struct btrfs_key key;
+	struct btrfs_path path;
+	struct btrfs_fs_info *fs_info = image_root->fs_info;
+	u64 checked_bytes = 0;
+	int ret;
+
+	key.objectid = ino;
+	key.offset = 0;
+	key.type = BTRFS_EXTENT_DATA_KEY;
+
+	btrfs_init_path(&path);
+	ret = btrfs_search_slot(NULL, image_root, &key, &path, 0, 0);
+	/*
+	 * It's possible that some fs doesn't store any(including sb)
+	 * data into 0~1M range, and NO_HOLES is enabled.
+	 *
+	 * So only needs to check ret < 0 case
+	 */
+	if (ret < 0) {
+		error("failed to iterate file extents at offset 0: %s",
+			strerror(-ret));
+		btrfs_release_path(&path);
+		return ret;
+	}
+
+	/* Loop from the first file extents */
+	while (1) {
+		struct btrfs_file_extent_item *fi;
+		struct extent_buffer *leaf = path.nodes[0];
+		u64 disk_bytenr;
+		u64 file_offset;
+		u64 ram_bytes;
+		int slot = path.slots[0];
+
+		if (slot >= btrfs_header_nritems(leaf))
+			goto next;
+		btrfs_item_key_to_cpu(leaf, &key, slot);
+
+		/*
+		 * Iteration is done, exit normally, we have extra check out of
+		 * the loop
+		 */
+		if (key.objectid != ino || key.type != BTRFS_EXTENT_DATA_KEY) {
+			ret = 0;
+			break;
+		}
+		file_offset = key.offset;
+		fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+		if (btrfs_file_extent_type(leaf, fi) != BTRFS_FILE_EXTENT_REG) {
+			ret = -EINVAL;
+			error(
+		"ino %llu offset %llu doesn't have a regular file extent",
+				ino, file_offset);
+			break;
+		}
+		if (btrfs_file_extent_compression(leaf, fi) ||
+		    btrfs_file_extent_encryption(leaf, fi) ||
+		    btrfs_file_extent_other_encoding(leaf, fi)) {
+			ret = -EINVAL;
+			error(
+			"ino %llu offset %llu doesn't have a plain file extent",
+				ino, file_offset);
+			break;
+		}
+
+		disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
+		ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
+
+		checked_bytes += ram_bytes;
+		/* Skip hole */
+		if (disk_bytenr == 0)
+			goto next;
+
+		/*
+		 * Most file extent must be 1:1 mapped, which means 2 things:
+		 * 1) File extent file offset == disk_bytenr
+		 * 2) That data chunk's logical == chunk's physical
+		 *
+		 * So file extent's file offset == physical position on disk.
+		 *
+		 * And after rolling back btrfs reserved range, other part
+		 * remains what old fs used to be.
+		 */
+		if (file_offset != disk_bytenr ||
+		    !is_chunk_direct_mapped(fs_info, disk_bytenr)) {
+			/*
+			 * Only file extent in btrfs reserved ranges are allow
+			 * non-1:1 mapped
+			 */
+			if (!is_subset_of_reserved_ranges(file_offset,
+							ram_bytes)) {
+				ret = -EINVAL;
+				error(
+		"ino %llu offset %llu file extent should not be relocated",
+					ino, file_offset);
+				break;
+			}
+		}
+next:
+		ret = btrfs_next_item(image_root, &path);
+		if (ret) {
+			if (ret > 0)
+				ret = 0;
+			break;
+		}
+	}
+	btrfs_release_path(&path);
+	/*
+	 * For HOLES mode (without NO_HOLES), we must ensure file extents
+	 * cover the whole range of the image
+	 */
+	if (!ret && !btrfs_fs_incompat(fs_info, NO_HOLES)) {
+		if (checked_bytes != total_size) {
+			error("inode %llu has some file extents not checked",
+				ino);
+			return -EINVAL;
+		}
+	}
+
+	/* So far so good, read out old data located in btrfs reserved ranges */
+	ret = read_reserved_ranges(image_root, ino, total_size,
+				   reserved_ranges);
+	return ret;
+}
+
 static int do_rollback(const char *devname)
 {
 	int fd = -1;
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 7/7] btrfs-progs: convert: Rework rollback
  2017-03-14  8:05 [PATCH v3 0/7] convert: rollback rework Qu Wenruo
                   ` (5 preceding siblings ...)
  2017-03-14  8:05 ` [PATCH v3 6/7] btrfs-progs: convert: Introduce function to check if convert image is able to be rolled back Qu Wenruo
@ 2017-03-14  8:05 ` Qu Wenruo
  2017-03-14 11:55 ` [PATCH v3 0/7] convert: rollback rework David Sterba
  7 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2017-03-14  8:05 UTC (permalink / raw
  To: linux-btrfs; +Cc: dsterba

Rework rollback to a more easy to understand way.

New convert behavior makes us to have a more flex chunk layout, which
only data chunk containing old fs data will be at the same physical
location, while new chunks (data/meta/sys) can be mapped anywhere else.

This behavior makes old rollback behavior can't handle it.
As old behavior assumes all data/meta is mapped in a large chunk, which is
mapped 1:1 on disk.

So rework rollback to handle new convert behavior, enhance the check by
only checking all file extents of convert image, only to check if these
file extents and therir chunks are mapped 1:1.

This new rollback check behavior can handle both new and old convert
behavior, as the new behavior is a superset of old behavior.

Further more, introduce a simple rollback mechanisim:
1) Read reserved data (offset = file offset) from convert image
2) Write reserved data into disk (offset = physical offset)

Since old fs image is a valid fs, and we only need to rollback
superblocks (btrfs reserved ranges), then we just read out data in
reserved range, and write it back.

Due to the fact that all other file extents of converted image is mapped
1:1 on disk, we put the missing piece back, then the fs is as good as
old one.

Then what we do in btrfs is just another dream.

With this new rollback mechanisim, we can open btrfs read-only, so we
won't cause any damage to current btrfs, until the final piece (0~1M,
containing 1st super block) is put back.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 convert/main.c | 591 ++++++++++++++-------------------------------------------
 1 file changed, 138 insertions(+), 453 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index 3ad7b2fb..e81a1a29 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1164,36 +1164,6 @@ fail:
 	return ret;
 }
 
-static int prepare_system_chunk_sb(struct btrfs_super_block *super)
-{
-	struct btrfs_chunk *chunk;
-	struct btrfs_disk_key *key;
-	u32 sectorsize = btrfs_super_sectorsize(super);
-
-	key = (struct btrfs_disk_key *)(super->sys_chunk_array);
-	chunk = (struct btrfs_chunk *)(super->sys_chunk_array +
-				       sizeof(struct btrfs_disk_key));
-
-	btrfs_set_disk_key_objectid(key, BTRFS_FIRST_CHUNK_TREE_OBJECTID);
-	btrfs_set_disk_key_type(key, BTRFS_CHUNK_ITEM_KEY);
-	btrfs_set_disk_key_offset(key, 0);
-
-	btrfs_set_stack_chunk_length(chunk, btrfs_super_total_bytes(super));
-	btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID);
-	btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN);
-	btrfs_set_stack_chunk_type(chunk, BTRFS_BLOCK_GROUP_SYSTEM);
-	btrfs_set_stack_chunk_io_align(chunk, sectorsize);
-	btrfs_set_stack_chunk_io_width(chunk, sectorsize);
-	btrfs_set_stack_chunk_sector_size(chunk, sectorsize);
-	btrfs_set_stack_chunk_num_stripes(chunk, 1);
-	btrfs_set_stack_chunk_sub_stripes(chunk, 0);
-	chunk->stripe.devid = super->dev_item.devid;
-	btrfs_set_stack_stripe_offset(&chunk->stripe, 0);
-	memcpy(chunk->stripe.dev_uuid, super->dev_item.uuid, BTRFS_UUID_SIZE);
-	btrfs_set_super_sys_array_size(super, sizeof(*key) + sizeof(*chunk));
-	return 0;
-}
-
 static int convert_open_fs(const char *devname,
 			   struct btrfs_convert_context *cctx)
 {
@@ -1382,131 +1352,6 @@ fail:
 }
 
 /*
- * Check if a non 1:1 mapped chunk can be rolled back.
- * For new convert, it's OK while for old convert it's not.
- */
-static int may_rollback_chunk(struct btrfs_fs_info *fs_info, u64 bytenr)
-{
-	struct btrfs_block_group_cache *bg;
-	struct btrfs_key key;
-	struct btrfs_path path;
-	struct btrfs_root *extent_root = fs_info->extent_root;
-	u64 bg_start;
-	u64 bg_end;
-	int ret;
-
-	bg = btrfs_lookup_first_block_group(fs_info, bytenr);
-	if (!bg)
-		return -ENOENT;
-	bg_start = bg->key.objectid;
-	bg_end = bg->key.objectid + bg->key.offset;
-
-	key.objectid = bg_end;
-	key.type = BTRFS_METADATA_ITEM_KEY;
-	key.offset = 0;
-	btrfs_init_path(&path);
-
-	ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
-	if (ret < 0)
-		return ret;
-
-	while (1) {
-		struct btrfs_extent_item *ei;
-
-		ret = btrfs_previous_extent_item(extent_root, &path, bg_start);
-		if (ret > 0) {
-			ret = 0;
-			break;
-		}
-		if (ret < 0)
-			break;
-
-		btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
-		if (key.type == BTRFS_METADATA_ITEM_KEY)
-			continue;
-		/* Now it's EXTENT_ITEM_KEY only */
-		ei = btrfs_item_ptr(path.nodes[0], path.slots[0],
-				    struct btrfs_extent_item);
-		/*
-		 * Found data extent, means this is old convert must follow 1:1
-		 * mapping.
-		 */
-		if (btrfs_extent_flags(path.nodes[0], ei)
-				& BTRFS_EXTENT_FLAG_DATA) {
-			ret = -EINVAL;
-			break;
-		}
-	}
-	btrfs_release_path(&path);
-	return ret;
-}
-
-static int may_rollback(struct btrfs_root *root)
-{
-	struct btrfs_fs_info *info = root->fs_info;
-	struct btrfs_multi_bio *multi = NULL;
-	u64 bytenr;
-	u64 length;
-	u64 physical;
-	u64 total_bytes;
-	int num_stripes;
-	int ret;
-
-	if (btrfs_super_num_devices(info->super_copy) != 1)
-		goto fail;
-
-	bytenr = BTRFS_SUPER_INFO_OFFSET;
-	total_bytes = btrfs_super_total_bytes(root->fs_info->super_copy);
-
-	while (1) {
-		ret = btrfs_map_block(&info->mapping_tree, WRITE, bytenr,
-				      &length, &multi, 0, NULL);
-		if (ret) {
-			if (ret == -ENOENT) {
-				/* removed block group at the tail */
-				if (length == (u64)-1)
-					break;
-
-				/* removed block group in the middle */
-				goto next;
-			}
-			goto fail;
-		}
-
-		num_stripes = multi->num_stripes;
-		physical = multi->stripes[0].physical;
-		free(multi);
-
-		if (num_stripes != 1) {
-			error("num stripes for bytenr %llu is not 1", bytenr);
-			goto fail;
-		}
-
-		/*
-		 * Extra check for new convert, as metadata chunk from new
-		 * convert is much more free than old convert, it doesn't need
-		 * to do 1:1 mapping.
-		 */
-		if (physical != bytenr) {
-			/*
-			 * Check if it's a metadata chunk and has only metadata
-			 * extent.
-			 */
-			ret = may_rollback_chunk(info, bytenr);
-			if (ret < 0)
-				goto fail;
-		}
-next:
-		bytenr += length;
-		if (bytenr >= total_bytes)
-			break;
-	}
-	return 0;
-fail:
-	return -1;
-}
-
-/*
  * Read out data of convert image which is in btrfs reserved range.
  *
  * So rollback can just use these data to overwrite these ranges of btrfs
@@ -1716,354 +1561,194 @@ next:
 	return ret;
 }
 
+/*
+ * btrfs rollback is just reverted convert:
+ * |<---------------Btrfs fs------------------------------>|
+ * |<-   Old data chunk  ->|< new chunk (D/M/S)>|<- ODC  ->|
+ * |<-Old-FE->| |<-Old-FE->|<- Btrfs extents  ->|<-Old-FE->|
+ *                           ||
+ *                           \/
+ * |<------------------Old fs----------------------------->|
+ * |<- used ->| |<- used ->|                    |<- used ->|
+ *
+ * However things are much easier than convert, we don't really need to
+ * do the complex space calculation, but only to handle btrfs reserved space
+ *
+ * |<---------------------------Btrfs fs----------------------------->|
+ * |   RSV 1   |  | Old  |   |    RSV 2  | | Old  | |   RSV 3   |
+ * |   0~1M    |  | Fs   |   | SB2 + 64K | | Fs   | | SB3 + 64K |
+ *
+ * On the other hande, the converted fs image in btrfs is a completely
+ * valid old fs.
+ *
+ * |<-----------------Converted fs image in btrfs-------------------->|
+ * |   RSV 1   |  | Old  |   |    RSV 2  | | Old  | |   RSV 3   |
+ * | Relocated |  | Fs   |   | Relocated | | Fs   | | Relocated |
+ *
+ * Used space in fs image should be at the same physical position on disk.
+ * We only need to recover the data in reserved ranges, so the whole
+ * old fs is back.
+ *
+ * The idea to rollback is also straightforward, we just "read" out the data
+ * of reserved ranges, and write them back to there they should be.
+ * Then the old fs is back.
+ */
 static int do_rollback(const char *devname)
 {
-	int fd = -1;
-	int ret;
-	int i;
 	struct btrfs_root *root;
 	struct btrfs_root *image_root;
-	struct btrfs_root *chunk_root;
-	struct btrfs_dir_item *dir;
-	struct btrfs_inode_item *inode;
-	struct btrfs_file_extent_item *fi;
-	struct btrfs_trans_handle *trans;
-	struct extent_buffer *leaf;
-	struct btrfs_block_group_cache *cache1;
-	struct btrfs_block_group_cache *cache2;
+	struct btrfs_fs_info *fs_info;
 	struct btrfs_key key;
 	struct btrfs_path path;
-	struct extent_io_tree io_tree;
-	char *buf = NULL;
-	char *name;
-	u64 bytenr;
-	u64 num_bytes;
-	u64 root_dir;
-	u64 objectid;
-	u64 offset;
-	u64 start;
-	u64 end;
-	u64 sb_bytenr;
-	u64 first_free;
+	struct btrfs_dir_item *dir;
+	struct btrfs_inode_item *inode_item;
+	char *image_name = "image";
+	char *reserved_ranges[ARRAY_SIZE(btrfs_reserved_ranges)] = { NULL };
 	u64 total_bytes;
-	u32 sectorsize;
+	u64 fsize;
+	u64 root_dir;
+	u64 ino;
+	int fd = -1;
+	int ret;
+	int i;
 
-	extent_io_tree_init(&io_tree);
+	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
+		const struct simple_range *range = &btrfs_reserved_ranges[i];
 
+		reserved_ranges[i] = calloc(1, range->len);
+		if (!reserved_ranges[i]) {
+			ret = -ENOMEM;
+			goto free_mem;
+		}
+	}
 	fd = open(devname, O_RDWR);
 	if (fd < 0) {
 		error("unable to open %s: %s", devname, strerror(errno));
-		goto fail;
+		ret = -EIO;
+		goto free_mem;
 	}
+	fsize = lseek(fd, 0, SEEK_END);
 	root = open_ctree_fd(fd, devname, 0, OPEN_CTREE_WRITES);
 	if (!root) {
 		error("unable to open ctree");
-		goto fail;
-	}
-	ret = may_rollback(root);
-	if (ret < 0) {
-		error("unable to do rollback: %d", ret);
-		goto fail;
+		ret = -EIO;
+		goto free_mem;
 	}
+	fs_info = root->fs_info;
 
-	sectorsize = root->sectorsize;
-	buf = malloc(sectorsize);
-	if (!buf) {
-		error("unable to allocate memory");
-		goto fail;
-	}
 
+	/*
+	 * Search root backref first, or after subvolume deletion (orphan),
+	 * we can still rollback the image.
+	 */
 	btrfs_init_path(&path);
 
 	key.objectid = CONV_IMAGE_SUBVOL_OBJECTID;
 	key.type = BTRFS_ROOT_BACKREF_KEY;
 	key.offset = BTRFS_FS_TREE_OBJECTID;
-	ret = btrfs_search_slot(NULL, root->fs_info->tree_root, &key, &path, 0,
-				0);
+	ret = btrfs_search_slot(NULL, fs_info->tree_root, &key, &path, 0, 0);
 	btrfs_release_path(&path);
 	if (ret > 0) {
-		error("unable to convert ext2 image subvolume, is it deleted?");
-		goto fail;
+		error("unable to find ext2 image subvolume, is it deleted?");
+		ret = -ENOENT;
+		goto close_fs;
 	} else if (ret < 0) {
-		error("unable to open ext2_saved, id %llu: %s",
-			(unsigned long long)key.objectid, strerror(-ret));
-		goto fail;
+		error("failed to find ext2 image subvolume: %s",
+			strerror(-ret));
+		goto close_fs;
 	}
 
+	/* Search convert subvolume */
 	key.objectid = CONV_IMAGE_SUBVOL_OBJECTID;
 	key.type = BTRFS_ROOT_ITEM_KEY;
 	key.offset = (u64)-1;
-	image_root = btrfs_read_fs_root(root->fs_info, &key);
-	if (!image_root || IS_ERR(image_root)) {
-		error("unable to open subvolume %llu: %ld",
-			(unsigned long long)key.objectid, PTR_ERR(image_root));
-		goto fail;
+
+	image_root = btrfs_read_fs_root(fs_info, &key);
+	if (IS_ERR(image_root)) {
+		ret = PTR_ERR(image_root);
+		error("failed to open convert image subvolume: %s",
+			strerror(-ret));
+		goto close_fs;
 	}
 
-	name = "image";
-	root_dir = btrfs_root_dirid(&root->root_item);
-	dir = btrfs_lookup_dir_item(NULL, image_root, &path,
-				   root_dir, name, strlen(name), 0);
+	/* Search the image file */
+	root_dir = btrfs_root_dirid(&image_root->root_item);
+	dir = btrfs_lookup_dir_item(NULL, image_root, &path, root_dir,
+			image_name, strlen(image_name), 0);
 	if (!dir || IS_ERR(dir)) {
-		error("unable to find file %s: %ld", name, PTR_ERR(dir));
-		goto fail;
+		btrfs_release_path(&path);
+		if (dir)
+			ret = PTR_ERR(dir);
+		else
+			ret = -ENOENT;
+		error("failed to locate file %s: %s", image_name,
+			strerror(-ret));
+		goto close_fs;
 	}
-	leaf = path.nodes[0];
-	btrfs_dir_item_key_to_cpu(leaf, dir, &key);
+	btrfs_dir_item_key_to_cpu(path.nodes[0], dir, &key);
 	btrfs_release_path(&path);
 
-	objectid = key.objectid;
+	/* Get total size of the original image */
+	ino = key.objectid;
 
 	ret = btrfs_lookup_inode(NULL, image_root, &path, &key, 0);
-	if (ret) {
-		error("unable to find inode item: %d", ret);
-		goto fail;
-	}
-	leaf = path.nodes[0];
-	inode = btrfs_item_ptr(leaf, path.slots[0], struct btrfs_inode_item);
-	total_bytes = btrfs_inode_size(leaf, inode);
-	btrfs_release_path(&path);
-
-	key.objectid = objectid;
-	key.offset = 0;
-	key.type = BTRFS_EXTENT_DATA_KEY;
-	ret = btrfs_search_slot(NULL, image_root, &key, &path, 0, 0);
-	if (ret != 0) {
-		error("unable to find first file extent: %d", ret);
+	if (ret < 0) {
 		btrfs_release_path(&path);
-		goto fail;
-	}
-
-	/* build mapping tree for the relocated blocks */
-	for (offset = 0; offset < total_bytes; ) {
-		leaf = path.nodes[0];
-		if (path.slots[0] >= btrfs_header_nritems(leaf)) {
-			ret = btrfs_next_leaf(root, &path);
-			if (ret != 0)
-				break;	
-			continue;
-		}
-
-		btrfs_item_key_to_cpu(leaf, &key, path.slots[0]);
-		if (key.objectid != objectid || key.offset != offset ||
-		    key.type != BTRFS_EXTENT_DATA_KEY)
-			break;
-
-		fi = btrfs_item_ptr(leaf, path.slots[0],
-				    struct btrfs_file_extent_item);
-		if (btrfs_file_extent_type(leaf, fi) != BTRFS_FILE_EXTENT_REG)
-			break;
-		if (btrfs_file_extent_compression(leaf, fi) ||
-		    btrfs_file_extent_encryption(leaf, fi) ||
-		    btrfs_file_extent_other_encoding(leaf, fi))
-			break;
-
-		bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
-		/* skip holes and direct mapped extents */
-		if (bytenr == 0 || bytenr == offset)
-			goto next_extent;
-
-		bytenr += btrfs_file_extent_offset(leaf, fi);
-		num_bytes = btrfs_file_extent_num_bytes(leaf, fi);
-
-		cache1 = btrfs_lookup_block_group(root->fs_info, offset);
-		cache2 = btrfs_lookup_block_group(root->fs_info,
-						  offset + num_bytes - 1);
-		/*
-		 * Here we must take consideration of old and new convert
-		 * behavior.
-		 * For old convert case, sign, there is no consist chunk type
-		 * that will cover the extent. META/DATA/SYS are all possible.
-		 * Just ensure relocate one is in SYS chunk.
-		 * For new convert case, they are all covered by DATA chunk.
-		 *
-		 * So, there is not valid chunk type check for it now.
-		 */
-		if (cache1 != cache2)
-			break;
-
-		set_extent_bits(&io_tree, offset, offset + num_bytes - 1,
-				EXTENT_LOCKED);
-		set_state_private(&io_tree, offset, bytenr);
-next_extent:
-		offset += btrfs_file_extent_num_bytes(leaf, fi);
-		path.slots[0]++;
+		error("unable to find inode %llu: %s", ino, strerror(-ret));
+		goto close_fs;
 	}
+	inode_item = btrfs_item_ptr(path.nodes[0], path.slots[0],
+				    struct btrfs_inode_item);
+	total_bytes = btrfs_inode_size(path.nodes[0], inode_item);
 	btrfs_release_path(&path);
 
-	if (offset < total_bytes) {
-		error("unable to build extent mapping (offset %llu, total_bytes %llu)",
-				(unsigned long long)offset,
-				(unsigned long long)total_bytes);
-		error("converted filesystem after balance is unable to rollback");
-		goto fail;
-	}
-
-	first_free = BTRFS_SUPER_INFO_OFFSET + 2 * sectorsize - 1;
-	first_free &= ~((u64)sectorsize - 1);
-	/* backup for extent #0 should exist */
-	if(!test_range_bit(&io_tree, 0, first_free - 1, EXTENT_LOCKED, 1)) {
-		error("no backup for the first extent");
-		goto fail;
-	}
-	/* force no allocation from system block group */
-	root->fs_info->system_allocs = -1;
-	trans = btrfs_start_transaction(root, 1);
-	if (!trans) {
-		error("unable to start transaction");
-		goto fail;
-	}
-	/*
-	 * recow the whole chunk tree, this will remove all chunk tree blocks
-	 * from system block group
-	 */
-	chunk_root = root->fs_info->chunk_root;
-	memset(&key, 0, sizeof(key));
-	while (1) {
-		ret = btrfs_search_slot(trans, chunk_root, &key, &path, 0, 1);
-		if (ret < 0)
-			break;
-
-		ret = btrfs_next_leaf(chunk_root, &path);
-		if (ret)
-			break;
-
-		btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
-		btrfs_release_path(&path);
+	/* Check if we can rollback the image */
+	ret = check_convert_image(image_root, ino, total_bytes,
+				  reserved_ranges);
+	if (ret < 0) {
+		error("old fs image can't be rolled back");
+		goto close_fs;
 	}
+close_fs:
 	btrfs_release_path(&path);
+	close_ctree_fs_info(fs_info);
+	if (ret)
+		goto free_mem;
 
-	offset = 0;
-	num_bytes = 0;
-	while(1) {
-		cache1 = btrfs_lookup_block_group(root->fs_info, offset);
-		if (!cache1)
-			break;
-
-		if (cache1->flags & BTRFS_BLOCK_GROUP_SYSTEM)
-			num_bytes += btrfs_block_group_used(&cache1->item);
-
-		offset = cache1->key.objectid + cache1->key.offset;
-	}
-	/* only extent #0 left in system block group? */
-	if (num_bytes > first_free) {
-		error(
-	"unable to empty system block group (num_bytes %llu, first_free %llu",
-				(unsigned long long)num_bytes,
-				(unsigned long long)first_free);
-		goto fail;
-	}
-	/* create a system chunk that maps the whole device */
-	ret = prepare_system_chunk_sb(root->fs_info->super_copy);
-	if (ret) {
-		error("unable to update system chunk: %d", ret);
-		goto fail;
-	}
-
-	ret = btrfs_commit_transaction(trans, root);
-	if (ret) {
-		error("transaction commit failed: %d", ret);
-		goto fail;
-	}
-
-	ret = close_ctree(root);
-	if (ret) {
-		error("close_ctree failed: %d", ret);
-		goto fail;
-	}
-
-	/* zero btrfs super block mirrors */
-	memset(buf, 0, sectorsize);
-	for (i = 1 ; i < BTRFS_SUPER_MIRROR_MAX; i++) {
-		bytenr = btrfs_sb_offset(i);
-		if (bytenr >= total_bytes)
-			break;
-		ret = pwrite(fd, buf, sectorsize, bytenr);
-		if (ret != sectorsize) {
-			error("zeroing superblock mirror %d failed: %d",
-					i, ret);
-			goto fail;
-		}
-	}
-
-	sb_bytenr = (u64)-1;
-	/* copy all relocated blocks back */
-	while(1) {
-		ret = find_first_extent_bit(&io_tree, 0, &start, &end,
-					    EXTENT_LOCKED);
-		if (ret)
-			break;
-
-		ret = get_state_private(&io_tree, start, &bytenr);
-		BUG_ON(ret);
-
-		clear_extent_bits(&io_tree, start, end, EXTENT_LOCKED);
+ 	/*
+	 * Everything is OK, just write back old fs data into btrfs reserved
+	 * ranges
+	 *
+	 * Here, we starts from the backup blocks first, so if something goes
+	 * wrong, the fs is still mountable
+	 */
+	for (i = ARRAY_SIZE(btrfs_reserved_ranges) - 1; i >= 0; i--) {
+		u64 real_size;
+		const struct simple_range *range = &btrfs_reserved_ranges[i];
 
-		while (start <= end) {
-			if (start == BTRFS_SUPER_INFO_OFFSET) {
-				sb_bytenr = bytenr;
-				goto next_sector;
-			}
-			ret = pread(fd, buf, sectorsize, bytenr);
-			if (ret < 0) {
-				error("reading superblock at %llu failed: %d",
-						(unsigned long long)bytenr, ret);
-				goto fail;
-			}
-			BUG_ON(ret != sectorsize);
-			ret = pwrite(fd, buf, sectorsize, start);
-			if (ret < 0) {
-				error("writing superblock at %llu failed: %d",
-						(unsigned long long)start, ret);
-				goto fail;
-			}
-			BUG_ON(ret != sectorsize);
-next_sector:
-			start += sectorsize;
-			bytenr += sectorsize;
+		if (range_end(range) >= fsize)
+			continue;
+		real_size = min(range_end(range), fsize) - range->start;
+		ret = pwrite(fd, reserved_ranges[i], real_size, range->start);
+		if (ret < real_size) {
+			if (ret < 0)
+				ret = -errno;
+			else
+				ret = -EIO;
+			error("failed to recover range [%llu, %llu): %s",
+			      range->start, real_size, strerror(-ret));
+			goto free_mem;
 		}
+		ret = 0;
 	}
-
-	ret = fsync(fd);
-	if (ret < 0) {
-		error("fsync failed: %s", strerror(errno));
-		goto fail;
-	}
-	/*
-	 * finally, overwrite btrfs super block.
-	 */
-	ret = pread(fd, buf, sectorsize, sb_bytenr);
-	if (ret < 0) {
-		error("reading primary superblock failed: %s",
-				strerror(errno));
-		goto fail;
-	}
-	BUG_ON(ret != sectorsize);
-	ret = pwrite(fd, buf, sectorsize, BTRFS_SUPER_INFO_OFFSET);
-	if (ret < 0) {
-		error("writing primary superblock failed: %s",
-				strerror(errno));
-		goto fail;
-	}
-	BUG_ON(ret != sectorsize);
-	ret = fsync(fd);
-	if (ret < 0) {
-		error("fsync failed: %s", strerror(errno));
-		goto fail;
-	}
-
-	close(fd);
-	free(buf);
-	extent_io_tree_cleanup(&io_tree);
-	printf("rollback complete\n");
-	return 0;
-
-fail:
-	if (fd != -1)
-		close(fd);
-	free(buf);
-	error("rollback aborted");
-	return -1;
+free_mem:
+	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++)
+		free(reserved_ranges[i]);
+	if (ret)
+		error("rollback failed");
+	else
+		printf("rollback succeeded\n");
+	return ret;
 }
 
 static void print_usage(void)
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/7] convert: rollback rework
  2017-03-14  8:05 [PATCH v3 0/7] convert: rollback rework Qu Wenruo
                   ` (6 preceding siblings ...)
  2017-03-14  8:05 ` [PATCH v3 7/7] btrfs-progs: convert: Rework rollback Qu Wenruo
@ 2017-03-14 11:55 ` David Sterba
  2017-03-14 11:55   ` David Sterba
  7 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2017-03-14 11:55 UTC (permalink / raw
  To: Qu Wenruo; +Cc: linux-btrfs, dsterba

On Tue, Mar 14, 2017 at 04:05:02PM +0800, Qu Wenruo wrote:
> v2:
>   Abstract the original code to read out data in one btrfs file to
>   btrfs_read_file().
>   Use simple_range and btrfs_reserved_ranges[] to cleanup open code.
> v3:
>   Rebased to v4.10.
>   Squash modification in later commits to their previous owner.
>   Fix a converity report, which doesn't exit when an error is found in
>   check_convert_image()
>   Fix a lot of check scripts warning

I did not mention that explicitly under v2, but the series has been
merged to devel so incremental changes should be sent from now on. I
went through the patches almost line by line and fixed things here and
there. The patchset-to-patchset diff is not large (attached). If you
think there are still issues to fix, please send them on top of the
devel branch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/7] convert: rollback rework
  2017-03-14 11:55 ` [PATCH v3 0/7] convert: rollback rework David Sterba
@ 2017-03-14 11:55   ` David Sterba
  2017-03-15  0:35     ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2017-03-14 11:55 UTC (permalink / raw
  To: David Sterba; +Cc: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

On Tue, Mar 14, 2017 at 12:55:06PM +0100, David Sterba wrote:
> On Tue, Mar 14, 2017 at 04:05:02PM +0800, Qu Wenruo wrote:
> > v2:
> >   Abstract the original code to read out data in one btrfs file to
> >   btrfs_read_file().
> >   Use simple_range and btrfs_reserved_ranges[] to cleanup open code.
> > v3:
> >   Rebased to v4.10.
> >   Squash modification in later commits to their previous owner.
> >   Fix a converity report, which doesn't exit when an error is found in
> >   check_convert_image()
> >   Fix a lot of check scripts warning
> 
> I did not mention that explicitly under v2, but the series has been
> merged to devel so incremental changes should be sent from now on. I
> went through the patches almost line by line and fixed things here and
> there. The patchset-to-patchset diff is not large (attached).

Now attached.

[-- Attachment #2: devel-to-v3.diff --]
[-- Type: text/plain, Size: 18478 bytes --]

diff --git a/convert/common.h b/convert/common.h
index 28ca14617303..0d3adeaa96cc 100644
--- a/convert/common.h
+++ b/convert/common.h
@@ -26,33 +26,6 @@
 #include "common-defs.h"
 #include "extent-cache.h"
 
-/*
- * Presents one simple continuous range.
- *
- * For multiple or non-continuous ranges, use extent_cache_tree from
- * extent-cache.c
- */
-struct simple_range {
-	u64 start;
-	u64 len;
-};
-
-const static struct simple_range btrfs_reserved_ranges[] = {
-	{0, SZ_1M},
-	{BTRFS_SB_MIRROR_OFFSET(1), SZ_64K},
-	{BTRFS_SB_MIRROR_OFFSET(2), SZ_64K}
-};
-
-/*
- * Simple range functions
- *
- * Get range end (exclusive)
- */
-static inline u64 range_end(const struct simple_range *range)
-{
-	return (range->start + range->len);
-}
-
 struct btrfs_mkfs_config;
 
 struct btrfs_convert_context {
diff --git a/convert/main.c b/convert/main.c
index e81a1a29a755..73c9d889ac27 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -17,8 +17,10 @@
  */
 
 /*
- * btrfs convert design:
+ * Btrfs convert design:
+ *
  * The overall design of btrfs convert is like the following:
+ *
  * |<------------------Old fs----------------------------->|
  * |<- used ->| |<- used ->|                    |<- used ->|
  *                           ||
@@ -29,39 +31,43 @@
  *
  * ODC    = Old data chunk, btrfs chunks containing old fs data
  *          Mapped 1:1 (logical address == device offset)
- * Old-FE = file extents points to old fs.
+ * Old-FE = file extents pointing to old fs.
  *
  * So old fs used space is (mostly) kept as is, while btrfs will insert
- * its chunk(Data/Meta/Sys) into large enough free space.
- * In this way, we can create different profiles for meta/data for converted fs.
+ * its chunk (Data/Meta/Sys) into large enough free space.
+ * In this way, we can create different profiles for metadata/data for
+ * converted fs.
  *
- * The DIRTY work is, we must reserve and relocate 3 ranges for btrfs:
- * [0, 1M), [btrfs_sb_offset(1), +64K), [btrfs_sb_offset(2), +64K)
+ * We must reserve and relocate 3 ranges for btrfs:
+ * * [0, 1M)                    - area never used for any data except the first
+ *                                superblock
+ * * [btrfs_sb_offset(1), +64K) - 1st superblock backup copy
+ * * [btrfs_sb_offset(2), +64K) - 2nd, dtto
  *
- * Most work are spent handling corner cases around these reserved ranges.
+ * Most work is spent handling corner cases around these reserved ranges.
  *
  * Detailed workflow is:
  * 1)   Scan old fs used space and calculate data chunk layout
  * 1.1) Scan old fs
- *      We can a map of used space of old fs
+ *      We can a map used space of old fs
  *
- * 1.2) Calculate data chunk layout <<< Complex work
- *      New data chunks must meet 3 conditions using result from 1.1)
+ * 1.2) Calculate data chunk layout - this is the hard part
+ *      New data chunks must meet 3 conditions using result fomr 1.1
  *      a. Large enough to be a chunk
- *      b. Doesn't cover reserved ranges
+ *      b. Doesn't intersect reserved ranges
  *      c. Covers all the remaining old fs used space
  *
- *      XXX: This can be simplified if we don't need to handle backup supers
+ *      NOTE: This can be simplified if we don't need to handle backup supers
  *
- * 1.3) Calculate usable space for btrfs new chunks
- *      Btrfs chunk usable space must meet 3 conditions using result from 1.2)
+ * 1.3) Calculate usable space for new btrfs chunks
+ *      Btrfs chunk usable space must meet 3 conditions using result from 1.2
  *      a. Large enough to be a chunk
- *      b. Doesn't cover reserved ranges
- *      c. Doesn't cover any data chunks in 1.1)
+ *      b. Doesn't intersect reserved ranges
+ *      c. Doesn't cover any data chunks in 1.1
  *
- * 2)   Create basic btrfs
- *      Initial metadata and sys chunks are inserted in the first available
- *      space of result of 1.3)
+ * 2)   Create basic btrfs filesystem structure
+ *      Initial metadata and sys chunks are inserted in the first availabe
+ *      space found in step 1.3
  *      Then insert all data chunks into the basic btrfs
  *
  * 3)   Create convert image
@@ -70,7 +76,8 @@
  *      as reflink source to create old files
  *
  * 4)   Iterate old fs to create files
- *      We just reflink any offset in old fs to new file extents
+ *      We just reflink file extents from old fs to newly created files on
+ *      btrfs.
  */
 
 #include "kerncompat.h"
@@ -211,7 +218,7 @@ static int create_image_file_range(struct btrfs_trans_handle *trans,
 	 * migrate block will fail as there is already a file extent.
 	 */
 	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
-		const struct simple_range *reserved = &btrfs_reserved_ranges[i];
+		struct simple_range *reserved = &btrfs_reserved_ranges[i];
 
 		/*
 		 * |-- reserved --|
@@ -238,7 +245,7 @@ static int create_image_file_range(struct btrfs_trans_handle *trans,
 		}
 	}
 
-	/* Check if we are going to insert regular file extent or hole */
+	/* Check if we are going to insert regular file extent, or hole */
 	cache = search_cache_extent(used, bytenr);
 	if (cache) {
 		if (cache->start <= bytenr) {
@@ -246,7 +253,7 @@ static int create_image_file_range(struct btrfs_trans_handle *trans,
 			 * |///////Used///////|
 			 *	|<--insert--->|
 			 *	bytenr
-			 * Regular file extent
+			 * Insert one real file extent
 			 */
 			len = min_t(u64, len, cache->start + cache->size -
 				    bytenr);
@@ -256,7 +263,7 @@ static int create_image_file_range(struct btrfs_trans_handle *trans,
 			 *		|//Used//|
 			 *  |<-insert-->|
 			 *  bytenr
-			 * Hole
+			 *  Insert one hole
 			 */
 			len = min(len, cache->start - bytenr);
 			disk_bytenr = 0;
@@ -267,7 +274,7 @@ static int create_image_file_range(struct btrfs_trans_handle *trans,
 		 * |//Used//|		|EOF
 		 *	    |<-insert-->|
 		 *	    bytenr
-		 * Hole
+		 * Insert one hole
 		 */
 		disk_bytenr = 0;
 		datacsum = 0;
@@ -313,7 +320,7 @@ static int migrate_one_reserved_range(struct btrfs_trans_handle *trans,
 				      struct btrfs_root *root,
 				      struct cache_tree *used,
 				      struct btrfs_inode_item *inode, int fd,
-				      u64 ino, const struct simple_range *range,
+				      u64 ino, struct simple_range *range,
 				      u32 convert_flags)
 {
 	u64 cur_off = range->start;
@@ -416,7 +423,7 @@ static int migrate_reserved_ranges(struct btrfs_trans_handle *trans,
 	int ret = 0;
 
 	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
-		const struct simple_range *range = &btrfs_reserved_ranges[i];
+		struct simple_range *range = &btrfs_reserved_ranges[i];
 
 		if (range->start > total_bytes)
 			return ret;
@@ -425,6 +432,7 @@ static int migrate_reserved_ranges(struct btrfs_trans_handle *trans,
 		if (ret < 0)
 			return ret;
 	}
+
 	return ret;
 }
 
@@ -601,7 +609,7 @@ static int wipe_reserved_ranges(struct cache_tree *tree, u64 min_stripe_size,
 	int ret;
 
 	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
-		const struct simple_range *range = &btrfs_reserved_ranges[i];
+		struct simple_range *range = &btrfs_reserved_ranges[i];
 
 		ret = wipe_one_reserved_range(tree, range->start, range->len,
 					      min_stripe_size, ensure_size);
@@ -1352,9 +1360,8 @@ static int do_convert(const char *devname, u32 convert_flags, u32 nodesize,
 }
 
 /*
- * Read out data of convert image which is in btrfs reserved range.
- *
- * So rollback can just use these data to overwrite these ranges of btrfs
+ * Read out data of convert image which is in btrfs reserved ranges so we can
+ * use them to overwrite the ranges during rollback.
  */
 static int read_reserved_ranges(struct btrfs_root *root, u64 ino,
 				u64 total_bytes, char *reserved_ranges[])
@@ -1363,7 +1370,7 @@ static int read_reserved_ranges(struct btrfs_root *root, u64 ino,
 	int ret = 0;
 
 	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
-		const struct simple_range *range = &btrfs_reserved_ranges[i];
+		struct simple_range *range = &btrfs_reserved_ranges[i];
 
 		if (range->start + range->len >= total_bytes)
 			break;
@@ -1371,7 +1378,7 @@ static int read_reserved_ranges(struct btrfs_root *root, u64 ino,
 				      reserved_ranges[i]);
 		if (ret < range->len) {
 			error(
-	"failed to read out data of convert image, offset=%llu len=%llu ret=%d",
+	"failed to read data of convert image, offset=%llu len=%llu ret=%d",
 			      range->start, range->len, ret);
 			if (ret >= 0)
 				ret = -EIO;
@@ -1388,7 +1395,7 @@ static bool is_subset_of_reserved_ranges(u64 start, u64 len)
 	bool ret = false;
 
 	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
-		const struct simple_range *range = &btrfs_reserved_ranges[i];
+		struct simple_range *range = &btrfs_reserved_ranges[i];
 
 		if (start >= range->start && start + len <= range_end(range)) {
 			ret = true;
@@ -1427,8 +1434,8 @@ static bool is_chunk_direct_mapped(struct btrfs_fs_info *fs_info, u64 start)
 /*
  * Iterate all file extents of the convert image.
  *
- * All file extents except ones in btrfs_reserved_ranges[] must be mapped 1:1
- * on disk. (Means their file_offset must match their on disk bytenr)
+ * All file extents except ones in btrfs_reserved_ranges must be mapped 1:1
+ * on disk. (Means thier file_offset must match their on disk bytenr)
  *
  * File extents in reserved ranges can be relocated to other place, and in
  * that case we will read them out for later use.
@@ -1449,10 +1456,10 @@ static int check_convert_image(struct btrfs_root *image_root, u64 ino,
 	btrfs_init_path(&path);
 	ret = btrfs_search_slot(NULL, image_root, &key, &path, 0, 0);
 	/*
-	 * It's possible that some fs doesn't store any(including sb)
+	 * It's possible that some fs doesn't store any (including sb)
 	 * data into 0~1M range, and NO_HOLES is enabled.
 	 *
-	 * So only needs to check ret < 0 case
+	 * So we only need to check if ret < 0
 	 */
 	if (ret < 0) {
 		error("failed to iterate file extents at offset 0: %s",
@@ -1510,7 +1517,7 @@ static int check_convert_image(struct btrfs_root *image_root, u64 ino,
 			goto next;
 
 		/*
-		 * Most file extent must be 1:1 mapped, which means 2 things:
+		 * Most file extents must be 1:1 mapped, which means 2 things:
 		 * 1) File extent file offset == disk_bytenr
 		 * 2) That data chunk's logical == chunk's physical
 		 *
@@ -1522,8 +1529,8 @@ static int check_convert_image(struct btrfs_root *image_root, u64 ino,
 		if (file_offset != disk_bytenr ||
 		    !is_chunk_direct_mapped(fs_info, disk_bytenr)) {
 			/*
-			 * Only file extent in btrfs reserved ranges are allow
-			 * non-1:1 mapped
+			 * Only file extent in btrfs reserved ranges are
+			 * allowed to be non-1:1 mapped
 			 */
 			if (!is_subset_of_reserved_ranges(file_offset,
 							ram_bytes)) {
@@ -1549,13 +1556,13 @@ static int check_convert_image(struct btrfs_root *image_root, u64 ino,
 	 */
 	if (!ret && !btrfs_fs_incompat(fs_info, NO_HOLES)) {
 		if (checked_bytes != total_size) {
+			ret = -EINVAL;
 			error("inode %llu has some file extents not checked",
 				ino);
-			return -EINVAL;
 		}
 	}
 
-	/* So far so good, read out old data located in btrfs reserved ranges */
+	/* So far so good, read old data located in btrfs reserved ranges */
 	ret = read_reserved_ranges(image_root, ino, total_size,
 				   reserved_ranges);
 	return ret;
@@ -1578,7 +1585,7 @@ static int check_convert_image(struct btrfs_root *image_root, u64 ino,
  * |   RSV 1   |  | Old  |   |    RSV 2  | | Old  | |   RSV 3   |
  * |   0~1M    |  | Fs   |   | SB2 + 64K | | Fs   | | SB3 + 64K |
  *
- * On the other hande, the converted fs image in btrfs is a completely
+ * On the other hande, the converted fs image in btrfs is a completely 
  * valid old fs.
  *
  * |<-----------------Converted fs image in btrfs-------------------->|
@@ -1613,7 +1620,7 @@ static int do_rollback(const char *devname)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++) {
-		const struct simple_range *range = &btrfs_reserved_ranges[i];
+		struct simple_range *range = &btrfs_reserved_ranges[i];
 
 		reserved_ranges[i] = calloc(1, range->len);
 		if (!reserved_ranges[i]) {
@@ -1636,16 +1643,14 @@ static int do_rollback(const char *devname)
 	}
 	fs_info = root->fs_info;
 
-
 	/*
 	 * Search root backref first, or after subvolume deletion (orphan),
 	 * we can still rollback the image.
 	 */
-	btrfs_init_path(&path);
-
 	key.objectid = CONV_IMAGE_SUBVOL_OBJECTID;
 	key.type = BTRFS_ROOT_BACKREF_KEY;
 	key.offset = BTRFS_FS_TREE_OBJECTID;
+	btrfs_init_path(&path);
 	ret = btrfs_search_slot(NULL, fs_info->tree_root, &key, &path, 0, 0);
 	btrfs_release_path(&path);
 	if (ret > 0) {
@@ -1662,7 +1667,6 @@ static int do_rollback(const char *devname)
 	key.objectid = CONV_IMAGE_SUBVOL_OBJECTID;
 	key.type = BTRFS_ROOT_ITEM_KEY;
 	key.offset = (u64)-1;
-
 	image_root = btrfs_read_fs_root(fs_info, &key);
 	if (IS_ERR(image_root)) {
 		ret = PTR_ERR(image_root);
@@ -1675,6 +1679,7 @@ static int do_rollback(const char *devname)
 	root_dir = btrfs_root_dirid(&image_root->root_item);
 	dir = btrfs_lookup_dir_item(NULL, image_root, &path, root_dir,
 			image_name, strlen(image_name), 0);
+
 	if (!dir || IS_ERR(dir)) {
 		btrfs_release_path(&path);
 		if (dir)
@@ -1692,6 +1697,7 @@ static int do_rollback(const char *devname)
 	ino = key.objectid;
 
 	ret = btrfs_lookup_inode(NULL, image_root, &path, &key, 0);
+
 	if (ret < 0) {
 		btrfs_release_path(&path);
 		error("unable to find inode %llu: %s", ino, strerror(-ret));
@@ -1703,8 +1709,7 @@ static int do_rollback(const char *devname)
 	btrfs_release_path(&path);
 
 	/* Check if we can rollback the image */
-	ret = check_convert_image(image_root, ino, total_bytes,
-				  reserved_ranges);
+	ret = check_convert_image(image_root, ino, total_bytes, reserved_ranges);
 	if (ret < 0) {
 		error("old fs image can't be rolled back");
 		goto close_fs;
@@ -1715,19 +1720,21 @@ static int do_rollback(const char *devname)
 	if (ret)
 		goto free_mem;
 
- 	/*
+	/*
 	 * Everything is OK, just write back old fs data into btrfs reserved
 	 * ranges
 	 *
 	 * Here, we starts from the backup blocks first, so if something goes
 	 * wrong, the fs is still mountable
 	 */
+
 	for (i = ARRAY_SIZE(btrfs_reserved_ranges) - 1; i >= 0; i--) {
 		u64 real_size;
-		const struct simple_range *range = &btrfs_reserved_ranges[i];
+		struct simple_range *range = &btrfs_reserved_ranges[i];
 
 		if (range_end(range) >= fsize)
 			continue;
+
 		real_size = min(range_end(range), fsize) - range->start;
 		ret = pwrite(fd, reserved_ranges[i], real_size, range->start);
 		if (ret < real_size) {
@@ -1741,6 +1748,7 @@ static int do_rollback(const char *devname)
 		}
 		ret = 0;
 	}
+
 free_mem:
 	for (i = 0; i < ARRAY_SIZE(btrfs_reserved_ranges); i++)
 		free(reserved_ranges[i]);
diff --git a/convert/source-fs.c b/convert/source-fs.c
index 9268639d5d7e..7cf515b0325d 100644
--- a/convert/source-fs.c
+++ b/convert/source-fs.c
@@ -22,6 +22,12 @@
 #include "convert/common.h"
 #include "convert/source-fs.h"
 
+struct simple_range btrfs_reserved_ranges[3] = {
+	{ 0,			     SZ_1M },
+	{ BTRFS_SB_MIRROR_OFFSET(1), SZ_64K },
+	{ BTRFS_SB_MIRROR_OFFSET(2), SZ_64K }
+};
+
 static int intersect_with_sb(u64 bytenr, u64 num_bytes)
 {
 	int i;
@@ -181,9 +187,6 @@ int read_disk_extent(struct btrfs_root *root, u64 bytenr,
  * The special point is, old disk_block can point to a reserved range.
  * So here, we don't use disk_block directly but search convert_root
  * to get the real disk_bytenr.
- *
- * TODO: Better extract this function to btrfs_reflink(), in fact we are just
- * reflinking from convert_image of convert_root.
  */
 int record_file_blocks(struct blk_iterate_data *data,
 			      u64 file_block, u64 disk_block, u64 num_blocks)
diff --git a/convert/source-fs.h b/convert/source-fs.h
index f3f96d07b9b8..9f611150e504 100644
--- a/convert/source-fs.h
+++ b/convert/source-fs.h
@@ -21,6 +21,19 @@
 
 #define CONV_IMAGE_SUBVOL_OBJECTID BTRFS_FIRST_FREE_OBJECTID
 
+/*
+ * Reresents a simple contiguous range.
+ *
+ * For multiple or non-contiguous ranges, use extent_cache_tree from
+ * extent-cache.c
+ */
+struct simple_range {
+	u64 start;
+	u64 len;
+};
+
+extern struct simple_range btrfs_reserved_ranges[3];
+
 struct task_info;
 
 struct task_ctx {
@@ -89,4 +102,14 @@ int read_disk_extent(struct btrfs_root *root, u64 bytenr,
 int record_file_blocks(struct blk_iterate_data *data,
 			      u64 file_block, u64 disk_block, u64 num_blocks);
 
+/*
+ * Simple range functions
+ *
+ * Get range end (exclusive)
+ */
+static inline u64 range_end(struct simple_range *range)
+{
+	return (range->start + range->len);
+}
+
 #endif
diff --git a/ctree.h b/ctree.h
index 63d6d04704e9..91d55557f91c 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2778,4 +2778,5 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
 		     u64 ino, u64 offset, u64 len);
 int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
 		    char *dest);
+
 #endif
diff --git a/file.c b/file.c
index 06c1e2dde49e..bf31cceffd97 100644
--- a/file.c
+++ b/file.c
@@ -173,12 +173,12 @@ int btrfs_punch_hole(struct btrfs_trans_handle *trans,
  * @dest:  where data will be stored
  *
  * NOTE:
- * 1) compression data is not supported yet.
+ * 1) compression data is not supported yet
  * 2) @start and @len must be aligned to sectorsize
  * 3) data read out is also aligned to sectorsize, not truncated to inode size
  *
  * Return < 0 for fatal error during read.
- * Otherwise return the number of suceeful read out data.
+ * Otherwise return the number of succesfully read data in bytes.
  */
 int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
 		    char *dest)
@@ -210,8 +210,7 @@ int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
 		goto out;
 
 	if (ret > 0) {
-		ret = btrfs_previous_item(root, &path, ino,
-					  BTRFS_EXTENT_DATA_KEY);
+		ret = btrfs_previous_item(root, &path, ino, BTRFS_EXTENT_DATA_KEY);
 		if (ret > 0) {
 			ret = -ENOENT;
 			goto out;
@@ -284,8 +283,8 @@ int btrfs_read_file(struct btrfs_root *root, u64 ino, u64 start, int len,
 		disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi) +
 			      btrfs_file_extent_offset(leaf, fi);
 		read_len_ret = read_len;
-		ret = read_extent_data(root, dest + read_start - start,
-					disk_bytenr, &read_len_ret, 0);
+		ret = read_extent_data(root, dest + read_start - start, disk_bytenr,
+				       &read_len_ret, 0);
 		if (ret < 0)
 			break;
 		/* Short read, something went wrong */

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/7] convert: rollback rework
  2017-03-14 11:55   ` David Sterba
@ 2017-03-15  0:35     ` Qu Wenruo
  2017-03-27 17:55       ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2017-03-15  0:35 UTC (permalink / raw
  To: dsterba, linux-btrfs



At 03/14/2017 07:55 PM, David Sterba wrote:
> On Tue, Mar 14, 2017 at 12:55:06PM +0100, David Sterba wrote:
>> On Tue, Mar 14, 2017 at 04:05:02PM +0800, Qu Wenruo wrote:
>>> v2:
>>>   Abstract the original code to read out data in one btrfs file to
>>>   btrfs_read_file().
>>>   Use simple_range and btrfs_reserved_ranges[] to cleanup open code.
>>> v3:
>>>   Rebased to v4.10.
>>>   Squash modification in later commits to their previous owner.
>>>   Fix a converity report, which doesn't exit when an error is found in
>>>   check_convert_image()
>>>   Fix a lot of check scripts warning
>>
>> I did not mention that explicitly under v2, but the series has been
>> merged to devel so incremental changes should be sent from now on.

Sorry, but IIRC it's recommended to base the code to latest 
stable/release branch, so I rebase them all to v4.10 code base.
(Last time you said basing code upon devel is quite a bad behavior and 
makes merging painful)

And yes, it's quite a pain to rebase them. I totally understand the pain 
during the v4.9.1->v4.10 rebase.

>> I
>> went through the patches almost line by line and fixed things here and
>> there. The patchset-to-patchset diff is not large (attached).

Sorry for the trouble and thanks for the effort.

But I'm afraid it may still happen time by time.
As you already see, we have several pending big patchset for btrfs-progs.
(Lowmem mode repair, offline scrub, and maybe some other things)

So to save your time, could you info us before merging large patchset 
and provide a base commit for us to rebase for you?

I think this would save time for both.

Thanks,
Qu
>
> Now attached.
>
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/7] convert: rollback rework
  2017-03-15  0:35     ` Qu Wenruo
@ 2017-03-27 17:55       ` David Sterba
  0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2017-03-27 17:55 UTC (permalink / raw
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Wed, Mar 15, 2017 at 08:35:06AM +0800, Qu Wenruo wrote:
> At 03/14/2017 07:55 PM, David Sterba wrote:
> > On Tue, Mar 14, 2017 at 12:55:06PM +0100, David Sterba wrote:
> >> On Tue, Mar 14, 2017 at 04:05:02PM +0800, Qu Wenruo wrote:
> >>> v2:
> >>>   Abstract the original code to read out data in one btrfs file to
> >>>   btrfs_read_file().
> >>>   Use simple_range and btrfs_reserved_ranges[] to cleanup open code.
> >>> v3:
> >>>   Rebased to v4.10.
> >>>   Squash modification in later commits to their previous owner.
> >>>   Fix a converity report, which doesn't exit when an error is found in
> >>>   check_convert_image()
> >>>   Fix a lot of check scripts warning
> >>
> >> I did not mention that explicitly under v2, but the series has been
> >> merged to devel so incremental changes should be sent from now on.
> 
> Sorry, but IIRC it's recommended to base the code to latest 
> stable/release branch, so I rebase them all to v4.10 code base.

This applies to unmerged patchsets. Once it's in devel I do not intend
to remove it (unless there's something serious).

> (Last time you said basing code upon devel is quite a bad behavior and 
> makes merging painful)

This depends if your patchset builds on top of any changes in devel or
not. If not, any base branch should do (master, or devel). Moving a
patchset based on devel to a new rebased devel head is something I do
regularly. I do occasinally want to rebase devel, so the patch history
stays clean and we don't pile up fixups on fixups. What is not a good
practice is basing patchset on a 'next' branch of any sort. This usually
serves as a preview and any patch can disappear between two next
snapshots.

> And yes, it's quite a pain to rebase them. I totally understand the pain 
> during the v4.9.1->v4.10 rebase.
> 
> >> I
> >> went through the patches almost line by line and fixed things here and
> >> there. The patchset-to-patchset diff is not large (attached).
> 
> Sorry for the trouble and thanks for the effort.
> 
> But I'm afraid it may still happen time by time.

Of course, that's expected.

> As you already see, we have several pending big patchset for btrfs-progs.
> (Lowmem mode repair, offline scrub, and maybe some other things)
> 
> So to save your time, could you info us before merging large patchset 

If only I knew that myself :) but I'll try.

> and provide a base commit for us to rebase for you?

For big or intrusive patchsets it should be better to start on top of
master, as it's never rebased. Sometimes rebasing on top of devel
succeeds with minor conflicts, that's more like a bonus for the next
cycle if the patchset does not get merged.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-03-27 17:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-14  8:05 [PATCH v3 0/7] convert: rollback rework Qu Wenruo
2017-03-14  8:05 ` [PATCH v3 1/7] btrfs-progs: convert: Add comment for the overall convert workflow Qu Wenruo
2017-03-14  8:05 ` [PATCH v3 2/7] btrfs-progs: convert: Introduce simple range structure for convert reserved ranges Qu Wenruo
2017-03-14  8:05 ` [PATCH v3 3/7] btrfs-progs: convert: Use reserved ranges array to cleanup open code Qu Wenruo
2017-03-14  8:05 ` [PATCH v3 4/7] btrfs-progs: file: Introduce function to read out file content Qu Wenruo
2017-03-14  8:05 ` [PATCH v3 5/7] btrfs-progs: convert: Introduce function to read out btrfs reserved range Qu Wenruo
2017-03-14  8:05 ` [PATCH v3 6/7] btrfs-progs: convert: Introduce function to check if convert image is able to be rolled back Qu Wenruo
2017-03-14  8:05 ` [PATCH v3 7/7] btrfs-progs: convert: Rework rollback Qu Wenruo
2017-03-14 11:55 ` [PATCH v3 0/7] convert: rollback rework David Sterba
2017-03-14 11:55   ` David Sterba
2017-03-15  0:35     ` Qu Wenruo
2017-03-27 17:55       ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.