All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory
@ 2015-06-04 11:18 Robbie Ko
  2015-06-04 11:18 ` [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path Robbie Ko
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Robbie Ko @ 2015-06-04 11:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

Patch for fix btrfs send receive. These patches base on v4.1-rc6-49-g8a7deb3
plus following patches.
[PATCH] Btrfs: incremental send, don't delay directory renames unnecessarily
[PATCH] Btrfs: incremental send, check if orphanized dir inode needs delayed rename

Thanks.

Robbie Ko (5):
  Btrfs: incremental send, avoid circular waiting and descendant
    overwrite ancestor need to update path
  Btrfs: incremental send, avoid ancestor rename to descendant
  Btrfs: incremental send, fix orphan_dir_info not completely cleared
  Btrfs: incremental send, fix rmdir but dir have a unprocess item
  Btrfs: incremental send, fix rmdir not send utimes

 fs/btrfs/send.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 149 insertions(+), 14 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path
  2015-06-04 11:18 [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Robbie Ko
@ 2015-06-04 11:18 ` Robbie Ko
  2015-06-04 13:50   ` Filipe David Manana
  2015-06-04 11:18 ` [PATCH 2/5] Btrfs: incremental send, avoid ancestor rename to descendant Robbie Ko
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Robbie Ko @ 2015-06-04 11:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

Base on [PATCH] Btrfs: incremental send, check if orphanized dir inode needs delayed rename

Example1:
There's one case where we can't issue a rename operation for a directory
immediately when we process it. 

Parent snapshot:
|---- d/ (ino 257)
  |---- p1 (ino 258)
|---- p1/ (ino 259)

Send snapshot:
|---- d/ (ino 257)
  |---- p1 (ino 259)
    |---- p1/ (ino 258)

Here we can not rename 258 from d/p1 to p1/p1 without the rename of inode 259.
p1 258 is put into wait_parent_move. 259 can't be rename to d/p1, so it is put into
circular waiting happens. This is fix by rename destination directory and set 
it as orphanized for this case.

Example2:
There's one case where we can't issue a rename operation for a directory
immediately we process it.
After moving 262 outside, path of 265 is stored in the name_cache_entry.
When 263 try to overwrite 265, its ancestor, 265 is moved to orphanized. Path of 263
is still the original path, however. This causes error.

Parent snapshot:
|---- a/ (ino 259)
  |---- c (ino 266)
|---- d/ (ino 260)
  |---- ance (ino 265)
    |---- e (ino 261)
    |---- f (ino 262)
    |---- ance (ino 263)

Send snapshot:
|---- a/ (ino 259)
|---- c/ (ino 266)
  |---- ance (ino 265)
|---- d/ (ino 260)
  |---- ance (ino 263)
|---- f/ (ino 262)
  |---- e (ino 261)

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/send.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 1c1f161..fbfbb8b 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -230,7 +230,6 @@ struct pending_dir_move {
 	u64 parent_ino;
 	u64 ino;
 	u64 gen;
-	bool is_orphan;
 	struct list_head update_refs;
 };
 
@@ -1840,7 +1839,7 @@ static int will_overwrite_ref(struct send_ctx *sctx, u64 dir, u64 dir_gen,
 	 * was already unlinked/moved, so we can safely assume that we will not
 	 * overwrite anything at this point in time.
 	 */
-	if (other_inode > sctx->send_progress) {
+	if (other_inode > sctx->send_progress || is_waiting_for_move(sctx, other_inode)) {
 		ret = get_inode_info(sctx->parent_root, other_inode, NULL,
 				who_gen, NULL, NULL, NULL, NULL);
 		if (ret < 0)
@@ -3014,7 +3013,6 @@ static int add_pending_dir_move(struct send_ctx *sctx,
 	pm->parent_ino = parent_ino;
 	pm->ino = ino;
 	pm->gen = ino_gen;
-	pm->is_orphan = is_orphan;
 	INIT_LIST_HEAD(&pm->list);
 	INIT_LIST_HEAD(&pm->update_refs);
 	RB_CLEAR_NODE(&pm->node);
@@ -3091,6 +3089,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
 	struct waiting_dir_move *dm = NULL;
 	u64 rmdir_ino = 0;
 	int ret;
+	bool is_orphan;
 
 	name = fs_path_alloc();
 	from_path = fs_path_alloc();
@@ -3102,9 +3101,10 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
 	dm = get_waiting_dir_move(sctx, pm->ino);
 	ASSERT(dm);
 	rmdir_ino = dm->rmdir_ino;
+	is_orphan = dm->orphanized;
 	free_waiting_dir_move(sctx, dm);
 
-	if (pm->is_orphan) {
+	if (is_orphan) {
 		ret = gen_unique_name(sctx, pm->ino,
 				      pm->gen, from_path);
 	} else {
@@ -3292,6 +3292,7 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
 	u64 left_gen;
 	u64 right_gen;
 	int ret = 0;
+	struct waiting_dir_move *wdm;
 
 	if (RB_EMPTY_ROOT(&sctx->waiting_dir_moves))
 		return 0;
@@ -3350,7 +3351,8 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
 		goto out;
 	}
 
-	if (is_waiting_for_move(sctx, di_key.objectid)) {
+	wdm = get_waiting_dir_move(sctx, di_key.objectid);
+	if (wdm && !wdm->orphanized) {
 		ret = add_pending_dir_move(sctx,
 					   sctx->cur_ino,
 					   sctx->cur_inode_gen,
@@ -3610,11 +3612,33 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
 				goto out;
 			if (ret) {
 				struct name_cache_entry *nce;
+				struct waiting_dir_move *wdm;
+				bool cur_is_ancestor = false;
+
+				/*
+				 * check is dset path is ancestor src path
+				 * if yes, need to update cur_ino path
+				 */
+				if (strncmp(cur->full_path->start, valid_path->start, fs_path_len(cur->full_path)) == 0 &&
+							fs_path_len(valid_path) > fs_path_len(cur->full_path) && valid_path->start[fs_path_len(cur->full_path)] == '/') {
+					cur_is_ancestor = true;
+				}
 
 				ret = orphanize_inode(sctx, ow_inode, ow_gen,
 						cur->full_path);
 				if (ret < 0)
 					goto out;
+
+				/*
+				 * check is waiting dir, if yes change the ino
+				 * to orphanized in the waiting tree.
+				 */
+				if (is_waiting_for_move(sctx, ow_inode)) {
+					wdm = get_waiting_dir_move(sctx, ow_inode);
+					ASSERT(wdm);
+					wdm->orphanized = true;
+				}
+
 				/*
 				 * Make sure we clear our orphanized inode's
 				 * name from the name cache. This is because the
@@ -3630,6 +3654,17 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
 					name_cache_delete(sctx, nce);
 					kfree(nce);
 				}
+
+				/*
+				 * if ow_inode is ancestor cur_ino, need to update
+				 * update cur_ino path.
+				 */
+				if (cur_is_ancestor) {
+					fs_path_reset(valid_path);
+					ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, valid_path);
+					if (ret < 0)
+						goto out;
+				}
 			} else {
 				ret = send_unlink(sctx, cur->full_path);
 				if (ret < 0)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] Btrfs: incremental send, avoid ancestor rename to descendant
  2015-06-04 11:18 [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Robbie Ko
  2015-06-04 11:18 ` [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path Robbie Ko
@ 2015-06-04 11:18 ` Robbie Ko
  2015-06-04 15:43   ` Filipe David Manana
  2015-06-04 11:18 ` [PATCH 3/5] Btrfs: incremental send, fix orphan_dir_info not completely cleared Robbie Ko
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Robbie Ko @ 2015-06-04 11:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

Base on [PATCH] Btrfs: incremental send, don't delay directory renames unnecessarily

There's one case where we can't issue a rename operation for a directory
as soon as we process it. We move a directory from ancestor to descendant,
if not wait_parent_move or wait_for_dest_dir_move so that happen path loop.

Example:

Parent snapshot:
|---- @tmp/ (ino 257)
|---- pre/ (ino 259)
  |---- wait_dir (ino 260)
    |---- finish_dir2/ (ino 261)
|---- ance/ (ino 263)
  |---- finish_dir1/ (ino 258)
|---- desc/ (ino 262)
|---- other_dir/ (ino 264)

Send snapshot:
|---- @tmp/ (ino 257)
  |---- other_dir/ (ino 264)
    |---- wait_dir/ (ino 260)
      |---- finish_dir2/ (ino 261)
        |---- desc/ (ino 262)
          |---- ance/ (ino 263)
            |---- finish_dir1/ (ino 258)
              |---- pre/ (ino 259)
Here we can not rename 263 from ance to ance/finish_dir1/pre/wait_dir/finish_dir2/desc/ance
without the rename of inode 260, so that happen path loop.
So fix this by delaying directory renames for this case.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/send.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index fbfbb8b..596b9dc 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3078,6 +3078,56 @@ static struct pending_dir_move *get_pending_dir_moves(struct send_ctx *sctx,
 	return NULL;
 }
 
+static int path_loop(struct send_ctx *sctx, struct fs_path *name,
+		     u64 ino, u64 gen, u64 *ancestor_ino)
+{
+	int ret = 0;
+	u64 parent_inode = 0;
+	u64 parent_gen = 0;
+	u64 start_ino = ino;
+
+	*ancestor_ino = 0;
+	while (ino != BTRFS_FIRST_FREE_OBJECTID) {
+		struct waiting_dir_move *wdm;
+		fs_path_reset(name);
+
+		if (is_waiting_for_rm(sctx, ino))
+			break;
+
+		wdm = get_waiting_dir_move(sctx, ino);
+		if (wdm) {
+			if (*ancestor_ino == 0)
+				*ancestor_ino = ino;
+			if (wdm->orphanized) {
+				ret = gen_unique_name(sctx, ino, gen, name);
+				break;
+			} else {
+				ret = get_first_ref(sctx->parent_root, ino,
+									&parent_inode, &parent_gen, name);
+			}
+		} else {
+			ret = __get_cur_name_and_parent(sctx, ino, gen,
+							&parent_inode,
+							&parent_gen, name);
+			if (ret > 0) {
+				ret = 0;
+				break;
+			}
+		}
+		if (ret < 0)
+			break;
+		if (parent_inode == start_ino) {
+			ret = 1;
+			if (*ancestor_ino == 0)
+				*ancestor_ino = ino;
+			break;
+		}
+		ino = parent_inode;
+		gen = parent_gen;
+	}
+	return ret;
+}
+
 static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
 {
 	struct fs_path *from_path = NULL;
@@ -3089,6 +3139,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
 	struct waiting_dir_move *dm = NULL;
 	u64 rmdir_ino = 0;
 	int ret;
+	u64 ancestor = 0;
 	bool is_orphan;
 
 	name = fs_path_alloc();
@@ -3122,6 +3173,22 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
 		goto out;
 
 	sctx->send_progress = sctx->cur_ino + 1;
+	ret = path_loop(sctx, name, pm->ino, pm->gen, &ancestor);
+	if (ret) {
+		LIST_HEAD(deleted_refs);
+		ASSERT(ancestor > BTRFS_FIRST_FREE_OBJECTID);
+		ret = add_pending_dir_move(sctx, pm->ino, pm->gen, ancestor,
+									&pm->update_refs, &deleted_refs,
+									is_orphan);
+		if (ret < 0)
+			goto out;
+		if (rmdir_ino) {
+			dm = get_waiting_dir_move(sctx, pm->ino);
+			ASSERT(dm);
+			dm->rmdir_ino = rmdir_ino;
+		}
+		goto out;
+	}
 	fs_path_reset(name);
 	to_path = name;
 	name = NULL;
@@ -3693,6 +3760,34 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
 		}
 
 		/*
+		 * if cur_ino is cur ancestor, can't move now,
+		 * find descendant who is waiting, waiting it.
+		 */
+		if (can_rename && (strncmp(valid_path->start, cur->full_path->start, fs_path_len(valid_path)) == 0) &&
+							fs_path_len(cur->full_path) > fs_path_len(valid_path) && cur->full_path->start[fs_path_len(valid_path)] == '/') {
+			struct fs_path *name = NULL;
+			u64 ancestor;
+			u64 old_send_progress = sctx->send_progress;
+
+			name = fs_path_alloc();
+			sctx->send_progress = sctx->cur_ino + 1;
+			ret = path_loop(sctx, name, sctx->cur_ino, sctx->cur_inode_gen, &ancestor);
+			if (ret) {
+				ret = add_pending_dir_move(sctx, sctx->cur_ino, sctx->cur_inode_gen,
+							ancestor, &sctx->new_refs, &sctx->deleted_refs, is_orphan);
+				if (ret < 0) {
+					sctx->send_progress = old_send_progress;
+					fs_path_free(name);
+					goto out;
+				}
+				can_rename = false;
+				*pending_move = 1;
+			}
+			sctx->send_progress = old_send_progress;
+			fs_path_free(name);
+		}
+
+		/*
 		 * link/move the ref to the new place. If we have an orphan
 		 * inode, move it and update valid_path. If not, link or move
 		 * it depending on the inode mode.
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] Btrfs: incremental send, fix orphan_dir_info not completely cleared
  2015-06-04 11:18 [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Robbie Ko
  2015-06-04 11:18 ` [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path Robbie Ko
  2015-06-04 11:18 ` [PATCH 2/5] Btrfs: incremental send, avoid ancestor rename to descendant Robbie Ko
@ 2015-06-04 11:18 ` Robbie Ko
  2015-06-04 16:24   ` Filipe David Manana
  2015-06-05 13:58   ` David Sterba
  2015-06-04 11:18 ` [PATCH 4/5] Btrfs: incremental send, fix rmdir but dir have a unprocess item Robbie Ko
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Robbie Ko @ 2015-06-04 11:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

There's one case where we not clear orphan_dir_info issue.

Example:

Parent snapshot:
|---- a/ (ino 279)
  |---- c (ino 282)
|---- del/ (ino 281)
  |---- tmp/ (ino 280)
  |---- long/ (ino 283)
  |---- longlong/ (ino 284)

Send snapshot:
|---- a/ (ino 279)
  |---- long (ino 283)
  |---- longlong (ino 284)
|---- c/ (ino 282)
  |---- tmp/ (ino 280)

Here we process 281 use can_rmdir check, but 280 is waiting, so create orphan_dir_info
and when 282 is move to dest, so 280 can move to c/tmp, and now run can_rmdir check again.
Return is false, because 283 and 284 is unprocess, but now not release orphan_dir_info.
When 283 and 284 is processd, 281 be delete, but not delete orphan_dir_info.
So fix this by release orphan_dir_info for this case.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/send.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 596b9dc..ff9d052 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2785,12 +2785,6 @@ add_orphan_dir_info(struct send_ctx *sctx, u64 dir_ino)
 	struct rb_node *parent = NULL;
 	struct orphan_dir_info *entry, *odi;
 
-	odi = kmalloc(sizeof(*odi), GFP_NOFS);
-	if (!odi)
-		return ERR_PTR(-ENOMEM);
-	odi->ino = dir_ino;
-	odi->gen = 0;
-
 	while (*p) {
 		parent = *p;
 		entry = rb_entry(parent, struct orphan_dir_info, node);
@@ -2799,11 +2793,16 @@ add_orphan_dir_info(struct send_ctx *sctx, u64 dir_ino)
 		} else if (dir_ino > entry->ino) {
 			p = &(*p)->rb_right;
 		} else {
-			kfree(odi);
 			return entry;
 		}
 	}
 
+	odi = kmalloc(sizeof(*odi), GFP_NOFS);
+	if (!odi)
+		return ERR_PTR(-ENOMEM);
+	odi->ino = dir_ino;
+	odi->gen = 0;
+
 	rb_link_node(&odi->node, parent, p);
 	rb_insert_color(&odi->node, &sctx->orphan_dirs);
 	return odi;
@@ -2913,6 +2912,12 @@ static int can_rmdir(struct send_ctx *sctx, u64 dir, u64 dir_gen,
 		}
 
 		if (loc.objectid > send_progress) {
+			struct orphan_dir_info *odi;
+
+			odi = get_orphan_dir_info(sctx, dir);
+			if (odi) {
+				free_orphan_dir_info(sctx, odi);
+			}
 			ret = 0;
 			goto out;
 		}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] Btrfs: incremental send, fix rmdir but dir have a unprocess item
  2015-06-04 11:18 [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Robbie Ko
                   ` (2 preceding siblings ...)
  2015-06-04 11:18 ` [PATCH 3/5] Btrfs: incremental send, fix orphan_dir_info not completely cleared Robbie Ko
@ 2015-06-04 11:18 ` Robbie Ko
  2015-06-04 16:40   ` Filipe David Manana
  2015-06-04 11:18 ` [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes Robbie Ko
  2015-06-04 13:04 ` [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Filipe David Manana
  5 siblings, 1 reply; 25+ messages in thread
From: Robbie Ko @ 2015-06-04 11:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

There's one case where we can't rmdir issue.

Example:

Parent snapshot:
|---- a/ (ino 279)
  |---- c (ino 282)
|---- del/ (ino 281)
  |---- tmp/ (ino 280)
  |---- long/ (ino 283)

Send snapshot:
|---- a/ (ino 279)
  |---- long (ino 283)
|---- c/ (ino 282)
  |---- tmp/ (ino 280)

Here we process 281 use can_rmdir check, but 280 is waiting, so create orphan_dir_info
and when 282 is move to dest, so 280 can move to c/tmp, and now run can_rmdir check again.
Return is true, because sctx->cur_ino is 282 , and call can_rmdir(, sctx->cur_ino + 1)
so 283 is equal or lesser than (sctx->cur_ino + 1), not anything unprocess.
So fix this rmdir for this case.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/send.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index ff9d052..e8eb3ab 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3213,7 +3213,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
 			/* already deleted */
 			goto finish;
 		}
-		ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino + 1);
+		ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino);
 		if (ret < 0)
 			goto out;
 		if (!ret)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-04 11:18 [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Robbie Ko
                   ` (3 preceding siblings ...)
  2015-06-04 11:18 ` [PATCH 4/5] Btrfs: incremental send, fix rmdir but dir have a unprocess item Robbie Ko
@ 2015-06-04 11:18 ` Robbie Ko
  2015-06-04 16:14   ` Filipe David Manana
  2015-06-04 13:04 ` [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Filipe David Manana
  5 siblings, 1 reply; 25+ messages in thread
From: Robbie Ko @ 2015-06-04 11:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

There's one case where we can't issue a utimes operation for a directory.
When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
processed and update uime between two parent directory.
So fix this by not update non exist utimes for this case.

Example:

Parent snapshot:
|---- a/ (ino 259)
  |---- c (ino 264)
|---- b/ (ino 260)
  |---- d (ino 265)
|---- del/ (ino 263)
  |---- item1/ (ino 261)
  |---- item2/ (ino 262)

Send snapshot:
|---- a/ (ino 259)
|---- b/ (ino 260)
|---- c/ (ino 2)
  |---- item2 (ino 259)
|---- d/ (ino 257)
  |---- item1/ (ino 258)

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/send.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index e8eb3ab..46f954c 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
 	key.type = BTRFS_INODE_ITEM_KEY;
 	key.offset = 0;
 	ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
-	if (ret < 0)
+	if (ret != 0)
 		goto out;
 
 	eb = path->nodes[0];
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory
  2015-06-04 11:18 [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Robbie Ko
                   ` (4 preceding siblings ...)
  2015-06-04 11:18 ` [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes Robbie Ko
@ 2015-06-04 13:04 ` Filipe David Manana
  5 siblings, 0 replies; 25+ messages in thread
From: Filipe David Manana @ 2015-06-04 13:04 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
> Patch for fix btrfs send receive. These patches base on v4.1-rc6-49-g8a7deb3
> plus following patches.
> [PATCH] Btrfs: incremental send, don't delay directory renames unnecessarily
> [PATCH] Btrfs: incremental send, check if orphanized dir inode needs delayed rename
>
> Thanks.
>
> Robbie Ko (5):
>   Btrfs: incremental send, avoid circular waiting and descendant
>     overwrite ancestor need to update path
>   Btrfs: incremental send, avoid ancestor rename to descendant
>   Btrfs: incremental send, fix orphan_dir_info not completely cleared
>   Btrfs: incremental send, fix rmdir but dir have a unprocess item
>   Btrfs: incremental send, fix rmdir not send utimes
>
>  fs/btrfs/send.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 149 insertions(+), 14 deletions(-)

Thanks for doing this, a quick look over all the patches and they seem
ok, just some minor comments later.

Would you be willing to submit test cases for xfstests that cover all
these cases?
We don't want to get regressions in the future, and this particular
part of send is complex and messy, being easy to break it for some use
cases without noticing it, and xfstests [1] is the test suite
developers (and some QA people) use to validate their changes and
verify they aren't introducing regressions.

[1] https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/log/ and
mailing list: fstests@vger.kernel.org

>
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path
  2015-06-04 11:18 ` [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path Robbie Ko
@ 2015-06-04 13:50   ` Filipe David Manana
  2015-06-04 19:19     ` Filipe David Manana
  0 siblings, 1 reply; 25+ messages in thread
From: Filipe David Manana @ 2015-06-04 13:50 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
> Base on [PATCH] Btrfs: incremental send, check if orphanized dir inode needs delayed rename
>
> Example1:
> There's one case where we can't issue a rename operation for a directory
> immediately when we process it.
>
> Parent snapshot:
> |---- d/ (ino 257)
>   |---- p1 (ino 258)
> |---- p1/ (ino 259)
>
> Send snapshot:
> |---- d/ (ino 257)
>   |---- p1 (ino 259)
>     |---- p1/ (ino 258)
>
> Here we can not rename 258 from d/p1 to p1/p1 without the rename of inode 259.
> p1 258 is put into wait_parent_move. 259 can't be rename to d/p1, so it is put into
> circular waiting happens.

"... into circular waiting happens" -> so 259's rename is delayed to
happen after 258's rename, which creates a circular dependency (258 ->
259 -> 258).

> This is fix by rename destination directory and set
> it as orphanized for this case.
>
> Example2:
> There's one case where we can't issue a rename operation for a directory
> immediately we process it.
> After moving 262 outside, path of 265 is stored in the name_cache_entry.
> When 263 try to overwrite 265, its ancestor, 265 is moved to orphanized. Path of 263
> is still the original path, however. This causes error.

For the sake of a more complete/informative change log, can you
mention what's the error?

>
> Parent snapshot:
> |---- a/ (ino 259)
>   |---- c (ino 266)
> |---- d/ (ino 260)
>   |---- ance (ino 265)
>     |---- e (ino 261)
>     |---- f (ino 262)
>     |---- ance (ino 263)
>
> Send snapshot:
> |---- a/ (ino 259)
> |---- c/ (ino 266)
>   |---- ance (ino 265)
> |---- d/ (ino 260)
>   |---- ance (ino 263)
> |---- f/ (ino 262)
>   |---- e (ino 261)
>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/send.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 40 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 1c1f161..fbfbb8b 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -230,7 +230,6 @@ struct pending_dir_move {
>         u64 parent_ino;
>         u64 ino;
>         u64 gen;
> -       bool is_orphan;
>         struct list_head update_refs;
>  };
>
> @@ -1840,7 +1839,7 @@ static int will_overwrite_ref(struct send_ctx *sctx, u64 dir, u64 dir_gen,
>          * was already unlinked/moved, so we can safely assume that we will not
>          * overwrite anything at this point in time.
>          */
> -       if (other_inode > sctx->send_progress) {
> +       if (other_inode > sctx->send_progress || is_waiting_for_move(sctx, other_inode)) {
>                 ret = get_inode_info(sctx->parent_root, other_inode, NULL,
>                                 who_gen, NULL, NULL, NULL, NULL);
>                 if (ret < 0)
> @@ -3014,7 +3013,6 @@ static int add_pending_dir_move(struct send_ctx *sctx,
>         pm->parent_ino = parent_ino;
>         pm->ino = ino;
>         pm->gen = ino_gen;
> -       pm->is_orphan = is_orphan;
>         INIT_LIST_HEAD(&pm->list);
>         INIT_LIST_HEAD(&pm->update_refs);
>         RB_CLEAR_NODE(&pm->node);
> @@ -3091,6 +3089,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>         struct waiting_dir_move *dm = NULL;
>         u64 rmdir_ino = 0;
>         int ret;
> +       bool is_orphan;
>
>         name = fs_path_alloc();
>         from_path = fs_path_alloc();
> @@ -3102,9 +3101,10 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>         dm = get_waiting_dir_move(sctx, pm->ino);
>         ASSERT(dm);
>         rmdir_ino = dm->rmdir_ino;
> +       is_orphan = dm->orphanized;
>         free_waiting_dir_move(sctx, dm);
>
> -       if (pm->is_orphan) {
> +       if (is_orphan) {
>                 ret = gen_unique_name(sctx, pm->ino,
>                                       pm->gen, from_path);
>         } else {
> @@ -3292,6 +3292,7 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
>         u64 left_gen;
>         u64 right_gen;
>         int ret = 0;
> +       struct waiting_dir_move *wdm;
>
>         if (RB_EMPTY_ROOT(&sctx->waiting_dir_moves))
>                 return 0;
> @@ -3350,7 +3351,8 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
>                 goto out;
>         }
>
> -       if (is_waiting_for_move(sctx, di_key.objectid)) {
> +       wdm = get_waiting_dir_move(sctx, di_key.objectid);
> +       if (wdm && !wdm->orphanized) {
>                 ret = add_pending_dir_move(sctx,
>                                            sctx->cur_ino,
>                                            sctx->cur_inode_gen,
> @@ -3610,11 +3612,33 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>                                 goto out;
>                         if (ret) {
>                                 struct name_cache_entry *nce;
> +                               struct waiting_dir_move *wdm;
> +                               bool cur_is_ancestor = false;
> +
> +                               /*
> +                                * check is dset path is ancestor src path
> +                                * if yes, need to update cur_ino path
> +                                */

Typos/confusing comment and doesn't explain why the following check is
being done.

> +                               if (strncmp(cur->full_path->start, valid_path->start, fs_path_len(cur->full_path)) == 0 &&
> +                                                       fs_path_len(valid_path) > fs_path_len(cur->full_path) && valid_path->start[fs_path_len(cur->full_path)] == '/') {

At a first glance it seems confusing why we are comparing substrings
of an entire path instead of just the old and new names for the
current and the conflicting (ow_inode) inodes and their parent inode
numbers and generation. I think the comment should explain why.

Also please try to keep lines up to 80 characters (that line is 169
characters long).
You can run ./scripts/checkpatch.pl to validate your patch files and
warn you if the code doesn't comply to the coding standard.

> +                                       cur_is_ancestor = true;
> +                               }
>
>                                 ret = orphanize_inode(sctx, ow_inode, ow_gen,
>                                                 cur->full_path);
>                                 if (ret < 0)
>                                         goto out;
> +
> +                               /*
> +                                * check is waiting dir, if yes change the ino
> +                                * to orphanized in the waiting tree.
> +                                */
> +                               if (is_waiting_for_move(sctx, ow_inode)) {
> +                                       wdm = get_waiting_dir_move(sctx, ow_inode);
> +                                       ASSERT(wdm);
> +                                       wdm->orphanized = true;
> +                               }
> +
>                                 /*
>                                  * Make sure we clear our orphanized inode's
>                                  * name from the name cache. This is because the
> @@ -3630,6 +3654,17 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>                                         name_cache_delete(sctx, nce);
>                                         kfree(nce);
>                                 }
> +
> +                               /*
> +                                * if ow_inode is ancestor cur_ino, need to update
> +                                * update cur_ino path.
> +                                */

"If ow_inode is an ancestor of cur_ino in the send snapshot, update
valid_path because ow_inode was orphanized and valid_path contains its
pre-orphanization name, which is not valid anymore".

> +                               if (cur_is_ancestor) {
> +                                       fs_path_reset(valid_path);
> +                                       ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, valid_path);
> +                                       if (ret < 0)
> +                                               goto out;
> +                               }
>                         } else {
>                                 ret = send_unlink(sctx, cur->full_path);
>                                 if (ret < 0)
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] Btrfs: incremental send, avoid ancestor rename to descendant
  2015-06-04 11:18 ` [PATCH 2/5] Btrfs: incremental send, avoid ancestor rename to descendant Robbie Ko
@ 2015-06-04 15:43   ` Filipe David Manana
  2015-06-05 11:18     ` Robbie Ko
  0 siblings, 1 reply; 25+ messages in thread
From: Filipe David Manana @ 2015-06-04 15:43 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
> Base on [PATCH] Btrfs: incremental send, don't delay directory renames unnecessarily
>
> There's one case where we can't issue a rename operation for a directory

Should be "There's one more case where..." - we already have at least
a handful of cases where we can't rename a directory as soon as we
finish processing it :)

> as soon as we process it. We move a directory from ancestor to descendant,
> if not wait_parent_move or wait_for_dest_dir_move so that happen path loop.

Can you elaborate here? I think this doesn't explain what's the
problem and why it happens.

What does "move a directory from ancestor to descendant" means? Does
it mean renaming it from the name in the parent snapshot to its name
in the send snapshot? Or something else? Ancestor of which inode and
descendant of which inode?

Mentioning "if not wait_parent_mobe or wait_for_dest_dir" isn't
helpful either, it doesn't tells us what cases those functions cover
and which case they are missing, nor does it explain why we get into a
path building loop problem (I assume that's what you mean with "so
that happen path loop").

>
> Example:
>
> Parent snapshot:
> |---- @tmp/ (ino 257)
> |---- pre/ (ino 259)
>   |---- wait_dir (ino 260)
>     |---- finish_dir2/ (ino 261)
> |---- ance/ (ino 263)
>   |---- finish_dir1/ (ino 258)
> |---- desc/ (ino 262)
> |---- other_dir/ (ino 264)
>
> Send snapshot:
> |---- @tmp/ (ino 257)
>   |---- other_dir/ (ino 264)
>     |---- wait_dir/ (ino 260)
>       |---- finish_dir2/ (ino 261)
>         |---- desc/ (ino 262)
>           |---- ance/ (ino 263)
>             |---- finish_dir1/ (ino 258)
>               |---- pre/ (ino 259)
> Here we can not rename 263 from ance to ance/finish_dir1/pre/wait_dir/finish_dir2/desc/ance
> without the rename of inode 260, so that happen path loop.
> So fix this by delaying directory renames for this case.

By delaying directory renames for which directories? And why exactly?

You should explain why the path loop happens and why a rename of inode
263 from "ance" to
"ance/finish_dir1/pre/wait_dir/finish_dir2/desc/ance" is attempted,
what is the bug in the logic of an incremental send that causes such
rename to be performed.

I haven't tried it here, but it seems weird you get 2 problems here:
the path building loop and an incorrect rename operation - for all the
previously solved bugs we had either one or the other happening, but
not both. Unless the "path loop" issue you mention is not the infinite
path build loop that leads to an -ENOMEM error when the path we're
building exceeds PATH_MAX, but instead some other new problem.

The commit message should really explain what causes the problem and
the fix in detail.

>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/send.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 95 insertions(+)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index fbfbb8b..596b9dc 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -3078,6 +3078,56 @@ static struct pending_dir_move *get_pending_dir_moves(struct send_ctx *sctx,
>         return NULL;
>  }
>
> +static int path_loop(struct send_ctx *sctx, struct fs_path *name,
> +                    u64 ino, u64 gen, u64 *ancestor_ino)
> +{
> +       int ret = 0;
> +       u64 parent_inode = 0;
> +       u64 parent_gen = 0;
> +       u64 start_ino = ino;
> +
> +       *ancestor_ino = 0;
> +       while (ino != BTRFS_FIRST_FREE_OBJECTID) {
> +               struct waiting_dir_move *wdm;
> +               fs_path_reset(name);
> +
> +               if (is_waiting_for_rm(sctx, ino))
> +                       break;
> +
> +               wdm = get_waiting_dir_move(sctx, ino);
> +               if (wdm) {
> +                       if (*ancestor_ino == 0)
> +                               *ancestor_ino = ino;
> +                       if (wdm->orphanized) {
> +                               ret = gen_unique_name(sctx, ino, gen, name);
> +                               break;
> +                       } else {
> +                               ret = get_first_ref(sctx->parent_root, ino,
> +                                                                       &parent_inode, &parent_gen, name);
> +                       }
> +               } else {
> +                       ret = __get_cur_name_and_parent(sctx, ino, gen,
> +                                                       &parent_inode,
> +                                                       &parent_gen, name);
> +                       if (ret > 0) {
> +                               ret = 0;
> +                               break;
> +                       }
> +               }
> +               if (ret < 0)
> +                       break;
> +               if (parent_inode == start_ino) {
> +                       ret = 1;
> +                       if (*ancestor_ino == 0)
> +                               *ancestor_ino = ino;
> +                       break;
> +               }
> +               ino = parent_inode;
> +               gen = parent_gen;
> +       }
> +       return ret;
> +}
> +
>  static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>  {
>         struct fs_path *from_path = NULL;
> @@ -3089,6 +3139,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>         struct waiting_dir_move *dm = NULL;
>         u64 rmdir_ino = 0;
>         int ret;
> +       u64 ancestor = 0;
>         bool is_orphan;
>
>         name = fs_path_alloc();
> @@ -3122,6 +3173,22 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>                 goto out;
>
>         sctx->send_progress = sctx->cur_ino + 1;
> +       ret = path_loop(sctx, name, pm->ino, pm->gen, &ancestor);
> +       if (ret) {
> +               LIST_HEAD(deleted_refs);
> +               ASSERT(ancestor > BTRFS_FIRST_FREE_OBJECTID);
> +               ret = add_pending_dir_move(sctx, pm->ino, pm->gen, ancestor,
> +                                                                       &pm->update_refs, &deleted_refs,
> +                                                                       is_orphan);
> +               if (ret < 0)
> +                       goto out;
> +               if (rmdir_ino) {
> +                       dm = get_waiting_dir_move(sctx, pm->ino);
> +                       ASSERT(dm);
> +                       dm->rmdir_ino = rmdir_ino;
> +               }
> +               goto out;
> +       }

So far you're basically reverting this change:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5f806c3ae2ff6263a10a6901f97abb74dac03d36

That should be a separate 'revert' patch in the series.


>         fs_path_reset(name);
>         to_path = name;
>         name = NULL;
> @@ -3693,6 +3760,34 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>                 }
>
>                 /*
> +                * if cur_ino is cur ancestor, can't move now,
> +                * find descendant who is waiting, waiting it.
> +                */

This comment is confusing too. cur_ino is ancestor of itself, or
ancestor of which inode? Plus the code below is looking for an
ancestor while the comment mentions finding a descendant.

> +               if (can_rename && (strncmp(valid_path->start, cur->full_path->start, fs_path_len(valid_path)) == 0) &&
> +                                                       fs_path_len(cur->full_path) > fs_path_len(valid_path) && cur->full_path->start[fs_path_len(valid_path)] == '/') {

Same comment as in the first patch, too long line.
Also given that this check (second condition) is being repeated in 2
different places, it should be encapsulated in a helper function.

> +                       struct fs_path *name = NULL;
> +                       u64 ancestor;
> +                       u64 old_send_progress = sctx->send_progress;
> +
> +                       name = fs_path_alloc();

Allocation can fail, it can return NULL, need to return -ENOMEM in such case.

thanks

> +                       sctx->send_progress = sctx->cur_ino + 1;
> +                       ret = path_loop(sctx, name, sctx->cur_ino, sctx->cur_inode_gen, &ancestor);
> +                       if (ret) {
> +                               ret = add_pending_dir_move(sctx, sctx->cur_ino, sctx->cur_inode_gen,
> +                                                       ancestor, &sctx->new_refs, &sctx->deleted_refs, is_orphan);
> +                               if (ret < 0) {
> +                                       sctx->send_progress = old_send_progress;
> +                                       fs_path_free(name);
> +                                       goto out;
> +                               }
> +                               can_rename = false;
> +                               *pending_move = 1;
> +                       }
> +                       sctx->send_progress = old_send_progress;
> +                       fs_path_free(name);
> +               }
> +
> +               /*
>                  * link/move the ref to the new place. If we have an orphan
>                  * inode, move it and update valid_path. If not, link or move
>                  * it depending on the inode mode.
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-04 11:18 ` [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes Robbie Ko
@ 2015-06-04 16:14   ` Filipe David Manana
  2015-06-08  3:44     ` Robbie Ko
  0 siblings, 1 reply; 25+ messages in thread
From: Filipe David Manana @ 2015-06-04 16:14 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
> There's one case where we can't issue a utimes operation for a directory.
> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
> processed and update uime between two parent directory.
> So fix this by not update non exist utimes for this case.

So you mean that we attempt to update utimes for an inode,
corresponding to a directory, that exists in the parent snapshot but
not in the send snapshot.

So the subject should be something like "Btrfs: incremental send,
don't send utimes for non-existing directory" instead of "Btrfs:
incremental send, fix rmdir not send utimes"

>
> Example:
>
> Parent snapshot:
> |---- a/ (ino 259)
>   |---- c (ino 264)
> |---- b/ (ino 260)
>   |---- d (ino 265)
> |---- del/ (ino 263)
>   |---- item1/ (ino 261)
>   |---- item2/ (ino 262)
>
> Send snapshot:
> |---- a/ (ino 259)
> |---- b/ (ino 260)
> |---- c/ (ino 2)
>   |---- item2 (ino 259)
> |---- d/ (ino 257)
>   |---- item1/ (ino 258)
>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/send.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index e8eb3ab..46f954c 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>         key.type = BTRFS_INODE_ITEM_KEY;
>         key.offset = 0;
>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
> -       if (ret < 0)
> +       if (ret != 0)
>                 goto out;

So I don't think this is a good fix. The problem is in some code that
calls this function (send_utimes) against the directory that doesn't
exist - it just shouldn't do that, its logic should be fixed.
Following this approach, while it works, it's just hiding logic errors
in one or more code paths, and none of its callers checks for a return
value of 1 - they only react to values < 0 and introduces the
possibility of propagating a return value of 1 to user space.

thanks

>
>         eb = path->nodes[0];
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] Btrfs: incremental send, fix orphan_dir_info not completely cleared
  2015-06-04 11:18 ` [PATCH 3/5] Btrfs: incremental send, fix orphan_dir_info not completely cleared Robbie Ko
@ 2015-06-04 16:24   ` Filipe David Manana
  2015-06-05 13:58   ` David Sterba
  1 sibling, 0 replies; 25+ messages in thread
From: Filipe David Manana @ 2015-06-04 16:24 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
> There's one case where we not clear orphan_dir_info issue.

You mean where we leak a orphan_dir_info structure.

>
> Example:
>
> Parent snapshot:
> |---- a/ (ino 279)
>   |---- c (ino 282)
> |---- del/ (ino 281)
>   |---- tmp/ (ino 280)
>   |---- long/ (ino 283)
>   |---- longlong/ (ino 284)
>
> Send snapshot:
> |---- a/ (ino 279)
>   |---- long (ino 283)
>   |---- longlong (ino 284)
> |---- c/ (ino 282)
>   |---- tmp/ (ino 280)
>
> Here we process 281 use can_rmdir check, but 280 is waiting, so create orphan_dir_info
> and when 282 is move to dest, so 280 can move to c/tmp, and now run can_rmdir check again.
> Return is false, because 283 and 284 is unprocess, but now not release orphan_dir_info.
> When 283 and 284 is processd, 281 be delete, but not delete orphan_dir_info.
> So fix this by release orphan_dir_info for this case.

Could be described more generically as freeing an existing
orphan_dir_info for a directory, when we realize we can't rmdir the
directory because it has a descendant that wasn't yet processed, and
the orphan_dir_info was created because it had a descendant that had
its rename operation delayed.

>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/send.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 596b9dc..ff9d052 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -2785,12 +2785,6 @@ add_orphan_dir_info(struct send_ctx *sctx, u64 dir_ino)
>         struct rb_node *parent = NULL;
>         struct orphan_dir_info *entry, *odi;
>
> -       odi = kmalloc(sizeof(*odi), GFP_NOFS);
> -       if (!odi)
> -               return ERR_PTR(-ENOMEM);
> -       odi->ino = dir_ino;
> -       odi->gen = 0;
> -
>         while (*p) {
>                 parent = *p;
>                 entry = rb_entry(parent, struct orphan_dir_info, node);
> @@ -2799,11 +2793,16 @@ add_orphan_dir_info(struct send_ctx *sctx, u64 dir_ino)
>                 } else if (dir_ino > entry->ino) {
>                         p = &(*p)->rb_right;
>                 } else {
> -                       kfree(odi);
>                         return entry;
>                 }
>         }
>
> +       odi = kmalloc(sizeof(*odi), GFP_NOFS);
> +       if (!odi)
> +               return ERR_PTR(-ENOMEM);
> +       odi->ino = dir_ino;
> +       odi->gen = 0;
> +

All the above changes don't fix the issue described in this change -
the memory leak - they just avoid the overhead of allocating an
orphan_dir_info object unnecessarily.

The change is ok, but should be a separate patch in the series that
does only that.

>         rb_link_node(&odi->node, parent, p);
>         rb_insert_color(&odi->node, &sctx->orphan_dirs);
>         return odi;
> @@ -2913,6 +2912,12 @@ static int can_rmdir(struct send_ctx *sctx, u64 dir, u64 dir_gen,
>                 }
>
>                 if (loc.objectid > send_progress) {
> +                       struct orphan_dir_info *odi;
> +
> +                       odi = get_orphan_dir_info(sctx, dir);
> +                       if (odi) {
> +                               free_orphan_dir_info(sctx, odi);
> +                       }

Looks correct, great catch.

Thanks.

>                         ret = 0;
>                         goto out;
>                 }
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] Btrfs: incremental send, fix rmdir but dir have a unprocess item
  2015-06-04 11:18 ` [PATCH 4/5] Btrfs: incremental send, fix rmdir but dir have a unprocess item Robbie Ko
@ 2015-06-04 16:40   ` Filipe David Manana
  0 siblings, 0 replies; 25+ messages in thread
From: Filipe David Manana @ 2015-06-04 16:40 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
> There's one case where we can't rmdir issue.

"There's one case where we attempt to rmdir a directory prematurely."

>
> Example:
>
> Parent snapshot:
> |---- a/ (ino 279)
>   |---- c (ino 282)
> |---- del/ (ino 281)
>   |---- tmp/ (ino 280)
>   |---- long/ (ino 283)
>
> Send snapshot:
> |---- a/ (ino 279)
>   |---- long (ino 283)
> |---- c/ (ino 282)
>   |---- tmp/ (ino 280)
>
> Here we process 281 use can_rmdir check, but 280 is waiting, so create orphan_dir_info
> and when 282 is move to dest, so 280 can move to c/tmp, and now run can_rmdir check again.

> Return is true, because sctx->cur_ino is 282 , and call can_rmdir(, sctx->cur_ino + 1)
> so 283 is equal or lesser than (sctx->cur_ino + 1), not anything unprocess.

We pass 283 (sctx->cur_ino + 1) as the send_progress to the
can_rmdir() function and that makes it return true when it shouldn't,
because the inode 283 wasn't processed yet and it's still a child of
the directory with inode number 281, which makes the receiver run into
an ENOTEMPTY error when attempting to remove the directory.

> So fix this rmdir for this case.
>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/send.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index ff9d052..e8eb3ab 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -3213,7 +3213,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>                         /* already deleted */
>                         goto finish;
>                 }
> -               ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino + 1);
> +               ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino);

Looks good, great catch.

Thanks.

>                 if (ret < 0)
>                         goto out;
>                 if (!ret)
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path
  2015-06-04 13:50   ` Filipe David Manana
@ 2015-06-04 19:19     ` Filipe David Manana
  2015-06-05  3:55       ` Robbie Ko
  0 siblings, 1 reply; 25+ messages in thread
From: Filipe David Manana @ 2015-06-04 19:19 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Thu, Jun 4, 2015 at 2:50 PM, Filipe David Manana <fdmanana@gmail.com> wrote:
> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>> Base on [PATCH] Btrfs: incremental send, check if orphanized dir inode needs delayed rename
>>
>> Example1:
>> There's one case where we can't issue a rename operation for a directory
>> immediately when we process it.
>>
>> Parent snapshot:
>> |---- d/ (ino 257)
>>   |---- p1 (ino 258)
>> |---- p1/ (ino 259)
>>
>> Send snapshot:
>> |---- d/ (ino 257)
>>   |---- p1 (ino 259)
>>     |---- p1/ (ino 258)
>>
>> Here we can not rename 258 from d/p1 to p1/p1 without the rename of inode 259.
>> p1 258 is put into wait_parent_move. 259 can't be rename to d/p1, so it is put into
>> circular waiting happens.
>
> "... into circular waiting happens" -> so 259's rename is delayed to
> happen after 258's rename, which creates a circular dependency (258 ->
> 259 -> 258).
>
>> This is fix by rename destination directory and set
>> it as orphanized for this case.
>>
>> Example2:
>> There's one case where we can't issue a rename operation for a directory
>> immediately we process it.
>> After moving 262 outside, path of 265 is stored in the name_cache_entry.
>> When 263 try to overwrite 265, its ancestor, 265 is moved to orphanized. Path of 263
>> is still the original path, however. This causes error.
>
> For the sake of a more complete/informative change log, can you
> mention what's the error?
>
>>
>> Parent snapshot:
>> |---- a/ (ino 259)
>>   |---- c (ino 266)
>> |---- d/ (ino 260)
>>   |---- ance (ino 265)
>>     |---- e (ino 261)
>>     |---- f (ino 262)
>>     |---- ance (ino 263)
>>
>> Send snapshot:
>> |---- a/ (ino 259)
>> |---- c/ (ino 266)
>>   |---- ance (ino 265)
>> |---- d/ (ino 260)
>>   |---- ance (ino 263)
>> |---- f/ (ino 262)
>>   |---- e (ino 261)
>>
>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>>  fs/btrfs/send.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 40 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>> index 1c1f161..fbfbb8b 100644
>> --- a/fs/btrfs/send.c
>> +++ b/fs/btrfs/send.c
>> @@ -230,7 +230,6 @@ struct pending_dir_move {
>>         u64 parent_ino;
>>         u64 ino;
>>         u64 gen;
>> -       bool is_orphan;
>>         struct list_head update_refs;
>>  };
>>
>> @@ -1840,7 +1839,7 @@ static int will_overwrite_ref(struct send_ctx *sctx, u64 dir, u64 dir_gen,
>>          * was already unlinked/moved, so we can safely assume that we will not
>>          * overwrite anything at this point in time.
>>          */
>> -       if (other_inode > sctx->send_progress) {
>> +       if (other_inode > sctx->send_progress || is_waiting_for_move(sctx, other_inode)) {
>>                 ret = get_inode_info(sctx->parent_root, other_inode, NULL,
>>                                 who_gen, NULL, NULL, NULL, NULL);
>>                 if (ret < 0)
>> @@ -3014,7 +3013,6 @@ static int add_pending_dir_move(struct send_ctx *sctx,
>>         pm->parent_ino = parent_ino;
>>         pm->ino = ino;
>>         pm->gen = ino_gen;
>> -       pm->is_orphan = is_orphan;
>>         INIT_LIST_HEAD(&pm->list);
>>         INIT_LIST_HEAD(&pm->update_refs);
>>         RB_CLEAR_NODE(&pm->node);
>> @@ -3091,6 +3089,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>>         struct waiting_dir_move *dm = NULL;
>>         u64 rmdir_ino = 0;
>>         int ret;
>> +       bool is_orphan;
>>
>>         name = fs_path_alloc();
>>         from_path = fs_path_alloc();
>> @@ -3102,9 +3101,10 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>>         dm = get_waiting_dir_move(sctx, pm->ino);
>>         ASSERT(dm);
>>         rmdir_ino = dm->rmdir_ino;
>> +       is_orphan = dm->orphanized;
>>         free_waiting_dir_move(sctx, dm);
>>
>> -       if (pm->is_orphan) {
>> +       if (is_orphan) {
>>                 ret = gen_unique_name(sctx, pm->ino,
>>                                       pm->gen, from_path);
>>         } else {
>> @@ -3292,6 +3292,7 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
>>         u64 left_gen;
>>         u64 right_gen;
>>         int ret = 0;
>> +       struct waiting_dir_move *wdm;
>>
>>         if (RB_EMPTY_ROOT(&sctx->waiting_dir_moves))
>>                 return 0;
>> @@ -3350,7 +3351,8 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
>>                 goto out;
>>         }
>>
>> -       if (is_waiting_for_move(sctx, di_key.objectid)) {
>> +       wdm = get_waiting_dir_move(sctx, di_key.objectid);
>> +       if (wdm && !wdm->orphanized) {
>>                 ret = add_pending_dir_move(sctx,
>>                                            sctx->cur_ino,
>>                                            sctx->cur_inode_gen,
>> @@ -3610,11 +3612,33 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>>                                 goto out;
>>                         if (ret) {
>>                                 struct name_cache_entry *nce;
>> +                               struct waiting_dir_move *wdm;
>> +                               bool cur_is_ancestor = false;
>> +
>> +                               /*
>> +                                * check is dset path is ancestor src path
>> +                                * if yes, need to update cur_ino path
>> +                                */
>
> Typos/confusing comment and doesn't explain why the following check is
> being done.
>
>> +                               if (strncmp(cur->full_path->start, valid_path->start, fs_path_len(cur->full_path)) == 0 &&
>> +                                                       fs_path_len(valid_path) > fs_path_len(cur->full_path) && valid_path->start[fs_path_len(cur->full_path)] == '/') {
>
> At a first glance it seems confusing why we are comparing substrings
> of an entire path instead of just the old and new names for the
> current and the conflicting (ow_inode) inodes and their parent inode
> numbers and generation. I think the comment should explain why.

So you can determine if ow_inode is an ancestor of cur_ino in the
parent snapshot using is_ancestor(), which is less error prone than
comparing partial path strings and consistent with the existing code
base, as it takes into account inode and generation numbers for each
path component.

E.g. the following patch on top of your patch makes at least the 2nd
scenario in the commit message work as well (I've fixed long lines and
the comment as well, also pasted at
https://friendpaste.com/KEIIBZ8H1F1t3trdjv0bH). What do you think?

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 3c38879..d909892 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3629,16 +3629,7 @@ verbose_printk("btrfs: process_recorded_refs
%llu\n", sctx->cur_ino);
                        if (ret) {
                                struct name_cache_entry *nce;
                                struct waiting_dir_move *wdm;
-                               bool cur_is_ancestor = false;
-
-                               /*
-                                * check is dset path is ancestor src path
-                                * if yes, need to update cur_ino path
-                                */
-                               if (strncmp(cur->full_path->start,
valid_path->start, fs_path_len(cur->full_path)) == 0 &&
-
fs_path_len(valid_path) > fs_path_len(cur->full_path) &&
valid_path->start[fs_path_len(cur->full_path)] == '/') {
-                                       cur_is_ancestor = true;
-                               }
+                               struct fs_path *tmp_path;

                                ret = orphanize_inode(sctx, ow_inode, ow_gen,
                                                cur->full_path);
@@ -3672,12 +3663,27 @@ verbose_printk("btrfs: process_recorded_refs
%llu\n", sctx->cur_ino);
                                }

                                /*
-                                * if ow_inode is ancestor cur_ino,
need to update
-                                * update cur_ino path.
+                                * If ow_inode is an ancestor of cur_ino in the
+                                * parent root, compute valid_path again because
+                                * it contains the pre-orphanization name of
+                                * ow_inode, which is no longer valid.
                                 */
-                               if (cur_is_ancestor) {
+                               tmp_path = fs_path_alloc();
+                               if (!tmp_path) {
+                                       ret = -ENOMEM;
+                                       goto out;
+                               }
+                               ret = is_ancestor(sctx->parent_root,
+                                                 ow_inode, ow_gen,
+                                                 sctx->cur_ino, tmp_path);
+                               fs_path_free(tmp_path);
+                               if (ret < 0)
+                                       goto out;
+                               if (ret == 1) {
                                        fs_path_reset(valid_path);
-                                       ret = get_cur_path(sctx,
sctx->cur_ino, sctx->cur_inode_gen, valid_path);
+                                       ret = get_cur_path(sctx, sctx->cur_ino,
+                                                          sctx->cur_inode_gen,
+                                                          valid_path);
                                        if (ret < 0)
                                                goto out;
                                }

>
> Also please try to keep lines up to 80 characters (that line is 169
> characters long).
> You can run ./scripts/checkpatch.pl to validate your patch files and
> warn you if the code doesn't comply to the coding standard.
>
>> +                                       cur_is_ancestor = true;
>> +                               }
>>
>>                                 ret = orphanize_inode(sctx, ow_inode, ow_gen,
>>                                                 cur->full_path);
>>                                 if (ret < 0)
>>                                         goto out;
>> +
>> +                               /*
>> +                                * check is waiting dir, if yes change the ino
>> +                                * to orphanized in the waiting tree.
>> +                                */
>> +                               if (is_waiting_for_move(sctx, ow_inode)) {
>> +                                       wdm = get_waiting_dir_move(sctx, ow_inode);
>> +                                       ASSERT(wdm);
>> +                                       wdm->orphanized = true;
>> +                               }
>> +
>>                                 /*
>>                                  * Make sure we clear our orphanized inode's
>>                                  * name from the name cache. This is because the
>> @@ -3630,6 +3654,17 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>>                                         name_cache_delete(sctx, nce);
>>                                         kfree(nce);
>>                                 }
>> +
>> +                               /*
>> +                                * if ow_inode is ancestor cur_ino, need to update
>> +                                * update cur_ino path.
>> +                                */
>
> "If ow_inode is an ancestor of cur_ino in the send snapshot, update
> valid_path because ow_inode was orphanized and valid_path contains its
> pre-orphanization name, which is not valid anymore".
>
>> +                               if (cur_is_ancestor) {
>> +                                       fs_path_reset(valid_path);
>> +                                       ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, valid_path);
>> +                                       if (ret < 0)
>> +                                               goto out;
>> +                               }
>>                         } else {
>>                                 ret = send_unlink(sctx, cur->full_path);
>>                                 if (ret < 0)
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path
  2015-06-04 19:19     ` Filipe David Manana
@ 2015-06-05  3:55       ` Robbie Ko
  2015-06-05  8:46         ` Filipe David Manana
  0 siblings, 1 reply; 25+ messages in thread
From: Robbie Ko @ 2015-06-05  3:55 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org

Hi Filipe,

There is another case for  2nd scenario where is_ancestor() can't be used.

Parent snapshot:
|---- a/ (ino 261)
  |---- c (ino 267)
|---- d/ (ino 259)
  |---- ance/ (ino 266)
    |---- waiting_dir/ (ino 262)
|---- pre/ (ino 264)
  |---- ance/ (ino 265)

Send snapshot:
|---- a/ (ino 261)
  |---- ance/ (ino 266)
|---- c (ino 267)
  |---- waiting_dir/ (ino 262)
    |---- pre/ (ino 264)
|---- d/ (ino 259)
  |---- ance/ (ino 265)

First, 262 can't move to c/waiting_dir without the rename of inode 267.
Second, 264 can move into dir 262. Although 262 is waiting, 264 is not
parent of 262 in the parent root.
(The second behavior will happen after applying "[PATCH] Btrfs:
incremental send, don't delay directory renames unnecessarily")
Finally, 265 will overwrite 266 and path for 265 should be updated
since 266 is not the ancestor of 265.
Here we need to check the current state of tree rather than parent
root which  is_ancestor function does.

Thanks
Robbie Ko

2015-06-05 3:19 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
> On Thu, Jun 4, 2015 at 2:50 PM, Filipe David Manana <fdmanana@gmail.com> wrote:
>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>>> Base on [PATCH] Btrfs: incremental send, check if orphanized dir inode needs delayed rename
>>>
>>> Example1:
>>> There's one case where we can't issue a rename operation for a directory
>>> immediately when we process it.
>>>
>>> Parent snapshot:
>>> |---- d/ (ino 257)
>>>   |---- p1 (ino 258)
>>> |---- p1/ (ino 259)
>>>
>>> Send snapshot:
>>> |---- d/ (ino 257)
>>>   |---- p1 (ino 259)
>>>     |---- p1/ (ino 258)
>>>
>>> Here we can not rename 258 from d/p1 to p1/p1 without the rename of inode 259.
>>> p1 258 is put into wait_parent_move. 259 can't be rename to d/p1, so it is put into
>>> circular waiting happens.
>>
>> "... into circular waiting happens" -> so 259's rename is delayed to
>> happen after 258's rename, which creates a circular dependency (258 ->
>> 259 -> 258).
>>
>>> This is fix by rename destination directory and set
>>> it as orphanized for this case.
>>>
>>> Example2:
>>> There's one case where we can't issue a rename operation for a directory
>>> immediately we process it.
>>> After moving 262 outside, path of 265 is stored in the name_cache_entry.
>>> When 263 try to overwrite 265, its ancestor, 265 is moved to orphanized. Path of 263
>>> is still the original path, however. This causes error.
>>
>> For the sake of a more complete/informative change log, can you
>> mention what's the error?
>>
>>>
>>> Parent snapshot:
>>> |---- a/ (ino 259)
>>>   |---- c (ino 266)
>>> |---- d/ (ino 260)
>>>   |---- ance (ino 265)
>>>     |---- e (ino 261)
>>>     |---- f (ino 262)
>>>     |---- ance (ino 263)
>>>
>>> Send snapshot:
>>> |---- a/ (ino 259)
>>> |---- c/ (ino 266)
>>>   |---- ance (ino 265)
>>> |---- d/ (ino 260)
>>>   |---- ance (ino 263)
>>> |---- f/ (ino 262)
>>>   |---- e (ino 261)
>>>
>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>> ---
>>>  fs/btrfs/send.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 40 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>> index 1c1f161..fbfbb8b 100644
>>> --- a/fs/btrfs/send.c
>>> +++ b/fs/btrfs/send.c
>>> @@ -230,7 +230,6 @@ struct pending_dir_move {
>>>         u64 parent_ino;
>>>         u64 ino;
>>>         u64 gen;
>>> -       bool is_orphan;
>>>         struct list_head update_refs;
>>>  };
>>>
>>> @@ -1840,7 +1839,7 @@ static int will_overwrite_ref(struct send_ctx *sctx, u64 dir, u64 dir_gen,
>>>          * was already unlinked/moved, so we can safely assume that we will not
>>>          * overwrite anything at this point in time.
>>>          */
>>> -       if (other_inode > sctx->send_progress) {
>>> +       if (other_inode > sctx->send_progress || is_waiting_for_move(sctx, other_inode)) {
>>>                 ret = get_inode_info(sctx->parent_root, other_inode, NULL,
>>>                                 who_gen, NULL, NULL, NULL, NULL);
>>>                 if (ret < 0)
>>> @@ -3014,7 +3013,6 @@ static int add_pending_dir_move(struct send_ctx *sctx,
>>>         pm->parent_ino = parent_ino;
>>>         pm->ino = ino;
>>>         pm->gen = ino_gen;
>>> -       pm->is_orphan = is_orphan;
>>>         INIT_LIST_HEAD(&pm->list);
>>>         INIT_LIST_HEAD(&pm->update_refs);
>>>         RB_CLEAR_NODE(&pm->node);
>>> @@ -3091,6 +3089,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>>>         struct waiting_dir_move *dm = NULL;
>>>         u64 rmdir_ino = 0;
>>>         int ret;
>>> +       bool is_orphan;
>>>
>>>         name = fs_path_alloc();
>>>         from_path = fs_path_alloc();
>>> @@ -3102,9 +3101,10 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>>>         dm = get_waiting_dir_move(sctx, pm->ino);
>>>         ASSERT(dm);
>>>         rmdir_ino = dm->rmdir_ino;
>>> +       is_orphan = dm->orphanized;
>>>         free_waiting_dir_move(sctx, dm);
>>>
>>> -       if (pm->is_orphan) {
>>> +       if (is_orphan) {
>>>                 ret = gen_unique_name(sctx, pm->ino,
>>>                                       pm->gen, from_path);
>>>         } else {
>>> @@ -3292,6 +3292,7 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
>>>         u64 left_gen;
>>>         u64 right_gen;
>>>         int ret = 0;
>>> +       struct waiting_dir_move *wdm;
>>>
>>>         if (RB_EMPTY_ROOT(&sctx->waiting_dir_moves))
>>>                 return 0;
>>> @@ -3350,7 +3351,8 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
>>>                 goto out;
>>>         }
>>>
>>> -       if (is_waiting_for_move(sctx, di_key.objectid)) {
>>> +       wdm = get_waiting_dir_move(sctx, di_key.objectid);
>>> +       if (wdm && !wdm->orphanized) {
>>>                 ret = add_pending_dir_move(sctx,
>>>                                            sctx->cur_ino,
>>>                                            sctx->cur_inode_gen,
>>> @@ -3610,11 +3612,33 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>>>                                 goto out;
>>>                         if (ret) {
>>>                                 struct name_cache_entry *nce;
>>> +                               struct waiting_dir_move *wdm;
>>> +                               bool cur_is_ancestor = false;
>>> +
>>> +                               /*
>>> +                                * check is dset path is ancestor src path
>>> +                                * if yes, need to update cur_ino path
>>> +                                */
>>
>> Typos/confusing comment and doesn't explain why the following check is
>> being done.
>>
>>> +                               if (strncmp(cur->full_path->start, valid_path->start, fs_path_len(cur->full_path)) == 0 &&
>>> +                                                       fs_path_len(valid_path) > fs_path_len(cur->full_path) && valid_path->start[fs_path_len(cur->full_path)] == '/') {
>>
>> At a first glance it seems confusing why we are comparing substrings
>> of an entire path instead of just the old and new names for the
>> current and the conflicting (ow_inode) inodes and their parent inode
>> numbers and generation. I think the comment should explain why.
>
> So you can determine if ow_inode is an ancestor of cur_ino in the
> parent snapshot using is_ancestor(), which is less error prone than
> comparing partial path strings and consistent with the existing code
> base, as it takes into account inode and generation numbers for each
> path component.
>
> E.g. the following patch on top of your patch makes at least the 2nd
> scenario in the commit message work as well (I've fixed long lines and
> the comment as well, also pasted at
> https://friendpaste.com/KEIIBZ8H1F1t3trdjv0bH). What do you think?
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 3c38879..d909892 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -3629,16 +3629,7 @@ verbose_printk("btrfs: process_recorded_refs
> %llu\n", sctx->cur_ino);
>                         if (ret) {
>                                 struct name_cache_entry *nce;
>                                 struct waiting_dir_move *wdm;
> -                               bool cur_is_ancestor = false;
> -
> -                               /*
> -                                * check is dset path is ancestor src path
> -                                * if yes, need to update cur_ino path
> -                                */
> -                               if (strncmp(cur->full_path->start,
> valid_path->start, fs_path_len(cur->full_path)) == 0 &&
> -
> fs_path_len(valid_path) > fs_path_len(cur->full_path) &&
> valid_path->start[fs_path_len(cur->full_path)] == '/') {
> -                                       cur_is_ancestor = true;
> -                               }
> +                               struct fs_path *tmp_path;
>
>                                 ret = orphanize_inode(sctx, ow_inode, ow_gen,
>                                                 cur->full_path);
> @@ -3672,12 +3663,27 @@ verbose_printk("btrfs: process_recorded_refs
> %llu\n", sctx->cur_ino);
>                                 }
>
>                                 /*
> -                                * if ow_inode is ancestor cur_ino,
> need to update
> -                                * update cur_ino path.
> +                                * If ow_inode is an ancestor of cur_ino in the
> +                                * parent root, compute valid_path again because
> +                                * it contains the pre-orphanization name of
> +                                * ow_inode, which is no longer valid.
>                                  */
> -                               if (cur_is_ancestor) {
> +                               tmp_path = fs_path_alloc();
> +                               if (!tmp_path) {
> +                                       ret = -ENOMEM;
> +                                       goto out;
> +                               }
> +                               ret = is_ancestor(sctx->parent_root,
> +                                                 ow_inode, ow_gen,
> +                                                 sctx->cur_ino, tmp_path);
> +                               fs_path_free(tmp_path);
> +                               if (ret < 0)
> +                                       goto out;
> +                               if (ret == 1) {
>                                         fs_path_reset(valid_path);
> -                                       ret = get_cur_path(sctx,
> sctx->cur_ino, sctx->cur_inode_gen, valid_path);
> +                                       ret = get_cur_path(sctx, sctx->cur_ino,
> +                                                          sctx->cur_inode_gen,
> +                                                          valid_path);
>                                         if (ret < 0)
>                                                 goto out;
>                                 }
>
>>
>> Also please try to keep lines up to 80 characters (that line is 169
>> characters long).
>> You can run ./scripts/checkpatch.pl to validate your patch files and
>> warn you if the code doesn't comply to the coding standard.
>>
>>> +                                       cur_is_ancestor = true;
>>> +                               }
>>>
>>>                                 ret = orphanize_inode(sctx, ow_inode, ow_gen,
>>>                                                 cur->full_path);
>>>                                 if (ret < 0)
>>>                                         goto out;
>>> +
>>> +                               /*
>>> +                                * check is waiting dir, if yes change the ino
>>> +                                * to orphanized in the waiting tree.
>>> +                                */
>>> +                               if (is_waiting_for_move(sctx, ow_inode)) {
>>> +                                       wdm = get_waiting_dir_move(sctx, ow_inode);
>>> +                                       ASSERT(wdm);
>>> +                                       wdm->orphanized = true;
>>> +                               }
>>> +
>>>                                 /*
>>>                                  * Make sure we clear our orphanized inode's
>>>                                  * name from the name cache. This is because the
>>> @@ -3630,6 +3654,17 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>>>                                         name_cache_delete(sctx, nce);
>>>                                         kfree(nce);
>>>                                 }
>>> +
>>> +                               /*
>>> +                                * if ow_inode is ancestor cur_ino, need to update
>>> +                                * update cur_ino path.
>>> +                                */
>>
>> "If ow_inode is an ancestor of cur_ino in the send snapshot, update
>> valid_path because ow_inode was orphanized and valid_path contains its
>> pre-orphanization name, which is not valid anymore".
>>
>>> +                               if (cur_is_ancestor) {
>>> +                                       fs_path_reset(valid_path);
>>> +                                       ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, valid_path);
>>> +                                       if (ret < 0)
>>> +                                               goto out;
>>> +                               }
>>>                         } else {
>>>                                 ret = send_unlink(sctx, cur->full_path);
>>>                                 if (ret < 0)
>>> --
>>> 1.9.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path
  2015-06-05  3:55       ` Robbie Ko
@ 2015-06-05  8:46         ` Filipe David Manana
  0 siblings, 0 replies; 25+ messages in thread
From: Filipe David Manana @ 2015-06-05  8:46 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Fri, Jun 5, 2015 at 4:55 AM, Robbie Ko <robbieko@synology.com> wrote:
> Hi Filipe,
>
> There is another case for  2nd scenario where is_ancestor() can't be used.

So it's a 3rd case and not the 2nd one anymore. We must have an
xfstest for this case too.

>
> Parent snapshot:
> |---- a/ (ino 261)
>   |---- c (ino 267)
> |---- d/ (ino 259)
>   |---- ance/ (ino 266)
>     |---- waiting_dir/ (ino 262)
> |---- pre/ (ino 264)
>   |---- ance/ (ino 265)
>
> Send snapshot:
> |---- a/ (ino 261)
>   |---- ance/ (ino 266)
> |---- c (ino 267)
>   |---- waiting_dir/ (ino 262)
>     |---- pre/ (ino 264)
> |---- d/ (ino 259)
>   |---- ance/ (ino 265)
>
> First, 262 can't move to c/waiting_dir without the rename of inode 267.
> Second, 264 can move into dir 262. Although 262 is waiting, 264 is not
> parent of 262 in the parent root.
> (The second behavior will happen after applying "[PATCH] Btrfs:
> incremental send, don't delay directory renames unnecessarily")
> Finally, 265 will overwrite 266 and path for 265 should be updated
> since 266 is not the ancestor of 265.
> Here we need to check the current state of tree rather than parent
> root which  is_ancestor function does.

Right. But comparing full paths is not the way the go for the reasons
mentioned previously. So get_cur_path() gives us the full path of an
inode based on the current state (i.e. the state of directory
hierarchy on the receiving side after applying all operations issued
in the send stream so far). That means we can use that code (write a
new function similar to it) to determine if some inode is currently an
ancestor of some other inode by walking up hierarchy and comparing
inode numbers and generation numbers - that's the only correct way.

But we can make it more simple than writing such a new function that
would be similar to get_cur_path()... Just reset valid_path and
compute again after orphanizing a conflicting entry - i.e. don't
bother checking for ancestry.

So that the previous patch would be (also at
https://friendpaste.com/6jdXdYPdC6YFffwNL6V563):

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 3c38879..d34df19 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3629,16 +3629,6 @@ verbose_printk("btrfs: process_recorded_refs
%llu\n", sctx->cur_ino);
  if (ret) {
  struct name_cache_entry *nce;
  struct waiting_dir_move *wdm;
- bool cur_is_ancestor = false;
-
- /*
- * check is dset path is ancestor src path
- * if yes, need to update cur_ino path
- */
- if (strncmp(cur->full_path->start, valid_path->start,
fs_path_len(cur->full_path)) == 0 &&
- fs_path_len(valid_path) > fs_path_len(cur->full_path) &&
valid_path->start[fs_path_len(cur->full_path)] == '/') {
- cur_is_ancestor = true;
- }

  ret = orphanize_inode(sctx, ow_inode, ow_gen,
  cur->full_path);
@@ -3672,15 +3662,18 @@ verbose_printk("btrfs: process_recorded_refs
%llu\n", sctx->cur_ino);
  }

  /*
- * if ow_inode is ancestor cur_ino, need to update
- * update cur_ino path.
+ * ow_inode might currently be an ancestor of
+ * cur_ino, therefore compute valid_path (the
+ * current path of cur_ino) again because it
+ * might contain the pre-orphanization name of
+ * ow_inode, which is no longer valid.
  */
- if (cur_is_ancestor) {
- fs_path_reset(valid_path);
- ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, valid_path);
- if (ret < 0)
- goto out;
- }
+ fs_path_reset(valid_path);
+ ret = get_cur_path(sctx, sctx->cur_ino,
+   sctx->cur_inode_gen,
+   valid_path);
+ if (ret < 0)
+ goto out;
  } else {
  ret = send_unlink(sctx, cur->full_path);
  if (ret < 0)


>
> Thanks
> Robbie Ko
>
> 2015-06-05 3:19 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>> On Thu, Jun 4, 2015 at 2:50 PM, Filipe David Manana <fdmanana@gmail.com> wrote:
>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>>>> Base on [PATCH] Btrfs: incremental send, check if orphanized dir inode needs delayed rename
>>>>
>>>> Example1:
>>>> There's one case where we can't issue a rename operation for a directory
>>>> immediately when we process it.
>>>>
>>>> Parent snapshot:
>>>> |---- d/ (ino 257)
>>>>   |---- p1 (ino 258)
>>>> |---- p1/ (ino 259)
>>>>
>>>> Send snapshot:
>>>> |---- d/ (ino 257)
>>>>   |---- p1 (ino 259)
>>>>     |---- p1/ (ino 258)
>>>>
>>>> Here we can not rename 258 from d/p1 to p1/p1 without the rename of inode 259.
>>>> p1 258 is put into wait_parent_move. 259 can't be rename to d/p1, so it is put into
>>>> circular waiting happens.
>>>
>>> "... into circular waiting happens" -> so 259's rename is delayed to
>>> happen after 258's rename, which creates a circular dependency (258 ->
>>> 259 -> 258).
>>>
>>>> This is fix by rename destination directory and set
>>>> it as orphanized for this case.
>>>>
>>>> Example2:
>>>> There's one case where we can't issue a rename operation for a directory
>>>> immediately we process it.
>>>> After moving 262 outside, path of 265 is stored in the name_cache_entry.
>>>> When 263 try to overwrite 265, its ancestor, 265 is moved to orphanized. Path of 263
>>>> is still the original path, however. This causes error.
>>>
>>> For the sake of a more complete/informative change log, can you
>>> mention what's the error?
>>>
>>>>
>>>> Parent snapshot:
>>>> |---- a/ (ino 259)
>>>>   |---- c (ino 266)
>>>> |---- d/ (ino 260)
>>>>   |---- ance (ino 265)
>>>>     |---- e (ino 261)
>>>>     |---- f (ino 262)
>>>>     |---- ance (ino 263)
>>>>
>>>> Send snapshot:
>>>> |---- a/ (ino 259)
>>>> |---- c/ (ino 266)
>>>>   |---- ance (ino 265)
>>>> |---- d/ (ino 260)
>>>>   |---- ance (ino 263)
>>>> |---- f/ (ino 262)
>>>>   |---- e (ino 261)
>>>>
>>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>>> ---
>>>>  fs/btrfs/send.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
>>>>  1 file changed, 40 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>> index 1c1f161..fbfbb8b 100644
>>>> --- a/fs/btrfs/send.c
>>>> +++ b/fs/btrfs/send.c
>>>> @@ -230,7 +230,6 @@ struct pending_dir_move {
>>>>         u64 parent_ino;
>>>>         u64 ino;
>>>>         u64 gen;
>>>> -       bool is_orphan;
>>>>         struct list_head update_refs;
>>>>  };
>>>>
>>>> @@ -1840,7 +1839,7 @@ static int will_overwrite_ref(struct send_ctx *sctx, u64 dir, u64 dir_gen,
>>>>          * was already unlinked/moved, so we can safely assume that we will not
>>>>          * overwrite anything at this point in time.
>>>>          */
>>>> -       if (other_inode > sctx->send_progress) {
>>>> +       if (other_inode > sctx->send_progress || is_waiting_for_move(sctx, other_inode)) {
>>>>                 ret = get_inode_info(sctx->parent_root, other_inode, NULL,
>>>>                                 who_gen, NULL, NULL, NULL, NULL);
>>>>                 if (ret < 0)
>>>> @@ -3014,7 +3013,6 @@ static int add_pending_dir_move(struct send_ctx *sctx,
>>>>         pm->parent_ino = parent_ino;
>>>>         pm->ino = ino;
>>>>         pm->gen = ino_gen;
>>>> -       pm->is_orphan = is_orphan;
>>>>         INIT_LIST_HEAD(&pm->list);
>>>>         INIT_LIST_HEAD(&pm->update_refs);
>>>>         RB_CLEAR_NODE(&pm->node);
>>>> @@ -3091,6 +3089,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>>>>         struct waiting_dir_move *dm = NULL;
>>>>         u64 rmdir_ino = 0;
>>>>         int ret;
>>>> +       bool is_orphan;
>>>>
>>>>         name = fs_path_alloc();
>>>>         from_path = fs_path_alloc();
>>>> @@ -3102,9 +3101,10 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>>>>         dm = get_waiting_dir_move(sctx, pm->ino);
>>>>         ASSERT(dm);
>>>>         rmdir_ino = dm->rmdir_ino;
>>>> +       is_orphan = dm->orphanized;
>>>>         free_waiting_dir_move(sctx, dm);
>>>>
>>>> -       if (pm->is_orphan) {
>>>> +       if (is_orphan) {
>>>>                 ret = gen_unique_name(sctx, pm->ino,
>>>>                                       pm->gen, from_path);
>>>>         } else {
>>>> @@ -3292,6 +3292,7 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
>>>>         u64 left_gen;
>>>>         u64 right_gen;
>>>>         int ret = 0;
>>>> +       struct waiting_dir_move *wdm;
>>>>
>>>>         if (RB_EMPTY_ROOT(&sctx->waiting_dir_moves))
>>>>                 return 0;
>>>> @@ -3350,7 +3351,8 @@ static int wait_for_dest_dir_move(struct send_ctx *sctx,
>>>>                 goto out;
>>>>         }
>>>>
>>>> -       if (is_waiting_for_move(sctx, di_key.objectid)) {
>>>> +       wdm = get_waiting_dir_move(sctx, di_key.objectid);
>>>> +       if (wdm && !wdm->orphanized) {
>>>>                 ret = add_pending_dir_move(sctx,
>>>>                                            sctx->cur_ino,
>>>>                                            sctx->cur_inode_gen,
>>>> @@ -3610,11 +3612,33 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>>>>                                 goto out;
>>>>                         if (ret) {
>>>>                                 struct name_cache_entry *nce;
>>>> +                               struct waiting_dir_move *wdm;
>>>> +                               bool cur_is_ancestor = false;
>>>> +
>>>> +                               /*
>>>> +                                * check is dset path is ancestor src path
>>>> +                                * if yes, need to update cur_ino path
>>>> +                                */
>>>
>>> Typos/confusing comment and doesn't explain why the following check is
>>> being done.
>>>
>>>> +                               if (strncmp(cur->full_path->start, valid_path->start, fs_path_len(cur->full_path)) == 0 &&
>>>> +                                                       fs_path_len(valid_path) > fs_path_len(cur->full_path) && valid_path->start[fs_path_len(cur->full_path)] == '/') {
>>>
>>> At a first glance it seems confusing why we are comparing substrings
>>> of an entire path instead of just the old and new names for the
>>> current and the conflicting (ow_inode) inodes and their parent inode
>>> numbers and generation. I think the comment should explain why.
>>
>> So you can determine if ow_inode is an ancestor of cur_ino in the
>> parent snapshot using is_ancestor(), which is less error prone than
>> comparing partial path strings and consistent with the existing code
>> base, as it takes into account inode and generation numbers for each
>> path component.
>>
>> E.g. the following patch on top of your patch makes at least the 2nd
>> scenario in the commit message work as well (I've fixed long lines and
>> the comment as well, also pasted at
>> https://friendpaste.com/KEIIBZ8H1F1t3trdjv0bH). What do you think?
>>
>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>> index 3c38879..d909892 100644
>> --- a/fs/btrfs/send.c
>> +++ b/fs/btrfs/send.c
>> @@ -3629,16 +3629,7 @@ verbose_printk("btrfs: process_recorded_refs
>> %llu\n", sctx->cur_ino);
>>                         if (ret) {
>>                                 struct name_cache_entry *nce;
>>                                 struct waiting_dir_move *wdm;
>> -                               bool cur_is_ancestor = false;
>> -
>> -                               /*
>> -                                * check is dset path is ancestor src path
>> -                                * if yes, need to update cur_ino path
>> -                                */
>> -                               if (strncmp(cur->full_path->start,
>> valid_path->start, fs_path_len(cur->full_path)) == 0 &&
>> -
>> fs_path_len(valid_path) > fs_path_len(cur->full_path) &&
>> valid_path->start[fs_path_len(cur->full_path)] == '/') {
>> -                                       cur_is_ancestor = true;
>> -                               }
>> +                               struct fs_path *tmp_path;
>>
>>                                 ret = orphanize_inode(sctx, ow_inode, ow_gen,
>>                                                 cur->full_path);
>> @@ -3672,12 +3663,27 @@ verbose_printk("btrfs: process_recorded_refs
>> %llu\n", sctx->cur_ino);
>>                                 }
>>
>>                                 /*
>> -                                * if ow_inode is ancestor cur_ino,
>> need to update
>> -                                * update cur_ino path.
>> +                                * If ow_inode is an ancestor of cur_ino in the
>> +                                * parent root, compute valid_path again because
>> +                                * it contains the pre-orphanization name of
>> +                                * ow_inode, which is no longer valid.
>>                                  */
>> -                               if (cur_is_ancestor) {
>> +                               tmp_path = fs_path_alloc();
>> +                               if (!tmp_path) {
>> +                                       ret = -ENOMEM;
>> +                                       goto out;
>> +                               }
>> +                               ret = is_ancestor(sctx->parent_root,
>> +                                                 ow_inode, ow_gen,
>> +                                                 sctx->cur_ino, tmp_path);
>> +                               fs_path_free(tmp_path);
>> +                               if (ret < 0)
>> +                                       goto out;
>> +                               if (ret == 1) {
>>                                         fs_path_reset(valid_path);
>> -                                       ret = get_cur_path(sctx,
>> sctx->cur_ino, sctx->cur_inode_gen, valid_path);
>> +                                       ret = get_cur_path(sctx, sctx->cur_ino,
>> +                                                          sctx->cur_inode_gen,
>> +                                                          valid_path);
>>                                         if (ret < 0)
>>                                                 goto out;
>>                                 }
>>
>>>
>>> Also please try to keep lines up to 80 characters (that line is 169
>>> characters long).
>>> You can run ./scripts/checkpatch.pl to validate your patch files and
>>> warn you if the code doesn't comply to the coding standard.
>>>
>>>> +                                       cur_is_ancestor = true;
>>>> +                               }
>>>>
>>>>                                 ret = orphanize_inode(sctx, ow_inode, ow_gen,
>>>>                                                 cur->full_path);
>>>>                                 if (ret < 0)
>>>>                                         goto out;
>>>> +
>>>> +                               /*
>>>> +                                * check is waiting dir, if yes change the ino
>>>> +                                * to orphanized in the waiting tree.
>>>> +                                */
>>>> +                               if (is_waiting_for_move(sctx, ow_inode)) {
>>>> +                                       wdm = get_waiting_dir_move(sctx, ow_inode);
>>>> +                                       ASSERT(wdm);
>>>> +                                       wdm->orphanized = true;
>>>> +                               }
>>>> +
>>>>                                 /*
>>>>                                  * Make sure we clear our orphanized inode's
>>>>                                  * name from the name cache. This is because the
>>>> @@ -3630,6 +3654,17 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>>>>                                         name_cache_delete(sctx, nce);
>>>>                                         kfree(nce);
>>>>                                 }
>>>> +
>>>> +                               /*
>>>> +                                * if ow_inode is ancestor cur_ino, need to update
>>>> +                                * update cur_ino path.
>>>> +                                */
>>>
>>> "If ow_inode is an ancestor of cur_ino in the send snapshot, update
>>> valid_path because ow_inode was orphanized and valid_path contains its
>>> pre-orphanization name, which is not valid anymore".
>>>
>>>> +                               if (cur_is_ancestor) {
>>>> +                                       fs_path_reset(valid_path);
>>>> +                                       ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, valid_path);
>>>> +                                       if (ret < 0)
>>>> +                                               goto out;
>>>> +                               }
>>>>                         } else {
>>>>                                 ret = send_unlink(sctx, cur->full_path);
>>>>                                 if (ret < 0)
>>>> --
>>>> 1.9.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>>
>>> "Reasonable men adapt themselves to the world.
>>>  Unreasonable men adapt the world to themselves.
>>>  That's why all progress depends on unreasonable men."
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] Btrfs: incremental send, avoid ancestor rename to descendant
  2015-06-04 15:43   ` Filipe David Manana
@ 2015-06-05 11:18     ` Robbie Ko
  0 siblings, 0 replies; 25+ messages in thread
From: Robbie Ko @ 2015-06-05 11:18 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org

Hi Filipe,

This case will happen after applying "[PATCH] Btrfs: incremental send,
don't delay directory renames unnecessarily".
Because, that patch changes behavior of wait_for_parent_move function.

|---- a
  |---- b
    |---- c
       |---- d
"Move a directory from ancestor to descendant" means moving dir. a into dir. c


Let's identify the problem by the following example.

Parent snapshot:
|---- @tmp/ (ino 257)
|---- pre/ (ino 259)
  |---- wait_dir (ino 260)
    |---- finish_dir2/ (ino 261)
|---- ance/ (ino 263)
  |---- finish_dir1/ (ino 258)
|---- desc/ (ino 262)
|---- other_dir/ (ino 264)

Send snapshot:
|---- @tmp/ (ino 257)
  |---- other_dir/ (ino 264)
    |---- wait_dir/ (ino 260)
      |---- finish_dir2/ (ino 261)
        |---- desc/ (ino 262)
          |---- ance/ (ino 263)
            |---- finish_dir1/ (ino 258)
              |---- pre/ (ino 259)

1. 259 can not move under 258 because 263 needs to move to 263 first.
So 259 is waiting on ance(263).

2. 260 must move to @tmp/other_dir, so it is waiting on other_dir(264).

3. 262 is able to rename as pre/wait_dir/finish_dir2(261)/desc since
wait_dir(260) is waiting and 262 is not the ancestor of wait_dir(260).

4.263 is able to rename as pre/wait_dir/finish_dir2(261)/ance since
wait_dir(260) is waiting and 263 is not the ancestor of wait_dir(260).

5. After wait_dir(263) is finished, all pending dirs. start to run.
    /pre(259) in apply_dir_move() renames /pre as
pre/wait_dir/finish_dir2/desc/ance/finish_dir1/pre
  At the same time, receiving side will encounter error.
  If anyone calls get_cur_path() to any element in
pre/wait_dir/finish_dir2/desc/ance/finish_dir1/pre like wait_dir(260)
,
  there will cause path building loop like this : 260 -> 259 -> 258 ->
263 -> 262 -> 261 -> 260

So fix the problem by check path_loop for this case.

Thanks.
Robbie Ko

2015-06-04 23:43 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>> Base on [PATCH] Btrfs: incremental send, don't delay directory renames unnecessarily
>>
>> There's one case where we can't issue a rename operation for a directory
>
> Should be "There's one more case where..." - we already have at least
> a handful of cases where we can't rename a directory as soon as we
> finish processing it :)
>
>> as soon as we process it. We move a directory from ancestor to descendant,
>> if not wait_parent_move or wait_for_dest_dir_move so that happen path loop.
>
> Can you elaborate here? I think this doesn't explain what's the
> problem and why it happens.
>
> What does "move a directory from ancestor to descendant" means? Does
> it mean renaming it from the name in the parent snapshot to its name
> in the send snapshot? Or something else? Ancestor of which inode and
> descendant of which inode?
>
> Mentioning "if not wait_parent_mobe or wait_for_dest_dir" isn't
> helpful either, it doesn't tells us what cases those functions cover
> and which case they are missing, nor does it explain why we get into a
> path building loop problem (I assume that's what you mean with "so
> that happen path loop").
>
>>
>> Example:
>>
>> Parent snapshot:
>> |---- @tmp/ (ino 257)
>> |---- pre/ (ino 259)
>>   |---- wait_dir (ino 260)
>>     |---- finish_dir2/ (ino 261)
>> |---- ance/ (ino 263)
>>   |---- finish_dir1/ (ino 258)
>> |---- desc/ (ino 262)
>> |---- other_dir/ (ino 264)
>>
>> Send snapshot:
>> |---- @tmp/ (ino 257)
>>   |---- other_dir/ (ino 264)
>>     |---- wait_dir/ (ino 260)
>>       |---- finish_dir2/ (ino 261)
>>         |---- desc/ (ino 262)
>>           |---- ance/ (ino 263)
>>             |---- finish_dir1/ (ino 258)
>>               |---- pre/ (ino 259)
>> Here we can not rename 263 from ance to ance/finish_dir1/pre/wait_dir/finish_dir2/desc/ance
>> without the rename of inode 260, so that happen path loop.
>> So fix this by delaying directory renames for this case.
>
> By delaying directory renames for which directories? And why exactly?
>
> You should explain why the path loop happens and why a rename of inode
> 263 from "ance" to
> "ance/finish_dir1/pre/wait_dir/finish_dir2/desc/ance" is attempted,
> what is the bug in the logic of an incremental send that causes such
> rename to be performed.
>
> I haven't tried it here, but it seems weird you get 2 problems here:
> the path building loop and an incorrect rename operation - for all the
> previously solved bugs we had either one or the other happening, but
> not both. Unless the "path loop" issue you mention is not the infinite
> path build loop that leads to an -ENOMEM error when the path we're
> building exceeds PATH_MAX, but instead some other new problem.
>
> The commit message should really explain what causes the problem and
> the fix in detail.
>
>>
>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>>  fs/btrfs/send.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 95 insertions(+)
>>
>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>> index fbfbb8b..596b9dc 100644
>> --- a/fs/btrfs/send.c
>> +++ b/fs/btrfs/send.c
>> @@ -3078,6 +3078,56 @@ static struct pending_dir_move *get_pending_dir_moves(struct send_ctx *sctx,
>>         return NULL;
>>  }
>>
>> +static int path_loop(struct send_ctx *sctx, struct fs_path *name,
>> +                    u64 ino, u64 gen, u64 *ancestor_ino)
>> +{
>> +       int ret = 0;
>> +       u64 parent_inode = 0;
>> +       u64 parent_gen = 0;
>> +       u64 start_ino = ino;
>> +
>> +       *ancestor_ino = 0;
>> +       while (ino != BTRFS_FIRST_FREE_OBJECTID) {
>> +               struct waiting_dir_move *wdm;
>> +               fs_path_reset(name);
>> +
>> +               if (is_waiting_for_rm(sctx, ino))
>> +                       break;
>> +
>> +               wdm = get_waiting_dir_move(sctx, ino);
>> +               if (wdm) {
>> +                       if (*ancestor_ino == 0)
>> +                               *ancestor_ino = ino;
>> +                       if (wdm->orphanized) {
>> +                               ret = gen_unique_name(sctx, ino, gen, name);
>> +                               break;
>> +                       } else {
>> +                               ret = get_first_ref(sctx->parent_root, ino,
>> +                                                                       &parent_inode, &parent_gen, name);
>> +                       }
>> +               } else {
>> +                       ret = __get_cur_name_and_parent(sctx, ino, gen,
>> +                                                       &parent_inode,
>> +                                                       &parent_gen, name);
>> +                       if (ret > 0) {
>> +                               ret = 0;
>> +                               break;
>> +                       }
>> +               }
>> +               if (ret < 0)
>> +                       break;
>> +               if (parent_inode == start_ino) {
>> +                       ret = 1;
>> +                       if (*ancestor_ino == 0)
>> +                               *ancestor_ino = ino;
>> +                       break;
>> +               }
>> +               ino = parent_inode;
>> +               gen = parent_gen;
>> +       }
>> +       return ret;
>> +}
>> +
>>  static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>>  {
>>         struct fs_path *from_path = NULL;
>> @@ -3089,6 +3139,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>>         struct waiting_dir_move *dm = NULL;
>>         u64 rmdir_ino = 0;
>>         int ret;
>> +       u64 ancestor = 0;
>>         bool is_orphan;
>>
>>         name = fs_path_alloc();
>> @@ -3122,6 +3173,22 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>>                 goto out;
>>
>>         sctx->send_progress = sctx->cur_ino + 1;
>> +       ret = path_loop(sctx, name, pm->ino, pm->gen, &ancestor);
>> +       if (ret) {
>> +               LIST_HEAD(deleted_refs);
>> +               ASSERT(ancestor > BTRFS_FIRST_FREE_OBJECTID);
>> +               ret = add_pending_dir_move(sctx, pm->ino, pm->gen, ancestor,
>> +                                                                       &pm->update_refs, &deleted_refs,
>> +                                                                       is_orphan);
>> +               if (ret < 0)
>> +                       goto out;
>> +               if (rmdir_ino) {
>> +                       dm = get_waiting_dir_move(sctx, pm->ino);
>> +                       ASSERT(dm);
>> +                       dm->rmdir_ino = rmdir_ino;
>> +               }
>> +               goto out;
>> +       }
>
> So far you're basically reverting this change:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5f806c3ae2ff6263a10a6901f97abb74dac03d36
>
> That should be a separate 'revert' patch in the series.
>
>
>>         fs_path_reset(name);
>>         to_path = name;
>>         name = NULL;
>> @@ -3693,6 +3760,34 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", sctx->cur_ino);
>>                 }
>>
>>                 /*
>> +                * if cur_ino is cur ancestor, can't move now,
>> +                * find descendant who is waiting, waiting it.
>> +                */
>
> This comment is confusing too. cur_ino is ancestor of itself, or
> ancestor of which inode? Plus the code below is looking for an
> ancestor while the comment mentions finding a descendant.
>
>> +               if (can_rename && (strncmp(valid_path->start, cur->full_path->start, fs_path_len(valid_path)) == 0) &&
>> +                                                       fs_path_len(cur->full_path) > fs_path_len(valid_path) && cur->full_path->start[fs_path_len(valid_path)] == '/') {
>
> Same comment as in the first patch, too long line.
> Also given that this check (second condition) is being repeated in 2
> different places, it should be encapsulated in a helper function.
>
>> +                       struct fs_path *name = NULL;
>> +                       u64 ancestor;
>> +                       u64 old_send_progress = sctx->send_progress;
>> +
>> +                       name = fs_path_alloc();
>
> Allocation can fail, it can return NULL, need to return -ENOMEM in such case.
>
> thanks
>
>> +                       sctx->send_progress = sctx->cur_ino + 1;
>> +                       ret = path_loop(sctx, name, sctx->cur_ino, sctx->cur_inode_gen, &ancestor);
>> +                       if (ret) {
>> +                               ret = add_pending_dir_move(sctx, sctx->cur_ino, sctx->cur_inode_gen,
>> +                                                       ancestor, &sctx->new_refs, &sctx->deleted_refs, is_orphan);
>> +                               if (ret < 0) {
>> +                                       sctx->send_progress = old_send_progress;
>> +                                       fs_path_free(name);
>> +                                       goto out;
>> +                               }
>> +                               can_rename = false;
>> +                               *pending_move = 1;
>> +                       }
>> +                       sctx->send_progress = old_send_progress;
>> +                       fs_path_free(name);
>> +               }
>> +
>> +               /*
>>                  * link/move the ref to the new place. If we have an orphan
>>                  * inode, move it and update valid_path. If not, link or move
>>                  * it depending on the inode mode.
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] Btrfs: incremental send, fix orphan_dir_info not completely cleared
  2015-06-04 11:18 ` [PATCH 3/5] Btrfs: incremental send, fix orphan_dir_info not completely cleared Robbie Ko
  2015-06-04 16:24   ` Filipe David Manana
@ 2015-06-05 13:58   ` David Sterba
  1 sibling, 0 replies; 25+ messages in thread
From: David Sterba @ 2015-06-05 13:58 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs

On Thu, Jun 04, 2015 at 07:18:08PM +0800, Robbie Ko wrote:
> @@ -2913,6 +2912,12 @@ static int can_rmdir(struct send_ctx *sctx, u64 dir, u64 dir_gen,
>  		}
>  
>  		if (loc.objectid > send_progress) {
> +			struct orphan_dir_info *odi;
> +
> +			odi = get_orphan_dir_info(sctx, dir);
> +			if (odi) {
> +				free_orphan_dir_info(sctx, odi);
> +			}

As you're going to resend, please drop the { .. } around the call.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-04 16:14   ` Filipe David Manana
@ 2015-06-08  3:44     ` Robbie Ko
  2015-06-08 14:00       ` Filipe David Manana
  0 siblings, 1 reply; 25+ messages in thread
From: Robbie Ko @ 2015-06-08  3:44 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org

Hi Filipe,

I've fixed "don't send utimes for non-existing directory" with another solution.

 In apply_dir_move(), the old parent dir. and new parent dir. will be
updated after the current dir. has moved.

And there's only one entry in old parent dir. (e.g. entry with
smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
is deleted but updated.

However, if we process rename for another entry not tagged with
rmdir_ino first, its old parent dir. which is deleted  will be updated
according to apply_dir_move().

Therefore, I think we should check the existence of  the dir. before
we're going to update it's utime.

The patch is pasted in the following link, could you give me some comment?

https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k

Thans!

Robbie Ko

2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>> There's one case where we can't issue a utimes operation for a directory.
>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>> processed and update uime between two parent directory.
>> So fix this by not update non exist utimes for this case.
>
> So you mean that we attempt to update utimes for an inode,
> corresponding to a directory, that exists in the parent snapshot but
> not in the send snapshot.
>
> So the subject should be something like "Btrfs: incremental send,
> don't send utimes for non-existing directory" instead of "Btrfs:
> incremental send, fix rmdir not send utimes"
>
>>
>> Example:
>>
>> Parent snapshot:
>> |---- a/ (ino 259)
>>   |---- c (ino 264)
>> |---- b/ (ino 260)
>>   |---- d (ino 265)
>> |---- del/ (ino 263)
>>   |---- item1/ (ino 261)
>>   |---- item2/ (ino 262)
>>
>> Send snapshot:
>> |---- a/ (ino 259)
>> |---- b/ (ino 260)
>> |---- c/ (ino 2)
>>   |---- item2 (ino 259)
>> |---- d/ (ino 257)
>>   |---- item1/ (ino 258)
>>
>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>>  fs/btrfs/send.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>> index e8eb3ab..46f954c 100644
>> --- a/fs/btrfs/send.c
>> +++ b/fs/btrfs/send.c
>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>         key.type = BTRFS_INODE_ITEM_KEY;
>>         key.offset = 0;
>>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>> -       if (ret < 0)
>> +       if (ret != 0)
>>                 goto out;
>
> So I don't think this is a good fix. The problem is in some code that
> calls this function (send_utimes) against the directory that doesn't
> exist - it just shouldn't do that, its logic should be fixed.
> Following this approach, while it works, it's just hiding logic errors
> in one or more code paths, and none of its callers checks for a return
> value of 1 - they only react to values < 0 and introduces the
> possibility of propagating a return value of 1 to user space.
>
> thanks
>
>>
>>         eb = path->nodes[0];
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-08  3:44     ` Robbie Ko
@ 2015-06-08 14:00       ` Filipe David Manana
  2015-06-09 10:04         ` Robbie Ko
  0 siblings, 1 reply; 25+ messages in thread
From: Filipe David Manana @ 2015-06-08 14:00 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote:
> Hi Filipe,

Hi Robbie,

>
> I've fixed "don't send utimes for non-existing directory" with another solution.
>
>  In apply_dir_move(), the old parent dir. and new parent dir. will be
> updated after the current dir. has moved.
>
> And there's only one entry in old parent dir. (e.g. entry with
> smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
> is deleted but updated.

Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino?
rmdir_ino corresponds to the number of a inode that wasn't deleted
when it was processed because there was some inode with a lower number
that is a child of the directory in the parent snapshot and had its
rename/move operation delayed (it happens after the directory we want
to delete is processed).

>
> However, if we process rename for another entry not tagged with
> rmdir_ino first, its old parent dir. which is deleted  will be updated
> according to apply_dir_move().
>
> Therefore, I think we should check the existence of  the dir. before
> we're going to update it's utime.
>
> The patch is pasted in the following link, could you give me some comment?
>
> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k

Looks better.
However I still don't understand your explanation, and just tried the
example in your commit message:

"Parent snapshot:

|---- a/ (ino 259)
  |---- c (ino 264)
|---- b/ (ino 260)
  |---- d (ino 265)
|---- del/ (ino 263)
  |---- item1/ (ino 261)
  |---- item2/ (ino 262)

Send snapshot:
|---- a/ (ino 259)
|---- b/ (ino 260)
|---- c/ (ino 2)
  |---- item2 (ino 259)
|---- d/ (ino 257)
  |---- item1/ (ino 258)"

So it's confusing after looking at it.
First the send snapshot mentions inode number 2, which doesn't exist
in the parent snapshot - I assume you meant inode number 264.
Then, the send snapshot has two inodes with number 259. Is "item2" in
the send snapshot supposed to be inode 262?

Anyway, assuming those 2 fixes to the example are correct guesses, I
tried the following and it didn't fail without your patches (i.e. no
attempts to send utimes to a non-existing directory):

# Parent snapshot:
#
# |---- a/            (ino 259)
# |     |---- c       (ino 264)
# |
# |---- b/            (ino 260)
# |     |---- d       (ino 265)
# |
# |---- del/          (ino 263)
#        |---- item1/ (ino 261)
#        |---- item2/ (ino 262)

# Send snapshot:
#
# |---- a/            (ino 259)
# |---- b/            (ino 260)
# |---- c/            (ino 264)
# |     |---- item2/  (ino 262)
# |
# |---- d/            (ino 265)
#       |---- item1/  (ino 258)

mkdir $SCRATCH_MNT/0
mkdir $SCRATCH_MNT/1

mkdir $SCRATCH_MNT/a # 259
mkdir $SCRATCH_MNT/b # 260
mkdir $SCRATCH_MNT/item1 # 261
mkdir $SCRATCH_MNT/item2 # 262
mkdir $SCRATCH_MNT/del # 263
mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1
mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2
mkdir $SCRATCH_MNT/a/c # 264
mkdir $SCRATCH_MNT/b/d # 265

_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1

mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c
mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d
mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c
mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d
rmdir $SCRATCH_MNT/del

_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2

run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
$SCRATCH_MNT/mysnap2

_run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
_run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
-f $tmp/2.snap

_check_scratch_fs

_scratch_unmount
_scratch_mkfs >/dev/null 2>&1
_scratch_mount

_run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap
run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1

_run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap
run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2



I would suggest making those hiearachy diagrams more readable - pipes
right below the name of their parent, continuation pipes like and
align all inode numbers in the same column, like the following:

# Parent snapshot:
#
# |---- a/            (ino 259)
# |     |---- c       (ino 264)
# |
# |---- b/            (ino 260)
# |     |---- d       (ino 265)
# |
# |---- del/          (ino 263)
#        |---- item1/ (ino 261)
#        |---- item2/ (ino 262)

# Send snapshot:
#
# |---- a/            (ino 259)
# |---- b/            (ino 260)
# |---- c/            (ino 264)
# |     |---- item2/  (ino 262)
# |
# |---- d/            (ino 265)
#       |---- item1/  (ino 258)

(pasted here in case gmail screws up the indentation/formatting:
https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU)

thanks

>
> Thans!
>
> Robbie Ko
>
> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>>> There's one case where we can't issue a utimes operation for a directory.
>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>>> processed and update uime between two parent directory.
>>> So fix this by not update non exist utimes for this case.
>>
>> So you mean that we attempt to update utimes for an inode,
>> corresponding to a directory, that exists in the parent snapshot but
>> not in the send snapshot.
>>
>> So the subject should be something like "Btrfs: incremental send,
>> don't send utimes for non-existing directory" instead of "Btrfs:
>> incremental send, fix rmdir not send utimes"
>>
>>>
>>> Example:
>>>
>>> Parent snapshot:
>>> |---- a/ (ino 259)
>>>   |---- c (ino 264)
>>> |---- b/ (ino 260)
>>>   |---- d (ino 265)
>>> |---- del/ (ino 263)
>>>   |---- item1/ (ino 261)
>>>   |---- item2/ (ino 262)
>>>
>>> Send snapshot:
>>> |---- a/ (ino 259)
>>> |---- b/ (ino 260)
>>> |---- c/ (ino 2)
>>>   |---- item2 (ino 259)
>>> |---- d/ (ino 257)
>>>   |---- item1/ (ino 258)
>>>
>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>> ---
>>>  fs/btrfs/send.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>> index e8eb3ab..46f954c 100644
>>> --- a/fs/btrfs/send.c
>>> +++ b/fs/btrfs/send.c
>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>>         key.type = BTRFS_INODE_ITEM_KEY;
>>>         key.offset = 0;
>>>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>>> -       if (ret < 0)
>>> +       if (ret != 0)
>>>                 goto out;
>>
>> So I don't think this is a good fix. The problem is in some code that
>> calls this function (send_utimes) against the directory that doesn't
>> exist - it just shouldn't do that, its logic should be fixed.
>> Following this approach, while it works, it's just hiding logic errors
>> in one or more code paths, and none of its callers checks for a return
>> value of 1 - they only react to values < 0 and introduces the
>> possibility of propagating a return value of 1 to user space.
>>
>> thanks
>>
>>>
>>>         eb = path->nodes[0];
>>> --
>>> 1.9.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-08 14:00       ` Filipe David Manana
@ 2015-06-09 10:04         ` Robbie Ko
  2015-06-09 10:36           ` Filipe David Manana
  0 siblings, 1 reply; 25+ messages in thread
From: Robbie Ko @ 2015-06-09 10:04 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org

Hi Filipe,

2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote:
>> Hi Filipe,
>
> Hi Robbie,
>
>>
>> I've fixed "don't send utimes for non-existing directory" with another solution.
>>
>>  In apply_dir_move(), the old parent dir. and new parent dir. will be
>> updated after the current dir. has moved.
>>
>> And there's only one entry in old parent dir. (e.g. entry with
>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
>> is deleted but updated.
>
> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino?
> rmdir_ino corresponds to the number of a inode that wasn't deleted
> when it was processed because there was some inode with a lower number
> that is a child of the directory in the parent snapshot and had its
> rename/move operation delayed (it happens after the directory we want
> to delete is processed).
>

Right , my "tagged with rmdir_ino" is same meaning as you explained here.

>>
>> However, if we process rename for another entry not tagged with
>> rmdir_ino first, its old parent dir. which is deleted  will be updated
>> according to apply_dir_move().
>>
>> Therefore, I think we should check the existence of  the dir. before
>> we're going to update it's utime.
>>
>> The patch is pasted in the following link, could you give me some comment?
>>
>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k
>
> Looks better.
> However I still don't understand your explanation, and just tried the
> example in your commit message:
>
> "Parent snapshot:
>
> |---- a/ (ino 259)
>   |---- c (ino 264)
> |---- b/ (ino 260)
>   |---- d (ino 265)
> |---- del/ (ino 263)
>   |---- item1/ (ino 261)
>   |---- item2/ (ino 262)
>
> Send snapshot:
> |---- a/ (ino 259)
> |---- b/ (ino 260)
> |---- c/ (ino 2)
>   |---- item2 (ino 259)
> |---- d/ (ino 257)
>   |---- item1/ (ino 258)"
>
> So it's confusing after looking at it.
> First the send snapshot mentions inode number 2, which doesn't exist
> in the parent snapshot - I assume you meant inode number 264.
> Then, the send snapshot has two inodes with number 259. Is "item2" in
> the send snapshot supposed to be inode 262?
>

Your guess is right. And I correct it as follow.

 # Parent snapshot:
 #
 # |---- a/            (ino 259)
 # |     |---- c       (ino 264)
 # |
 # |---- b/            (ino 260)
 # |     |---- d       (ino 265)
 # |
 # |---- del/          (ino 263)
 #        |---- item1/ (ino 261)
 #        |---- item2/ (ino 262)

 # Send snapshot:
 #
 # |---- a/            (ino 259)
 # |---- b/            (ino 260)
 # |---- c/            (ino 264)
 # |     |---- item2/  (ino 262)
 # |
 # |---- d/            (ino 265)
 #       |---- item1/  (ino 261)

> Anyway, assuming those 2 fixes to the example are correct guesses, I
> tried the following and it didn't fail without your patches (i.e. no
> attempts to send utimes to a non-existing directory):
>

Here my mean is :  btrfs tries to get utime from non-existing directory and
apply it on the existing directory. And my patch is attempted to avoid
this case.
However, this case is not guaranteed to cause error anytime but it may
fails somehow
which is depending on the data on the disk.
The following are the incremental procedures to send the snapshot.

utimes
utimes a
utimes b
rename del -> o263-259-0
utimes
rename a/c -> c
utimes
utimes a
rename o263-259-0/item2 -> c/item2
utimes c/item2
utimes o263-259-0          <<---------------------- this step may cause error
utimes c
utimes c
rename b/d -> d
utimes
utimes b
rename o263-259-0/item1 -> d/item1
rmdir o263-259-0
utimes d/item1
utimes d
utimes d

As the above pointed procedure, o263-259-0 is not appeared in the send root.
When utime got from o263-259-0 is invalid (i.e. out of range of time
format), it will fail.
I saw the error occurs at utimensat() and got EINVAL. Here's
explanation from linux man page.

EINVAL Invalid value in one of the tv_nsec fields (value outside
              range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or
              an invalid value in one of the tv_sec fields.

So if o263-259-0 is not the send root, invalid format of utime may be got.

Thanks.
Robbie Ko

> # Parent snapshot:
> #
> # |---- a/            (ino 259)
> # |     |---- c       (ino 264)
> # |
> # |---- b/            (ino 260)
> # |     |---- d       (ino 265)
> # |
> # |---- del/          (ino 263)
> #        |---- item1/ (ino 261)
> #        |---- item2/ (ino 262)
>
> # Send snapshot:
> #
> # |---- a/            (ino 259)
> # |---- b/            (ino 260)
> # |---- c/            (ino 264)
> # |     |---- item2/  (ino 262)
> # |
> # |---- d/            (ino 265)
> #       |---- item1/  (ino 258)
>
> mkdir $SCRATCH_MNT/0
> mkdir $SCRATCH_MNT/1
>
> mkdir $SCRATCH_MNT/a # 259
> mkdir $SCRATCH_MNT/b # 260
> mkdir $SCRATCH_MNT/item1 # 261
> mkdir $SCRATCH_MNT/item2 # 262
> mkdir $SCRATCH_MNT/del # 263
> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1
> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2
> mkdir $SCRATCH_MNT/a/c # 264
> mkdir $SCRATCH_MNT/b/d # 265
>
> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>
> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c
> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d
> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c
> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d
> rmdir $SCRATCH_MNT/del
>
> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
>
> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
> $SCRATCH_MNT/mysnap2
>
> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
> -f $tmp/2.snap
>
> _check_scratch_fs
>
> _scratch_unmount
> _scratch_mkfs >/dev/null 2>&1
> _scratch_mount
>
> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap
> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1
>
> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap
> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2
>
>
>
> I would suggest making those hiearachy diagrams more readable - pipes
> right below the name of their parent, continuation pipes like and
> align all inode numbers in the same column, like the following:
>
> # Parent snapshot:
> #
> # |---- a/            (ino 259)
> # |     |---- c       (ino 264)
> # |
> # |---- b/            (ino 260)
> # |     |---- d       (ino 265)
> # |
> # |---- del/          (ino 263)
> #        |---- item1/ (ino 261)
> #        |---- item2/ (ino 262)
>
> # Send snapshot:
> #
> # |---- a/            (ino 259)
> # |---- b/            (ino 260)
> # |---- c/            (ino 264)
> # |     |---- item2/  (ino 262)
> # |
> # |---- d/            (ino 265)
> #       |---- item1/  (ino 258)
>
> (pasted here in case gmail screws up the indentation/formatting:
> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU)
>
> thanks
>
>>
>> Thans!
>>
>> Robbie Ko
>>
>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>>>> There's one case where we can't issue a utimes operation for a directory.
>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>>>> processed and update uime between two parent directory.
>>>> So fix this by not update non exist utimes for this case.
>>>
>>> So you mean that we attempt to update utimes for an inode,
>>> corresponding to a directory, that exists in the parent snapshot but
>>> not in the send snapshot.
>>>
>>> So the subject should be something like "Btrfs: incremental send,
>>> don't send utimes for non-existing directory" instead of "Btrfs:
>>> incremental send, fix rmdir not send utimes"
>>>
>>>>
>>>> Example:
>>>>
>>>> Parent snapshot:
>>>> |---- a/ (ino 259)
>>>>   |---- c (ino 264)
>>>> |---- b/ (ino 260)
>>>>   |---- d (ino 265)
>>>> |---- del/ (ino 263)
>>>>   |---- item1/ (ino 261)
>>>>   |---- item2/ (ino 262)
>>>>
>>>> Send snapshot:
>>>> |---- a/ (ino 259)
>>>> |---- b/ (ino 260)
>>>> |---- c/ (ino 2)
>>>>   |---- item2 (ino 259)
>>>> |---- d/ (ino 257)
>>>>   |---- item1/ (ino 258)
>>>>
>>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>>> ---
>>>>  fs/btrfs/send.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>> index e8eb3ab..46f954c 100644
>>>> --- a/fs/btrfs/send.c
>>>> +++ b/fs/btrfs/send.c
>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>>>         key.type = BTRFS_INODE_ITEM_KEY;
>>>>         key.offset = 0;
>>>>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>>>> -       if (ret < 0)
>>>> +       if (ret != 0)
>>>>                 goto out;
>>>
>>> So I don't think this is a good fix. The problem is in some code that
>>> calls this function (send_utimes) against the directory that doesn't
>>> exist - it just shouldn't do that, its logic should be fixed.
>>> Following this approach, while it works, it's just hiding logic errors
>>> in one or more code paths, and none of its callers checks for a return
>>> value of 1 - they only react to values < 0 and introduces the
>>> possibility of propagating a return value of 1 to user space.
>>>
>>> thanks
>>>
>>>>
>>>>         eb = path->nodes[0];
>>>> --
>>>> 1.9.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>>
>>> "Reasonable men adapt themselves to the world.
>>>  Unreasonable men adapt the world to themselves.
>>>  That's why all progress depends on unreasonable men."
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-09 10:04         ` Robbie Ko
@ 2015-06-09 10:36           ` Filipe David Manana
  2015-06-10 10:06             ` Robbie Ko
  0 siblings, 1 reply; 25+ messages in thread
From: Filipe David Manana @ 2015-06-09 10:36 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote:
> Hi Filipe,
>
> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote:
>>> Hi Filipe,
>>
>> Hi Robbie,
>>
>>>
>>> I've fixed "don't send utimes for non-existing directory" with another solution.
>>>
>>>  In apply_dir_move(), the old parent dir. and new parent dir. will be
>>> updated after the current dir. has moved.
>>>
>>> And there's only one entry in old parent dir. (e.g. entry with
>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
>>> is deleted but updated.
>>
>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino?
>> rmdir_ino corresponds to the number of a inode that wasn't deleted
>> when it was processed because there was some inode with a lower number
>> that is a child of the directory in the parent snapshot and had its
>> rename/move operation delayed (it happens after the directory we want
>> to delete is processed).
>>
>
> Right , my "tagged with rmdir_ino" is same meaning as you explained here.
>
>>>
>>> However, if we process rename for another entry not tagged with
>>> rmdir_ino first, its old parent dir. which is deleted  will be updated
>>> according to apply_dir_move().
>>>
>>> Therefore, I think we should check the existence of  the dir. before
>>> we're going to update it's utime.
>>>
>>> The patch is pasted in the following link, could you give me some comment?
>>>
>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k
>>
>> Looks better.
>> However I still don't understand your explanation, and just tried the
>> example in your commit message:
>>
>> "Parent snapshot:
>>
>> |---- a/ (ino 259)
>>   |---- c (ino 264)
>> |---- b/ (ino 260)
>>   |---- d (ino 265)
>> |---- del/ (ino 263)
>>   |---- item1/ (ino 261)
>>   |---- item2/ (ino 262)
>>
>> Send snapshot:
>> |---- a/ (ino 259)
>> |---- b/ (ino 260)
>> |---- c/ (ino 2)
>>   |---- item2 (ino 259)
>> |---- d/ (ino 257)
>>   |---- item1/ (ino 258)"
>>
>> So it's confusing after looking at it.
>> First the send snapshot mentions inode number 2, which doesn't exist
>> in the parent snapshot - I assume you meant inode number 264.
>> Then, the send snapshot has two inodes with number 259. Is "item2" in
>> the send snapshot supposed to be inode 262?
>>
>
> Your guess is right. And I correct it as follow.
>
>  # Parent snapshot:
>  #
>  # |---- a/            (ino 259)
>  # |     |---- c       (ino 264)
>  # |
>  # |---- b/            (ino 260)
>  # |     |---- d       (ino 265)
>  # |
>  # |---- del/          (ino 263)
>  #        |---- item1/ (ino 261)
>  #        |---- item2/ (ino 262)
>
>  # Send snapshot:
>  #
>  # |---- a/            (ino 259)
>  # |---- b/            (ino 260)
>  # |---- c/            (ino 264)
>  # |     |---- item2/  (ino 262)
>  # |
>  # |---- d/            (ino 265)
>  #       |---- item1/  (ino 261)
>
>> Anyway, assuming those 2 fixes to the example are correct guesses, I
>> tried the following and it didn't fail without your patches (i.e. no
>> attempts to send utimes to a non-existing directory):
>>
>
> Here my mean is :  btrfs tries to get utime from non-existing directory and
> apply it on the existing directory. And my patch is attempted to avoid
> this case.
> However, this case is not guaranteed to cause error anytime but it may
> fails somehow
> which is depending on the data on the disk.
> The following are the incremental procedures to send the snapshot.
>
> utimes
> utimes a
> utimes b
> rename del -> o263-259-0
> utimes
> rename a/c -> c
> utimes
> utimes a
> rename o263-259-0/item2 -> c/item2
> utimes c/item2
> utimes o263-259-0          <<---------------------- this step may cause error

Why may it cause an error?
At that moment the name/path o263-259-0 exists at the destination
(i.e. the receiver, as it applies commands from the send stream
serially).

> utimes c
> utimes c
> rename b/d -> d
> utimes
> utimes b
> rename o263-259-0/item1 -> d/item1
> rmdir o263-259-0
> utimes d/item1
> utimes d
> utimes d
>
> As the above pointed procedure, o263-259-0 is not appeared in the send root.

Well yes, but that doesn't matter.
The oXXX-YYY-ZZZ names are never in any of the roots (send or parent),
they're just temporary names to allow for rename/move operations when
collisions are detected, and exist only in the receiver's filesystem.

> When utime got from o263-259-0 is invalid (i.e. out of range of time
> format), it will fail.
> I saw the error occurs at utimensat() and got EINVAL. Here's
> explanation from linux man page.
>
> EINVAL Invalid value in one of the tv_nsec fields (value outside
>               range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or
>               an invalid value in one of the tv_sec fields.
>
> So if o263-259-0 is not the send root, invalid format of utime may be got.

Doubly confused now. So the whole change log (and the code changes)
mentions only attempts to send utimes for a directory/path that
doesn't exist in the receiver's fs - inode 263 exists in both the send
and parent roots, so we can always get its utimes values. Now you're
saying that somewhere in the send code we're getting incorrect values
for a utimes operation and then sending such operation to the
receiver?

Getting the values for utimes is done looking up the inode item, by
its number, in the send root - for this the current name of the inode
in the receiver fs doesn't matter - it matters only for building the
path for the utimes operation. But what I'm understanding from your
reply is that we're getting a wrong utimes value form the inode item
for inode number 263, which would be a totally different issue from
generating incorrect paths.


So this still doesn't explain me why, without any of your patches
applied, the reproducer doesn't fail - which is really an important
aspect.
This is the xfstest I made with your reproducer:
https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and
tests/btrfs/999.out is just contains the single line "QA output
created by 999").

If you're able to give me a correct reproducer, I can make sense of it
and help getting a better change log.
Or this issue happens only after applying some of the other patches in
the series?

thanks




>
> Thanks.
> Robbie Ko
>
>> # Parent snapshot:
>> #
>> # |---- a/            (ino 259)
>> # |     |---- c       (ino 264)
>> # |
>> # |---- b/            (ino 260)
>> # |     |---- d       (ino 265)
>> # |
>> # |---- del/          (ino 263)
>> #        |---- item1/ (ino 261)
>> #        |---- item2/ (ino 262)
>>
>> # Send snapshot:
>> #
>> # |---- a/            (ino 259)
>> # |---- b/            (ino 260)
>> # |---- c/            (ino 264)
>> # |     |---- item2/  (ino 262)
>> # |
>> # |---- d/            (ino 265)
>> #       |---- item1/  (ino 258)
>>
>> mkdir $SCRATCH_MNT/0
>> mkdir $SCRATCH_MNT/1
>>
>> mkdir $SCRATCH_MNT/a # 259
>> mkdir $SCRATCH_MNT/b # 260
>> mkdir $SCRATCH_MNT/item1 # 261
>> mkdir $SCRATCH_MNT/item2 # 262
>> mkdir $SCRATCH_MNT/del # 263
>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1
>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2
>> mkdir $SCRATCH_MNT/a/c # 264
>> mkdir $SCRATCH_MNT/b/d # 265
>>
>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>>
>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c
>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d
>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c
>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d
>> rmdir $SCRATCH_MNT/del
>>
>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
>>
>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
>> $SCRATCH_MNT/mysnap2
>>
>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
>> -f $tmp/2.snap
>>
>> _check_scratch_fs
>>
>> _scratch_unmount
>> _scratch_mkfs >/dev/null 2>&1
>> _scratch_mount
>>
>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap
>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>
>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap
>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2
>>
>>
>>
>> I would suggest making those hiearachy diagrams more readable - pipes
>> right below the name of their parent, continuation pipes like and
>> align all inode numbers in the same column, like the following:
>>
>> # Parent snapshot:
>> #
>> # |---- a/            (ino 259)
>> # |     |---- c       (ino 264)
>> # |
>> # |---- b/            (ino 260)
>> # |     |---- d       (ino 265)
>> # |
>> # |---- del/          (ino 263)
>> #        |---- item1/ (ino 261)
>> #        |---- item2/ (ino 262)
>>
>> # Send snapshot:
>> #
>> # |---- a/            (ino 259)
>> # |---- b/            (ino 260)
>> # |---- c/            (ino 264)
>> # |     |---- item2/  (ino 262)
>> # |
>> # |---- d/            (ino 265)
>> #       |---- item1/  (ino 258)
>>
>> (pasted here in case gmail screws up the indentation/formatting:
>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU)
>>
>> thanks
>>
>>>
>>> Thans!
>>>
>>> Robbie Ko
>>>
>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>>>>> There's one case where we can't issue a utimes operation for a directory.
>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>>>>> processed and update uime between two parent directory.
>>>>> So fix this by not update non exist utimes for this case.
>>>>
>>>> So you mean that we attempt to update utimes for an inode,
>>>> corresponding to a directory, that exists in the parent snapshot but
>>>> not in the send snapshot.
>>>>
>>>> So the subject should be something like "Btrfs: incremental send,
>>>> don't send utimes for non-existing directory" instead of "Btrfs:
>>>> incremental send, fix rmdir not send utimes"
>>>>
>>>>>
>>>>> Example:
>>>>>
>>>>> Parent snapshot:
>>>>> |---- a/ (ino 259)
>>>>>   |---- c (ino 264)
>>>>> |---- b/ (ino 260)
>>>>>   |---- d (ino 265)
>>>>> |---- del/ (ino 263)
>>>>>   |---- item1/ (ino 261)
>>>>>   |---- item2/ (ino 262)
>>>>>
>>>>> Send snapshot:
>>>>> |---- a/ (ino 259)
>>>>> |---- b/ (ino 260)
>>>>> |---- c/ (ino 2)
>>>>>   |---- item2 (ino 259)
>>>>> |---- d/ (ino 257)
>>>>>   |---- item1/ (ino 258)
>>>>>
>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>>>> ---
>>>>>  fs/btrfs/send.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>>> index e8eb3ab..46f954c 100644
>>>>> --- a/fs/btrfs/send.c
>>>>> +++ b/fs/btrfs/send.c
>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>>>>         key.type = BTRFS_INODE_ITEM_KEY;
>>>>>         key.offset = 0;
>>>>>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>>>>> -       if (ret < 0)
>>>>> +       if (ret != 0)
>>>>>                 goto out;
>>>>
>>>> So I don't think this is a good fix. The problem is in some code that
>>>> calls this function (send_utimes) against the directory that doesn't
>>>> exist - it just shouldn't do that, its logic should be fixed.
>>>> Following this approach, while it works, it's just hiding logic errors
>>>> in one or more code paths, and none of its callers checks for a return
>>>> value of 1 - they only react to values < 0 and introduces the
>>>> possibility of propagating a return value of 1 to user space.
>>>>
>>>> thanks
>>>>
>>>>>
>>>>>         eb = path->nodes[0];
>>>>> --
>>>>> 1.9.1
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Filipe David Manana,
>>>>
>>>> "Reasonable men adapt themselves to the world.
>>>>  Unreasonable men adapt the world to themselves.
>>>>  That's why all progress depends on unreasonable men."
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-09 10:36           ` Filipe David Manana
@ 2015-06-10 10:06             ` Robbie Ko
  2015-06-18  3:21               ` Robbie Ko
  0 siblings, 1 reply; 25+ messages in thread
From: Robbie Ko @ 2015-06-10 10:06 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org

Hi Filipi,

2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
> On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote:
>> Hi Filipe,
>>
>> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote:
>>>> Hi Filipe,
>>>
>>> Hi Robbie,
>>>
>>>>
>>>> I've fixed "don't send utimes for non-existing directory" with another solution.
>>>>
>>>>  In apply_dir_move(), the old parent dir. and new parent dir. will be
>>>> updated after the current dir. has moved.
>>>>
>>>> And there's only one entry in old parent dir. (e.g. entry with
>>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
>>>> is deleted but updated.
>>>
>>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino?
>>> rmdir_ino corresponds to the number of a inode that wasn't deleted
>>> when it was processed because there was some inode with a lower number
>>> that is a child of the directory in the parent snapshot and had its
>>> rename/move operation delayed (it happens after the directory we want
>>> to delete is processed).
>>>
>>
>> Right , my "tagged with rmdir_ino" is same meaning as you explained here.
>>
>>>>
>>>> However, if we process rename for another entry not tagged with
>>>> rmdir_ino first, its old parent dir. which is deleted  will be updated
>>>> according to apply_dir_move().
>>>>
>>>> Therefore, I think we should check the existence of  the dir. before
>>>> we're going to update it's utime.
>>>>
>>>> The patch is pasted in the following link, could you give me some comment?
>>>>
>>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k
>>>
>>> Looks better.
>>> However I still don't understand your explanation, and just tried the
>>> example in your commit message:
>>>
>>> "Parent snapshot:
>>>
>>> |---- a/ (ino 259)
>>>   |---- c (ino 264)
>>> |---- b/ (ino 260)
>>>   |---- d (ino 265)
>>> |---- del/ (ino 263)
>>>   |---- item1/ (ino 261)
>>>   |---- item2/ (ino 262)
>>>
>>> Send snapshot:
>>> |---- a/ (ino 259)
>>> |---- b/ (ino 260)
>>> |---- c/ (ino 2)
>>>   |---- item2 (ino 259)
>>> |---- d/ (ino 257)
>>>   |---- item1/ (ino 258)"
>>>
>>> So it's confusing after looking at it.
>>> First the send snapshot mentions inode number 2, which doesn't exist
>>> in the parent snapshot - I assume you meant inode number 264.
>>> Then, the send snapshot has two inodes with number 259. Is "item2" in
>>> the send snapshot supposed to be inode 262?
>>>
>>
>> Your guess is right. And I correct it as follow.
>>
>>  # Parent snapshot:
>>  #
>>  # |---- a/            (ino 259)
>>  # |     |---- c       (ino 264)
>>  # |
>>  # |---- b/            (ino 260)
>>  # |     |---- d       (ino 265)
>>  # |
>>  # |---- del/          (ino 263)
>>  #        |---- item1/ (ino 261)
>>  #        |---- item2/ (ino 262)
>>
>>  # Send snapshot:
>>  #
>>  # |---- a/            (ino 259)
>>  # |---- b/            (ino 260)
>>  # |---- c/            (ino 264)
>>  # |     |---- item2/  (ino 262)
>>  # |
>>  # |---- d/            (ino 265)
>>  #       |---- item1/  (ino 261)
>>
>>> Anyway, assuming those 2 fixes to the example are correct guesses, I
>>> tried the following and it didn't fail without your patches (i.e. no
>>> attempts to send utimes to a non-existing directory):
>>>
>>
>> Here my mean is :  btrfs tries to get utime from non-existing directory and
>> apply it on the existing directory. And my patch is attempted to avoid
>> this case.
>> However, this case is not guaranteed to cause error anytime but it may
>> fails somehow
>> which is depending on the data on the disk.
>> The following are the incremental procedures to send the snapshot.
>>
>> utimes
>> utimes a
>> utimes b
>> rename del -> o263-259-0
>> utimes
>> rename a/c -> c
>> utimes
>> utimes a
>> rename o263-259-0/item2 -> c/item2
>> utimes c/item2
>> utimes o263-259-0          <<---------------------- this step may cause error
>
> Why may it cause an error?
> At that moment the name/path o263-259-0 exists at the destination
> (i.e. the receiver, as it applies commands from the send stream
> serially).

Following is the error message when do receive.
ERROR: utimes o263-259-0 failed. Invalid argument

The argument of utimes for o263-259-0 is got from inode 263 in send
root, But inode 263 is not exist in send root
In send_utimes(), we didn't check if btrfs_search_slot returns 1,
therefore may encounters this problem.

I will try to make a stable reproducer.

Thanks.
Robbie Ko

>
>> utimes c
>> utimes c
>> rename b/d -> d
>> utimes
>> utimes b
>> rename o263-259-0/item1 -> d/item1
>> rmdir o263-259-0
>> utimes d/item1
>> utimes d
>> utimes d
>>
>> As the above pointed procedure, o263-259-0 is not appeared in the send root.
>
> Well yes, but that doesn't matter.
> The oXXX-YYY-ZZZ names are never in any of the roots (send or parent),
> they're just temporary names to allow for rename/move operations when
> collisions are detected, and exist only in the receiver's filesystem.
>
>> When utime got from o263-259-0 is invalid (i.e. out of range of time
>> format), it will fail.
>> I saw the error occurs at utimensat() and got EINVAL. Here's
>> explanation from linux man page.
>>
>> EINVAL Invalid value in one of the tv_nsec fields (value outside
>>               range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or
>>               an invalid value in one of the tv_sec fields.
>>
>> So if o263-259-0 is not the send root, invalid format of utime may be got.
>
> Doubly confused now. So the whole change log (and the code changes)
> mentions only attempts to send utimes for a directory/path that
> doesn't exist in the receiver's fs - inode 263 exists in both the send
> and parent roots, so we can always get its utimes values. Now you're
> saying that somewhere in the send code we're getting incorrect values
> for a utimes operation and then sending such operation to the
> receiver?
>
> Getting the values for utimes is done looking up the inode item, by
> its number, in the send root - for this the current name of the inode
> in the receiver fs doesn't matter - it matters only for building the
> path for the utimes operation. But what I'm understanding from your
> reply is that we're getting a wrong utimes value form the inode item
> for inode number 263, which would be a totally different issue from
> generating incorrect paths.
>
>
> So this still doesn't explain me why, without any of your patches
> applied, the reproducer doesn't fail - which is really an important
> aspect.
> This is the xfstest I made with your reproducer:
> https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and
> tests/btrfs/999.out is just contains the single line "QA output
> created by 999").
>
> If you're able to give me a correct reproducer, I can make sense of it
> and help getting a better change log.
> Or this issue happens only after applying some of the other patches in
> the series?
>
> thanks
>
>
>
>
>>
>> Thanks.
>> Robbie Ko
>>
>>> # Parent snapshot:
>>> #
>>> # |---- a/            (ino 259)
>>> # |     |---- c       (ino 264)
>>> # |
>>> # |---- b/            (ino 260)
>>> # |     |---- d       (ino 265)
>>> # |
>>> # |---- del/          (ino 263)
>>> #        |---- item1/ (ino 261)
>>> #        |---- item2/ (ino 262)
>>>
>>> # Send snapshot:
>>> #
>>> # |---- a/            (ino 259)
>>> # |---- b/            (ino 260)
>>> # |---- c/            (ino 264)
>>> # |     |---- item2/  (ino 262)
>>> # |
>>> # |---- d/            (ino 265)
>>> #       |---- item1/  (ino 258)
>>>
>>> mkdir $SCRATCH_MNT/0
>>> mkdir $SCRATCH_MNT/1
>>>
>>> mkdir $SCRATCH_MNT/a # 259
>>> mkdir $SCRATCH_MNT/b # 260
>>> mkdir $SCRATCH_MNT/item1 # 261
>>> mkdir $SCRATCH_MNT/item2 # 262
>>> mkdir $SCRATCH_MNT/del # 263
>>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1
>>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2
>>> mkdir $SCRATCH_MNT/a/c # 264
>>> mkdir $SCRATCH_MNT/b/d # 265
>>>
>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>>>
>>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c
>>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d
>>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c
>>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d
>>> rmdir $SCRATCH_MNT/del
>>>
>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
>>>
>>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
>>> $SCRATCH_MNT/mysnap2
>>>
>>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
>>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
>>> -f $tmp/2.snap
>>>
>>> _check_scratch_fs
>>>
>>> _scratch_unmount
>>> _scratch_mkfs >/dev/null 2>&1
>>> _scratch_mount
>>>
>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap
>>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>
>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap
>>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2
>>>
>>>
>>>
>>> I would suggest making those hiearachy diagrams more readable - pipes
>>> right below the name of their parent, continuation pipes like and
>>> align all inode numbers in the same column, like the following:
>>>
>>> # Parent snapshot:
>>> #
>>> # |---- a/            (ino 259)
>>> # |     |---- c       (ino 264)
>>> # |
>>> # |---- b/            (ino 260)
>>> # |     |---- d       (ino 265)
>>> # |
>>> # |---- del/          (ino 263)
>>> #        |---- item1/ (ino 261)
>>> #        |---- item2/ (ino 262)
>>>
>>> # Send snapshot:
>>> #
>>> # |---- a/            (ino 259)
>>> # |---- b/            (ino 260)
>>> # |---- c/            (ino 264)
>>> # |     |---- item2/  (ino 262)
>>> # |
>>> # |---- d/            (ino 265)
>>> #       |---- item1/  (ino 258)
>>>
>>> (pasted here in case gmail screws up the indentation/formatting:
>>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU)
>>>
>>> thanks
>>>
>>>>
>>>> Thans!
>>>>
>>>> Robbie Ko
>>>>
>>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>>>>>> There's one case where we can't issue a utimes operation for a directory.
>>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>>>>>> processed and update uime between two parent directory.
>>>>>> So fix this by not update non exist utimes for this case.
>>>>>
>>>>> So you mean that we attempt to update utimes for an inode,
>>>>> corresponding to a directory, that exists in the parent snapshot but
>>>>> not in the send snapshot.
>>>>>
>>>>> So the subject should be something like "Btrfs: incremental send,
>>>>> don't send utimes for non-existing directory" instead of "Btrfs:
>>>>> incremental send, fix rmdir not send utimes"
>>>>>
>>>>>>
>>>>>> Example:
>>>>>>
>>>>>> Parent snapshot:
>>>>>> |---- a/ (ino 259)
>>>>>>   |---- c (ino 264)
>>>>>> |---- b/ (ino 260)
>>>>>>   |---- d (ino 265)
>>>>>> |---- del/ (ino 263)
>>>>>>   |---- item1/ (ino 261)
>>>>>>   |---- item2/ (ino 262)
>>>>>>
>>>>>> Send snapshot:
>>>>>> |---- a/ (ino 259)
>>>>>> |---- b/ (ino 260)
>>>>>> |---- c/ (ino 2)
>>>>>>   |---- item2 (ino 259)
>>>>>> |---- d/ (ino 257)
>>>>>>   |---- item1/ (ino 258)
>>>>>>
>>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>>>>> ---
>>>>>>  fs/btrfs/send.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>>>> index e8eb3ab..46f954c 100644
>>>>>> --- a/fs/btrfs/send.c
>>>>>> +++ b/fs/btrfs/send.c
>>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>>>>>         key.type = BTRFS_INODE_ITEM_KEY;
>>>>>>         key.offset = 0;
>>>>>>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>>>>>> -       if (ret < 0)
>>>>>> +       if (ret != 0)
>>>>>>                 goto out;
>>>>>
>>>>> So I don't think this is a good fix. The problem is in some code that
>>>>> calls this function (send_utimes) against the directory that doesn't
>>>>> exist - it just shouldn't do that, its logic should be fixed.
>>>>> Following this approach, while it works, it's just hiding logic errors
>>>>> in one or more code paths, and none of its callers checks for a return
>>>>> value of 1 - they only react to values < 0 and introduces the
>>>>> possibility of propagating a return value of 1 to user space.
>>>>>
>>>>> thanks
>>>>>
>>>>>>
>>>>>>         eb = path->nodes[0];
>>>>>> --
>>>>>> 1.9.1
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Filipe David Manana,
>>>>>
>>>>> "Reasonable men adapt themselves to the world.
>>>>>  Unreasonable men adapt the world to themselves.
>>>>>  That's why all progress depends on unreasonable men."
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>>
>>> "Reasonable men adapt themselves to the world.
>>>  Unreasonable men adapt the world to themselves.
>>>  That's why all progress depends on unreasonable men."
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-10 10:06             ` Robbie Ko
@ 2015-06-18  3:21               ` Robbie Ko
  2015-06-18 18:11                 ` Filipe David Manana
  0 siblings, 1 reply; 25+ messages in thread
From: Robbie Ko @ 2015-06-18  3:21 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org

Hi Filipe,

I've found that the following case is the main cause of such error
and it's fs tree is shown via btrfs-debug-tress as below.

file tree key (459 ROOT_ITEM 20487)
node 132988928 level 1 items 3 free 490 generation 20487 owner 459
fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
        key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486
        key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480
        key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464
leaf 132710400 items 166 free space 3639 generation 20486 owner 455
fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
        item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
                inode generation 20425 transid 20442 size 32 block
group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
        item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
                inode ref index 0 namelen 2 name: ..
...
        item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39
                location key (0 UNKNOWN.0 0) type XATTR
                namelen 8 datalen 1 name: user.a78
                data a
                binary 61
leaf 130695168 items 133 free space 7332 generation 20480 owner 455
fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
        item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160
                inode generation 20428 transid 20434 size 10 block
group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
        item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11
                inode ref index 11 namelen 1 name: c
...

We can see that inode 262 is right at the end of leaf. Then send_utime() will
use btrfs_search_slot() to find a appropriate place to put 262 where is at the
back of 262. However, that place is uninitialized on disk. Suppose we read
atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out.
Receiving side will  got EINVAL since tv_nsec:1919251317 is greater
than   999,999,999.

Thanks.
Robbie Ko

2015-06-10 18:06 GMT+08:00 Robbie Ko <robbieko@synology.com>:
> Hi Filipi,
>
> 2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>> On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote:
>>> Hi Filipe,
>>>
>>> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote:
>>>>> Hi Filipe,
>>>>
>>>> Hi Robbie,
>>>>
>>>>>
>>>>> I've fixed "don't send utimes for non-existing directory" with another solution.
>>>>>
>>>>>  In apply_dir_move(), the old parent dir. and new parent dir. will be
>>>>> updated after the current dir. has moved.
>>>>>
>>>>> And there's only one entry in old parent dir. (e.g. entry with
>>>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
>>>>> is deleted but updated.
>>>>
>>>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino?
>>>> rmdir_ino corresponds to the number of a inode that wasn't deleted
>>>> when it was processed because there was some inode with a lower number
>>>> that is a child of the directory in the parent snapshot and had its
>>>> rename/move operation delayed (it happens after the directory we want
>>>> to delete is processed).
>>>>
>>>
>>> Right , my "tagged with rmdir_ino" is same meaning as you explained here.
>>>
>>>>>
>>>>> However, if we process rename for another entry not tagged with
>>>>> rmdir_ino first, its old parent dir. which is deleted  will be updated
>>>>> according to apply_dir_move().
>>>>>
>>>>> Therefore, I think we should check the existence of  the dir. before
>>>>> we're going to update it's utime.
>>>>>
>>>>> The patch is pasted in the following link, could you give me some comment?
>>>>>
>>>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k
>>>>
>>>> Looks better.
>>>> However I still don't understand your explanation, and just tried the
>>>> example in your commit message:
>>>>
>>>> "Parent snapshot:
>>>>
>>>> |---- a/ (ino 259)
>>>>   |---- c (ino 264)
>>>> |---- b/ (ino 260)
>>>>   |---- d (ino 265)
>>>> |---- del/ (ino 263)
>>>>   |---- item1/ (ino 261)
>>>>   |---- item2/ (ino 262)
>>>>
>>>> Send snapshot:
>>>> |---- a/ (ino 259)
>>>> |---- b/ (ino 260)
>>>> |---- c/ (ino 2)
>>>>   |---- item2 (ino 259)
>>>> |---- d/ (ino 257)
>>>>   |---- item1/ (ino 258)"
>>>>
>>>> So it's confusing after looking at it.
>>>> First the send snapshot mentions inode number 2, which doesn't exist
>>>> in the parent snapshot - I assume you meant inode number 264.
>>>> Then, the send snapshot has two inodes with number 259. Is "item2" in
>>>> the send snapshot supposed to be inode 262?
>>>>
>>>
>>> Your guess is right. And I correct it as follow.
>>>
>>>  # Parent snapshot:
>>>  #
>>>  # |---- a/            (ino 259)
>>>  # |     |---- c       (ino 264)
>>>  # |
>>>  # |---- b/            (ino 260)
>>>  # |     |---- d       (ino 265)
>>>  # |
>>>  # |---- del/          (ino 263)
>>>  #        |---- item1/ (ino 261)
>>>  #        |---- item2/ (ino 262)
>>>
>>>  # Send snapshot:
>>>  #
>>>  # |---- a/            (ino 259)
>>>  # |---- b/            (ino 260)
>>>  # |---- c/            (ino 264)
>>>  # |     |---- item2/  (ino 262)
>>>  # |
>>>  # |---- d/            (ino 265)
>>>  #       |---- item1/  (ino 261)
>>>
>>>> Anyway, assuming those 2 fixes to the example are correct guesses, I
>>>> tried the following and it didn't fail without your patches (i.e. no
>>>> attempts to send utimes to a non-existing directory):
>>>>
>>>
>>> Here my mean is :  btrfs tries to get utime from non-existing directory and
>>> apply it on the existing directory. And my patch is attempted to avoid
>>> this case.
>>> However, this case is not guaranteed to cause error anytime but it may
>>> fails somehow
>>> which is depending on the data on the disk.
>>> The following are the incremental procedures to send the snapshot.
>>>
>>> utimes
>>> utimes a
>>> utimes b
>>> rename del -> o263-259-0
>>> utimes
>>> rename a/c -> c
>>> utimes
>>> utimes a
>>> rename o263-259-0/item2 -> c/item2
>>> utimes c/item2
>>> utimes o263-259-0          <<---------------------- this step may cause error
>>
>> Why may it cause an error?
>> At that moment the name/path o263-259-0 exists at the destination
>> (i.e. the receiver, as it applies commands from the send stream
>> serially).
>
> Following is the error message when do receive.
> ERROR: utimes o263-259-0 failed. Invalid argument
>
> The argument of utimes for o263-259-0 is got from inode 263 in send
> root, But inode 263 is not exist in send root
> In send_utimes(), we didn't check if btrfs_search_slot returns 1,
> therefore may encounters this problem.
>
> I will try to make a stable reproducer.
>
> Thanks.
> Robbie Ko
>
>>
>>> utimes c
>>> utimes c
>>> rename b/d -> d
>>> utimes
>>> utimes b
>>> rename o263-259-0/item1 -> d/item1
>>> rmdir o263-259-0
>>> utimes d/item1
>>> utimes d
>>> utimes d
>>>
>>> As the above pointed procedure, o263-259-0 is not appeared in the send root.
>>
>> Well yes, but that doesn't matter.
>> The oXXX-YYY-ZZZ names are never in any of the roots (send or parent),
>> they're just temporary names to allow for rename/move operations when
>> collisions are detected, and exist only in the receiver's filesystem.
>>
>>> When utime got from o263-259-0 is invalid (i.e. out of range of time
>>> format), it will fail.
>>> I saw the error occurs at utimensat() and got EINVAL. Here's
>>> explanation from linux man page.
>>>
>>> EINVAL Invalid value in one of the tv_nsec fields (value outside
>>>               range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or
>>>               an invalid value in one of the tv_sec fields.
>>>
>>> So if o263-259-0 is not the send root, invalid format of utime may be got.
>>
>> Doubly confused now. So the whole change log (and the code changes)
>> mentions only attempts to send utimes for a directory/path that
>> doesn't exist in the receiver's fs - inode 263 exists in both the send
>> and parent roots, so we can always get its utimes values. Now you're
>> saying that somewhere in the send code we're getting incorrect values
>> for a utimes operation and then sending such operation to the
>> receiver?
>>
>> Getting the values for utimes is done looking up the inode item, by
>> its number, in the send root - for this the current name of the inode
>> in the receiver fs doesn't matter - it matters only for building the
>> path for the utimes operation. But what I'm understanding from your
>> reply is that we're getting a wrong utimes value form the inode item
>> for inode number 263, which would be a totally different issue from
>> generating incorrect paths.
>>
>>
>> So this still doesn't explain me why, without any of your patches
>> applied, the reproducer doesn't fail - which is really an important
>> aspect.
>> This is the xfstest I made with your reproducer:
>> https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and
>> tests/btrfs/999.out is just contains the single line "QA output
>> created by 999").
>>
>> If you're able to give me a correct reproducer, I can make sense of it
>> and help getting a better change log.
>> Or this issue happens only after applying some of the other patches in
>> the series?
>>
>> thanks
>>
>>
>>
>>
>>>
>>> Thanks.
>>> Robbie Ko
>>>
>>>> # Parent snapshot:
>>>> #
>>>> # |---- a/            (ino 259)
>>>> # |     |---- c       (ino 264)
>>>> # |
>>>> # |---- b/            (ino 260)
>>>> # |     |---- d       (ino 265)
>>>> # |
>>>> # |---- del/          (ino 263)
>>>> #        |---- item1/ (ino 261)
>>>> #        |---- item2/ (ino 262)
>>>>
>>>> # Send snapshot:
>>>> #
>>>> # |---- a/            (ino 259)
>>>> # |---- b/            (ino 260)
>>>> # |---- c/            (ino 264)
>>>> # |     |---- item2/  (ino 262)
>>>> # |
>>>> # |---- d/            (ino 265)
>>>> #       |---- item1/  (ino 258)
>>>>
>>>> mkdir $SCRATCH_MNT/0
>>>> mkdir $SCRATCH_MNT/1
>>>>
>>>> mkdir $SCRATCH_MNT/a # 259
>>>> mkdir $SCRATCH_MNT/b # 260
>>>> mkdir $SCRATCH_MNT/item1 # 261
>>>> mkdir $SCRATCH_MNT/item2 # 262
>>>> mkdir $SCRATCH_MNT/del # 263
>>>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1
>>>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2
>>>> mkdir $SCRATCH_MNT/a/c # 264
>>>> mkdir $SCRATCH_MNT/b/d # 265
>>>>
>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>>>>
>>>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c
>>>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d
>>>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c
>>>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d
>>>> rmdir $SCRATCH_MNT/del
>>>>
>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
>>>>
>>>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
>>>> $SCRATCH_MNT/mysnap2
>>>>
>>>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
>>>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
>>>> -f $tmp/2.snap
>>>>
>>>> _check_scratch_fs
>>>>
>>>> _scratch_unmount
>>>> _scratch_mkfs >/dev/null 2>&1
>>>> _scratch_mount
>>>>
>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap
>>>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>>
>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap
>>>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2
>>>>
>>>>
>>>>
>>>> I would suggest making those hiearachy diagrams more readable - pipes
>>>> right below the name of their parent, continuation pipes like and
>>>> align all inode numbers in the same column, like the following:
>>>>
>>>> # Parent snapshot:
>>>> #
>>>> # |---- a/            (ino 259)
>>>> # |     |---- c       (ino 264)
>>>> # |
>>>> # |---- b/            (ino 260)
>>>> # |     |---- d       (ino 265)
>>>> # |
>>>> # |---- del/          (ino 263)
>>>> #        |---- item1/ (ino 261)
>>>> #        |---- item2/ (ino 262)
>>>>
>>>> # Send snapshot:
>>>> #
>>>> # |---- a/            (ino 259)
>>>> # |---- b/            (ino 260)
>>>> # |---- c/            (ino 264)
>>>> # |     |---- item2/  (ino 262)
>>>> # |
>>>> # |---- d/            (ino 265)
>>>> #       |---- item1/  (ino 258)
>>>>
>>>> (pasted here in case gmail screws up the indentation/formatting:
>>>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU)
>>>>
>>>> thanks
>>>>
>>>>>
>>>>> Thans!
>>>>>
>>>>> Robbie Ko
>>>>>
>>>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>>>>>>> There's one case where we can't issue a utimes operation for a directory.
>>>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>>>>>>> processed and update uime between two parent directory.
>>>>>>> So fix this by not update non exist utimes for this case.
>>>>>>
>>>>>> So you mean that we attempt to update utimes for an inode,
>>>>>> corresponding to a directory, that exists in the parent snapshot but
>>>>>> not in the send snapshot.
>>>>>>
>>>>>> So the subject should be something like "Btrfs: incremental send,
>>>>>> don't send utimes for non-existing directory" instead of "Btrfs:
>>>>>> incremental send, fix rmdir not send utimes"
>>>>>>
>>>>>>>
>>>>>>> Example:
>>>>>>>
>>>>>>> Parent snapshot:
>>>>>>> |---- a/ (ino 259)
>>>>>>>   |---- c (ino 264)
>>>>>>> |---- b/ (ino 260)
>>>>>>>   |---- d (ino 265)
>>>>>>> |---- del/ (ino 263)
>>>>>>>   |---- item1/ (ino 261)
>>>>>>>   |---- item2/ (ino 262)
>>>>>>>
>>>>>>> Send snapshot:
>>>>>>> |---- a/ (ino 259)
>>>>>>> |---- b/ (ino 260)
>>>>>>> |---- c/ (ino 2)
>>>>>>>   |---- item2 (ino 259)
>>>>>>> |---- d/ (ino 257)
>>>>>>>   |---- item1/ (ino 258)
>>>>>>>
>>>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>>>>>> ---
>>>>>>>  fs/btrfs/send.c | 2 +-
>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>>>>> index e8eb3ab..46f954c 100644
>>>>>>> --- a/fs/btrfs/send.c
>>>>>>> +++ b/fs/btrfs/send.c
>>>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>>>>>>         key.type = BTRFS_INODE_ITEM_KEY;
>>>>>>>         key.offset = 0;
>>>>>>>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>>>>>>> -       if (ret < 0)
>>>>>>> +       if (ret != 0)
>>>>>>>                 goto out;
>>>>>>
>>>>>> So I don't think this is a good fix. The problem is in some code that
>>>>>> calls this function (send_utimes) against the directory that doesn't
>>>>>> exist - it just shouldn't do that, its logic should be fixed.
>>>>>> Following this approach, while it works, it's just hiding logic errors
>>>>>> in one or more code paths, and none of its callers checks for a return
>>>>>> value of 1 - they only react to values < 0 and introduces the
>>>>>> possibility of propagating a return value of 1 to user space.
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>>>
>>>>>>>         eb = path->nodes[0];
>>>>>>> --
>>>>>>> 1.9.1
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Filipe David Manana,
>>>>>>
>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>  Unreasonable men adapt the world to themselves.
>>>>>>  That's why all progress depends on unreasonable men."
>>>>
>>>>
>>>>
>>>> --
>>>> Filipe David Manana,
>>>>
>>>> "Reasonable men adapt themselves to the world.
>>>>  Unreasonable men adapt the world to themselves.
>>>>  That's why all progress depends on unreasonable men."
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-18  3:21               ` Robbie Ko
@ 2015-06-18 18:11                 ` Filipe David Manana
  2015-06-22  5:35                   ` Robbie Ko
  0 siblings, 1 reply; 25+ messages in thread
From: Filipe David Manana @ 2015-06-18 18:11 UTC (permalink / raw)
  To: Robbie Ko; +Cc: linux-btrfs@vger.kernel.org

On Thu, Jun 18, 2015 at 4:21 AM, Robbie Ko <robbieko@synology.com> wrote:
> Hi Filipe,
>
> I've found that the following case is the main cause of such error
> and it's fs tree is shown via btrfs-debug-tress as below.
>
> file tree key (459 ROOT_ITEM 20487)
> node 132988928 level 1 items 3 free 490 generation 20487 owner 459
> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
>         key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486
>         key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480
>         key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464
> leaf 132710400 items 166 free space 3639 generation 20486 owner 455
> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
>         item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
>                 inode generation 20425 transid 20442 size 32 block
> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
>         item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
>                 inode ref index 0 namelen 2 name: ..
> ...
>         item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39
>                 location key (0 UNKNOWN.0 0) type XATTR
>                 namelen 8 datalen 1 name: user.a78
>                 data a
>                 binary 61
> leaf 130695168 items 133 free space 7332 generation 20480 owner 455
> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
>         item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160
>                 inode generation 20428 transid 20434 size 10 block
> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
>         item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11
>                 inode ref index 11 namelen 1 name: c
> ...
>
> We can see that inode 262 is right at the end of leaf. Then send_utime() will
> use btrfs_search_slot() to find a appropriate place to put 262 where is at the
>  back of 262. However, that place is uninitialized on disk.
> Suppose we read
> atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out.
> Receiving side will  got EINVAL since tv_nsec:1919251317 is greater
> than   999,999,999.

I see.
So in apply_dir_move, instead of searching the btree of the send
snapshot, we can search the rbtree of orphan dir infos for an entry
with a key == cur->dir. Searching that rbtree makes it clear what the
intention is and more efficient (fully in memory structure, and much
smaller than the btree). Should work, but I haven't tested it.

thanks

>
> Thanks.
> Robbie Ko
>
> 2015-06-10 18:06 GMT+08:00 Robbie Ko <robbieko@synology.com>:
>> Hi Filipi,
>>
>> 2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>> On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote:
>>>> Hi Filipe,
>>>>
>>>> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>>>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote:
>>>>>> Hi Filipe,
>>>>>
>>>>> Hi Robbie,
>>>>>
>>>>>>
>>>>>> I've fixed "don't send utimes for non-existing directory" with another solution.
>>>>>>
>>>>>>  In apply_dir_move(), the old parent dir. and new parent dir. will be
>>>>>> updated after the current dir. has moved.
>>>>>>
>>>>>> And there's only one entry in old parent dir. (e.g. entry with
>>>>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
>>>>>> is deleted but updated.
>>>>>
>>>>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino?
>>>>> rmdir_ino corresponds to the number of a inode that wasn't deleted
>>>>> when it was processed because there was some inode with a lower number
>>>>> that is a child of the directory in the parent snapshot and had its
>>>>> rename/move operation delayed (it happens after the directory we want
>>>>> to delete is processed).
>>>>>
>>>>
>>>> Right , my "tagged with rmdir_ino" is same meaning as you explained here.
>>>>
>>>>>>
>>>>>> However, if we process rename for another entry not tagged with
>>>>>> rmdir_ino first, its old parent dir. which is deleted  will be updated
>>>>>> according to apply_dir_move().
>>>>>>
>>>>>> Therefore, I think we should check the existence of  the dir. before
>>>>>> we're going to update it's utime.
>>>>>>
>>>>>> The patch is pasted in the following link, could you give me some comment?
>>>>>>
>>>>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k
>>>>>
>>>>> Looks better.
>>>>> However I still don't understand your explanation, and just tried the
>>>>> example in your commit message:
>>>>>
>>>>> "Parent snapshot:
>>>>>
>>>>> |---- a/ (ino 259)
>>>>>   |---- c (ino 264)
>>>>> |---- b/ (ino 260)
>>>>>   |---- d (ino 265)
>>>>> |---- del/ (ino 263)
>>>>>   |---- item1/ (ino 261)
>>>>>   |---- item2/ (ino 262)
>>>>>
>>>>> Send snapshot:
>>>>> |---- a/ (ino 259)
>>>>> |---- b/ (ino 260)
>>>>> |---- c/ (ino 2)
>>>>>   |---- item2 (ino 259)
>>>>> |---- d/ (ino 257)
>>>>>   |---- item1/ (ino 258)"
>>>>>
>>>>> So it's confusing after looking at it.
>>>>> First the send snapshot mentions inode number 2, which doesn't exist
>>>>> in the parent snapshot - I assume you meant inode number 264.
>>>>> Then, the send snapshot has two inodes with number 259. Is "item2" in
>>>>> the send snapshot supposed to be inode 262?
>>>>>
>>>>
>>>> Your guess is right. And I correct it as follow.
>>>>
>>>>  # Parent snapshot:
>>>>  #
>>>>  # |---- a/            (ino 259)
>>>>  # |     |---- c       (ino 264)
>>>>  # |
>>>>  # |---- b/            (ino 260)
>>>>  # |     |---- d       (ino 265)
>>>>  # |
>>>>  # |---- del/          (ino 263)
>>>>  #        |---- item1/ (ino 261)
>>>>  #        |---- item2/ (ino 262)
>>>>
>>>>  # Send snapshot:
>>>>  #
>>>>  # |---- a/            (ino 259)
>>>>  # |---- b/            (ino 260)
>>>>  # |---- c/            (ino 264)
>>>>  # |     |---- item2/  (ino 262)
>>>>  # |
>>>>  # |---- d/            (ino 265)
>>>>  #       |---- item1/  (ino 261)
>>>>
>>>>> Anyway, assuming those 2 fixes to the example are correct guesses, I
>>>>> tried the following and it didn't fail without your patches (i.e. no
>>>>> attempts to send utimes to a non-existing directory):
>>>>>
>>>>
>>>> Here my mean is :  btrfs tries to get utime from non-existing directory and
>>>> apply it on the existing directory. And my patch is attempted to avoid
>>>> this case.
>>>> However, this case is not guaranteed to cause error anytime but it may
>>>> fails somehow
>>>> which is depending on the data on the disk.
>>>> The following are the incremental procedures to send the snapshot.
>>>>
>>>> utimes
>>>> utimes a
>>>> utimes b
>>>> rename del -> o263-259-0
>>>> utimes
>>>> rename a/c -> c
>>>> utimes
>>>> utimes a
>>>> rename o263-259-0/item2 -> c/item2
>>>> utimes c/item2
>>>> utimes o263-259-0          <<---------------------- this step may cause error
>>>
>>> Why may it cause an error?
>>> At that moment the name/path o263-259-0 exists at the destination
>>> (i.e. the receiver, as it applies commands from the send stream
>>> serially).
>>
>> Following is the error message when do receive.
>> ERROR: utimes o263-259-0 failed. Invalid argument
>>
>> The argument of utimes for o263-259-0 is got from inode 263 in send
>> root, But inode 263 is not exist in send root
>> In send_utimes(), we didn't check if btrfs_search_slot returns 1,
>> therefore may encounters this problem.
>>
>> I will try to make a stable reproducer.
>>
>> Thanks.
>> Robbie Ko
>>
>>>
>>>> utimes c
>>>> utimes c
>>>> rename b/d -> d
>>>> utimes
>>>> utimes b
>>>> rename o263-259-0/item1 -> d/item1
>>>> rmdir o263-259-0
>>>> utimes d/item1
>>>> utimes d
>>>> utimes d
>>>>
>>>> As the above pointed procedure, o263-259-0 is not appeared in the send root.
>>>
>>> Well yes, but that doesn't matter.
>>> The oXXX-YYY-ZZZ names are never in any of the roots (send or parent),
>>> they're just temporary names to allow for rename/move operations when
>>> collisions are detected, and exist only in the receiver's filesystem.
>>>
>>>> When utime got from o263-259-0 is invalid (i.e. out of range of time
>>>> format), it will fail.
>>>> I saw the error occurs at utimensat() and got EINVAL. Here's
>>>> explanation from linux man page.
>>>>
>>>> EINVAL Invalid value in one of the tv_nsec fields (value outside
>>>>               range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or
>>>>               an invalid value in one of the tv_sec fields.
>>>>
>>>> So if o263-259-0 is not the send root, invalid format of utime may be got.
>>>
>>> Doubly confused now. So the whole change log (and the code changes)
>>> mentions only attempts to send utimes for a directory/path that
>>> doesn't exist in the receiver's fs - inode 263 exists in both the send
>>> and parent roots, so we can always get its utimes values. Now you're
>>> saying that somewhere in the send code we're getting incorrect values
>>> for a utimes operation and then sending such operation to the
>>> receiver?
>>>
>>> Getting the values for utimes is done looking up the inode item, by
>>> its number, in the send root - for this the current name of the inode
>>> in the receiver fs doesn't matter - it matters only for building the
>>> path for the utimes operation. But what I'm understanding from your
>>> reply is that we're getting a wrong utimes value form the inode item
>>> for inode number 263, which would be a totally different issue from
>>> generating incorrect paths.
>>>
>>>
>>> So this still doesn't explain me why, without any of your patches
>>> applied, the reproducer doesn't fail - which is really an important
>>> aspect.
>>> This is the xfstest I made with your reproducer:
>>> https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and
>>> tests/btrfs/999.out is just contains the single line "QA output
>>> created by 999").
>>>
>>> If you're able to give me a correct reproducer, I can make sense of it
>>> and help getting a better change log.
>>> Or this issue happens only after applying some of the other patches in
>>> the series?
>>>
>>> thanks
>>>
>>>
>>>
>>>
>>>>
>>>> Thanks.
>>>> Robbie Ko
>>>>
>>>>> # Parent snapshot:
>>>>> #
>>>>> # |---- a/            (ino 259)
>>>>> # |     |---- c       (ino 264)
>>>>> # |
>>>>> # |---- b/            (ino 260)
>>>>> # |     |---- d       (ino 265)
>>>>> # |
>>>>> # |---- del/          (ino 263)
>>>>> #        |---- item1/ (ino 261)
>>>>> #        |---- item2/ (ino 262)
>>>>>
>>>>> # Send snapshot:
>>>>> #
>>>>> # |---- a/            (ino 259)
>>>>> # |---- b/            (ino 260)
>>>>> # |---- c/            (ino 264)
>>>>> # |     |---- item2/  (ino 262)
>>>>> # |
>>>>> # |---- d/            (ino 265)
>>>>> #       |---- item1/  (ino 258)
>>>>>
>>>>> mkdir $SCRATCH_MNT/0
>>>>> mkdir $SCRATCH_MNT/1
>>>>>
>>>>> mkdir $SCRATCH_MNT/a # 259
>>>>> mkdir $SCRATCH_MNT/b # 260
>>>>> mkdir $SCRATCH_MNT/item1 # 261
>>>>> mkdir $SCRATCH_MNT/item2 # 262
>>>>> mkdir $SCRATCH_MNT/del # 263
>>>>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1
>>>>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2
>>>>> mkdir $SCRATCH_MNT/a/c # 264
>>>>> mkdir $SCRATCH_MNT/b/d # 265
>>>>>
>>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>>>>>
>>>>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c
>>>>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d
>>>>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c
>>>>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d
>>>>> rmdir $SCRATCH_MNT/del
>>>>>
>>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
>>>>>
>>>>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
>>>>> $SCRATCH_MNT/mysnap2
>>>>>
>>>>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
>>>>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
>>>>> -f $tmp/2.snap
>>>>>
>>>>> _check_scratch_fs
>>>>>
>>>>> _scratch_unmount
>>>>> _scratch_mkfs >/dev/null 2>&1
>>>>> _scratch_mount
>>>>>
>>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap
>>>>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>>>
>>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap
>>>>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2
>>>>>
>>>>>
>>>>>
>>>>> I would suggest making those hiearachy diagrams more readable - pipes
>>>>> right below the name of their parent, continuation pipes like and
>>>>> align all inode numbers in the same column, like the following:
>>>>>
>>>>> # Parent snapshot:
>>>>> #
>>>>> # |---- a/            (ino 259)
>>>>> # |     |---- c       (ino 264)
>>>>> # |
>>>>> # |---- b/            (ino 260)
>>>>> # |     |---- d       (ino 265)
>>>>> # |
>>>>> # |---- del/          (ino 263)
>>>>> #        |---- item1/ (ino 261)
>>>>> #        |---- item2/ (ino 262)
>>>>>
>>>>> # Send snapshot:
>>>>> #
>>>>> # |---- a/            (ino 259)
>>>>> # |---- b/            (ino 260)
>>>>> # |---- c/            (ino 264)
>>>>> # |     |---- item2/  (ino 262)
>>>>> # |
>>>>> # |---- d/            (ino 265)
>>>>> #       |---- item1/  (ino 258)
>>>>>
>>>>> (pasted here in case gmail screws up the indentation/formatting:
>>>>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU)
>>>>>
>>>>> thanks
>>>>>
>>>>>>
>>>>>> Thans!
>>>>>>
>>>>>> Robbie Ko
>>>>>>
>>>>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>>>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>>>>>>>> There's one case where we can't issue a utimes operation for a directory.
>>>>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>>>>>>>> processed and update uime between two parent directory.
>>>>>>>> So fix this by not update non exist utimes for this case.
>>>>>>>
>>>>>>> So you mean that we attempt to update utimes for an inode,
>>>>>>> corresponding to a directory, that exists in the parent snapshot but
>>>>>>> not in the send snapshot.
>>>>>>>
>>>>>>> So the subject should be something like "Btrfs: incremental send,
>>>>>>> don't send utimes for non-existing directory" instead of "Btrfs:
>>>>>>> incremental send, fix rmdir not send utimes"
>>>>>>>
>>>>>>>>
>>>>>>>> Example:
>>>>>>>>
>>>>>>>> Parent snapshot:
>>>>>>>> |---- a/ (ino 259)
>>>>>>>>   |---- c (ino 264)
>>>>>>>> |---- b/ (ino 260)
>>>>>>>>   |---- d (ino 265)
>>>>>>>> |---- del/ (ino 263)
>>>>>>>>   |---- item1/ (ino 261)
>>>>>>>>   |---- item2/ (ino 262)
>>>>>>>>
>>>>>>>> Send snapshot:
>>>>>>>> |---- a/ (ino 259)
>>>>>>>> |---- b/ (ino 260)
>>>>>>>> |---- c/ (ino 2)
>>>>>>>>   |---- item2 (ino 259)
>>>>>>>> |---- d/ (ino 257)
>>>>>>>>   |---- item1/ (ino 258)
>>>>>>>>
>>>>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>>>>>>> ---
>>>>>>>>  fs/btrfs/send.c | 2 +-
>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>>>>>> index e8eb3ab..46f954c 100644
>>>>>>>> --- a/fs/btrfs/send.c
>>>>>>>> +++ b/fs/btrfs/send.c
>>>>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>>>>>>>         key.type = BTRFS_INODE_ITEM_KEY;
>>>>>>>>         key.offset = 0;
>>>>>>>>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>>>>>>>> -       if (ret < 0)
>>>>>>>> +       if (ret != 0)
>>>>>>>>                 goto out;
>>>>>>>
>>>>>>> So I don't think this is a good fix. The problem is in some code that
>>>>>>> calls this function (send_utimes) against the directory that doesn't
>>>>>>> exist - it just shouldn't do that, its logic should be fixed.
>>>>>>> Following this approach, while it works, it's just hiding logic errors
>>>>>>> in one or more code paths, and none of its callers checks for a return
>>>>>>> value of 1 - they only react to values < 0 and introduces the
>>>>>>> possibility of propagating a return value of 1 to user space.
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>>>
>>>>>>>>         eb = path->nodes[0];
>>>>>>>> --
>>>>>>>> 1.9.1
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Filipe David Manana,
>>>>>>>
>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>>  Unreasonable men adapt the world to themselves.
>>>>>>>  That's why all progress depends on unreasonable men."
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Filipe David Manana,
>>>>>
>>>>> "Reasonable men adapt themselves to the world.
>>>>>  Unreasonable men adapt the world to themselves.
>>>>>  That's why all progress depends on unreasonable men."
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>>
>>> "Reasonable men adapt themselves to the world.
>>>  Unreasonable men adapt the world to themselves.
>>>  That's why all progress depends on unreasonable men."



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes
  2015-06-18 18:11                 ` Filipe David Manana
@ 2015-06-22  5:35                   ` Robbie Ko
  0 siblings, 0 replies; 25+ messages in thread
From: Robbie Ko @ 2015-06-22  5:35 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org

Hi Filipe,

We can not only search rbtree of orphan dir infos for an entry with a
key == cur->dir.
Because in apply_dir_move, it needs to need to update the utimes of
both new parent(s) and old parent(s) after rename/move.
If the old parent have not been processed yet, it  can't  find
cur->dir in rbtree of orphan dir infos.

Example,

Parent snapshot:
|---- a/ (ino 259)
    |---- c (ino 261)
|---- del/ (ino 262)
    |---- item1/ (ino 260)

Send snapshot:
|---- a/ (ino 259)
|---- c/ (ino 261)
    |---- item1 (ino 260)

receiving snapshot
utimes
utimes a
rename a/c -> c
utimes
utimes a
rename del/dir_item1 -> c/dir_item1
utimes c/dir_item1
utimes del                <----------- the same problem.
utimes c
utimes c
rmdir del
utimes

Thanks.
Robbieko

2015-06-19 2:11 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
> On Thu, Jun 18, 2015 at 4:21 AM, Robbie Ko <robbieko@synology.com> wrote:
>> Hi Filipe,
>>
>> I've found that the following case is the main cause of such error
>> and it's fs tree is shown via btrfs-debug-tress as below.
>>
>> file tree key (459 ROOT_ITEM 20487)
>> node 132988928 level 1 items 3 free 490 generation 20487 owner 459
>> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
>> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
>>         key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486
>>         key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480
>>         key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464
>> leaf 132710400 items 166 free space 3639 generation 20486 owner 455
>> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
>> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
>>         item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
>>                 inode generation 20425 transid 20442 size 32 block
>> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
>>         item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
>>                 inode ref index 0 namelen 2 name: ..
>> ...
>>         item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39
>>                 location key (0 UNKNOWN.0 0) type XATTR
>>                 namelen 8 datalen 1 name: user.a78
>>                 data a
>>                 binary 61
>> leaf 130695168 items 133 free space 7332 generation 20480 owner 455
>> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
>> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
>>         item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160
>>                 inode generation 20428 transid 20434 size 10 block
>> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
>>         item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11
>>                 inode ref index 11 namelen 1 name: c
>> ...
>>
>> We can see that inode 262 is right at the end of leaf. Then send_utime() will
>> use btrfs_search_slot() to find a appropriate place to put 262 where is at the
>>  back of 262. However, that place is uninitialized on disk.
>> Suppose we read
>> atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out.
>> Receiving side will  got EINVAL since tv_nsec:1919251317 is greater
>> than   999,999,999.
>
> I see.
> So in apply_dir_move, instead of searching the btree of the send
> snapshot, we can search the rbtree of orphan dir infos for an entry
> with a key == cur->dir. Searching that rbtree makes it clear what the
> intention is and more efficient (fully in memory structure, and much
> smaller than the btree). Should work, but I haven't tested it.
>
> thanks
>
>>
>> Thanks.
>> Robbie Ko
>>
>> 2015-06-10 18:06 GMT+08:00 Robbie Ko <robbieko@synology.com>:
>>> Hi Filipi,
>>>
>>> 2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>>> On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote:
>>>>> Hi Filipe,
>>>>>
>>>>> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>>>>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote:
>>>>>>> Hi Filipe,
>>>>>>
>>>>>> Hi Robbie,
>>>>>>
>>>>>>>
>>>>>>> I've fixed "don't send utimes for non-existing directory" with another solution.
>>>>>>>
>>>>>>>  In apply_dir_move(), the old parent dir. and new parent dir. will be
>>>>>>> updated after the current dir. has moved.
>>>>>>>
>>>>>>> And there's only one entry in old parent dir. (e.g. entry with
>>>>>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
>>>>>>> is deleted but updated.
>>>>>>
>>>>>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino?
>>>>>> rmdir_ino corresponds to the number of a inode that wasn't deleted
>>>>>> when it was processed because there was some inode with a lower number
>>>>>> that is a child of the directory in the parent snapshot and had its
>>>>>> rename/move operation delayed (it happens after the directory we want
>>>>>> to delete is processed).
>>>>>>
>>>>>
>>>>> Right , my "tagged with rmdir_ino" is same meaning as you explained here.
>>>>>
>>>>>>>
>>>>>>> However, if we process rename for another entry not tagged with
>>>>>>> rmdir_ino first, its old parent dir. which is deleted  will be updated
>>>>>>> according to apply_dir_move().
>>>>>>>
>>>>>>> Therefore, I think we should check the existence of  the dir. before
>>>>>>> we're going to update it's utime.
>>>>>>>
>>>>>>> The patch is pasted in the following link, could you give me some comment?
>>>>>>>
>>>>>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k
>>>>>>
>>>>>> Looks better.
>>>>>> However I still don't understand your explanation, and just tried the
>>>>>> example in your commit message:
>>>>>>
>>>>>> "Parent snapshot:
>>>>>>
>>>>>> |---- a/ (ino 259)
>>>>>>   |---- c (ino 264)
>>>>>> |---- b/ (ino 260)
>>>>>>   |---- d (ino 265)
>>>>>> |---- del/ (ino 263)
>>>>>>   |---- item1/ (ino 261)
>>>>>>   |---- item2/ (ino 262)
>>>>>>
>>>>>> Send snapshot:
>>>>>> |---- a/ (ino 259)
>>>>>> |---- b/ (ino 260)
>>>>>> |---- c/ (ino 2)
>>>>>>   |---- item2 (ino 259)
>>>>>> |---- d/ (ino 257)
>>>>>>   |---- item1/ (ino 258)"
>>>>>>
>>>>>> So it's confusing after looking at it.
>>>>>> First the send snapshot mentions inode number 2, which doesn't exist
>>>>>> in the parent snapshot - I assume you meant inode number 264.
>>>>>> Then, the send snapshot has two inodes with number 259. Is "item2" in
>>>>>> the send snapshot supposed to be inode 262?
>>>>>>
>>>>>
>>>>> Your guess is right. And I correct it as follow.
>>>>>
>>>>>  # Parent snapshot:
>>>>>  #
>>>>>  # |---- a/            (ino 259)
>>>>>  # |     |---- c       (ino 264)
>>>>>  # |
>>>>>  # |---- b/            (ino 260)
>>>>>  # |     |---- d       (ino 265)
>>>>>  # |
>>>>>  # |---- del/          (ino 263)
>>>>>  #        |---- item1/ (ino 261)
>>>>>  #        |---- item2/ (ino 262)
>>>>>
>>>>>  # Send snapshot:
>>>>>  #
>>>>>  # |---- a/            (ino 259)
>>>>>  # |---- b/            (ino 260)
>>>>>  # |---- c/            (ino 264)
>>>>>  # |     |---- item2/  (ino 262)
>>>>>  # |
>>>>>  # |---- d/            (ino 265)
>>>>>  #       |---- item1/  (ino 261)
>>>>>
>>>>>> Anyway, assuming those 2 fixes to the example are correct guesses, I
>>>>>> tried the following and it didn't fail without your patches (i.e. no
>>>>>> attempts to send utimes to a non-existing directory):
>>>>>>
>>>>>
>>>>> Here my mean is :  btrfs tries to get utime from non-existing directory and
>>>>> apply it on the existing directory. And my patch is attempted to avoid
>>>>> this case.
>>>>> However, this case is not guaranteed to cause error anytime but it may
>>>>> fails somehow
>>>>> which is depending on the data on the disk.
>>>>> The following are the incremental procedures to send the snapshot.
>>>>>
>>>>> utimes
>>>>> utimes a
>>>>> utimes b
>>>>> rename del -> o263-259-0
>>>>> utimes
>>>>> rename a/c -> c
>>>>> utimes
>>>>> utimes a
>>>>> rename o263-259-0/item2 -> c/item2
>>>>> utimes c/item2
>>>>> utimes o263-259-0          <<---------------------- this step may cause error
>>>>
>>>> Why may it cause an error?
>>>> At that moment the name/path o263-259-0 exists at the destination
>>>> (i.e. the receiver, as it applies commands from the send stream
>>>> serially).
>>>
>>> Following is the error message when do receive.
>>> ERROR: utimes o263-259-0 failed. Invalid argument
>>>
>>> The argument of utimes for o263-259-0 is got from inode 263 in send
>>> root, But inode 263 is not exist in send root
>>> In send_utimes(), we didn't check if btrfs_search_slot returns 1,
>>> therefore may encounters this problem.
>>>
>>> I will try to make a stable reproducer.
>>>
>>> Thanks.
>>> Robbie Ko
>>>
>>>>
>>>>> utimes c
>>>>> utimes c
>>>>> rename b/d -> d
>>>>> utimes
>>>>> utimes b
>>>>> rename o263-259-0/item1 -> d/item1
>>>>> rmdir o263-259-0
>>>>> utimes d/item1
>>>>> utimes d
>>>>> utimes d
>>>>>
>>>>> As the above pointed procedure, o263-259-0 is not appeared in the send root.
>>>>
>>>> Well yes, but that doesn't matter.
>>>> The oXXX-YYY-ZZZ names are never in any of the roots (send or parent),
>>>> they're just temporary names to allow for rename/move operations when
>>>> collisions are detected, and exist only in the receiver's filesystem.
>>>>
>>>>> When utime got from o263-259-0 is invalid (i.e. out of range of time
>>>>> format), it will fail.
>>>>> I saw the error occurs at utimensat() and got EINVAL. Here's
>>>>> explanation from linux man page.
>>>>>
>>>>> EINVAL Invalid value in one of the tv_nsec fields (value outside
>>>>>               range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or
>>>>>               an invalid value in one of the tv_sec fields.
>>>>>
>>>>> So if o263-259-0 is not the send root, invalid format of utime may be got.
>>>>
>>>> Doubly confused now. So the whole change log (and the code changes)
>>>> mentions only attempts to send utimes for a directory/path that
>>>> doesn't exist in the receiver's fs - inode 263 exists in both the send
>>>> and parent roots, so we can always get its utimes values. Now you're
>>>> saying that somewhere in the send code we're getting incorrect values
>>>> for a utimes operation and then sending such operation to the
>>>> receiver?
>>>>
>>>> Getting the values for utimes is done looking up the inode item, by
>>>> its number, in the send root - for this the current name of the inode
>>>> in the receiver fs doesn't matter - it matters only for building the
>>>> path for the utimes operation. But what I'm understanding from your
>>>> reply is that we're getting a wrong utimes value form the inode item
>>>> for inode number 263, which would be a totally different issue from
>>>> generating incorrect paths.
>>>>
>>>>
>>>> So this still doesn't explain me why, without any of your patches
>>>> applied, the reproducer doesn't fail - which is really an important
>>>> aspect.
>>>> This is the xfstest I made with your reproducer:
>>>> https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and
>>>> tests/btrfs/999.out is just contains the single line "QA output
>>>> created by 999").
>>>>
>>>> If you're able to give me a correct reproducer, I can make sense of it
>>>> and help getting a better change log.
>>>> Or this issue happens only after applying some of the other patches in
>>>> the series?
>>>>
>>>> thanks
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Thanks.
>>>>> Robbie Ko
>>>>>
>>>>>> # Parent snapshot:
>>>>>> #
>>>>>> # |---- a/            (ino 259)
>>>>>> # |     |---- c       (ino 264)
>>>>>> # |
>>>>>> # |---- b/            (ino 260)
>>>>>> # |     |---- d       (ino 265)
>>>>>> # |
>>>>>> # |---- del/          (ino 263)
>>>>>> #        |---- item1/ (ino 261)
>>>>>> #        |---- item2/ (ino 262)
>>>>>>
>>>>>> # Send snapshot:
>>>>>> #
>>>>>> # |---- a/            (ino 259)
>>>>>> # |---- b/            (ino 260)
>>>>>> # |---- c/            (ino 264)
>>>>>> # |     |---- item2/  (ino 262)
>>>>>> # |
>>>>>> # |---- d/            (ino 265)
>>>>>> #       |---- item1/  (ino 258)
>>>>>>
>>>>>> mkdir $SCRATCH_MNT/0
>>>>>> mkdir $SCRATCH_MNT/1
>>>>>>
>>>>>> mkdir $SCRATCH_MNT/a # 259
>>>>>> mkdir $SCRATCH_MNT/b # 260
>>>>>> mkdir $SCRATCH_MNT/item1 # 261
>>>>>> mkdir $SCRATCH_MNT/item2 # 262
>>>>>> mkdir $SCRATCH_MNT/del # 263
>>>>>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1
>>>>>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2
>>>>>> mkdir $SCRATCH_MNT/a/c # 264
>>>>>> mkdir $SCRATCH_MNT/b/d # 265
>>>>>>
>>>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>>>>>>
>>>>>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c
>>>>>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d
>>>>>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c
>>>>>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d
>>>>>> rmdir $SCRATCH_MNT/del
>>>>>>
>>>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
>>>>>>
>>>>>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>>>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
>>>>>> $SCRATCH_MNT/mysnap2
>>>>>>
>>>>>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
>>>>>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
>>>>>> -f $tmp/2.snap
>>>>>>
>>>>>> _check_scratch_fs
>>>>>>
>>>>>> _scratch_unmount
>>>>>> _scratch_mkfs >/dev/null 2>&1
>>>>>> _scratch_mount
>>>>>>
>>>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap
>>>>>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>>>>
>>>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap
>>>>>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would suggest making those hiearachy diagrams more readable - pipes
>>>>>> right below the name of their parent, continuation pipes like and
>>>>>> align all inode numbers in the same column, like the following:
>>>>>>
>>>>>> # Parent snapshot:
>>>>>> #
>>>>>> # |---- a/            (ino 259)
>>>>>> # |     |---- c       (ino 264)
>>>>>> # |
>>>>>> # |---- b/            (ino 260)
>>>>>> # |     |---- d       (ino 265)
>>>>>> # |
>>>>>> # |---- del/          (ino 263)
>>>>>> #        |---- item1/ (ino 261)
>>>>>> #        |---- item2/ (ino 262)
>>>>>>
>>>>>> # Send snapshot:
>>>>>> #
>>>>>> # |---- a/            (ino 259)
>>>>>> # |---- b/            (ino 260)
>>>>>> # |---- c/            (ino 264)
>>>>>> # |     |---- item2/  (ino 262)
>>>>>> # |
>>>>>> # |---- d/            (ino 265)
>>>>>> #       |---- item1/  (ino 258)
>>>>>>
>>>>>> (pasted here in case gmail screws up the indentation/formatting:
>>>>>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU)
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>>>
>>>>>>> Thans!
>>>>>>>
>>>>>>> Robbie Ko
>>>>>>>
>>>>>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>:
>>>>>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote:
>>>>>>>>> There's one case where we can't issue a utimes operation for a directory.
>>>>>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>>>>>>>>> processed and update uime between two parent directory.
>>>>>>>>> So fix this by not update non exist utimes for this case.
>>>>>>>>
>>>>>>>> So you mean that we attempt to update utimes for an inode,
>>>>>>>> corresponding to a directory, that exists in the parent snapshot but
>>>>>>>> not in the send snapshot.
>>>>>>>>
>>>>>>>> So the subject should be something like "Btrfs: incremental send,
>>>>>>>> don't send utimes for non-existing directory" instead of "Btrfs:
>>>>>>>> incremental send, fix rmdir not send utimes"
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Example:
>>>>>>>>>
>>>>>>>>> Parent snapshot:
>>>>>>>>> |---- a/ (ino 259)
>>>>>>>>>   |---- c (ino 264)
>>>>>>>>> |---- b/ (ino 260)
>>>>>>>>>   |---- d (ino 265)
>>>>>>>>> |---- del/ (ino 263)
>>>>>>>>>   |---- item1/ (ino 261)
>>>>>>>>>   |---- item2/ (ino 262)
>>>>>>>>>
>>>>>>>>> Send snapshot:
>>>>>>>>> |---- a/ (ino 259)
>>>>>>>>> |---- b/ (ino 260)
>>>>>>>>> |---- c/ (ino 2)
>>>>>>>>>   |---- item2 (ino 259)
>>>>>>>>> |---- d/ (ino 257)
>>>>>>>>>   |---- item1/ (ino 258)
>>>>>>>>>
>>>>>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>>>>>>>> ---
>>>>>>>>>  fs/btrfs/send.c | 2 +-
>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>>>>>>> index e8eb3ab..46f954c 100644
>>>>>>>>> --- a/fs/btrfs/send.c
>>>>>>>>> +++ b/fs/btrfs/send.c
>>>>>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>>>>>>>>         key.type = BTRFS_INODE_ITEM_KEY;
>>>>>>>>>         key.offset = 0;
>>>>>>>>>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>>>>>>>>> -       if (ret < 0)
>>>>>>>>> +       if (ret != 0)
>>>>>>>>>                 goto out;
>>>>>>>>
>>>>>>>> So I don't think this is a good fix. The problem is in some code that
>>>>>>>> calls this function (send_utimes) against the directory that doesn't
>>>>>>>> exist - it just shouldn't do that, its logic should be fixed.
>>>>>>>> Following this approach, while it works, it's just hiding logic errors
>>>>>>>> in one or more code paths, and none of its callers checks for a return
>>>>>>>> value of 1 - they only react to values < 0 and introduces the
>>>>>>>> possibility of propagating a return value of 1 to user space.
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>>
>>>>>>>>>         eb = path->nodes[0];
>>>>>>>>> --
>>>>>>>>> 1.9.1
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Filipe David Manana,
>>>>>>>>
>>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>>>  Unreasonable men adapt the world to themselves.
>>>>>>>>  That's why all progress depends on unreasonable men."
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Filipe David Manana,
>>>>>>
>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>  Unreasonable men adapt the world to themselves.
>>>>>>  That's why all progress depends on unreasonable men."
>>>>
>>>>
>>>>
>>>> --
>>>> Filipe David Manana,
>>>>
>>>> "Reasonable men adapt themselves to the world.
>>>>  Unreasonable men adapt the world to themselves.
>>>>  That's why all progress depends on unreasonable men."
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-06-22  5:35 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-04 11:18 [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Robbie Ko
2015-06-04 11:18 ` [PATCH 1/5] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path Robbie Ko
2015-06-04 13:50   ` Filipe David Manana
2015-06-04 19:19     ` Filipe David Manana
2015-06-05  3:55       ` Robbie Ko
2015-06-05  8:46         ` Filipe David Manana
2015-06-04 11:18 ` [PATCH 2/5] Btrfs: incremental send, avoid ancestor rename to descendant Robbie Ko
2015-06-04 15:43   ` Filipe David Manana
2015-06-05 11:18     ` Robbie Ko
2015-06-04 11:18 ` [PATCH 3/5] Btrfs: incremental send, fix orphan_dir_info not completely cleared Robbie Ko
2015-06-04 16:24   ` Filipe David Manana
2015-06-05 13:58   ` David Sterba
2015-06-04 11:18 ` [PATCH 4/5] Btrfs: incremental send, fix rmdir but dir have a unprocess item Robbie Ko
2015-06-04 16:40   ` Filipe David Manana
2015-06-04 11:18 ` [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes Robbie Ko
2015-06-04 16:14   ` Filipe David Manana
2015-06-08  3:44     ` Robbie Ko
2015-06-08 14:00       ` Filipe David Manana
2015-06-09 10:04         ` Robbie Ko
2015-06-09 10:36           ` Filipe David Manana
2015-06-10 10:06             ` Robbie Ko
2015-06-18  3:21               ` Robbie Ko
2015-06-18 18:11                 ` Filipe David Manana
2015-06-22  5:35                   ` Robbie Ko
2015-06-04 13:04 ` [PATCH 0/5] Btrfs incremental send fix serval case for rename and rm directory Filipe David Manana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.