All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFCv2] new ioctl TREE_SEARCH_V2
@ 2014-01-27 13:28 Gerhard Heift
  2014-01-27 13:28 ` [PATCH RFCv2 1/6] btrfs: search_ioctl accepts varying buffer Gerhard Heift
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Gerhard Heift @ 2014-01-27 13:28 UTC (permalink / raw
  To: linux-btrfs

This patch series adds a new ioctl TREE_SEARCH_V2 with which we could store the
results in a varying buffer. Now even items larger than 3992 bytes or a large
amount of items can be returned.

I have a few questions:
  Which value should I assign to TREE_SEARCH_V2?
  Should we limit the buffer size?
  What about documentation?

Changelog

RFCv2
  * fixed a build bug caused by using a wrong patch
  * added a patch to expand buffer lifetime

Gerhard


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH RFCv2 1/6] btrfs: search_ioctl accepts varying buffer
  2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
@ 2014-01-27 13:28 ` Gerhard Heift
  2014-01-27 17:19   ` David Sterba
  2014-01-27 13:28 ` [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values Gerhard Heift
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Gerhard Heift @ 2014-01-27 13:28 UTC (permalink / raw
  To: linux-btrfs

rewrite search_ioctl to accept a buffer with varying size

Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
---
 fs/btrfs/ioctl.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3970f32..be4c780 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1850,6 +1850,7 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 			       struct btrfs_path *path,
 			       struct btrfs_key *key,
 			       struct btrfs_ioctl_search_key *sk,
+			       size_t buf_size,
 			       char *buf,
 			       unsigned long *sk_offset,
 			       int *num_found)
@@ -1882,11 +1883,10 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 		if (!key_in_sk(key, sk))
 			continue;
 
-		if (sizeof(sh) + item_len > BTRFS_SEARCH_ARGS_BUFSIZE)
+		if (sizeof(sh) + item_len > buf_size)
 			item_len = 0;
 
-		if (sizeof(sh) + item_len + *sk_offset >
-		    BTRFS_SEARCH_ARGS_BUFSIZE) {
+		if (sizeof(sh) + item_len + *sk_offset > buf_size) {
 			ret = 1;
 			goto overflow;
 		}
@@ -1931,17 +1931,22 @@ overflow:
 }
 
 static noinline int search_ioctl(struct inode *inode,
-				 struct btrfs_ioctl_search_args *args)
+				 struct btrfs_ioctl_search_key *sk,
+				 size_t buf_size,
+				 char *buf
+				 )
 {
 	struct btrfs_root *root;
 	struct btrfs_key key;
 	struct btrfs_path *path;
-	struct btrfs_ioctl_search_key *sk = &args->key;
 	struct btrfs_fs_info *info = BTRFS_I(inode)->root->fs_info;
 	int ret;
 	int num_found = 0;
 	unsigned long sk_offset = 0;
 
+	if (buf_size < sizeof(struct btrfs_ioctl_search_header))
+		return -EOVERFLOW;
+
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
@@ -1975,7 +1980,7 @@ static noinline int search_ioctl(struct inode *inode,
 				ret = 0;
 			goto err;
 		}
-		ret = copy_to_sk(root, path, &key, sk, args->buf,
+		ret = copy_to_sk(root, path, &key, sk, buf_size, buf,
 				 &sk_offset, &num_found);
 		btrfs_release_path(path);
 		if (ret || num_found >= sk->nr_items)
@@ -2004,7 +2009,7 @@ static noinline int btrfs_ioctl_tree_search(struct file *file,
 		return PTR_ERR(args);
 
 	inode = file_inode(file);
-	ret = search_ioctl(inode, args);
+	ret = search_ioctl(inode, &args->key, sizeof(args->buf), args->buf);
 	if (ret == 0 && copy_to_user(argp, args, sizeof(*args)))
 		ret = -EFAULT;
 	kfree(args);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values
  2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
  2014-01-27 13:28 ` [PATCH RFCv2 1/6] btrfs: search_ioctl accepts varying buffer Gerhard Heift
@ 2014-01-27 13:28 ` Gerhard Heift
  2014-01-27 17:28   ` David Sterba
  2014-01-27 19:06   ` Martin Steigerwald
  2014-01-27 13:28 ` [PATCH RFCv2 3/6] btrfs: copy_to_sk returns EOVERFLOW for too small buffer Gerhard Heift
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 22+ messages in thread
From: Gerhard Heift @ 2014-01-27 13:28 UTC (permalink / raw
  To: linux-btrfs

To prevent unexpectet values in the unused fields of the search key fail early.
Otherwise future extensions would break the behavior of the search if current
implementations in userspace set them to values other than zero.

Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
---
 fs/btrfs/ioctl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index be4c780..919d928 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1947,6 +1947,9 @@ static noinline int search_ioctl(struct inode *inode,
 	if (buf_size < sizeof(struct btrfs_ioctl_search_header))
 		return -EOVERFLOW;
 
+	if (sk->unused || sk->unused1 | sk->unused2 | sk->unused3 | sk->unused4)
+		return -EINVAL;
+
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv2 3/6] btrfs: copy_to_sk returns EOVERFLOW for too small buffer
  2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
  2014-01-27 13:28 ` [PATCH RFCv2 1/6] btrfs: search_ioctl accepts varying buffer Gerhard Heift
  2014-01-27 13:28 ` [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values Gerhard Heift
@ 2014-01-27 13:28 ` Gerhard Heift
  2014-01-27 13:28 ` [PATCH RFCv2 4/6] btrfs: new ioctl TREE_SEARCH_V2 Gerhard Heift
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Gerhard Heift @ 2014-01-27 13:28 UTC (permalink / raw
  To: linux-btrfs

In copy_to_sk, if an item is too large for the given buffer, it now returns
-EOVERFLOW instead of copying a search_header with len = 0. For backward
compatibility for the first item it still copies such a header to the buffer,
but not any other following items, which could have fitted.

tree_search changes -EOVERFLOW back to 0 to behave similiar to the way it
behaved before this patch.

Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
---
 fs/btrfs/ioctl.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 919d928..9fa222d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1883,8 +1883,15 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 		if (!key_in_sk(key, sk))
 			continue;
 
-		if (sizeof(sh) + item_len > buf_size)
+		if (sizeof(sh) + item_len > buf_size) {
+			if (*num_found) {
+				ret = 1;
+				goto overflow;
+			}
+
 			item_len = 0;
+			ret = -EOVERFLOW;
+		}
 
 		if (sizeof(sh) + item_len + *sk_offset > buf_size) {
 			ret = 1;
@@ -1910,6 +1917,9 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 		}
 		(*num_found)++;
 
+		if (ret) /* -EOVERFLOW from above */
+			goto overflow;
+
 		if (*num_found >= sk->nr_items)
 			break;
 	}
@@ -1990,7 +2000,8 @@ static noinline int search_ioctl(struct inode *inode,
 			break;
 
 	}
-	ret = 0;
+	if (ret > 0)
+		ret = 0;
 err:
 	sk->nr_items = num_found;
 	btrfs_free_path(path);
@@ -2013,6 +2024,8 @@ static noinline int btrfs_ioctl_tree_search(struct file *file,
 
 	inode = file_inode(file);
 	ret = search_ioctl(inode, &args->key, sizeof(args->buf), args->buf);
+	if (ret == -EOVERFLOW)
+		ret = 0;
 	if (ret == 0 && copy_to_user(argp, args, sizeof(*args)))
 		ret = -EFAULT;
 	kfree(args);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv2 4/6] btrfs: new ioctl TREE_SEARCH_V2
  2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
                   ` (2 preceding siblings ...)
  2014-01-27 13:28 ` [PATCH RFCv2 3/6] btrfs: copy_to_sk returns EOVERFLOW for too small buffer Gerhard Heift
@ 2014-01-27 13:28 ` Gerhard Heift
  2014-01-27 17:41   ` David Sterba
  2014-01-27 13:28 ` [PATCH RFCv2 5/6] btrfs: search_ioctl: direct copy to userspace Gerhard Heift
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Gerhard Heift @ 2014-01-27 13:28 UTC (permalink / raw
  To: linux-btrfs

This new ioctl call allows the user to supply a buffer of varying size in which
a tree search can store its results. This is much more flexible if you want to
receive items which are larger than the current fixed buffer of 3992 bytes or
if you want to fetch mor item at once.

Currently the buffer is limited to 32 pages.

Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
---
 fs/btrfs/ioctl.c           | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/btrfs.h |  8 ++++++++
 2 files changed, 56 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9fa222d..c44fcdd 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2032,6 +2032,52 @@ static noinline int btrfs_ioctl_tree_search(struct file *file,
 	return ret;
 }
 
+static noinline int btrfs_ioctl_tree_search_v2(struct file *file,
+					   void __user *argp)
+{
+	struct btrfs_ioctl_search_args_v2 *args;
+	struct inode *inode;
+	int ret;
+	char *buf;
+	size_t buf_size;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	/* copy search header and buffer size */
+	args = memdup_user(argp, sizeof(*args));
+	if (IS_ERR(args))
+		return PTR_ERR(args);
+
+	buf_size = args->buf_size;
+
+	if (buf_size < sizeof(struct btrfs_ioctl_search_header)) {
+		kfree(args);
+		return -ENOMEM;
+	}
+
+	/* limit memory */
+	if (buf_size > PAGE_SIZE * 32)
+		buf_size = PAGE_SIZE * 32;
+
+	buf = memdup_user(argp->buf, buf_size);
+	if (IS_ERR(buf)) {
+		kfree(args);
+		return PTR_ERR(buf);
+	}
+
+	inode = file_inode(file);
+	ret = search_ioctl(inode, &args->key, buf_size, buf);
+	if (ret == 0 && (
+		copy_to_user(argp, args, sizeof(*args)) ||
+		copy_to_user(argp->buf, buf, buf_size)
+		))
+		ret = -EFAULT;
+	kfree(buf);
+	kfree(args);
+	return ret;
+}
+
 /*
  * Search INODE_REFs to identify path name of 'dirid' directory
  * in a 'tree_id' tree. and sets path name to 'name'.
@@ -4777,6 +4823,8 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_trans_end(file);
 	case BTRFS_IOC_TREE_SEARCH:
 		return btrfs_ioctl_tree_search(file, argp);
+	case BTRFS_IOC_TREE_SEARCH_V2:
+		return btrfs_ioctl_tree_search_v2(file, argp);
 	case BTRFS_IOC_INO_LOOKUP:
 		return btrfs_ioctl_ino_lookup(file, argp);
 	case BTRFS_IOC_INO_PATHS:
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 1b8a0f4..6ba0d0f 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -301,6 +301,12 @@ struct btrfs_ioctl_search_args {
 	char buf[BTRFS_SEARCH_ARGS_BUFSIZE];
 };
 
+struct btrfs_ioctl_search_args_v2 {
+	struct btrfs_ioctl_search_key key;
+	size_t buf_size;
+	char buf[0];
+};
+
 struct btrfs_ioctl_clone_range_args {
   __s64 src_fd;
   __u64 src_offset, src_length;
@@ -553,6 +559,8 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code)
 				struct btrfs_ioctl_defrag_range_args)
 #define BTRFS_IOC_TREE_SEARCH _IOWR(BTRFS_IOCTL_MAGIC, 17, \
 				   struct btrfs_ioctl_search_args)
+#define BTRFS_IOC_TREE_SEARCH_V2 _IOWR(BTRFS_IOCTL_MAGIC, 17, \
+					   struct btrfs_ioctl_search_args_v2)
 #define BTRFS_IOC_INO_LOOKUP _IOWR(BTRFS_IOCTL_MAGIC, 18, \
 				   struct btrfs_ioctl_ino_lookup_args)
 #define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, __u64)
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv2 5/6] btrfs: search_ioctl: direct copy to userspace
  2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
                   ` (3 preceding siblings ...)
  2014-01-27 13:28 ` [PATCH RFCv2 4/6] btrfs: new ioctl TREE_SEARCH_V2 Gerhard Heift
@ 2014-01-27 13:28 ` Gerhard Heift
  2014-01-27 18:11   ` David Sterba
  2014-01-27 13:28 ` [PATCH RFCv2 6/6] btrfs: in tree_search extent buffer lifetime Gerhard Heift
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Gerhard Heift @ 2014-01-27 13:28 UTC (permalink / raw
  To: linux-btrfs

By copying each found item seperatly to userspace, we only need a small buffer
in the kernel. This allows to run a large search inside of a single call.

Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
---
 fs/btrfs/ioctl.c | 107 ++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 71 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c44fcdd..38403e6 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1851,7 +1851,7 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 			       struct btrfs_key *key,
 			       struct btrfs_ioctl_search_key *sk,
 			       size_t buf_size,
-			       char *buf,
+			       char __user *buf,
 			       unsigned long *sk_offset,
 			       int *num_found)
 {
@@ -1865,6 +1865,9 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 	int slot;
 	int ret = 0;
 
+	char *content_buffer = NULL;
+	unsigned long content_buffer_size = 0;
+
 	leaf = path->nodes[0];
 	slot = path->slots[0];
 	nritems = btrfs_header_nritems(leaf);
@@ -1886,7 +1889,7 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 		if (sizeof(sh) + item_len > buf_size) {
 			if (*num_found) {
 				ret = 1;
-				goto overflow;
+				goto err;
 			}
 
 			item_len = 0;
@@ -1895,7 +1898,7 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 
 		if (sizeof(sh) + item_len + *sk_offset > buf_size) {
 			ret = 1;
-			goto overflow;
+			goto err;
 		}
 
 		sh.objectid = key->objectid;
@@ -1905,20 +1908,47 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 		sh.transid = found_transid;
 
 		/* copy search result header */
-		memcpy(buf + *sk_offset, &sh, sizeof(sh));
+		if (copy_to_user(buf + *sk_offset, &sh, sizeof(sh))) {
+			ret = -EFAULT;
+			goto err;
+		}
+
 		*sk_offset += sizeof(sh);
 
 		if (item_len) {
-			char *p = buf + *sk_offset;
+			/* resize internal buffer if needed */
+			if (content_buffer_size < item_len) {
+				kfree(content_buffer);
+
+				content_buffer_size =
+					ALIGN(item_len, PAGE_SIZE);
+
+				content_buffer = kmalloc_track_caller(
+					content_buffer_size, GFP_KERNEL);
+
+				if (!content_buffer) {
+					content_buffer_size = 0;
+					ret = -ENOMEM;
+					goto err;
+				}
+			}
+
 			/* copy the item */
-			read_extent_buffer(leaf, p,
+			read_extent_buffer(leaf, content_buffer,
 					   item_off, item_len);
+
+			if (copy_to_user(buf + *sk_offset,
+					 content_buffer, item_len)) {
+				ret = -EFAULT;
+				goto err;
+			}
+
 			*sk_offset += item_len;
 		}
 		(*num_found)++;
 
 		if (ret) /* -EOVERFLOW from above */
-			goto overflow;
+			goto err;
 
 		if (*num_found >= sk->nr_items)
 			break;
@@ -1936,14 +1966,25 @@ advance_key:
 		key->objectid++;
 	} else
 		ret = 1;
-overflow:
+err:
+	/*
+	    0: all items from this leaf copied, continue with next
+	    1: more items can be copied, but buffer is too small or all items
+	       were found. Either way, it will stops the loop which iterates to
+	       the next leaf
+	    -EOVERFLOW: item was to large for buffer
+	    -ENOMEM: could not allocate memory for a temporary extent buffer
+	    -EFAULT: could not copy extent buffer back to userspace
+	*/
+	kfree(content_buffer);
+
 	return ret;
 }
 
 static noinline int search_ioctl(struct inode *inode,
 				 struct btrfs_ioctl_search_key *sk,
 				 size_t buf_size,
-				 char *buf
+				 char __user *buf
 				 )
 {
 	struct btrfs_root *root;
@@ -1996,6 +2037,7 @@ static noinline int search_ioctl(struct inode *inode,
 		ret = copy_to_sk(root, path, &key, sk, buf_size, buf,
 				 &sk_offset, &num_found);
 		btrfs_release_path(path);
+
 		if (ret || num_found >= sk->nr_items)
 			break;
 
@@ -2011,34 +2053,39 @@ err:
 static noinline int btrfs_ioctl_tree_search(struct file *file,
 					   void __user *argp)
 {
-	 struct btrfs_ioctl_search_args *args;
-	 struct inode *inode;
-	 int ret;
+	struct btrfs_ioctl_search_args __user *usarg;
+	struct btrfs_ioctl_search_key *sk;
+	struct inode *inode;
+	int ret;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	args = memdup_user(argp, sizeof(*args));
-	if (IS_ERR(args))
-		return PTR_ERR(args);
+	usarg = (struct btrfs_ioctl_search_args __user *)argp;
+
+	sk = memdup_user(&usarg->key, sizeof(*sk));
+	if (IS_ERR(sk))
+		return PTR_ERR(sk);
 
 	inode = file_inode(file);
-	ret = search_ioctl(inode, &args->key, sizeof(args->buf), args->buf);
+	ret = search_ioctl(inode, sk, sizeof(usarg->buf), usarg->buf);
+	/* the origin ioctl does handle too large results by returning an item
+	* with a len of 0 */
 	if (ret == -EOVERFLOW)
 		ret = 0;
-	if (ret == 0 && copy_to_user(argp, args, sizeof(*args)))
+	if (ret == 0 && copy_to_user(argp, sk, sizeof(*sk)))
 		ret = -EFAULT;
-	kfree(args);
+	kfree(sk);
 	return ret;
 }
 
 static noinline int btrfs_ioctl_tree_search_v2(struct file *file,
 					   void __user *argp)
 {
+	struct btrfs_ioctl_search_args_v2 __user *usarg;
 	struct btrfs_ioctl_search_args_v2 *args;
 	struct inode *inode;
 	int ret;
-	char *buf;
 	size_t buf_size;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -2049,31 +2096,19 @@ static noinline int btrfs_ioctl_tree_search_v2(struct file *file,
 	if (IS_ERR(args))
 		return PTR_ERR(args);
 
+	usarg = (struct btrfs_ioctl_search_args_v2 __user *)argp;
+
 	buf_size = args->buf_size;
 
 	if (buf_size < sizeof(struct btrfs_ioctl_search_header)) {
 		kfree(args);
-		return -ENOMEM;
-	}
-
-	/* limit memory */
-	if (buf_size > PAGE_SIZE * 32)
-		buf_size = PAGE_SIZE * 32;
-
-	buf = memdup_user(argp->buf, buf_size);
-	if (IS_ERR(buf)) {
-		kfree(args);
-		return PTR_ERR(buf);
+		return -EOVERFLOW;
 	}
 
 	inode = file_inode(file);
-	ret = search_ioctl(inode, &args->key, buf_size, buf);
-	if (ret == 0 && (
-		copy_to_user(argp, args, sizeof(*args)) ||
-		copy_to_user(argp->buf, buf, buf_size)
-		))
+	ret = search_ioctl(inode, &args->key, buf_size, usarg->buf);
+	if (ret == 0 && copy_to_user(argp, args, sizeof(*args)))
 		ret = -EFAULT;
-	kfree(buf);
 	kfree(args);
 	return ret;
 }
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv2 6/6] btrfs: in tree_search extent buffer lifetime
  2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
                   ` (4 preceding siblings ...)
  2014-01-27 13:28 ` [PATCH RFCv2 5/6] btrfs: search_ioctl: direct copy to userspace Gerhard Heift
@ 2014-01-27 13:28 ` Gerhard Heift
  2014-01-27 17:15 ` [PATCH RFCv2] new ioctl TREE_SEARCH_V2 David Sterba
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Gerhard Heift @ 2014-01-27 13:28 UTC (permalink / raw
  To: linux-btrfs

Instead of allocting and releasing a buffer for each leaf, which will be
visited during a search, allocate (and expand) a buffer for the whole search.
This saves a few allocations and deallocations during larger searchs, which
visits more leafs.

Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
---
 fs/btrfs/ioctl.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 38403e6..ec7d759 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1852,6 +1852,8 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 			       struct btrfs_ioctl_search_key *sk,
 			       size_t buf_size,
 			       char __user *buf,
+			       size_t *content_buffer_size,
+			       char **content_buffer,
 			       unsigned long *sk_offset,
 			       int *num_found)
 {
@@ -1865,9 +1867,6 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 	int slot;
 	int ret = 0;
 
-	char *content_buffer = NULL;
-	unsigned long content_buffer_size = 0;
-
 	leaf = path->nodes[0];
 	slot = path->slots[0];
 	nritems = btrfs_header_nritems(leaf);
@@ -1917,28 +1916,28 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 
 		if (item_len) {
 			/* resize internal buffer if needed */
-			if (content_buffer_size < item_len) {
-				kfree(content_buffer);
+			if (*content_buffer_size < item_len) {
+				kfree(*content_buffer);
 
-				content_buffer_size =
+				*content_buffer_size =
 					ALIGN(item_len, PAGE_SIZE);
 
-				content_buffer = kmalloc_track_caller(
-					content_buffer_size, GFP_KERNEL);
+				*content_buffer = kmalloc_track_caller(
+					*content_buffer_size, GFP_KERNEL);
 
-				if (!content_buffer) {
-					content_buffer_size = 0;
+				if (!*content_buffer) {
+					*content_buffer_size = 0;
 					ret = -ENOMEM;
 					goto err;
 				}
 			}
 
 			/* copy the item */
-			read_extent_buffer(leaf, content_buffer,
+			read_extent_buffer(leaf, *content_buffer,
 					   item_off, item_len);
 
 			if (copy_to_user(buf + *sk_offset,
-					 content_buffer, item_len)) {
+					 *content_buffer, item_len)) {
 				ret = -EFAULT;
 				goto err;
 			}
@@ -1976,8 +1975,6 @@ err:
 	    -ENOMEM: could not allocate memory for a temporary extent buffer
 	    -EFAULT: could not copy extent buffer back to userspace
 	*/
-	kfree(content_buffer);
-
 	return ret;
 }
 
@@ -1995,6 +1992,9 @@ static noinline int search_ioctl(struct inode *inode,
 	int num_found = 0;
 	unsigned long sk_offset = 0;
 
+	char *content_buffer = NULL;
+	size_t content_buffer_size = 0;
+
 	if (buf_size < sizeof(struct btrfs_ioctl_search_header))
 		return -EOVERFLOW;
 
@@ -2035,6 +2035,7 @@ static noinline int search_ioctl(struct inode *inode,
 			goto err;
 		}
 		ret = copy_to_sk(root, path, &key, sk, buf_size, buf,
+				 &content_buffer_size, &content_buffer,
 				 &sk_offset, &num_found);
 		btrfs_release_path(path);
 
@@ -2045,6 +2046,8 @@ static noinline int search_ioctl(struct inode *inode,
 	if (ret > 0)
 		ret = 0;
 err:
+	kfree(content_buffer);
+
 	sk->nr_items = num_found;
 	btrfs_free_path(path);
 	return ret;
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2] new ioctl TREE_SEARCH_V2
  2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
                   ` (5 preceding siblings ...)
  2014-01-27 13:28 ` [PATCH RFCv2 6/6] btrfs: in tree_search extent buffer lifetime Gerhard Heift
@ 2014-01-27 17:15 ` David Sterba
  2014-01-27 19:10 ` Goffredo Baroncelli
  2014-01-28  9:29 ` Anand Jain
  8 siblings, 0 replies; 22+ messages in thread
From: David Sterba @ 2014-01-27 17:15 UTC (permalink / raw
  To: Gerhard Heift; +Cc: linux-btrfs

On Mon, Jan 27, 2014 at 02:28:26PM +0100, Gerhard Heift wrote:
> This patch series adds a new ioctl TREE_SEARCH_V2 with which we could store the
> results in a varying buffer. Now even items larger than 3992 bytes or a large
> amount of items can be returned.
> 
> I have a few questions:
>   Which value should I assign to TREE_SEARCH_V2?

As you did it, it's ok.

>   Should we limit the buffer size?

A sane limit would be ok, and as it's not hardcoded to the data
structure, we can change it later if needed. The only requirement I see
now is to fit the maximum node/leaf size, ie at least 64k.

Big buffer allows to pack lots of items and save the overhead of ioctl
call, but if this is an issue, a separate ioctl should be dedicated to
do that.

>   What about documentation?

I think it's good to document the ioctls in ioctl.h itself, (arguments
and examples if it's not obvious) probably will hit the target audience.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2 1/6] btrfs: search_ioctl accepts varying buffer
  2014-01-27 13:28 ` [PATCH RFCv2 1/6] btrfs: search_ioctl accepts varying buffer Gerhard Heift
@ 2014-01-27 17:19   ` David Sterba
  0 siblings, 0 replies; 22+ messages in thread
From: David Sterba @ 2014-01-27 17:19 UTC (permalink / raw
  To: Gerhard Heift; +Cc: linux-btrfs

On Mon, Jan 27, 2014 at 02:28:27PM +0100, Gerhard Heift wrote:
> @@ -1931,17 +1931,22 @@ overflow:
>  }
>  
>  static noinline int search_ioctl(struct inode *inode,
> -				 struct btrfs_ioctl_search_args *args)
> +				 struct btrfs_ioctl_search_key *sk,
> +				 size_t buf_size,
> +				 char *buf
> +				 )

Minor style issue, don't put the ) on a separate line,
otherwise ok.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values
  2014-01-27 13:28 ` [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values Gerhard Heift
@ 2014-01-27 17:28   ` David Sterba
  2014-01-28  0:32     ` Gerhard Heift
  2014-01-27 19:06   ` Martin Steigerwald
  1 sibling, 1 reply; 22+ messages in thread
From: David Sterba @ 2014-01-27 17:28 UTC (permalink / raw
  To: Gerhard Heift; +Cc: linux-btrfs

On Mon, Jan 27, 2014 at 02:28:28PM +0100, Gerhard Heift wrote:
> To prevent unexpectet values in the unused fields of the search key fail early.
> Otherwise future extensions would break the behavior of the search if current
> implementations in userspace set them to values other than zero.
> 
> Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1947,6 +1947,9 @@ static noinline int search_ioctl(struct inode *inode,
>  	if (buf_size < sizeof(struct btrfs_ioctl_search_header))
>  		return -EOVERFLOW;
>  
> +	if (sk->unused || sk->unused1 | sk->unused2 | sk->unused3 | sk->unused4)
> +		return -EINVAL;

The pattern that's been used for forward/backward compatibility is to
zero the unused or reserved fields on the userspace side and ignore them
completely in kernel.

If any future version of the ioctl uses the now unused fields, it also
has to increase the version.

> +
>  	path = btrfs_alloc_path();
>  	if (!path)
>  		return -ENOMEM;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2 4/6] btrfs: new ioctl TREE_SEARCH_V2
  2014-01-27 13:28 ` [PATCH RFCv2 4/6] btrfs: new ioctl TREE_SEARCH_V2 Gerhard Heift
@ 2014-01-27 17:41   ` David Sterba
  2014-01-28  0:33     ` Gerhard Heift
  0 siblings, 1 reply; 22+ messages in thread
From: David Sterba @ 2014-01-27 17:41 UTC (permalink / raw
  To: Gerhard Heift; +Cc: linux-btrfs

On Mon, Jan 27, 2014 at 02:28:30PM +0100, Gerhard Heift wrote:
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2032,6 +2032,52 @@ static noinline int btrfs_ioctl_tree_search(struct file *file,
>  	return ret;
>  }
>  
> +static noinline int btrfs_ioctl_tree_search_v2(struct file *file,
> +					   void __user *argp)
> +{
> +	struct btrfs_ioctl_search_args_v2 *args;
> +	struct inode *inode;
> +	int ret;
> +	char *buf;
> +	size_t buf_size;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	/* copy search header and buffer size */
> +	args = memdup_user(argp, sizeof(*args));
> +	if (IS_ERR(args))
> +		return PTR_ERR(args);
> +
> +	buf_size = args->buf_size;
> +
> +	if (buf_size < sizeof(struct btrfs_ioctl_search_header)) {
> +		kfree(args);
> +		return -ENOMEM;

ENOMEM does not seem correct here, it's not a memory allocation failure
but rather an underflow of the buffer size, possibly EINVAL or EOVERFLOW
as the other checks return.

> +	}
> +
> +	/* limit memory */
> +	if (buf_size > PAGE_SIZE * 32)
> +		buf_size = PAGE_SIZE * 32;
> +
> +	buf = memdup_user(argp->buf, buf_size);

Memory allocations are not that easy, getting a contiguous 32 * 4k = 128k
buffer may often fail. And the point of the V2 ioctl was to avoid
allocating the buffer, and do copy_to_user directly.

Also, you remove this code in the next patch, I don't think this level
of patch granularity is necessary.

> +	if (IS_ERR(buf)) {
> +		kfree(args);
> +		return PTR_ERR(buf);
> +	}
> +
> +	inode = file_inode(file);
> +	ret = search_ioctl(inode, &args->key, buf_size, buf);
> +	if (ret == 0 && (
> +		copy_to_user(argp, args, sizeof(*args)) ||
> +		copy_to_user(argp->buf, buf, buf_size)
> +		))
> +		ret = -EFAULT;
> +	kfree(buf);
> +	kfree(args);
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2 5/6] btrfs: search_ioctl: direct copy to userspace
  2014-01-27 13:28 ` [PATCH RFCv2 5/6] btrfs: search_ioctl: direct copy to userspace Gerhard Heift
@ 2014-01-27 18:11   ` David Sterba
  2014-01-28  0:35     ` Gerhard Heift
  0 siblings, 1 reply; 22+ messages in thread
From: David Sterba @ 2014-01-27 18:11 UTC (permalink / raw
  To: Gerhard Heift; +Cc: linux-btrfs

On Mon, Jan 27, 2014 at 02:28:31PM +0100, Gerhard Heift wrote:
> By copying each found item seperatly to userspace, we only need a small buffer
> in the kernel. This allows to run a large search inside of a single call.
> 
> Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
> ---
>  fs/btrfs/ioctl.c | 107 ++++++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 71 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index c44fcdd..38403e6 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1905,20 +1908,47 @@ static noinline int copy_to_sk(struct btrfs_root *root,
>  		sh.transid = found_transid;
>  
>  		/* copy search result header */
> -		memcpy(buf + *sk_offset, &sh, sizeof(sh));
> +		if (copy_to_user(buf + *sk_offset, &sh, sizeof(sh))) {
> +			ret = -EFAULT;
> +			goto err;
> +		}
> +
>  		*sk_offset += sizeof(sh);
>  
>  		if (item_len) {
> -			char *p = buf + *sk_offset;
> +			/* resize internal buffer if needed */
> +			if (content_buffer_size < item_len) {
> +				kfree(content_buffer);
> +
> +				content_buffer_size =
> +					ALIGN(item_len, PAGE_SIZE);
> +
> +				content_buffer = kmalloc_track_caller(
> +					content_buffer_size, GFP_KERNEL);

So that's another allocation. I see it's due to read_extent_buffer()
that wants a kernel memory, and the buffer size is again up to the item
size, potentially spanning multiple pages.

> +				if (!content_buffer) {
> +					content_buffer_size = 0;
> +					ret = -ENOMEM;
> +					goto err;
> +				}
> +			}
> +
>  			/* copy the item */
> -			read_extent_buffer(leaf, p,
> +			read_extent_buffer(leaf, content_buffer,
>  					   item_off, item_len);

My suggestion is to introduce read_extent_buffer_user that does
copy_to_user instead of the memcpy. It is code duplication, but we can
clean it in a separate patch.

> +
> +			if (copy_to_user(buf + *sk_offset,
> +					 content_buffer, item_len)) {
> +				ret = -EFAULT;
> +				goto err;
> +			}
> +
>  			*sk_offset += item_len;
>  		}
>  		(*num_found)++;
>  
>  		if (ret) /* -EOVERFLOW from above */
> -			goto overflow;
> +			goto err;
>  
>  		if (*num_found >= sk->nr_items)
>  			break;
> @@ -1936,14 +1966,25 @@ advance_key:
>  		key->objectid++;
>  	} else
>  		ret = 1;
> -overflow:
> +err:
> +	/*
> +	    0: all items from this leaf copied, continue with next
> +	    1: more items can be copied, but buffer is too small or all items
> +	       were found. Either way, it will stops the loop which iterates to
> +	       the next leaf
> +	    -EOVERFLOW: item was to large for buffer
> +	    -ENOMEM: could not allocate memory for a temporary extent buffer
> +	    -EFAULT: could not copy extent buffer back to userspace
> +	*/

Please use the kernel style for comments.

> +	kfree(content_buffer);
> +
>  	return ret;
>  }
>  
> @@ -2011,34 +2053,39 @@ err:
>  static noinline int btrfs_ioctl_tree_search(struct file *file,
>  					   void __user *argp)
>  {
> -	 struct btrfs_ioctl_search_args *args;
> -	 struct inode *inode;
> -	 int ret;
> +	struct btrfs_ioctl_search_args __user *usarg;
> +	struct btrfs_ioctl_search_key *sk;
> +	struct inode *inode;
> +	int ret;
>  
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> -	args = memdup_user(argp, sizeof(*args));
> -	if (IS_ERR(args))
> -		return PTR_ERR(args);
> +	usarg = (struct btrfs_ioctl_search_args __user *)argp;
> +
> +	sk = memdup_user(&usarg->key, sizeof(*sk));

Search key is 13x u64 = 104 bytes, you can use a local variable. This is
usually possible in ioctl functions, the stack is not deep at this
moment and will usually not grow too much. Removing the allocation makes
it a bit more resilient against ENOMEM conditions.

> +	if (IS_ERR(sk))
> +		return PTR_ERR(sk);
>  
>  	inode = file_inode(file);
> -	ret = search_ioctl(inode, &args->key, sizeof(args->buf), args->buf);
> +	ret = search_ioctl(inode, sk, sizeof(usarg->buf), usarg->buf);

Ok, v1 ioctl uses the same code in the end.

> +	/* the origin ioctl does handle too large results by returning an item
> +	* with a len of 0 */
>  	if (ret == -EOVERFLOW)
>  		ret = 0;
> -	if (ret == 0 && copy_to_user(argp, args, sizeof(*args)))
> +	if (ret == 0 && copy_to_user(argp, sk, sizeof(*sk)))
>  		ret = -EFAULT;
> -	kfree(args);
> +	kfree(sk);
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values
  2014-01-27 13:28 ` [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values Gerhard Heift
  2014-01-27 17:28   ` David Sterba
@ 2014-01-27 19:06   ` Martin Steigerwald
  1 sibling, 0 replies; 22+ messages in thread
From: Martin Steigerwald @ 2014-01-27 19:06 UTC (permalink / raw
  To: Gerhard Heift; +Cc: linux-btrfs

Am Montag, 27. Januar 2014, 14:28:28 schrieb Gerhard Heift:
> To prevent unexpectet values in the unused fields of the search key fail

unexpected

> early. Otherwise future extensions would break the behavior of the search
> if current implementations in userspace set them to values other than zero.
> 
> Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
> ---
>  fs/btrfs/ioctl.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index be4c780..919d928 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1947,6 +1947,9 @@ static noinline int search_ioctl(struct inode *inode,
>  	if (buf_size < sizeof(struct btrfs_ioctl_search_header))
>  		return -EOVERFLOW;
> 
> +	if (sk->unused || sk->unused1 | sk->unused2 | sk->unused3 | sk->unused4)
> +		return -EINVAL;
> +
>  	path = btrfs_alloc_path();
>  	if (!path)
>  		return -ENOMEM;

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2] new ioctl TREE_SEARCH_V2
  2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
                   ` (6 preceding siblings ...)
  2014-01-27 17:15 ` [PATCH RFCv2] new ioctl TREE_SEARCH_V2 David Sterba
@ 2014-01-27 19:10 ` Goffredo Baroncelli
  2014-01-27 19:31   ` Hugo Mills
  2014-01-28  9:29 ` Anand Jain
  8 siblings, 1 reply; 22+ messages in thread
From: Goffredo Baroncelli @ 2014-01-27 19:10 UTC (permalink / raw
  To: Gerhard Heift, linux-btrfs

Hi,

On 2014-01-27 14:28, Gerhard Heift wrote:
> This patch series adds a new ioctl TREE_SEARCH_V2 with which we could store the
> results in a varying buffer. Now even items larger than 3992 bytes or a large
> amount of items can be returned.


One of the main strangeness of the current TREE_SEARCH ioctl, which I
found is the fact that the search was not in a "rectangular" region, but
in a "linear" region.

This is due the fact that after a ioctl call, we have to set the min_*
fields with the sh->{objectid,offset,type} +1 rounding to 0 if case of
overflow.

This because the min_* fields are both the "lower bound" of the search
and the "starting point" of the next search. Adding a new set of fields
named start_* could solve this issue.

I discussed this topic few years ago in [1].

Because we are introducing a new ioctl, is it possible to solve this
issue ? We could avoid some userspace<->kernelspace transition, which
seems be one of the goal of your patch.

BR
G.Baroncelli



[1] http://www.spinics.net/lists/linux-btrfs/msg07617.html



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2] new ioctl TREE_SEARCH_V2
  2014-01-27 19:10 ` Goffredo Baroncelli
@ 2014-01-27 19:31   ` Hugo Mills
  2014-01-27 21:33     ` Goffredo Baroncelli
  0 siblings, 1 reply; 22+ messages in thread
From: Hugo Mills @ 2014-01-27 19:31 UTC (permalink / raw
  To: kreijack; +Cc: Gerhard Heift, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2344 bytes --]

On Mon, Jan 27, 2014 at 08:10:52PM +0100, Goffredo Baroncelli wrote:
> Hi,
> 
> On 2014-01-27 14:28, Gerhard Heift wrote:
> > This patch series adds a new ioctl TREE_SEARCH_V2 with which we could store the
> > results in a varying buffer. Now even items larger than 3992 bytes or a large
> > amount of items can be returned.
> 
> 
> One of the main strangeness of the current TREE_SEARCH ioctl, which I
> found is the fact that the search was not in a "rectangular" region, but
> in a "linear" region.

> This is due the fact that after a ioctl call, we have to set the min_*
> fields with the sh->{objectid,offset,type} +1 rounding to 0 if case of
> overflow.
> 
> This because the min_* fields are both the "lower bound" of the search
> and the "starting point" of the next search. Adding a new set of fields
> named start_* could solve this issue.
> 
> I discussed this topic few years ago in [1].
> 
> Because we are introducing a new ioctl, is it possible to solve this
> issue ? We could avoid some userspace<->kernelspace transition, which
> seems be one of the goal of your patch.

   Aargh, not this again. :(

   The keyspace is *linear*, consisting of a concatenation of three
data fields, sorted lexically. This is how they're stored in the
trees, and it's how they're returned by TREE_SEARCH. You can't
traverse the keyspace with an O(1) "next" operation in anything other
than that order. Each tree represents a linear keyspace (because
that's the fundamental nature of the data structure).

   If you want to treat the keyspace as a 3-dimensional tuple space
(in a subspace of |N³), then you have to search separately in the tree
each time you come to the end of an (objectid, type, _) range. The
other alternative is to retrieve all of the linear range, and filter
out the unwanted keys whilst traversing the sequence.

   Bottom line: it's a linear data structure, not a 3-dimensional one.
Please treat it as such, otherwise things are going to get complicated
and confusing.

   Hugo.

> [1] http://www.spinics.net/lists/linux-btrfs/msg07617.html

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- Great oxymorons of the world, no. 7: The Simple Truth ---      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2] new ioctl TREE_SEARCH_V2
  2014-01-27 19:31   ` Hugo Mills
@ 2014-01-27 21:33     ` Goffredo Baroncelli
  0 siblings, 0 replies; 22+ messages in thread
From: Goffredo Baroncelli @ 2014-01-27 21:33 UTC (permalink / raw
  To: Hugo Mills, Gerhard Heift, linux-btrfs

Hi Hugo,

On 2014-01-27 20:31, Hugo Mills wrote:
>> > One of the main strangeness of the current TREE_SEARCH ioctl, which I
>> > found is the fact that the search was not in a "rectangular" region, but
>> > in a "linear" region.
>> > This is due the fact that after a ioctl call, we have to set the min_*
>> > fields with the sh->{objectid,offset,type} +1 rounding to 0 if case of
>> > overflow.
>> > 
>> > This because the min_* fields are both the "lower bound" of the search
>> > and the "starting point" of the next search. Adding a new set of fields
>> > named start_* could solve this issue.
>> > 
>> > I discussed this topic few years ago in [1].
>> > 
>> > Because we are introducing a new ioctl, is it possible to solve this
>> > issue ? We could avoid some userspace<->kernelspace transition, which
>> > seems be one of the goal of your patch.
>    Aargh, not this again. :(
> 
>    The keyspace is *linear*, consisting of a concatenation of three
> data fields, sorted lexically. This is how they're stored in the
> trees, and it's how they're returned by TREE_SEARCH. 

I never told anything different

> You can't
> traverse the keyspace with an O(1) "next" operation in anything other
> than that order. 

I didn't requested that. I am suggesting to filter out in the kernel
space the "out of range" values, to avoid return them in user space. The
search is still performed in "linear" mode.

Now, the min_* values are updated with the last key found+1 after each
ioctl. I am only suggesting to never reset the min_* values and to add
other three fields to store the "cursor" position.

When a key is found, it is checked in kernel space if it is valid or
none (if it is in the range min_* ... max_*). If it is valid, it is
returned in user space; in *any case* the search starts from the last+1 key.

Today this is not possible, because the min_* values are overridden.

I repeat, the search is still performed linearity (ok, some values could
be skipped but this is a further optimization); I want only to avoid
kernel space returning "out of range" data.

BR
Goffredo


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values
  2014-01-27 17:28   ` David Sterba
@ 2014-01-28  0:32     ` Gerhard Heift
  2014-01-29 17:12       ` David Sterba
  0 siblings, 1 reply; 22+ messages in thread
From: Gerhard Heift @ 2014-01-28  0:32 UTC (permalink / raw
  To: David Sterba, Gerhard Heift, linux-btrfs

2014-01-27 David Sterba <dsterba@suse.cz>:
> On Mon, Jan 27, 2014 at 02:28:28PM +0100, Gerhard Heift wrote:
>> To prevent unexpectet values in the unused fields of the search key fail early.
>> Otherwise future extensions would break the behavior of the search if current
>> implementations in userspace set them to values other than zero.
>>
>> Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1947,6 +1947,9 @@ static noinline int search_ioctl(struct inode *inode,
>>       if (buf_size < sizeof(struct btrfs_ioctl_search_header))
>>               return -EOVERFLOW;
>>
>> +     if (sk->unused || sk->unused1 | sk->unused2 | sk->unused3 | sk->unused4)
>> +             return -EINVAL;
>
> The pattern that's been used for forward/backward compatibility is to
> zero the unused or reserved fields on the userspace side and ignore them
> completely in kernel.

OK, I will dismiss this patch.

> If any future version of the ioctl uses the now unused fields, it also
> has to increase the version.

Just for me to learn: If we have to increase the version, why do we
have the unused fields anyway? We could expand the struct on demand if
we need new fields. Do I miss something?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2 4/6] btrfs: new ioctl TREE_SEARCH_V2
  2014-01-27 17:41   ` David Sterba
@ 2014-01-28  0:33     ` Gerhard Heift
  0 siblings, 0 replies; 22+ messages in thread
From: Gerhard Heift @ 2014-01-28  0:33 UTC (permalink / raw
  To: David Sterba, Gerhard Heift, linux-btrfs

2014-01-27 David Sterba <dsterba@suse.cz>:
> On Mon, Jan 27, 2014 at 02:28:30PM +0100, Gerhard Heift wrote:
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -2032,6 +2032,52 @@ static noinline int btrfs_ioctl_tree_search(struct file *file,
>>       return ret;
>>  }
>>
>> +static noinline int btrfs_ioctl_tree_search_v2(struct file *file,
>> +                                        void __user *argp)
>> +{
>> +     struct btrfs_ioctl_search_args_v2 *args;
>> +     struct inode *inode;
>> +     int ret;
>> +     char *buf;
>> +     size_t buf_size;
>> +
>> +     if (!capable(CAP_SYS_ADMIN))
>> +             return -EPERM;
>> +
>> +     /* copy search header and buffer size */
>> +     args = memdup_user(argp, sizeof(*args));
>> +     if (IS_ERR(args))
>> +             return PTR_ERR(args);
>> +
>> +     buf_size = args->buf_size;
>> +
>> +     if (buf_size < sizeof(struct btrfs_ioctl_search_header)) {
>> +             kfree(args);
>> +             return -ENOMEM;
>
> ENOMEM does not seem correct here, it's not a memory allocation failure
> but rather an underflow of the buffer size, possibly EINVAL or EOVERFLOW
> as the other checks return.

Saw this to during the rewrite, fixed it for the new version.

>> +     }
>> +
>> +     /* limit memory */
>> +     if (buf_size > PAGE_SIZE * 32)
>> +             buf_size = PAGE_SIZE * 32;
>> +
>> +     buf = memdup_user(argp->buf, buf_size);
>
> Memory allocations are not that easy, getting a contiguous 32 * 4k = 128k
> buffer may often fail. And the point of the V2 ioctl was to avoid
> allocating the buffer, and do copy_to_user directly.
>
> Also, you remove this code in the next patch, I don't think this level
> of patch granularity is necessary.

In the new version I do it the other way around, first direct copy,
then the new ioctl, so this is not necessary any more.

>> +     if (IS_ERR(buf)) {
>> +             kfree(args);
>> +             return PTR_ERR(buf);
>> +     }
>> +
>> +     inode = file_inode(file);
>> +     ret = search_ioctl(inode, &args->key, buf_size, buf);
>> +     if (ret == 0 && (
>> +             copy_to_user(argp, args, sizeof(*args)) ||
>> +             copy_to_user(argp->buf, buf, buf_size)
>> +             ))
>> +             ret = -EFAULT;
>> +     kfree(buf);
>> +     kfree(args);
>> +     return ret;
>> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2 5/6] btrfs: search_ioctl: direct copy to userspace
  2014-01-27 18:11   ` David Sterba
@ 2014-01-28  0:35     ` Gerhard Heift
  0 siblings, 0 replies; 22+ messages in thread
From: Gerhard Heift @ 2014-01-28  0:35 UTC (permalink / raw
  To: David Sterba, Gerhard Heift, linux-btrfs

2014-01-27 David Sterba <dsterba@suse.cz>:
> On Mon, Jan 27, 2014 at 02:28:31PM +0100, Gerhard Heift wrote:
>> By copying each found item seperatly to userspace, we only need a small buffer
>> in the kernel. This allows to run a large search inside of a single call.
>>
>> Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
>> ---
>>  fs/btrfs/ioctl.c | 107 ++++++++++++++++++++++++++++++++++++-------------------
>>  1 file changed, 71 insertions(+), 36 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index c44fcdd..38403e6 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1905,20 +1908,47 @@ static noinline int copy_to_sk(struct btrfs_root *root,
>>               sh.transid = found_transid;
>>
>>               /* copy search result header */
>> -             memcpy(buf + *sk_offset, &sh, sizeof(sh));
>> +             if (copy_to_user(buf + *sk_offset, &sh, sizeof(sh))) {
>> +                     ret = -EFAULT;
>> +                     goto err;
>> +             }
>> +
>>               *sk_offset += sizeof(sh);
>>
>>               if (item_len) {
>> -                     char *p = buf + *sk_offset;
>> +                     /* resize internal buffer if needed */
>> +                     if (content_buffer_size < item_len) {
>> +                             kfree(content_buffer);
>> +
>> +                             content_buffer_size =
>> +                                     ALIGN(item_len, PAGE_SIZE);
>> +
>> +                             content_buffer = kmalloc_track_caller(
>> +                                     content_buffer_size, GFP_KERNEL);
>
> So that's another allocation. I see it's due to read_extent_buffer()
> that wants a kernel memory, and the buffer size is again up to the item
> size, potentially spanning multiple pages.
>
>> +                             if (!content_buffer) {
>> +                                     content_buffer_size = 0;
>> +                                     ret = -ENOMEM;
>> +                                     goto err;
>> +                             }
>> +                     }
>> +
>>                       /* copy the item */
>> -                     read_extent_buffer(leaf, p,
>> +                     read_extent_buffer(leaf, content_buffer,
>>                                          item_off, item_len);
>
> My suggestion is to introduce read_extent_buffer_user that does
> copy_to_user instead of the memcpy. It is code duplication, but we can
> clean it in a separate patch.

This was an approach I have already tried, but it did not work for me.
After a few calls the kernel was frozen. I did not know, what caused
the freeze, perhaps it was a wrong change I made while testing.

I check, if it works now.

>> +
>> +                     if (copy_to_user(buf + *sk_offset,
>> +                                      content_buffer, item_len)) {
>> +                             ret = -EFAULT;
>> +                             goto err;
>> +                     }
>> +
>>                       *sk_offset += item_len;
>>               }
>>               (*num_found)++;
>>
>>               if (ret) /* -EOVERFLOW from above */
>> -                     goto overflow;
>> +                     goto err;
>>
>>               if (*num_found >= sk->nr_items)
>>                       break;
>> @@ -1936,14 +1966,25 @@ advance_key:
>>               key->objectid++;
>>       } else
>>               ret = 1;
>> -overflow:
>> +err:
>> +     /*
>> +         0: all items from this leaf copied, continue with next
>> +         1: more items can be copied, but buffer is too small or all items
>> +            were found. Either way, it will stops the loop which iterates to
>> +            the next leaf
>> +         -EOVERFLOW: item was to large for buffer
>> +         -ENOMEM: could not allocate memory for a temporary extent buffer
>> +         -EFAULT: could not copy extent buffer back to userspace
>> +     */
>
> Please use the kernel style for comments.

Fixed.

>> +     kfree(content_buffer);
>> +
>>       return ret;
>>  }
>>
>> @@ -2011,34 +2053,39 @@ err:
>>  static noinline int btrfs_ioctl_tree_search(struct file *file,
>>                                          void __user *argp)
>>  {
>> -      struct btrfs_ioctl_search_args *args;
>> -      struct inode *inode;
>> -      int ret;
>> +     struct btrfs_ioctl_search_args __user *usarg;
>> +     struct btrfs_ioctl_search_key *sk;
>> +     struct inode *inode;
>> +     int ret;
>>
>>       if (!capable(CAP_SYS_ADMIN))
>>               return -EPERM;
>>
>> -     args = memdup_user(argp, sizeof(*args));
>> -     if (IS_ERR(args))
>> -             return PTR_ERR(args);
>> +     usarg = (struct btrfs_ioctl_search_args __user *)argp;
>> +
>> +     sk = memdup_user(&usarg->key, sizeof(*sk));
>
> Search key is 13x u64 = 104 bytes, you can use a local variable. This is
> usually possible in ioctl functions, the stack is not deep at this
> moment and will usually not grow too much. Removing the allocation makes
> it a bit more resilient against ENOMEM conditions.

Thanks for the hint. In the new version I will use local variables for
the search argument in both versions.

>> +     if (IS_ERR(sk))
>> +             return PTR_ERR(sk);
>>
>>       inode = file_inode(file);
>> -     ret = search_ioctl(inode, &args->key, sizeof(args->buf), args->buf);
>> +     ret = search_ioctl(inode, sk, sizeof(usarg->buf), usarg->buf);
>
> Ok, v1 ioctl uses the same code in the end.
>
>> +     /* the origin ioctl does handle too large results by returning an item
>> +     * with a len of 0 */
>>       if (ret == -EOVERFLOW)
>>               ret = 0;
>> -     if (ret == 0 && copy_to_user(argp, args, sizeof(*args)))
>> +     if (ret == 0 && copy_to_user(argp, sk, sizeof(*sk)))
>>               ret = -EFAULT;
>> -     kfree(args);
>> +     kfree(sk);
>>       return ret;
>>  }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2] new ioctl TREE_SEARCH_V2
  2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
                   ` (7 preceding siblings ...)
  2014-01-27 19:10 ` Goffredo Baroncelli
@ 2014-01-28  9:29 ` Anand Jain
  2014-01-28 12:51   ` Gerhard Heift
  8 siblings, 1 reply; 22+ messages in thread
From: Anand Jain @ 2014-01-28  9:29 UTC (permalink / raw
  To: Gerhard Heift, linux-btrfs



  You may have covered this but its not explicit.
  Could you write few lines on whats wrong with the current
  TREE SEARCH and how V2 is helping.

Thanks, Anand



On 01/27/2014 09:28 PM, Gerhard Heift wrote:
> This patch series adds a new ioctl TREE_SEARCH_V2 with which we could store the
> results in a varying buffer. Now even items larger than 3992 bytes or a large
> amount of items can be returned.
>
> I have a few questions:
>    Which value should I assign to TREE_SEARCH_V2?
>    Should we limit the buffer size?
>    What about documentation?
>
> Changelog
>
> RFCv2
>    * fixed a build bug caused by using a wrong patch
>    * added a patch to expand buffer lifetime
>
> Gerhard
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2] new ioctl TREE_SEARCH_V2
  2014-01-28  9:29 ` Anand Jain
@ 2014-01-28 12:51   ` Gerhard Heift
  0 siblings, 0 replies; 22+ messages in thread
From: Gerhard Heift @ 2014-01-28 12:51 UTC (permalink / raw
  To: Anand Jain; +Cc: linux-btrfs

2014-01-28 Anand Jain <Anand.Jain@oracle.com>:
>  You may have covered this but its not explicit.
>  Could you write few lines on whats wrong with the current
>  TREE SEARCH and how V2 is helping.

I wrote a program, which basically calculates a checksum over a files
content checksums. First I got the fiemap of the file and then
searched for the csum items with

  objectid: BTRFS_EXTENT_CSUM_OBJECTID;
  type: BTRFS_EXTENT_CSUM_KEY;
  offset: [physical pages]

This worked good but sometimes resulted in a empty items, because they
did not fit in the given buffer of 3992 bytes, specially for large
files. As David worte in [1], EXTENT_CSUM items seem to cap at 16k. A
simple solution would have been to just expand the static buffer size,
but with this solution, the user can decide how much buffer is needed.
Either way, directly copying the data to userspace with
read_extent_buffer_to_user (as suggested in [2]) will eliminate the
double-copy of data and needs less memory in the kernel.

[1] http://article.gmane.org/gmane.comp.file-systems.btrfs/31765
[2] http://article.gmane.org/gmane.comp.file-systems.btrfs/32064

> Thanks, Anand
>
>
>
>
> On 01/27/2014 09:28 PM, Gerhard Heift wrote:
>>
>> This patch series adds a new ioctl TREE_SEARCH_V2 with which we could
>> store the
>> results in a varying buffer. Now even items larger than 3992 bytes or a
>> large
>> amount of items can be returned.
>>
>> I have a few questions:
>>    Which value should I assign to TREE_SEARCH_V2?
>>    Should we limit the buffer size?
>>    What about documentation?
>>
>> Changelog
>>
>> RFCv2
>>    * fixed a build bug caused by using a wrong patch
>>    * added a patch to expand buffer lifetime
>>
>> Gerhard
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values
  2014-01-28  0:32     ` Gerhard Heift
@ 2014-01-29 17:12       ` David Sterba
  0 siblings, 0 replies; 22+ messages in thread
From: David Sterba @ 2014-01-29 17:12 UTC (permalink / raw
  To: Gerhard Heift; +Cc: David Sterba, linux-btrfs

On Tue, Jan 28, 2014 at 01:32:36AM +0100, Gerhard Heift wrote:
> > If any future version of the ioctl uses the now unused fields, it also
> > has to increase the version.
> 
> Just for me to learn: If we have to increase the version, why do we
> have the unused fields anyway? We could expand the struct on demand if
> we need new fields. Do I miss something?

The spare fields can be used for minor updates, eg. returning a simple
number or status information, or passing some flags down to the ioctl.

Your patch does a major change to the crucial member of the ioctl
structure, the existing unused fileds don't help here and will stay
unused.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-01-29 17:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-27 13:28 [PATCH RFCv2] new ioctl TREE_SEARCH_V2 Gerhard Heift
2014-01-27 13:28 ` [PATCH RFCv2 1/6] btrfs: search_ioctl accepts varying buffer Gerhard Heift
2014-01-27 17:19   ` David Sterba
2014-01-27 13:28 ` [PATCH RFCv2 2/6] btrfs: search_ioctl rejects unused setted values Gerhard Heift
2014-01-27 17:28   ` David Sterba
2014-01-28  0:32     ` Gerhard Heift
2014-01-29 17:12       ` David Sterba
2014-01-27 19:06   ` Martin Steigerwald
2014-01-27 13:28 ` [PATCH RFCv2 3/6] btrfs: copy_to_sk returns EOVERFLOW for too small buffer Gerhard Heift
2014-01-27 13:28 ` [PATCH RFCv2 4/6] btrfs: new ioctl TREE_SEARCH_V2 Gerhard Heift
2014-01-27 17:41   ` David Sterba
2014-01-28  0:33     ` Gerhard Heift
2014-01-27 13:28 ` [PATCH RFCv2 5/6] btrfs: search_ioctl: direct copy to userspace Gerhard Heift
2014-01-27 18:11   ` David Sterba
2014-01-28  0:35     ` Gerhard Heift
2014-01-27 13:28 ` [PATCH RFCv2 6/6] btrfs: in tree_search extent buffer lifetime Gerhard Heift
2014-01-27 17:15 ` [PATCH RFCv2] new ioctl TREE_SEARCH_V2 David Sterba
2014-01-27 19:10 ` Goffredo Baroncelli
2014-01-27 19:31   ` Hugo Mills
2014-01-27 21:33     ` Goffredo Baroncelli
2014-01-28  9:29 ` Anand Jain
2014-01-28 12:51   ` Gerhard Heift

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.