All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: juncheng bai <baijuncheng@unitedstack.com>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: idryomov@redhat.com, Alex Elder <elder@linaro.org>,
	Josh Durgin <josh.durgin@inktank.com>,
	Guangliang Zhao <lucienchao@gmail.com>,
	jeff@garzik.org, yehuda@hq.newdream.net,
	Sage Weil <sage@newdream.net>,
	elder@inktank.com,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH RFC] storage:rbd: make the size of request is equal to the, size of the object
Date: Tue, 16 Jun 2015 22:14:30 +0800	[thread overview]
Message-ID: <55802F46.7020804@unitedstack.com> (raw)
In-Reply-To: <CAOi1vP-jnE+1-TA5P2=rGx9Bdoc4fTyp7u0g+QDpksm9RTxAqA@mail.gmail.com>



On 2015/6/16 21:30, Ilya Dryomov wrote:
> On Tue, Jun 16, 2015 at 2:57 PM, juncheng bai
> <baijuncheng@unitedstack.com> wrote:
>>
>>
>> On 2015/6/16 16:37, Ilya Dryomov wrote:
>>>
>>> On Tue, Jun 16, 2015 at 6:28 AM, juncheng bai
>>> <baijuncheng@unitedstack.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2015/6/15 22:27, Ilya Dryomov wrote:
>>>>>
>>>>>
>>>>> On Mon, Jun 15, 2015 at 4:23 PM, juncheng bai
>>>>> <baijuncheng@unitedstack.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2015/6/15 21:03, Ilya Dryomov wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 15, 2015 at 2:18 PM, juncheng bai
>>>>>>> <baijuncheng@unitedstack.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     From 6213215bd19926d1063d4e01a248107dab8a899b Mon Sep 17 00:00:00
>>>>>>>> 2001
>>>>>>>> From: juncheng bai <baijuncheng@unitedstack.com>
>>>>>>>> Date: Mon, 15 Jun 2015 18:34:00 +0800
>>>>>>>> Subject: [PATCH] storage:rbd: make the size of request is equal to
>>>>>>>> the
>>>>>>>>      size of the object
>>>>>>>>
>>>>>>>> ensures that the merged size of request can achieve the size of
>>>>>>>> the object.
>>>>>>>> when merge a bio to request or merge a request to request, the
>>>>>>>> sum of the segment number of the current request and the segment
>>>>>>>> number of the bio is not greater than the max segments of the
>>>>>>>> request,
>>>>>>>> so the max size of request is 512k if the max segments of request is
>>>>>>>> BLK_MAX_SEGMENTS.
>>>>>>>>
>>>>>>>> Signed-off-by: juncheng bai <baijuncheng@unitedstack.com>
>>>>>>>> ---
>>>>>>>>      drivers/block/rbd.c | 2 ++
>>>>>>>>      1 file changed, 2 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>>>>>>> index 0a54c58..dec6045 100644
>>>>>>>> --- a/drivers/block/rbd.c
>>>>>>>> +++ b/drivers/block/rbd.c
>>>>>>>> @@ -3757,6 +3757,8 @@ static int rbd_init_disk(struct rbd_device
>>>>>>>> *rbd_dev)
>>>>>>>>             segment_size = rbd_obj_bytes(&rbd_dev->header);
>>>>>>>>             blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE);
>>>>>>>>             blk_queue_max_segment_size(q, segment_size);
>>>>>>>> +       if (segment_size > BLK_MAX_SEGMENTS * PAGE_SIZE)
>>>>>>>> +               blk_queue_max_segments(q, segment_size / PAGE_SIZE);
>>>>>>>>             blk_queue_io_min(q, segment_size);
>>>>>>>>             blk_queue_io_opt(q, segment_size);
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I made a similar patch on Friday, investigating blk-mq plugging issue
>>>>>>> reported by Nick.  My patch sets it to BIO_MAX_PAGES unconditionally -
>>>>>>> AFAIU there is no point in setting to anything bigger since the bios
>>>>>>> will be clipped to that number of vecs.  Given that BIO_MAX_PAGES is
>>>>>>> 256, this gives is 1M direct I/Os.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi. For signal bio, the max number of bio_vec is BIO_MAX_PAGES, but a
>>>>>> request can be merged from multiple bios. We can see the below
>>>>>> function:
>>>>>> ll_back_merge_fn, ll_front_merge_fn and etc.
>>>>>> And I test in kernel 3.18 use this patch, and do:
>>>>>> echo 4096 > /sys/block/rbd0/queue/max_sectors_kb
>>>>>> We use systemtap to trace the request size, It is upto 4M.
>>>>>
>>>>>
>>>>>
>>>>> Kernel 3.18 is pre rbd blk-mq transition, which happened in 4.0.  You
>>>>> should test whatever patches you have with at least 4.0.
>>>>>
>>>>> Putting that aside, I must be missing something.  You'll get 4M
>>>>> requests on 3.18 both with your patch and without it, the only
>>>>> difference would be the size of bios being merged - 512k vs 1M.  Can
>>>>> you describe your test workload and provide before and after traces?
>>>>>
>>>> Hi. I update kernel version to 4.0.5. The test information as shown
>>>> below:
>>>> The base information:
>>>> 03:28:13-root@server-186:~$uname -r
>>>> 4.0.5
>>>>
>>>> My simple systemtap script:
>>>> probe module("rbd").function("rbd_img_request_create")
>>>> {
>>>>       printf("offset:%lu length:%lu\n", ulong_arg(2), ulong_arg(3));
>>>> }
>>>>
>>>> I use dd to execute the test case:
>>>> dd if=/dev/zero  of=/dev/rbd0 bs=4M count=1 oflag=direct
>>>>
>>>> Case one: Without patch
>>>> 03:30:23-root@server-186:~$cat /sys/block/rbd0/queue/max_sectors_kb
>>>> 4096
>>>> 03:30:35-root@server-186:~$cat /sys/block/rbd0/queue/max_segments
>>>> 128
>>>>
>>>> The output of systemtap for nornal data:
>>>> offset:0 length:524288
>>>> offset:524288 length:524288
>>>> offset:1048576 length:524288
>>>> offset:1572864 length:524288
>>>> offset:2097152 length:524288
>>>> offset:2621440 length:524288
>>>> offset:3145728 length:524288
>>>> offset:3670016 length:524288
>>>>
>>>> Case two:With patch
>>>> cat /sys/block/rbd0/queue/max_sectors_kb
>>>> 4096
>>>> 03:49:14-root@server-186:linux-4.0.5$cat
>>>> /sys/block/rbd0/queue/max_segments
>>>> 1024
>>>> The output of systemtap for nornal data:
>>>> offset:0 length:1048576
>>>> offset:1048576 length:1048576
>>>> offset:2097152 length:1048576
>>>> offset:3145728 length:1048576
>>>>
>>>> According to the test, you are right.
>>>> Because the blk-mq doesn't use any scheduling policy.
>>>> 03:52:13-root@server-186:linux-4.0.5$cat /sys/block/rbd0/queue/scheduler
>>>> none
>>>>
>>>> In previous versions of the kernel 4.0, the rbd use the defualt
>>>> scheduler:cfq
>>>>
>>>> So, I think that the blk-mq need to do more?
>>>
>>>
>>> There is no scheduler support in blk-mq as of now but your numbers
>>> don't have anything to do with that.  The current behaviour is a result
>>> of a bug in blk-mq.  It's fixed by [1], if you apply it you should see
>>> 4M requests with your stap script.
>>>
>>> [1] http://article.gmane.org/gmane.linux.kernel/1941750
>>>
>> Hi.
>> First, Let's look at the result in the kernel version 3.18
>> The function blk_limits_max_hw_sectors different implemention between 3.18
>> and 4.0+. We need do:
>> echo 4094 >/sys/block/rbd0/queue/max_sectors_kb
>>
>> The rbd device information:
>> 11:13:18-root@server-186:~$cat /sys/block/rbd0/queue/max_sectors_kb
>> 4094
>> 11:15:28-root@server-186:~$cat /sys/block/rbd0/queue/max_segments
>> 1024
>>
>> The test command:
>> dd if=/dev/zero of=/dev/rbd0 bs=4M count=1
>>
>> The simple stap script:
>> probe module("rbd").function("rbd_img_request_create")
>> {
>>      printf("offset:%lu length:%lu\n", ulong_arg(2), ulong_arg(3));
>> }
>>
>> The output from stap:
>> offset:0 length:4190208
>> offset:21474770944 length:4096
>>
>> Second, thanks for your patch [1].
>> I use the patch [1], and recompile the kernel.
>> The test information as shown below:
>> 12:26:12-root@server-186:$cat /sys/block/rbd0/queue/max_segments
>> 1024
>> 12:26:23-root@server-186:$cat /sys/block/rbd0/queue/max_sectors_kb
>> 4096
>>
>> The test command:
>> dd if=/dev/zero  of=/dev/rbd0 bs=4M count=2 oflag=direct
>>
>> The simple systemtap script:
>> probe module("rbd").function("rbd_img_request_create")
>> {
>>      printf("offset:%lu length:%lu\n", ulong_arg(2), ulong_arg(3));
>> }
>>
>> The output of systemtap for nornal data:
>> offset:0 length:4194304
>> offset:4194304 length:4194304
>> offset:21474770944 length:4096
>
> Sorry, I fail to see the purpose of the above tests.  The test commands
> differ, the kernels differ and it looks like you had your patch applied
> for both tests.  What I'm trying to get you to do is to show me some
> data that will back your claim (which your patch is based on):
>
>>
>> So, I think that the max_segments of request_limits should be divide the
>> object size by PAGE_SIZE.
>
> For that you need to use the same kernel and run the same workload.
> The only difference should be whether your patch is applied or not.
> I still think that setting rbd max_segments to anything above
> BIO_MAX_PAGES is bogus, but I'd be happy to be shown wrong on that
> since that would mean better performance, at least in some
> workloads.
>
Hi.
For cloned image, it will avoid doing copyup if the request size is
equal to the object size, I think that it is the key effect of this
patch.
The big request would result in overtime if the ceph backend is busy
or the network bandwidth is too low.
I suggest that add a module parameter to control the value which
decided by the user settings.

Thanks.
----
juncheng bai

> Thanks,
>
>                  Ilya
>

  reply	other threads:[~2015-06-16 14:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-15 11:18 [PATCH RFC] storage:rbd: make the size of request is equal to the, size of the object juncheng bai
2015-06-15 13:03 ` Ilya Dryomov
2015-06-15 13:23   ` juncheng bai
2015-06-15 14:27     ` Ilya Dryomov
2015-06-16  3:28       ` juncheng bai
2015-06-16  8:37         ` Ilya Dryomov
2015-06-16 11:57           ` juncheng bai
2015-06-16 13:30             ` Ilya Dryomov
2015-06-16 14:14               ` juncheng bai [this message]
2015-06-16 15:51                 ` Ilya Dryomov
2015-06-17  3:04                   ` juncheng bai
2015-06-17  8:24                     ` Ilya Dryomov
2015-06-17  9:47                       ` juncheng bai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55802F46.7020804@unitedstack.com \
    --to=baijuncheng@unitedstack.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=elder@inktank.com \
    --cc=elder@linaro.org \
    --cc=idryomov@gmail.com \
    --cc=idryomov@redhat.com \
    --cc=jeff@garzik.org \
    --cc=josh.durgin@inktank.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucienchao@gmail.com \
    --cc=sage@newdream.net \
    --cc=yehuda@hq.newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.