All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Haomai Wang <haomaiwang@gmail.com>
To: Gregory Farnum <gfarnum@redhat.com>
Cc: Sage Weil <sweil@redhat.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [NewStore]About PGLog Workload With RocksDB
Date: Tue, 8 Sep 2015 22:18:18 +0800	[thread overview]
Message-ID: <CACJqLyb2fTDTev5_gd33ZxSR9hYYZ7QsSxORaHTu31X8FW8P0Q@mail.gmail.com> (raw)
In-Reply-To: <CAJ4mKGZFoScqE3t-fawrkCEPZe2t_BmbbV+tpUF3HBkMUF+8=w@mail.gmail.com>

On Tue, Sep 8, 2015 at 10:12 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
> On Tue, Sep 8, 2015 at 3:06 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
>> Hit "Send" by accident for previous mail. :-(
>>
>> some points about pglog:
>> 1. short-alive but frequency(HIGH)
>
> Is this really true? The default length of the log is 1000 entries,
> and most OSDs have ~100 PGs, so on a hard drive running at 80
> writes/second that's about 100000 seconds (~27 hours) before we delete

SSD is filled in my mind....... Yep, for HDD pglogs it's not a passing
traveller.

The main point I think is pglog, journal data and omap keys are three
types data.

> an entry. In reality most deployments aren't writing that
> quickly....and if something goes wrong with the PG we increase to
> 10000 log entries!
> -Greg
>
>> 2. small and related to the number of pgs
>> 3. typical seq read/write scene
>> 4. doesn't need rich structure like LSM or B-tree to support apis, has
>> obvious different to user-side/other omap keys.
>> 5. a simple loopback impl is efficient and simple
>>
>>
>> On Tue, Sep 8, 2015 at 9:58 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
>>> Hi Sage,
>>>
>>> I notice your post in rocksdb page about make rocksdb aware of short
>>> alive key/value pairs.
>>>
>>> I think it would be great if one keyvalue db impl could support
>>> different key types with different store behaviors. But it looks like
>>> difficult for me to add this feature to an existing db.
>>>
>>> So combine my experience with filestore, I just think let
>>> NewStore/FileStore aware of this short-alive keys(Or just PGLog keys)
>>> could be easy and effective. PGLog owned by PG and maintain the
>>> history of ops. It's alike Journal Data but only have several hundreds
>>> bytes. Actually we only need to have several hundreds MB at most to
>>> store all pgs pglog. For FileStore, we already have FileJournal have a
>>> copy of PGLog, previously I always think about reduce another copy in
>>> leveldb to reduce leveldb calls which consumes lots of cpu cycles. But
>>> it need a lot of works to be done in FileJournal to aware of pglog
>>> things. NewStore doesn't use FileJournal and it should be easier to
>>> settle down my idea(?).
>>>
>>> Actually I think a rados write op in current objectstore impl that
>>> omap key/value pairs hurts performance hugely. Lots of cpu cycles are
>>> consumed and contributes to short-alive keys(pglog). It should be a
>>> obvious optimization point. In the other hands, pglog is dull and
>>> doesn't need rich keyvalue api supports. Maybe a lightweight
>>> filejournal to settle down pglogs keys is also worth to try.
>>>
>>> In short, I think it would be cleaner and easier than improving
>>> rocksdb to impl a pglog-optimization structure to store this.
>>>
>>> PS(off topic): a keyvaluedb benchmark http://sphia.org/benchmarks.html
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat

  reply	other threads:[~2015-09-08 14:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-08 13:58 [NewStore]About PGLog Workload With RocksDB Haomai Wang
2015-09-08 14:06 ` Haomai Wang
2015-09-08 14:12   ` Gregory Farnum
2015-09-08 14:18     ` Haomai Wang [this message]
2015-09-08 15:47     ` Gregory Farnum
2015-09-08 19:19   ` Sage Weil
2015-09-08 19:27     ` Mark Nelson
     [not found]     ` <55EF3639.3060108@redhat.com>
2015-09-08 19:32       ` Sage Weil
     [not found]         ` <6F3FA899187F0043BA1827A69DA2F7CC0361ED99@shsmsx102.ccr.corp.intel.com>
2015-09-14 12:31           ` Sage Weil
2015-09-09  7:28 ` Dałek, Piotr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACJqLyb2fTDTev5_gd33ZxSR9hYYZ7QsSxORaHTu31X8FW8P0Q@mail.gmail.com \
    --to=haomaiwang@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=sweil@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.