From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 3ACC91F953; Fri, 26 Nov 2021 18:34:01 +0000 (UTC) Date: Fri, 26 Nov 2021 18:34:01 +0000 From: Eric Wong To: KJ Tsanaktsidis Cc: raindrops-public@yhbt.net Subject: Re: [PATCH v2] Allow Raindrops objects to be backed by a file Message-ID: <20211126183400.GA9579@dcvr> References: <20211125065618.3432-1-ktsanaktsidis@zendesk.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20211125065618.3432-1-ktsanaktsidis@zendesk.com> List-Id: KJ Tsanaktsidis wrote: > Thanks for your feedback on my previous patch Eric, I think backing > Raindrops by an arbitrary file (which can actually be on tmpfs) is much > neater than harcdoding some stuff in there for memfd. > > Looking forward to hearing your thoughts on this one! Thanks. A few minor things, more inline, but it looks good and I'm inclined to squash the changes in the --range-diff (below). > --------------------------------------- > > Currently, all memory used by Raindrops is mapped as MAP_ANONYMOUS. This Note that applies to other projects, too: instead of a solid "-------" line, using a "----8<----" line as below (without indentation) for --scissors: ----8<---- Subject: [PATCH] Allow Raindrops objects to be backed by a file The above allows maintainers to use `git am --scissors' to apply a patch while ignoring any text above the "----8<----" line. > Note theat the provided IO object _must_ implement #truncate; this is s/theat/that/ (I can squash this in on my end) > On Linux, counter values can easily be shared between processes > in-memory only without touching the disk by passing in a File on a > tmpfs as the io object. AFAIK this applies to any OS with mmap support, and raindrops already requires mmap. I'll remove the "On Linux," part before pushing. > - * Raindrops.new(size) -> raindrops object > - * > - * Initializes a Raindrops object to hold +size+ counters. +size+ is > - * only a hint and the actual number of counters the object has is > - * dependent on the CPU model, number of cores, and page size of > - * the machine. The actual size of the object will always be equal > - * or greater than the specified +size+. > + * This is the _acutal_ implementation of #initialize - the Ruby wrapper s/acutal/actual/ > + * handles keyword-argument handling then calls this method. > */ > -static VALUE init(VALUE self, VALUE size) > +static VALUE init_cimpl(VALUE self, VALUE size, VALUE io, VALUE zero) > { > struct raindrops *r = DATA_PTR(self); > int tries = 1; > @@ -113,9 +116,18 @@ static VALUE init(VALUE self, VALUE size) > r->capa = tmp / raindrop_size; > assert(PAGE_ALIGN(raindrop_size * r->capa) == tmp && "not aligned"); > > + r->io = io; > + > retry: > - r->drops = mmap(NULL, tmp, > - PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0); > + if (RTEST(r->io)) { > + rb_funcall(r->io, rb_intern("truncate"), 1, SIZET2NUM(tmp)); > + int fd = NUM2INT(rb_funcall(r->io, rb_intern("fileno"), 0)); I'll swap the lines so "int fd = ..." is first in the block to avoid declaration-after-statement problems with older compilers. > + r->drops = mmap(NULL, tmp, > + PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > + } else { > + r->drops = mmap(NULL, tmp, > + PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0); And I'll wrap this at 80-columns wrap (I need giant fonts). > @@ -127,6 +139,10 @@ retry: > } > r->pid = getpid(); > > + if (RTEST(zero)) { > + memset(r->drops, 0, tmp); > + } No need for {} to waste vertical space. Again, I need giant fonts :x Omitting unnecessary {} also matches Linux kernel and git.git coding style. > - * Duplicates and snapshots the current state of a Raindrops object. > + * Duplicates and snapshots the current state of a Raindrops object. Even > + * if the given Raindrops object is backed by a file, the copy will be backed > + * by independent, anonymously mapped memory. > */ > static VALUE init_copy(VALUE dest, VALUE source) > { > struct raindrops *dst = DATA_PTR(dest); > struct raindrops *src = get(source); > > - init(dest, SIZET2NUM(src->size)); > + init_cimpl(dest, SIZET2NUM(src->size), Qnil, Qfalse); > memcpy(dst->drops, src->drops, raindrop_size * src->size); Switching from file-backed to anonymous seems strange to me, here, but I'm not sure of a better solution that's compatible with Object#dup ... I also don't know if anybody uses Raindrops#dup, either. So if you're satisfied with this behavior, that's good enough for me :> > + * to_io -> IO > + * > + * Returns the IO object backing the memory for this raindrop, if > + * one was specified when constructing this Raindrop. If this > + * Raindrop is backed by anonymous memory, this method returns nil. > + */ > +static VALUE to_io(VALUE self) > +{ > + struct raindrops *r = get(self); > + return r->io; > +} OK. There may be a need to call "to_io" on r->io itself (because Tempfile delegates, but isn't actually an IO object itself). I'm not sure, as examples found in the Ruby stdlib seem to contradict each other: * Zlib::GzipFile returns the ->io as-is: https://public-inbox.org/ruby-core/be5f148bcd9f/s/?b=ext/zlib/zlib.c#n3233 * csv calls #to_io: https://public-inbox.org/ruby-core/42e99435cbf2/s/?b=lib/csv.rb#n2102 I'm not sure which is correct, here, actually :x I'm inclined to just leave it as-is, for now, following zlib's example. Any future changes here can be done separately. Everything else looked good. Below is the range-diff of the minor things I'll squash in before pushing (probably in a few days). No need to resend. Also, was there anything else you had planned? (e.g. for memfd) Otherwise, I'm thinking of tagging and releasing a new version in a week or so (at least before Ruby 3.1 on Dec 25) Thanks again. Range-diff: 1: 79be82a ! 1: 18ee38a Allow Raindrops objects to be backed by a file @@ Commit message either by mmap'ing the file itself (or using Raindrops to do so), or by making ordinary seek()/read() calls if performance is not a concern. - Note theat the provided IO object _must_ implement #truncate; this is + Note that the provided IO object _must_ implement #truncate; this is used to set the size of the file to be right-sized for the memory mapping that is created. @@ Commit message attempt to msync the values, so they are not durable to e.g. system crashes). - On Linux, counter values can easily be shared between processes + Counter values can easily be shared between processes in-memory only without touching the disk by passing in a File on a tmpfs as the io object. @@ ext/raindrops/raindrops.c: static struct raindrops *get(VALUE self) - * dependent on the CPU model, number of cores, and page size of - * the machine. The actual size of the object will always be equal - * or greater than the specified +size+. -+ * This is the _acutal_ implementation of #initialize - the Ruby wrapper ++ * This is the _actual_ implementation of #initialize - the Ruby wrapper + * handles keyword-argument handling then calls this method. */ -static VALUE init(VALUE self, VALUE size) @@ ext/raindrops/raindrops.c: static VALUE init(VALUE self, VALUE size) - r->drops = mmap(NULL, tmp, - PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0); + if (RTEST(r->io)) { -+ rb_funcall(r->io, rb_intern("truncate"), 1, SIZET2NUM(tmp)); + int fd = NUM2INT(rb_funcall(r->io, rb_intern("fileno"), 0)); ++ rb_funcall(r->io, rb_intern("truncate"), 1, SIZET2NUM(tmp)); + r->drops = mmap(NULL, tmp, + PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); + } else { + r->drops = mmap(NULL, tmp, -+ PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0); ++ PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, ++ -1, 0); + } if (r->drops == MAP_FAILED) { int err = errno; @@ ext/raindrops/raindrops.c: retry: } r->pid = getpid(); -+ if (RTEST(zero)) { ++ if (RTEST(zero)) + memset(r->drops, 0, tmp); -+ } + return self; }