raindrops RubyGem user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
Cc: raindrops-public@yhbt.net
Subject: Re: [PATCH v2] Allow Raindrops objects to be backed by a file
Date: Fri, 26 Nov 2021 18:34:01 +0000	[thread overview]
Message-ID: <20211126183400.GA9579@dcvr> (raw)
In-Reply-To: <20211125065618.3432-1-ktsanaktsidis@zendesk.com>

KJ Tsanaktsidis <ktsanaktsidis@zendesk.com> wrote:
> Thanks for your feedback on my previous patch Eric, I think backing
> Raindrops by an arbitrary file (which can actually be on tmpfs) is much
> neater than harcdoding some stuff in there for memfd.
> 
> Looking forward to hearing your thoughts on this one!

Thanks.  A few minor things, more inline, but it looks good
and I'm inclined to squash the changes in the --range-diff
(below).

> ---------------------------------------
> 
> Currently, all memory used by Raindrops is mapped as MAP_ANONYMOUS. This

<snip>

Note that applies to other projects, too:  instead of a solid
"-------" line, using a "----8<----" line as below
(without indentation) for --scissors:

	<prose not intended for a git commit message>

	----8<----
	Subject: [PATCH] Allow Raindrops objects to be backed by a file

	<rest of the commit message + patch...>

The above allows maintainers to use `git am --scissors' to apply
a patch while ignoring any text above the "----8<----" line.

<snip>

> Note theat the provided IO object _must_ implement #truncate; this is

s/theat/that/  (I can squash this in on my end)

<snip>

> On Linux, counter values can easily be shared between processes
> in-memory only without touching the disk by passing in a File on a
> tmpfs as the io object.

AFAIK this applies to any OS with mmap support, and raindrops
already requires mmap.  I'll remove the "On Linux," part before
pushing.

<snip>

> - *	Raindrops.new(size)	-> raindrops object
> - *
> - * Initializes a Raindrops object to hold +size+ counters.  +size+ is
> - * only a hint and the actual number of counters the object has is
> - * dependent on the CPU model, number of cores, and page size of
> - * the machine.  The actual size of the object will always be equal
> - * or greater than the specified +size+.
> + * This is the _acutal_ implementation of #initialize - the Ruby wrapper

s/acutal/actual/

> + * handles keyword-argument handling then calls this method.
>   */
> -static VALUE init(VALUE self, VALUE size)
> +static VALUE init_cimpl(VALUE self, VALUE size, VALUE io, VALUE zero)
>  {
>  	struct raindrops *r = DATA_PTR(self);
>  	int tries = 1;
> @@ -113,9 +116,18 @@ static VALUE init(VALUE self, VALUE size)
>  	r->capa = tmp / raindrop_size;
>  	assert(PAGE_ALIGN(raindrop_size * r->capa) == tmp && "not aligned");
>  
> +	r->io = io;
> +
>  retry:
> -	r->drops = mmap(NULL, tmp,
> -	                PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
> +	if (RTEST(r->io)) {
> +		rb_funcall(r->io, rb_intern("truncate"), 1, SIZET2NUM(tmp));
> +		int fd = NUM2INT(rb_funcall(r->io, rb_intern("fileno"), 0));

I'll swap the lines so "int fd = ..." is first in the block
to avoid declaration-after-statement problems with older compilers.

> +		r->drops = mmap(NULL, tmp,
> +				PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> +	} else {
> +		r->drops = mmap(NULL, tmp,
> +				PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);

And I'll wrap this at 80-columns wrap (I need giant fonts).

> @@ -127,6 +139,10 @@ retry:
>  	}
>  	r->pid = getpid();
>  
> +	if (RTEST(zero)) {
> +		memset(r->drops, 0, tmp);
> +	}

No need for {} to waste vertical space.  Again, I need giant
fonts :x   Omitting unnecessary {} also matches Linux kernel
and git.git coding style.

> - * Duplicates and snapshots the current state of a Raindrops object.
> + * Duplicates and snapshots the current state of a Raindrops object. Even
> + * if the given Raindrops object is backed by a file, the copy will be backed
> + * by independent, anonymously mapped memory.
>   */
>  static VALUE init_copy(VALUE dest, VALUE source)
>  {
>  	struct raindrops *dst = DATA_PTR(dest);
>  	struct raindrops *src = get(source);
>  
> -	init(dest, SIZET2NUM(src->size));
> +	init_cimpl(dest, SIZET2NUM(src->size), Qnil, Qfalse);
>  	memcpy(dst->drops, src->drops, raindrop_size * src->size);

Switching from file-backed to anonymous seems strange to me,
here, but I'm not sure of a better solution that's compatible
with Object#dup ...  I also don't know if anybody uses
Raindrops#dup, either.  So if you're satisfied with this
behavior, that's good enough for me :>

> + * 	to_io	-> IO
> + *
> + * Returns the IO object backing the memory for this raindrop, if
> + * one was specified when constructing this Raindrop. If this
> + * Raindrop is backed by anonymous memory, this method returns nil.
> + */
> +static VALUE to_io(VALUE self)
> +{
> +	struct raindrops *r = get(self);
> +	return r->io;
> +}

OK.  There may be a need to call "to_io" on r->io itself
(because Tempfile delegates, but isn't actually an IO object
itself).  I'm not sure, as examples found in the Ruby stdlib
seem to contradict each other:

* Zlib::GzipFile returns the ->io as-is:
  https://public-inbox.org/ruby-core/be5f148bcd9f/s/?b=ext/zlib/zlib.c#n3233

* csv calls #to_io:
  https://public-inbox.org/ruby-core/42e99435cbf2/s/?b=lib/csv.rb#n2102

I'm not sure which is correct, here, actually :x  I'm inclined to
just leave it as-is, for now, following zlib's example.  Any
future changes here can be done separately.

Everything else looked good.  Below is the range-diff of the
minor things I'll squash in before pushing (probably in a few
days).  No need to resend.

Also, was there anything else you had planned? (e.g. for memfd)
Otherwise, I'm thinking of tagging and releasing a new version
in a week or so (at least before Ruby 3.1 on Dec 25)

Thanks again.

Range-diff:
1:  79be82a ! 1:  18ee38a Allow Raindrops objects to be backed by a file
    @@ Commit message
         either by mmap'ing the file itself (or using Raindrops to do so), or by
         making ordinary seek()/read() calls if performance is not a concern.
     
    -    Note theat the provided IO object _must_ implement #truncate; this is
    +    Note that the provided IO object _must_ implement #truncate; this is
         used to set the size of the file to be right-sized for the memory
         mapping that is created.
     
    @@ Commit message
         attempt to msync the values, so they are not durable to e.g. system
         crashes).
     
    -    On Linux, counter values can easily be shared between processes
    +    Counter values can easily be shared between processes
         in-memory only without touching the disk by passing in a File on a
         tmpfs as the io object.
     
    @@ ext/raindrops/raindrops.c: static struct raindrops *get(VALUE self)
     - * dependent on the CPU model, number of cores, and page size of
     - * the machine.  The actual size of the object will always be equal
     - * or greater than the specified +size+.
    -+ * This is the _acutal_ implementation of #initialize - the Ruby wrapper
    ++ * This is the _actual_ implementation of #initialize - the Ruby wrapper
     + * handles keyword-argument handling then calls this method.
       */
     -static VALUE init(VALUE self, VALUE size)
    @@ ext/raindrops/raindrops.c: static VALUE init(VALUE self, VALUE size)
     -	r->drops = mmap(NULL, tmp,
     -	                PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
     +	if (RTEST(r->io)) {
    -+		rb_funcall(r->io, rb_intern("truncate"), 1, SIZET2NUM(tmp));
     +		int fd = NUM2INT(rb_funcall(r->io, rb_intern("fileno"), 0));
    ++		rb_funcall(r->io, rb_intern("truncate"), 1, SIZET2NUM(tmp));
     +		r->drops = mmap(NULL, tmp,
     +				PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
     +	} else {
     +		r->drops = mmap(NULL, tmp,
    -+				PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
    ++				PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED,
    ++				-1, 0);
     +	}
      	if (r->drops == MAP_FAILED) {
      		int err = errno;
    @@ ext/raindrops/raindrops.c: retry:
      	}
      	r->pid = getpid();
      
    -+	if (RTEST(zero)) {
    ++	if (RTEST(zero))
     +		memset(r->drops, 0, tmp);
    -+	}
     +
      	return self;
      }

  reply	other threads:[~2021-11-26 18:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-25  6:56 [PATCH v2] Allow Raindrops objects to be backed by a file KJ Tsanaktsidis
2021-11-26 18:34 ` Eric Wong [this message]
2021-11-29  0:13   ` KJ Tsanaktsidis
2021-11-30  6:49     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/raindrops/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211126183400.GA9579@dcvr \
    --to=e@80x24.org \
    --cc=ktsanaktsidis@zendesk.com \
    --cc=raindrops-public@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/raindrops.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).