raindrops RubyGem user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* [PATCH] Allow Raindrops objects to be backed by a memfd file
@ 2021-11-22  1:03 KJ Tsanaktsidis
  2021-11-22  8:42 ` Eric Wong
  0 siblings, 1 reply; 4+ messages in thread
From: KJ Tsanaktsidis @ 2021-11-22  1:03 UTC (permalink / raw)
  To: raindrops-public; +Cc: KJ Tsanaktsidis

Currently, all memory used by Raindrops is mapped as MAP_ANONYMOUS. This
means that although Raindrops counters can be shared between processes
that have forked from each other, it is not possible to share the
counter values with another, unrelated process.

This patch adds support for backing the Raindrops mapping with a file
descriptor created from memfd_create. The API for doing this is simply:

    Raindrops.new(size, name: "name_of_raindrop")

This will cause Raindrops to call memfd_create("name_of_raindrop") and
use that file descriptor to back the mapping. An unrelated process can
then obtain a copy of this file descriptor (via a Unix domain socket, or
even just by looking for the name in /proc/$pid/fd) and read out the
counter values.

My use-case for this feature is that I want to collect memory statistics
of a Unicorn master process in a way that does not itself cause any
allocations. This is both because that would bias the measurement, but
more importantly because we very tightly control when the GC runs in our
Unicorn masters and any garbage created by high-frequency polling of
GC.stat would potentially live for a long time.

With this solution, we can simply store values from rb_objspace into a
Raindrops counter directly, and read the values out-of-process for
submission to our metrics collection system.
---
 ext/raindrops/extconf.rb  |  1 +
 ext/raindrops/raindrops.c | 81 ++++++++++++++++++++++++++++++++++++---
 test/test_linux.rb        | 15 ++++++++
 3 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/ext/raindrops/extconf.rb b/ext/raindrops/extconf.rb
index 792e509..1ddcdf2 100644
--- a/ext/raindrops/extconf.rb
+++ b/ext/raindrops/extconf.rb
@@ -7,6 +7,7 @@
 
 $CPPFLAGS += " -D_GNU_SOURCE "
 have_func('mremap', 'sys/mman.h')
+have_func('memfd_create', 'sys/mman.h')
 headers = %w(sys/types.h netdb.h string.h sys/socket.h netinet/in.h)
 if have_header('linux/tcp.h')
   headers << 'linux/tcp.h'
diff --git a/ext/raindrops/raindrops.c b/ext/raindrops/raindrops.c
index 837084c..c9bf8c8 100644
--- a/ext/raindrops/raindrops.c
+++ b/ext/raindrops/raindrops.c
@@ -34,6 +34,7 @@ struct raindrops {
 	size_t size;
 	size_t capa;
 	pid_t pid;
+	int fd;
 	struct raindrop *drops;
 };
 
@@ -47,6 +48,11 @@ static void rd_free(void *ptr)
 		if (rv != 0)
 			rb_bug("munmap failed in gc: %s", strerror(errno));
 	}
+	if (r->fd != -1) {
+	  int rv = close(r->fd);
+	  if (rv != 0)
+	    rb_bug("close failed in gc: %s", strerror(errno));
+	}
 
 	xfree(ptr);
 }
@@ -88,34 +94,74 @@ static struct raindrops *get(VALUE self)
 
 /*
  * call-seq:
- *	Raindrops.new(size)	-> raindrops object
+ *	Raindrops.new(size, name: nil)	-> raindrops object
  *
  * Initializes a Raindrops object to hold +size+ counters.  +size+ is
  * only a hint and the actual number of counters the object has is
  * dependent on the CPU model, number of cores, and page size of
  * the machine.  The actual size of the object will always be equal
  * or greater than the specified +size+.
+ * If +name+ is provided, and the platform is supported, the raindrop
+ * memory region will be backed by a memfd object with the provided
+ * name, so that it can be shared with other, non-child processes.
  */
-static VALUE init(VALUE self, VALUE size)
+static VALUE init(int argc, VALUE *argv, VALUE self)
 {
 	struct raindrops *r = DATA_PTR(self);
 	int tries = 1;
 	size_t tmp;
+  VALUE size;
+  VALUE kwargs_hash;
+  ID kwargs_ids[1] = { rb_intern_const("name") };
+  VALUE kwargs[1];
+  VALUE name;
 
 	if (r->drops != MAP_FAILED)
 		rb_raise(rb_eRuntimeError, "already initialized");
 
+  rb_scan_args(argc, argv, "1:", &size, &kwargs_hash);
+  rb_get_kwargs(kwargs_hash, kwargs_ids, 0, 1, kwargs);
+  name = kwargs[0];
+
 	r->size = NUM2SIZET(size);
 	if (r->size < 1)
 		rb_raise(rb_eArgError, "size must be >= 1");
 
+  if (name != Qundef && name != Qnil) {
+#ifdef HAVE_MEMFD_CREATE
+    r->fd = memfd_create(StringValueCStr(name), MFD_CLOEXEC);
+    if (r->fd == -1) {
+      int err = errno;
+      if (err == ENOSYS) {
+        rb_raise(rb_eRuntimeError, "system does not support memfd_create");
+      } else {
+        rb_raise(rb_eRuntimeError, "error calling memfd_create: %s", strerror(err));
+      }
+    }
+#else
+    rb_raise(rb_eRuntimeError, "extension not compiled with memfd_create");
+#endif
+  } else {
+    r->fd = -1;
+  }
+
 	tmp = PAGE_ALIGN(raindrop_size * r->size);
 	r->capa = tmp / raindrop_size;
 	assert(PAGE_ALIGN(raindrop_size * r->capa) == tmp && "not aligned");
 
 retry:
-	r->drops = mmap(NULL, tmp,
-	                PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
+  if (r->fd != -1) {
+    if (ftruncate(r->fd, tmp) == -1) {
+      r->drops = MAP_FAILED;
+    } else {
+    r->drops = mmap(NULL, tmp,
+                    PROT_READ|PROT_WRITE, MAP_SHARED, r->fd, 0);
+    }
+  } else {
+  	r->drops = mmap(NULL, tmp,
+  	                PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
+  }
+
 	if (r->drops == MAP_FAILED) {
 		int err = errno;
 
@@ -153,6 +199,10 @@ static void resize(struct raindrops *r, size_t new_rd_size)
 	if (r->pid != getpid())
 		rb_raise(rb_eRuntimeError, "cannot mremap() from child");
 
+	if (r->fd != -1) {
+	  rb_raise(rb_eRuntimeError, "resize not implemented with named raindrops");
+	}
+
 	rv = mremap(old_address, old_size, new_size, MREMAP_MAYMOVE);
 	if (rv == MAP_FAILED) {
 		int err = errno;
@@ -213,6 +263,18 @@ static VALUE capa(VALUE self)
 	return SIZET2NUM(get(self)->capa);
 }
 
+/*
+ * call-seq:
+ *   rd.fd    -> Integer
+ *
+ * Returns the file descriptor number associated with this Raindrop, if
+ * it was created with a name.
+ */
+static VALUE fd(VALUE self)
+{
+  return INT2NUM(get(self)->fd);
+}
+
 /*
  * call-seq:
  *	rd.dup		-> rd_copy
@@ -223,8 +285,9 @@ static VALUE init_copy(VALUE dest, VALUE source)
 {
 	struct raindrops *dst = DATA_PTR(dest);
 	struct raindrops *src = get(source);
+	VALUE init_argv[1] = { SIZET2NUM(src->size) };
 
-	init(dest, SIZET2NUM(src->size));
+	init(1, init_argv, dest);
 	memcpy(dst->drops, src->drops, raindrop_size * src->size);
 
 	return dest;
@@ -372,6 +435,11 @@ static VALUE evaporate_bang(VALUE self)
 	r->drops = MAP_FAILED;
 	if (munmap(addr, raindrop_size * r->capa) != 0)
 		rb_sys_fail("munmap");
+	if (r->fd != -1) {
+	  if (close(r->fd) != 0)
+	    rb_sys_fail("close");
+	  r->fd = -1;
+	}
 	return Qnil;
 }
 
@@ -433,7 +501,7 @@ void Init_raindrops_ext(void)
 
 	rb_define_alloc_func(cRaindrops, alloc);
 
-	rb_define_method(cRaindrops, "initialize", init, 1);
+	rb_define_method(cRaindrops, "initialize", init, -1);
 	rb_define_method(cRaindrops, "incr", incr, -1);
 	rb_define_method(cRaindrops, "decr", decr, -1);
 	rb_define_method(cRaindrops, "to_ary", to_ary, 0);
@@ -444,6 +512,7 @@ void Init_raindrops_ext(void)
 	rb_define_method(cRaindrops, "capa", capa, 0);
 	rb_define_method(cRaindrops, "initialize_copy", init_copy, 1);
 	rb_define_method(cRaindrops, "evaporate!", evaporate_bang, 0);
+	rb_define_method(cRaindrops, "fd", fd, 0);
 
 #ifdef __linux__
 	Init_raindrops_linux_inet_diag();
diff --git a/test/test_linux.rb b/test/test_linux.rb
index 7808469..b9dc757 100644
--- a/test/test_linux.rb
+++ b/test/test_linux.rb
@@ -278,4 +278,19 @@ def test_tcp_stress_test
     statuses = Process.waitall
     statuses.each { |(_,status)| assert status.success?, status.inspect }
   end if ENV["STRESS"].to_i != 0
+
+  def test_memfd
+    rd = Raindrops.new(1, name: "test_memfd_raindrop")
+    assert rd.fd != -1
+
+    rd.incr(0, 5)
+    assert_equal 5, rd[0]
+
+    raw_data = File.read "/proc/self/fd/#{rd.fd}"
+    assert raw_data.size > 8
+    counter_value = raw_data.unpack("Q")[0]
+    assert_equal 5, counter_value
+
+    rd.evaporate!
+  end
 end if RUBY_PLATFORM =~ /linux/
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Allow Raindrops objects to be backed by a memfd file
  2021-11-22  1:03 [PATCH] Allow Raindrops objects to be backed by a memfd file KJ Tsanaktsidis
@ 2021-11-22  8:42 ` Eric Wong
  2021-11-22 10:13   ` KJ Tsanaktsidis
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Wong @ 2021-11-22  8:42 UTC (permalink / raw)
  To: KJ Tsanaktsidis; +Cc: raindrops-public

KJ Tsanaktsidis <ktsanaktsidis@zendesk.com> wrote:
> My use-case for this feature is that I want to collect memory statistics
> of a Unicorn master process in a way that does not itself cause any
> allocations. This is both because that would bias the measurement, but
> more importantly because we very tightly control when the GC runs in our
> Unicorn masters and any garbage created by high-frequency polling of
> GC.stat would potentially live for a long time.
> 
> With this solution, we can simply store values from rb_objspace into a
> Raindrops counter directly, and read the values out-of-process for
> submission to our metrics collection system.

Thanks for the description and justification.  I will accept
this feature, but some tweaks to the patch will be needed.
If this is too much for you, I'll try to take care of
adjustments (but still give you credit) sometime this week.

See comments inline below...

> diff --git a/ext/raindrops/extconf.rb b/ext/raindrops/extconf.rb
> index 792e509..1ddcdf2 100644
> --- a/ext/raindrops/extconf.rb
> +++ b/ext/raindrops/extconf.rb
> @@ -7,6 +7,7 @@
>  
>  $CPPFLAGS += " -D_GNU_SOURCE "
>  have_func('mremap', 'sys/mman.h')
> +have_func('memfd_create', 'sys/mman.h')
>  headers = %w(sys/types.h netdb.h string.h sys/socket.h netinet/in.h)
>  if have_header('linux/tcp.h')
>    headers << 'linux/tcp.h'
> diff --git a/ext/raindrops/raindrops.c b/ext/raindrops/raindrops.c
> index 837084c..c9bf8c8 100644
> --- a/ext/raindrops/raindrops.c
> +++ b/ext/raindrops/raindrops.c
> @@ -34,6 +34,7 @@ struct raindrops {
>  	size_t size;
>  	size_t capa;
>  	pid_t pid;
> +	int fd;
>  	struct raindrop *drops;
>  };

I suggest using "VALUE io" here.  I think it'll make some
code below easier-to-follow and maintain while offering
some flexibility for portability.

The downside is higher memory use (few hundred for the IO object
if the IO is used, 4 bytes if not on 64-bit).  Adding an rd_mark
callback to call rb_gc_mark will be required, too.

> @@ -47,6 +48,11 @@ static void rd_free(void *ptr)
>  		if (rv != 0)
>  			rb_bug("munmap failed in gc: %s", strerror(errno));
>  	}
> +	if (r->fd != -1) {
> +	  int rv = close(r->fd);
> +	  if (rv != 0)
> +	    rb_bug("close failed in gc: %s", strerror(errno));
> +	}

The using an IO here would eliminate this block.

Regardless, whenever possible and reasonable, favor comparisons
against zero instead of -1 since compilers can generate smaller
code.  Pedantically it's not POSIX in some cases, but glibc does
it, at least.

Also, follow existing indentation conventions (this applies to
any project(*)).  The C code here is mostly formatted the same way
as git.git or the Linux kernel, so it uses hard tabs.

(*) I have deal with indentation styles which annoy me for other
    projects, too.

>  	xfree(ptr);
>  }
> @@ -88,34 +94,74 @@ static struct raindrops *get(VALUE self)
>  
>  /*
>   * call-seq:
> - *	Raindrops.new(size)	-> raindrops object
> + *	Raindrops.new(size, name: nil)	-> raindrops object
>   *
>   * Initializes a Raindrops object to hold +size+ counters.  +size+ is
>   * only a hint and the actual number of counters the object has is
>   * dependent on the CPU model, number of cores, and page size of
>   * the machine.  The actual size of the object will always be equal
>   * or greater than the specified +size+.
> + * If +name+ is provided, and the platform is supported, the raindrop
> + * memory region will be backed by a memfd object with the provided
> + * name, so that it can be shared with other, non-child processes.
>   */
> -static VALUE init(VALUE self, VALUE size)
> +static VALUE init(int argc, VALUE *argv, VALUE self)
>  {
>  	struct raindrops *r = DATA_PTR(self);
>  	int tries = 1;
>  	size_t tmp;
> +  VALUE size;
> +  VALUE kwargs_hash;
> +  ID kwargs_ids[1] = { rb_intern_const("name") };

That may be C99-specific, and I think Ruby itself only went to
C99 in 2.7 or 3.x?  In any case, I don't think it's appropriate
to introduce C99 at this time.

(I like using C99 in other projects, too, but I don't think it's
 appropriate for us, just yet).

> +  VALUE kwargs[1];
> +  VALUE name;
> 
>  	if (r->drops != MAP_FAILED)
>  		rb_raise(rb_eRuntimeError, "already initialized");
>  
> +  rb_scan_args(argc, argv, "1:", &size, &kwargs_hash);
> +  rb_get_kwargs(kwargs_hash, kwargs_ids, 0, 1, kwargs);

rb_get_kwargs is 2.2+ only, isn't it?

I suggest using a Ruby wrapper and rename the C method.

Since you mentioned you were using this for unicorn, that's still
Ruby 1.9.3-compatible and will be 2.0+ for the next release.

I'm OK with bumping raindrops to Ruby 2.0+, for now; but the
requirement should be <= the Ruby version used by its users
(e.g. unicorn).

Fwiw, some other popular gems I keep an eye on (e.g. `sequel',
`mail') tend to be conservative when it comes to Ruby version
compatibility, too.

For portability to non-Linux systems, I wonder if:

	Raindrops.new(size, io: Tempfile.new('foo'))

can also be supported.  I can give this a shot on a FreeBSD VM
I sometimes have access to...

And then maybe drop support for `name:' entirely:

	Raindrops.new(size, io: Raindrops::Memfd.new('foo'))

> +  name = kwargs[0];
> +
>  	r->size = NUM2SIZET(size);
>  	if (r->size < 1)
>  		rb_raise(rb_eArgError, "size must be >= 1");
>  
> +  if (name != Qundef && name != Qnil) {

RTEST() reads more easily, and covers Qfalse.

> +#ifdef HAVE_MEMFD_CREATE
> +    r->fd = memfd_create(StringValueCStr(name), MFD_CLOEXEC);
> +    if (r->fd == -1) {
> +      int err = errno;
> +      if (err == ENOSYS) {
> +        rb_raise(rb_eRuntimeError, "system does not support memfd_create");
> +      } else {
> +        rb_raise(rb_eRuntimeError, "error calling memfd_create: %s", strerror(err));
> +      }

I don't think the special case for ENOSYS is necessary.  And
rb_sys_fail("memfd_create") is sufficient (it matches the
rb_sys_fail(funcname) pattern used in Ruby code.

Something like:

	int fd = memfd_create(StringValueCStr(name), MFD_CLOEXEC);
	if (fd < 0)
		rb_sys_fail("memfd_create");
	rd->io = rb_funcall(klass, rb_intern("for_fd"), 1, INT2NUM(fd));

> +    }
> +#else
> +    rb_raise(rb_eRuntimeError, "extension not compiled with memfd_create");
> +#endif
> +  } else {
> +    r->fd = -1;
> +  }
> +
>  	tmp = PAGE_ALIGN(raindrop_size * r->size);
>  	r->capa = tmp / raindrop_size;
>  	assert(PAGE_ALIGN(raindrop_size * r->capa) == tmp && "not aligned");
>  
>  retry:
> -	r->drops = mmap(NULL, tmp,
> -	                PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
> +  if (r->fd != -1) {
> +    if (ftruncate(r->fd, tmp) == -1) {
> +      r->drops = MAP_FAILED;
> +    } else {
> +    r->drops = mmap(NULL, tmp,
> +                    PROT_READ|PROT_WRITE, MAP_SHARED, r->fd, 0);
> +    }
> +  } else {
> +  	r->drops = mmap(NULL, tmp,
> +  	                PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
> +  }
> +
>  	if (r->drops == MAP_FAILED) {
>  		int err = errno;
>  
> @@ -153,6 +199,10 @@ static void resize(struct raindrops *r, size_t new_rd_size)
>  	if (r->pid != getpid())
>  		rb_raise(rb_eRuntimeError, "cannot mremap() from child");
>  
> +	if (r->fd != -1) {
> +	  rb_raise(rb_eRuntimeError, "resize not implemented with named raindrops");
> +	}
> +
>  	rv = mremap(old_address, old_size, new_size, MREMAP_MAYMOVE);
>  	if (rv == MAP_FAILED) {
>  		int err = errno;
> @@ -213,6 +263,18 @@ static VALUE capa(VALUE self)
>  	return SIZET2NUM(get(self)->capa);
>  }
>  
> +/*
> + * call-seq:
> + *   rd.fd    -> Integer
> + *
> + * Returns the file descriptor number associated with this Raindrop, if
> + * it was created with a name.
> + */
> +static VALUE fd(VALUE self)
> +{
> +  return INT2NUM(get(self)->fd);
> +}

If we go with "VALUE io" as I suggested above, I suggest a
"to_io" method.  That would allow compatibility with other APIs
(e.g. IO.select), and access to IO#stat, IO#close_on_exec=,
IO#ioctl, IO#fileno, and any number of potentially useful
methods.

I've never used memfd before, but the mmap-ed regions are usable
after a regular file FD is closed.  So an FD-starved process
could use rd.to_io.close and continue using the mmap-ed section
as normal.

Regardless, naming a method "fd" is unnecessary, the
function/method called "fileno" means the same thing across
C stdio.h, Ruby, Perl, and likely other languages.

>  /*
>   * call-seq:
>   *	rd.dup		-> rd_copy
> @@ -223,8 +285,9 @@ static VALUE init_copy(VALUE dest, VALUE source)
>  {
>  	struct raindrops *dst = DATA_PTR(dest);
>  	struct raindrops *src = get(source);
> +	VALUE init_argv[1] = { SIZET2NUM(src->size) };
>  
> -	init(dest, SIZET2NUM(src->size));
> +	init(1, init_argv, dest);
>  	memcpy(dst->drops, src->drops, raindrop_size * src->size);

Same comments as before wrt. C compiler and Ruby compatibility.

>  	return dest;
> @@ -372,6 +435,11 @@ static VALUE evaporate_bang(VALUE self)
>  	r->drops = MAP_FAILED;
>  	if (munmap(addr, raindrop_size * r->capa) != 0)
>  		rb_sys_fail("munmap");
> +	if (r->fd != -1) {
> +	  if (close(r->fd) != 0)
> +	    rb_sys_fail("close");
> +	  r->fd = -1;
> +	}

This could just use rb_io_close.  No need to comment on the
rest... (or I'm too sleepy atm and missed something :x)

Anyways thanks again for bringing this up.  Let us know if you
want to tackle handling my suggestions or if you want to wait
for me to do it.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Allow Raindrops objects to be backed by a memfd file
  2021-11-22  8:42 ` Eric Wong
@ 2021-11-22 10:13   ` KJ Tsanaktsidis
  2021-11-22 17:56     ` Eric Wong
  0 siblings, 1 reply; 4+ messages in thread
From: KJ Tsanaktsidis @ 2021-11-22 10:13 UTC (permalink / raw)
  To: Eric Wong; +Cc: raindrops-public

On Mon, Nov 22, 2021 at 7:48 PM Eric Wong <e@80x24.org> wrote:
>
> KJ Tsanaktsidis <ktsanaktsidis@zendesk.com> wrote:
> > My use-case for this feature is that I want to collect memory statistics
> > of a Unicorn master process in a way that does not itself cause any
> > allocations. This is both because that would bias the measurement, but
> > more importantly because we very tightly control when the GC runs in our
> > Unicorn masters and any garbage created by high-frequency polling of
> > GC.stat would potentially live for a long time.
> >
> > With this solution, we can simply store values from rb_objspace into a
> > Raindrops counter directly, and read the values out-of-process for
> > submission to our metrics collection system.
>
> Thanks for the description and justification.  I will accept
> this feature, but some tweaks to the patch will be needed.
> If this is too much for you, I'll try to take care of
> adjustments (but still give you credit) sometime this week.
>
> See comments inline below...
>
> > diff --git a/ext/raindrops/extconf.rb b/ext/raindrops/extconf.rb
> > index 792e509..1ddcdf2 100644
> > --- a/ext/raindrops/extconf.rb
> > +++ b/ext/raindrops/extconf.rb
> > @@ -7,6 +7,7 @@
> >
> >  $CPPFLAGS += " -D_GNU_SOURCE "
> >  have_func('mremap', 'sys/mman.h')
> > +have_func('memfd_create', 'sys/mman.h')
> >  headers = %w(sys/types.h netdb.h string.h sys/socket.h netinet/in.h)
> >  if have_header('linux/tcp.h')
> >    headers << 'linux/tcp.h'
> > diff --git a/ext/raindrops/raindrops.c b/ext/raindrops/raindrops.c
> > index 837084c..c9bf8c8 100644
> > --- a/ext/raindrops/raindrops.c
> > +++ b/ext/raindrops/raindrops.c
> > @@ -34,6 +34,7 @@ struct raindrops {
> >       size_t size;
> >       size_t capa;
> >       pid_t pid;
> > +     int fd;
> >       struct raindrop *drops;
> >  };
>
> I suggest using "VALUE io" here.  I think it'll make some
> code below easier-to-follow and maintain while offering
> some flexibility for portability.
>
> The downside is higher memory use (few hundred for the IO object
> if the IO is used, 4 bytes if not on 64-bit).  Adding an rd_mark
> callback to call rb_gc_mark will be required, too.
>
> > @@ -47,6 +48,11 @@ static void rd_free(void *ptr)
> >               if (rv != 0)
> >                       rb_bug("munmap failed in gc: %s", strerror(errno));
> >       }
> > +     if (r->fd != -1) {
> > +       int rv = close(r->fd);
> > +       if (rv != 0)
> > +         rb_bug("close failed in gc: %s", strerror(errno));
> > +     }
>
> The using an IO here would eliminate this block.
>
> Regardless, whenever possible and reasonable, favor comparisons
> against zero instead of -1 since compilers can generate smaller
> code.  Pedantically it's not POSIX in some cases, but glibc does
> it, at least.
>
> Also, follow existing indentation conventions (this applies to
> any project(*)).  The C code here is mostly formatted the same way
> as git.git or the Linux kernel, so it uses hard tabs.
>
> (*) I have deal with indentation styles which annoy me for other
>     projects, too.
>
> >       xfree(ptr);
> >  }
> > @@ -88,34 +94,74 @@ static struct raindrops *get(VALUE self)
> >
> >  /*
> >   * call-seq:
> > - *   Raindrops.new(size)     -> raindrops object
> > + *   Raindrops.new(size, name: nil)  -> raindrops object
> >   *
> >   * Initializes a Raindrops object to hold +size+ counters.  +size+ is
> >   * only a hint and the actual number of counters the object has is
> >   * dependent on the CPU model, number of cores, and page size of
> >   * the machine.  The actual size of the object will always be equal
> >   * or greater than the specified +size+.
> > + * If +name+ is provided, and the platform is supported, the raindrop
> > + * memory region will be backed by a memfd object with the provided
> > + * name, so that it can be shared with other, non-child processes.
> >   */
> > -static VALUE init(VALUE self, VALUE size)
> > +static VALUE init(int argc, VALUE *argv, VALUE self)
> >  {
> >       struct raindrops *r = DATA_PTR(self);
> >       int tries = 1;
> >       size_t tmp;
> > +  VALUE size;
> > +  VALUE kwargs_hash;
> > +  ID kwargs_ids[1] = { rb_intern_const("name") };
>
> That may be C99-specific, and I think Ruby itself only went to
> C99 in 2.7 or 3.x?  In any case, I don't think it's appropriate
> to introduce C99 at this time.
>
> (I like using C99 in other projects, too, but I don't think it's
>  appropriate for us, just yet).
>
> > +  VALUE kwargs[1];
> > +  VALUE name;
> >
> >       if (r->drops != MAP_FAILED)
> >               rb_raise(rb_eRuntimeError, "already initialized");
> >
> > +  rb_scan_args(argc, argv, "1:", &size, &kwargs_hash);
> > +  rb_get_kwargs(kwargs_hash, kwargs_ids, 0, 1, kwargs);
>
> rb_get_kwargs is 2.2+ only, isn't it?
>
> I suggest using a Ruby wrapper and rename the C method.
>
> Since you mentioned you were using this for unicorn, that's still
> Ruby 1.9.3-compatible and will be 2.0+ for the next release.
>
> I'm OK with bumping raindrops to Ruby 2.0+, for now; but the
> requirement should be <= the Ruby version used by its users
> (e.g. unicorn).
>
> Fwiw, some other popular gems I keep an eye on (e.g. `sequel',
> `mail') tend to be conservative when it comes to Ruby version
> compatibility, too.
>
> For portability to non-Linux systems, I wonder if:
>
>         Raindrops.new(size, io: Tempfile.new('foo'))
>
> can also be supported.  I can give this a shot on a FreeBSD VM
> I sometimes have access to...
>
> And then maybe drop support for `name:' entirely:
>
>         Raindrops.new(size, io: Raindrops::Memfd.new('foo'))
>
> > +  name = kwargs[0];
> > +
> >       r->size = NUM2SIZET(size);
> >       if (r->size < 1)
> >               rb_raise(rb_eArgError, "size must be >= 1");
> >
> > +  if (name != Qundef && name != Qnil) {
>
> RTEST() reads more easily, and covers Qfalse.
>
> > +#ifdef HAVE_MEMFD_CREATE
> > +    r->fd = memfd_create(StringValueCStr(name), MFD_CLOEXEC);
> > +    if (r->fd == -1) {
> > +      int err = errno;
> > +      if (err == ENOSYS) {
> > +        rb_raise(rb_eRuntimeError, "system does not support memfd_create");
> > +      } else {
> > +        rb_raise(rb_eRuntimeError, "error calling memfd_create: %s", strerror(err));
> > +      }
>
> I don't think the special case for ENOSYS is necessary.  And
> rb_sys_fail("memfd_create") is sufficient (it matches the
> rb_sys_fail(funcname) pattern used in Ruby code.
>
> Something like:
>
>         int fd = memfd_create(StringValueCStr(name), MFD_CLOEXEC);
>         if (fd < 0)
>                 rb_sys_fail("memfd_create");
>         rd->io = rb_funcall(klass, rb_intern("for_fd"), 1, INT2NUM(fd));
>
> > +    }
> > +#else
> > +    rb_raise(rb_eRuntimeError, "extension not compiled with memfd_create");
> > +#endif
> > +  } else {
> > +    r->fd = -1;
> > +  }
> > +
> >       tmp = PAGE_ALIGN(raindrop_size * r->size);
> >       r->capa = tmp / raindrop_size;
> >       assert(PAGE_ALIGN(raindrop_size * r->capa) == tmp && "not aligned");
> >
> >  retry:
> > -     r->drops = mmap(NULL, tmp,
> > -                     PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
> > +  if (r->fd != -1) {
> > +    if (ftruncate(r->fd, tmp) == -1) {
> > +      r->drops = MAP_FAILED;
> > +    } else {
> > +    r->drops = mmap(NULL, tmp,
> > +                    PROT_READ|PROT_WRITE, MAP_SHARED, r->fd, 0);
> > +    }
> > +  } else {
> > +     r->drops = mmap(NULL, tmp,
> > +                     PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
> > +  }
> > +
> >       if (r->drops == MAP_FAILED) {
> >               int err = errno;
> >
> > @@ -153,6 +199,10 @@ static void resize(struct raindrops *r, size_t new_rd_size)
> >       if (r->pid != getpid())
> >               rb_raise(rb_eRuntimeError, "cannot mremap() from child");
> >
> > +     if (r->fd != -1) {
> > +       rb_raise(rb_eRuntimeError, "resize not implemented with named raindrops");
> > +     }
> > +
> >       rv = mremap(old_address, old_size, new_size, MREMAP_MAYMOVE);
> >       if (rv == MAP_FAILED) {
> >               int err = errno;
> > @@ -213,6 +263,18 @@ static VALUE capa(VALUE self)
> >       return SIZET2NUM(get(self)->capa);
> >  }
> >
> > +/*
> > + * call-seq:
> > + *   rd.fd    -> Integer
> > + *
> > + * Returns the file descriptor number associated with this Raindrop, if
> > + * it was created with a name.
> > + */
> > +static VALUE fd(VALUE self)
> > +{
> > +  return INT2NUM(get(self)->fd);
> > +}
>
> If we go with "VALUE io" as I suggested above, I suggest a
> "to_io" method.  That would allow compatibility with other APIs
> (e.g. IO.select), and access to IO#stat, IO#close_on_exec=,
> IO#ioctl, IO#fileno, and any number of potentially useful
> methods.
>
> I've never used memfd before, but the mmap-ed regions are usable
> after a regular file FD is closed.  So an FD-starved process
> could use rd.to_io.close and continue using the mmap-ed section
> as normal.
>
> Regardless, naming a method "fd" is unnecessary, the
> function/method called "fileno" means the same thing across
> C stdio.h, Ruby, Perl, and likely other languages.
>
> >  /*
> >   * call-seq:
> >   *   rd.dup          -> rd_copy
> > @@ -223,8 +285,9 @@ static VALUE init_copy(VALUE dest, VALUE source)
> >  {
> >       struct raindrops *dst = DATA_PTR(dest);
> >       struct raindrops *src = get(source);
> > +     VALUE init_argv[1] = { SIZET2NUM(src->size) };
> >
> > -     init(dest, SIZET2NUM(src->size));
> > +     init(1, init_argv, dest);
> >       memcpy(dst->drops, src->drops, raindrop_size * src->size);
>
> Same comments as before wrt. C compiler and Ruby compatibility.
>
> >       return dest;
> > @@ -372,6 +435,11 @@ static VALUE evaporate_bang(VALUE self)
> >       r->drops = MAP_FAILED;
> >       if (munmap(addr, raindrop_size * r->capa) != 0)
> >               rb_sys_fail("munmap");
> > +     if (r->fd != -1) {
> > +       if (close(r->fd) != 0)
> > +         rb_sys_fail("close");
> > +       r->fd = -1;
> > +     }
>
> This could just use rb_io_close.  No need to comment on the
> rest... (or I'm too sleepy atm and missed something :x)
>
> Anyways thanks again for bringing this up.  Let us know if you
> want to tackle handling my suggestions or if you want to wait
> for me to do it.
>

I think I committed the two cardinal sins of mailing list etiquette at
once - replied off list and in HTML /facepalm.

Copying my reply in here again.... sorry!

--------------------------------------------------------------

Hi Eric,

Thank you for the really detailed feedback on this patch! I haven’t really
written much in the way of Ruby C extensions before, so it’s really
valuable to me to get your thoughts.

I’m definitely happy to send another version of this later this week.

At a high level, I think your idea of allowing any file backing, rather than
just memfd, makes sense - both for portability and also because it’s easier
to share an actually linked file between processes in eg Kubernetes.

Anyway - I’ll be in touch soon with another patch, thanks!


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Allow Raindrops objects to be backed by a memfd file
  2021-11-22 10:13   ` KJ Tsanaktsidis
@ 2021-11-22 17:56     ` Eric Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2021-11-22 17:56 UTC (permalink / raw)
  To: KJ Tsanaktsidis; +Cc: raindrops-public

KJ Tsanaktsidis <ktsanaktsidis@zendesk.com> wrote:
> On Mon, Nov 22, 2021 at 7:48 PM Eric Wong <e@80x24.org> wrote:
> > Anyways thanks again for bringing this up.  Let us know if you
> > want to tackle handling my suggestions or if you want to wait
> > for me to do it.
> 
> I think I committed the two cardinal sins of mailing list etiquette at
> once - replied off list and in HTML /facepalm.

Also, no need to quote excessively :)  Anyways, thanks for
correcting yourself.

Fwiw, any attachment costs around 3x more memory for my spam
filter, and decoding HTML is probably 10x or more in terms of
CPU usage compared to plain-text.

I'm also storing and indexing around 15 million emails across
many projects while trying to get as many independently-run
mirrors up as possible; so yes, these bytes matter :)

> Copying my reply in here again.... sorry!
> 
> --------------------------------------------------------------
> 
> Hi Eric,
> 
> Thank you for the really detailed feedback on this patch! I haven’t really
> written much in the way of Ruby C extensions before, so it’s really
> valuable to me to get your thoughts.

No problem.  While I'm mostly retired from Ruby, it's still a
useful break from my other projects for me to review code.

> I’m definitely happy to send another version of this later this week.

Yeah, no rush; holidays and all.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-22 17:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-22  1:03 [PATCH] Allow Raindrops objects to be backed by a memfd file KJ Tsanaktsidis
2021-11-22  8:42 ` Eric Wong
2021-11-22 10:13   ` KJ Tsanaktsidis
2021-11-22 17:56     ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/raindrops.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).