From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 2E7451F9F4; Mon, 22 Nov 2021 08:42:00 +0000 (UTC) Date: Mon, 22 Nov 2021 08:42:00 +0000 From: Eric Wong To: KJ Tsanaktsidis Cc: raindrops-public@yhbt.net Subject: Re: [PATCH] Allow Raindrops objects to be backed by a memfd file Message-ID: <20211122084200.GA10182@dcvr> References: <20211122010336.43463-1-ktsanaktsidis@zendesk.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20211122010336.43463-1-ktsanaktsidis@zendesk.com> List-Id: KJ Tsanaktsidis wrote: > My use-case for this feature is that I want to collect memory statistics > of a Unicorn master process in a way that does not itself cause any > allocations. This is both because that would bias the measurement, but > more importantly because we very tightly control when the GC runs in our > Unicorn masters and any garbage created by high-frequency polling of > GC.stat would potentially live for a long time. > > With this solution, we can simply store values from rb_objspace into a > Raindrops counter directly, and read the values out-of-process for > submission to our metrics collection system. Thanks for the description and justification. I will accept this feature, but some tweaks to the patch will be needed. If this is too much for you, I'll try to take care of adjustments (but still give you credit) sometime this week. See comments inline below... > diff --git a/ext/raindrops/extconf.rb b/ext/raindrops/extconf.rb > index 792e509..1ddcdf2 100644 > --- a/ext/raindrops/extconf.rb > +++ b/ext/raindrops/extconf.rb > @@ -7,6 +7,7 @@ > > $CPPFLAGS += " -D_GNU_SOURCE " > have_func('mremap', 'sys/mman.h') > +have_func('memfd_create', 'sys/mman.h') > headers = %w(sys/types.h netdb.h string.h sys/socket.h netinet/in.h) > if have_header('linux/tcp.h') > headers << 'linux/tcp.h' > diff --git a/ext/raindrops/raindrops.c b/ext/raindrops/raindrops.c > index 837084c..c9bf8c8 100644 > --- a/ext/raindrops/raindrops.c > +++ b/ext/raindrops/raindrops.c > @@ -34,6 +34,7 @@ struct raindrops { > size_t size; > size_t capa; > pid_t pid; > + int fd; > struct raindrop *drops; > }; I suggest using "VALUE io" here. I think it'll make some code below easier-to-follow and maintain while offering some flexibility for portability. The downside is higher memory use (few hundred for the IO object if the IO is used, 4 bytes if not on 64-bit). Adding an rd_mark callback to call rb_gc_mark will be required, too. > @@ -47,6 +48,11 @@ static void rd_free(void *ptr) > if (rv != 0) > rb_bug("munmap failed in gc: %s", strerror(errno)); > } > + if (r->fd != -1) { > + int rv = close(r->fd); > + if (rv != 0) > + rb_bug("close failed in gc: %s", strerror(errno)); > + } The using an IO here would eliminate this block. Regardless, whenever possible and reasonable, favor comparisons against zero instead of -1 since compilers can generate smaller code. Pedantically it's not POSIX in some cases, but glibc does it, at least. Also, follow existing indentation conventions (this applies to any project(*)). The C code here is mostly formatted the same way as git.git or the Linux kernel, so it uses hard tabs. (*) I have deal with indentation styles which annoy me for other projects, too. > xfree(ptr); > } > @@ -88,34 +94,74 @@ static struct raindrops *get(VALUE self) > > /* > * call-seq: > - * Raindrops.new(size) -> raindrops object > + * Raindrops.new(size, name: nil) -> raindrops object > * > * Initializes a Raindrops object to hold +size+ counters. +size+ is > * only a hint and the actual number of counters the object has is > * dependent on the CPU model, number of cores, and page size of > * the machine. The actual size of the object will always be equal > * or greater than the specified +size+. > + * If +name+ is provided, and the platform is supported, the raindrop > + * memory region will be backed by a memfd object with the provided > + * name, so that it can be shared with other, non-child processes. > */ > -static VALUE init(VALUE self, VALUE size) > +static VALUE init(int argc, VALUE *argv, VALUE self) > { > struct raindrops *r = DATA_PTR(self); > int tries = 1; > size_t tmp; > + VALUE size; > + VALUE kwargs_hash; > + ID kwargs_ids[1] = { rb_intern_const("name") }; That may be C99-specific, and I think Ruby itself only went to C99 in 2.7 or 3.x? In any case, I don't think it's appropriate to introduce C99 at this time. (I like using C99 in other projects, too, but I don't think it's appropriate for us, just yet). > + VALUE kwargs[1]; > + VALUE name; > > if (r->drops != MAP_FAILED) > rb_raise(rb_eRuntimeError, "already initialized"); > > + rb_scan_args(argc, argv, "1:", &size, &kwargs_hash); > + rb_get_kwargs(kwargs_hash, kwargs_ids, 0, 1, kwargs); rb_get_kwargs is 2.2+ only, isn't it? I suggest using a Ruby wrapper and rename the C method. Since you mentioned you were using this for unicorn, that's still Ruby 1.9.3-compatible and will be 2.0+ for the next release. I'm OK with bumping raindrops to Ruby 2.0+, for now; but the requirement should be <= the Ruby version used by its users (e.g. unicorn). Fwiw, some other popular gems I keep an eye on (e.g. `sequel', `mail') tend to be conservative when it comes to Ruby version compatibility, too. For portability to non-Linux systems, I wonder if: Raindrops.new(size, io: Tempfile.new('foo')) can also be supported. I can give this a shot on a FreeBSD VM I sometimes have access to... And then maybe drop support for `name:' entirely: Raindrops.new(size, io: Raindrops::Memfd.new('foo')) > + name = kwargs[0]; > + > r->size = NUM2SIZET(size); > if (r->size < 1) > rb_raise(rb_eArgError, "size must be >= 1"); > > + if (name != Qundef && name != Qnil) { RTEST() reads more easily, and covers Qfalse. > +#ifdef HAVE_MEMFD_CREATE > + r->fd = memfd_create(StringValueCStr(name), MFD_CLOEXEC); > + if (r->fd == -1) { > + int err = errno; > + if (err == ENOSYS) { > + rb_raise(rb_eRuntimeError, "system does not support memfd_create"); > + } else { > + rb_raise(rb_eRuntimeError, "error calling memfd_create: %s", strerror(err)); > + } I don't think the special case for ENOSYS is necessary. And rb_sys_fail("memfd_create") is sufficient (it matches the rb_sys_fail(funcname) pattern used in Ruby code. Something like: int fd = memfd_create(StringValueCStr(name), MFD_CLOEXEC); if (fd < 0) rb_sys_fail("memfd_create"); rd->io = rb_funcall(klass, rb_intern("for_fd"), 1, INT2NUM(fd)); > + } > +#else > + rb_raise(rb_eRuntimeError, "extension not compiled with memfd_create"); > +#endif > + } else { > + r->fd = -1; > + } > + > tmp = PAGE_ALIGN(raindrop_size * r->size); > r->capa = tmp / raindrop_size; > assert(PAGE_ALIGN(raindrop_size * r->capa) == tmp && "not aligned"); > > retry: > - r->drops = mmap(NULL, tmp, > - PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0); > + if (r->fd != -1) { > + if (ftruncate(r->fd, tmp) == -1) { > + r->drops = MAP_FAILED; > + } else { > + r->drops = mmap(NULL, tmp, > + PROT_READ|PROT_WRITE, MAP_SHARED, r->fd, 0); > + } > + } else { > + r->drops = mmap(NULL, tmp, > + PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0); > + } > + > if (r->drops == MAP_FAILED) { > int err = errno; > > @@ -153,6 +199,10 @@ static void resize(struct raindrops *r, size_t new_rd_size) > if (r->pid != getpid()) > rb_raise(rb_eRuntimeError, "cannot mremap() from child"); > > + if (r->fd != -1) { > + rb_raise(rb_eRuntimeError, "resize not implemented with named raindrops"); > + } > + > rv = mremap(old_address, old_size, new_size, MREMAP_MAYMOVE); > if (rv == MAP_FAILED) { > int err = errno; > @@ -213,6 +263,18 @@ static VALUE capa(VALUE self) > return SIZET2NUM(get(self)->capa); > } > > +/* > + * call-seq: > + * rd.fd -> Integer > + * > + * Returns the file descriptor number associated with this Raindrop, if > + * it was created with a name. > + */ > +static VALUE fd(VALUE self) > +{ > + return INT2NUM(get(self)->fd); > +} If we go with "VALUE io" as I suggested above, I suggest a "to_io" method. That would allow compatibility with other APIs (e.g. IO.select), and access to IO#stat, IO#close_on_exec=, IO#ioctl, IO#fileno, and any number of potentially useful methods. I've never used memfd before, but the mmap-ed regions are usable after a regular file FD is closed. So an FD-starved process could use rd.to_io.close and continue using the mmap-ed section as normal. Regardless, naming a method "fd" is unnecessary, the function/method called "fileno" means the same thing across C stdio.h, Ruby, Perl, and likely other languages. > /* > * call-seq: > * rd.dup -> rd_copy > @@ -223,8 +285,9 @@ static VALUE init_copy(VALUE dest, VALUE source) > { > struct raindrops *dst = DATA_PTR(dest); > struct raindrops *src = get(source); > + VALUE init_argv[1] = { SIZET2NUM(src->size) }; > > - init(dest, SIZET2NUM(src->size)); > + init(1, init_argv, dest); > memcpy(dst->drops, src->drops, raindrop_size * src->size); Same comments as before wrt. C compiler and Ruby compatibility. > return dest; > @@ -372,6 +435,11 @@ static VALUE evaporate_bang(VALUE self) > r->drops = MAP_FAILED; > if (munmap(addr, raindrop_size * r->capa) != 0) > rb_sys_fail("munmap"); > + if (r->fd != -1) { > + if (close(r->fd) != 0) > + rb_sys_fail("close"); > + r->fd = -1; > + } This could just use rb_io_close. No need to comment on the rest... (or I'm too sleepy atm and missed something :x) Anyways thanks again for bringing this up. Let us know if you want to tackle handling my suggestions or if you want to wait for me to do it.