From: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
To: Eric Wong <e@80x24.org>
Cc: raindrops-public@yhbt.net
Subject: Re: [PATCH] Allow Raindrops objects to be backed by a memfd file
Date: Mon, 22 Nov 2021 21:13:13 +1100 [thread overview]
Message-ID: <CAJ7wOOt20nZaeT-UM=cB5MXb_nogCZSOuNEYRdR73N8AFqfBng@mail.gmail.com> (raw)
In-Reply-To: <20211122084200.GA10182@dcvr>
On Mon, Nov 22, 2021 at 7:48 PM Eric Wong <e@80x24.org> wrote:
>
> KJ Tsanaktsidis <ktsanaktsidis@zendesk.com> wrote:
> > My use-case for this feature is that I want to collect memory statistics
> > of a Unicorn master process in a way that does not itself cause any
> > allocations. This is both because that would bias the measurement, but
> > more importantly because we very tightly control when the GC runs in our
> > Unicorn masters and any garbage created by high-frequency polling of
> > GC.stat would potentially live for a long time.
> >
> > With this solution, we can simply store values from rb_objspace into a
> > Raindrops counter directly, and read the values out-of-process for
> > submission to our metrics collection system.
>
> Thanks for the description and justification. I will accept
> this feature, but some tweaks to the patch will be needed.
> If this is too much for you, I'll try to take care of
> adjustments (but still give you credit) sometime this week.
>
> See comments inline below...
>
> > diff --git a/ext/raindrops/extconf.rb b/ext/raindrops/extconf.rb
> > index 792e509..1ddcdf2 100644
> > --- a/ext/raindrops/extconf.rb
> > +++ b/ext/raindrops/extconf.rb
> > @@ -7,6 +7,7 @@
> >
> > $CPPFLAGS += " -D_GNU_SOURCE "
> > have_func('mremap', 'sys/mman.h')
> > +have_func('memfd_create', 'sys/mman.h')
> > headers = %w(sys/types.h netdb.h string.h sys/socket.h netinet/in.h)
> > if have_header('linux/tcp.h')
> > headers << 'linux/tcp.h'
> > diff --git a/ext/raindrops/raindrops.c b/ext/raindrops/raindrops.c
> > index 837084c..c9bf8c8 100644
> > --- a/ext/raindrops/raindrops.c
> > +++ b/ext/raindrops/raindrops.c
> > @@ -34,6 +34,7 @@ struct raindrops {
> > size_t size;
> > size_t capa;
> > pid_t pid;
> > + int fd;
> > struct raindrop *drops;
> > };
>
> I suggest using "VALUE io" here. I think it'll make some
> code below easier-to-follow and maintain while offering
> some flexibility for portability.
>
> The downside is higher memory use (few hundred for the IO object
> if the IO is used, 4 bytes if not on 64-bit). Adding an rd_mark
> callback to call rb_gc_mark will be required, too.
>
> > @@ -47,6 +48,11 @@ static void rd_free(void *ptr)
> > if (rv != 0)
> > rb_bug("munmap failed in gc: %s", strerror(errno));
> > }
> > + if (r->fd != -1) {
> > + int rv = close(r->fd);
> > + if (rv != 0)
> > + rb_bug("close failed in gc: %s", strerror(errno));
> > + }
>
> The using an IO here would eliminate this block.
>
> Regardless, whenever possible and reasonable, favor comparisons
> against zero instead of -1 since compilers can generate smaller
> code. Pedantically it's not POSIX in some cases, but glibc does
> it, at least.
>
> Also, follow existing indentation conventions (this applies to
> any project(*)). The C code here is mostly formatted the same way
> as git.git or the Linux kernel, so it uses hard tabs.
>
> (*) I have deal with indentation styles which annoy me for other
> projects, too.
>
> > xfree(ptr);
> > }
> > @@ -88,34 +94,74 @@ static struct raindrops *get(VALUE self)
> >
> > /*
> > * call-seq:
> > - * Raindrops.new(size) -> raindrops object
> > + * Raindrops.new(size, name: nil) -> raindrops object
> > *
> > * Initializes a Raindrops object to hold +size+ counters. +size+ is
> > * only a hint and the actual number of counters the object has is
> > * dependent on the CPU model, number of cores, and page size of
> > * the machine. The actual size of the object will always be equal
> > * or greater than the specified +size+.
> > + * If +name+ is provided, and the platform is supported, the raindrop
> > + * memory region will be backed by a memfd object with the provided
> > + * name, so that it can be shared with other, non-child processes.
> > */
> > -static VALUE init(VALUE self, VALUE size)
> > +static VALUE init(int argc, VALUE *argv, VALUE self)
> > {
> > struct raindrops *r = DATA_PTR(self);
> > int tries = 1;
> > size_t tmp;
> > + VALUE size;
> > + VALUE kwargs_hash;
> > + ID kwargs_ids[1] = { rb_intern_const("name") };
>
> That may be C99-specific, and I think Ruby itself only went to
> C99 in 2.7 or 3.x? In any case, I don't think it's appropriate
> to introduce C99 at this time.
>
> (I like using C99 in other projects, too, but I don't think it's
> appropriate for us, just yet).
>
> > + VALUE kwargs[1];
> > + VALUE name;
> >
> > if (r->drops != MAP_FAILED)
> > rb_raise(rb_eRuntimeError, "already initialized");
> >
> > + rb_scan_args(argc, argv, "1:", &size, &kwargs_hash);
> > + rb_get_kwargs(kwargs_hash, kwargs_ids, 0, 1, kwargs);
>
> rb_get_kwargs is 2.2+ only, isn't it?
>
> I suggest using a Ruby wrapper and rename the C method.
>
> Since you mentioned you were using this for unicorn, that's still
> Ruby 1.9.3-compatible and will be 2.0+ for the next release.
>
> I'm OK with bumping raindrops to Ruby 2.0+, for now; but the
> requirement should be <= the Ruby version used by its users
> (e.g. unicorn).
>
> Fwiw, some other popular gems I keep an eye on (e.g. `sequel',
> `mail') tend to be conservative when it comes to Ruby version
> compatibility, too.
>
> For portability to non-Linux systems, I wonder if:
>
> Raindrops.new(size, io: Tempfile.new('foo'))
>
> can also be supported. I can give this a shot on a FreeBSD VM
> I sometimes have access to...
>
> And then maybe drop support for `name:' entirely:
>
> Raindrops.new(size, io: Raindrops::Memfd.new('foo'))
>
> > + name = kwargs[0];
> > +
> > r->size = NUM2SIZET(size);
> > if (r->size < 1)
> > rb_raise(rb_eArgError, "size must be >= 1");
> >
> > + if (name != Qundef && name != Qnil) {
>
> RTEST() reads more easily, and covers Qfalse.
>
> > +#ifdef HAVE_MEMFD_CREATE
> > + r->fd = memfd_create(StringValueCStr(name), MFD_CLOEXEC);
> > + if (r->fd == -1) {
> > + int err = errno;
> > + if (err == ENOSYS) {
> > + rb_raise(rb_eRuntimeError, "system does not support memfd_create");
> > + } else {
> > + rb_raise(rb_eRuntimeError, "error calling memfd_create: %s", strerror(err));
> > + }
>
> I don't think the special case for ENOSYS is necessary. And
> rb_sys_fail("memfd_create") is sufficient (it matches the
> rb_sys_fail(funcname) pattern used in Ruby code.
>
> Something like:
>
> int fd = memfd_create(StringValueCStr(name), MFD_CLOEXEC);
> if (fd < 0)
> rb_sys_fail("memfd_create");
> rd->io = rb_funcall(klass, rb_intern("for_fd"), 1, INT2NUM(fd));
>
> > + }
> > +#else
> > + rb_raise(rb_eRuntimeError, "extension not compiled with memfd_create");
> > +#endif
> > + } else {
> > + r->fd = -1;
> > + }
> > +
> > tmp = PAGE_ALIGN(raindrop_size * r->size);
> > r->capa = tmp / raindrop_size;
> > assert(PAGE_ALIGN(raindrop_size * r->capa) == tmp && "not aligned");
> >
> > retry:
> > - r->drops = mmap(NULL, tmp,
> > - PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
> > + if (r->fd != -1) {
> > + if (ftruncate(r->fd, tmp) == -1) {
> > + r->drops = MAP_FAILED;
> > + } else {
> > + r->drops = mmap(NULL, tmp,
> > + PROT_READ|PROT_WRITE, MAP_SHARED, r->fd, 0);
> > + }
> > + } else {
> > + r->drops = mmap(NULL, tmp,
> > + PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0);
> > + }
> > +
> > if (r->drops == MAP_FAILED) {
> > int err = errno;
> >
> > @@ -153,6 +199,10 @@ static void resize(struct raindrops *r, size_t new_rd_size)
> > if (r->pid != getpid())
> > rb_raise(rb_eRuntimeError, "cannot mremap() from child");
> >
> > + if (r->fd != -1) {
> > + rb_raise(rb_eRuntimeError, "resize not implemented with named raindrops");
> > + }
> > +
> > rv = mremap(old_address, old_size, new_size, MREMAP_MAYMOVE);
> > if (rv == MAP_FAILED) {
> > int err = errno;
> > @@ -213,6 +263,18 @@ static VALUE capa(VALUE self)
> > return SIZET2NUM(get(self)->capa);
> > }
> >
> > +/*
> > + * call-seq:
> > + * rd.fd -> Integer
> > + *
> > + * Returns the file descriptor number associated with this Raindrop, if
> > + * it was created with a name.
> > + */
> > +static VALUE fd(VALUE self)
> > +{
> > + return INT2NUM(get(self)->fd);
> > +}
>
> If we go with "VALUE io" as I suggested above, I suggest a
> "to_io" method. That would allow compatibility with other APIs
> (e.g. IO.select), and access to IO#stat, IO#close_on_exec=,
> IO#ioctl, IO#fileno, and any number of potentially useful
> methods.
>
> I've never used memfd before, but the mmap-ed regions are usable
> after a regular file FD is closed. So an FD-starved process
> could use rd.to_io.close and continue using the mmap-ed section
> as normal.
>
> Regardless, naming a method "fd" is unnecessary, the
> function/method called "fileno" means the same thing across
> C stdio.h, Ruby, Perl, and likely other languages.
>
> > /*
> > * call-seq:
> > * rd.dup -> rd_copy
> > @@ -223,8 +285,9 @@ static VALUE init_copy(VALUE dest, VALUE source)
> > {
> > struct raindrops *dst = DATA_PTR(dest);
> > struct raindrops *src = get(source);
> > + VALUE init_argv[1] = { SIZET2NUM(src->size) };
> >
> > - init(dest, SIZET2NUM(src->size));
> > + init(1, init_argv, dest);
> > memcpy(dst->drops, src->drops, raindrop_size * src->size);
>
> Same comments as before wrt. C compiler and Ruby compatibility.
>
> > return dest;
> > @@ -372,6 +435,11 @@ static VALUE evaporate_bang(VALUE self)
> > r->drops = MAP_FAILED;
> > if (munmap(addr, raindrop_size * r->capa) != 0)
> > rb_sys_fail("munmap");
> > + if (r->fd != -1) {
> > + if (close(r->fd) != 0)
> > + rb_sys_fail("close");
> > + r->fd = -1;
> > + }
>
> This could just use rb_io_close. No need to comment on the
> rest... (or I'm too sleepy atm and missed something :x)
>
> Anyways thanks again for bringing this up. Let us know if you
> want to tackle handling my suggestions or if you want to wait
> for me to do it.
>
I think I committed the two cardinal sins of mailing list etiquette at
once - replied off list and in HTML /facepalm.
Copying my reply in here again.... sorry!
--------------------------------------------------------------
Hi Eric,
Thank you for the really detailed feedback on this patch! I haven’t really
written much in the way of Ruby C extensions before, so it’s really
valuable to me to get your thoughts.
I’m definitely happy to send another version of this later this week.
At a high level, I think your idea of allowing any file backing, rather than
just memfd, makes sense - both for portability and also because it’s easier
to share an actually linked file between processes in eg Kubernetes.
Anyway - I’ll be in touch soon with another patch, thanks!
next prev parent reply other threads:[~2021-11-22 10:13 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-22 1:03 [PATCH] Allow Raindrops objects to be backed by a memfd file KJ Tsanaktsidis
2021-11-22 8:42 ` Eric Wong
2021-11-22 10:13 ` KJ Tsanaktsidis [this message]
2021-11-22 17:56 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://yhbt.net/raindrops/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJ7wOOt20nZaeT-UM=cB5MXb_nogCZSOuNEYRdR73N8AFqfBng@mail.gmail.com' \
--to=ktsanaktsidis@zendesk.com \
--cc=e@80x24.org \
--cc=raindrops-public@yhbt.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhbt.net/raindrops.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).