From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS30031 216.205.24.0/24 X-Spam-Status: No, score=-4.6 required=3.0 tests=AWL,BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from us-smtp-delivery-110.mimecast.com (us-smtp-delivery-110.mimecast.com [216.205.24.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id B15D91F9F4 for ; Mon, 22 Nov 2021 10:13:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zendesk.com; s=mimecast20150210; t=1637576007; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cDxzw73XaG+5fkDqLE95WopO3X8w8yVMSnLbGMGVquU=; b=JeHKCZKhy/Rc1VPwVfaTJOKbcyacUvY21/uL6ijKslyw5Jt7HmjY9Dv6C3MAof8Y1upv8Y zkclfUMTpJUzgU1ylmtJ2FtfdiESxNVS6F6MC0BurKXS995CqwkYIgqdRRASMBVbFLrfBa H4/CFk9OwQtXQn+c2rTZ86rO29ppc9Q= Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-503-2Hu9f0iKOi-RjSXhPBQ1eQ-1; Mon, 22 Nov 2021 05:13:26 -0500 X-MC-Unique: 2Hu9f0iKOi-RjSXhPBQ1eQ-1 Received: by mail-pf1-f200.google.com with SMTP id 184-20020a6217c1000000b0049f9aad0040so9602010pfx.21 for ; Mon, 22 Nov 2021 02:13:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=cDxzw73XaG+5fkDqLE95WopO3X8w8yVMSnLbGMGVquU=; b=dEu2ZNO3sfSWQK29JHy9HuPdOku203+v5axvplhvLdrHUWgjYw/VyemDDlN3NpNZXz mXwjCzXrM6pNFlG1zEpeTgcc6P3CL5FE2zwA1E8bObGBrjpL4J2rrNmahYGppFAGoB1M RgHRggSTJiAS1CxV4VMdkfD3l86g/AnmWGM6adDz2GAhhDH15KV5Y2JOK0JWKWWulnXQ uAYflQlj1oAbrlJFuReApeFR4Uj58jtRw+PRu47NAjrkWODLrRNzhVMz9mspTm8z5K+P b09vQUM4uVkDlZjYt04QLSAoxFt1L60jsigij4547mrYKMQBJi2sR6nFhwnIKtNhJjwn 5MjQ== X-Gm-Message-State: AOAM531Isq/na97iSK2xl8RYhg33pOs6CSJmj4lh3Ye3Qnt4tK78b5W4 59wf/6jpMEDS2g4D9Ly9yQHlI5MkLxHHdq4x12HxEv6RNJzLHIWBN2cSmHL3KA7LG+fkEVSzWaa tWfhFgDjV+y4lU3V9YsvEktk4SDC/+O0a45Y= X-Received: by 2002:a17:902:9303:b0:143:d6c7:babc with SMTP id bc3-20020a170902930300b00143d6c7babcmr61239177plb.58.1637576005102; Mon, 22 Nov 2021 02:13:25 -0800 (PST) X-Google-Smtp-Source: ABdhPJyIrdO0B93bRXde+oMPmk1bEoj31q3VjhpBQTs3DBvXPrM5adiourn75TDaGodQCnCRJ5FbX7Kn/xGiNQGqr0E= X-Received: by 2002:a17:902:9303:b0:143:d6c7:babc with SMTP id bc3-20020a170902930300b00143d6c7babcmr61239115plb.58.1637576004598; Mon, 22 Nov 2021 02:13:24 -0800 (PST) MIME-Version: 1.0 References: <20211122010336.43463-1-ktsanaktsidis@zendesk.com> <20211122084200.GA10182@dcvr> In-Reply-To: <20211122084200.GA10182@dcvr> From: KJ Tsanaktsidis Date: Mon, 22 Nov 2021 21:13:13 +1100 Message-ID: Subject: Re: [PATCH] Allow Raindrops objects to be backed by a memfd file To: Eric Wong Cc: raindrops-public@yhbt.net Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA10A63 smtp.mailfrom=ktsanaktsidis@zendesk.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: zendesk.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: On Mon, Nov 22, 2021 at 7:48 PM Eric Wong wrote: > > KJ Tsanaktsidis wrote: > > My use-case for this feature is that I want to collect memory statistic= s > > of a Unicorn master process in a way that does not itself cause any > > allocations. This is both because that would bias the measurement, but > > more importantly because we very tightly control when the GC runs in ou= r > > Unicorn masters and any garbage created by high-frequency polling of > > GC.stat would potentially live for a long time. > > > > With this solution, we can simply store values from rb_objspace into a > > Raindrops counter directly, and read the values out-of-process for > > submission to our metrics collection system. > > Thanks for the description and justification. I will accept > this feature, but some tweaks to the patch will be needed. > If this is too much for you, I'll try to take care of > adjustments (but still give you credit) sometime this week. > > See comments inline below... > > > diff --git a/ext/raindrops/extconf.rb b/ext/raindrops/extconf.rb > > index 792e509..1ddcdf2 100644 > > --- a/ext/raindrops/extconf.rb > > +++ b/ext/raindrops/extconf.rb > > @@ -7,6 +7,7 @@ > > > > $CPPFLAGS +=3D " -D_GNU_SOURCE " > > have_func('mremap', 'sys/mman.h') > > +have_func('memfd_create', 'sys/mman.h') > > headers =3D %w(sys/types.h netdb.h string.h sys/socket.h netinet/in.h) > > if have_header('linux/tcp.h') > > headers << 'linux/tcp.h' > > diff --git a/ext/raindrops/raindrops.c b/ext/raindrops/raindrops.c > > index 837084c..c9bf8c8 100644 > > --- a/ext/raindrops/raindrops.c > > +++ b/ext/raindrops/raindrops.c > > @@ -34,6 +34,7 @@ struct raindrops { > > size_t size; > > size_t capa; > > pid_t pid; > > + int fd; > > struct raindrop *drops; > > }; > > I suggest using "VALUE io" here. I think it'll make some > code below easier-to-follow and maintain while offering > some flexibility for portability. > > The downside is higher memory use (few hundred for the IO object > if the IO is used, 4 bytes if not on 64-bit). Adding an rd_mark > callback to call rb_gc_mark will be required, too. > > > @@ -47,6 +48,11 @@ static void rd_free(void *ptr) > > if (rv !=3D 0) > > rb_bug("munmap failed in gc: %s", strerror(errno)= ); > > } > > + if (r->fd !=3D -1) { > > + int rv =3D close(r->fd); > > + if (rv !=3D 0) > > + rb_bug("close failed in gc: %s", strerror(errno)); > > + } > > The using an IO here would eliminate this block. > > Regardless, whenever possible and reasonable, favor comparisons > against zero instead of -1 since compilers can generate smaller > code. Pedantically it's not POSIX in some cases, but glibc does > it, at least. > > Also, follow existing indentation conventions (this applies to > any project(*)). The C code here is mostly formatted the same way > as git.git or the Linux kernel, so it uses hard tabs. > > (*) I have deal with indentation styles which annoy me for other > projects, too. > > > xfree(ptr); > > } > > @@ -88,34 +94,74 @@ static struct raindrops *get(VALUE self) > > > > /* > > * call-seq: > > - * Raindrops.new(size) -> raindrops object > > + * Raindrops.new(size, name: nil) -> raindrops object > > * > > * Initializes a Raindrops object to hold +size+ counters. +size+ is > > * only a hint and the actual number of counters the object has is > > * dependent on the CPU model, number of cores, and page size of > > * the machine. The actual size of the object will always be equal > > * or greater than the specified +size+. > > + * If +name+ is provided, and the platform is supported, the raindrop > > + * memory region will be backed by a memfd object with the provided > > + * name, so that it can be shared with other, non-child processes. > > */ > > -static VALUE init(VALUE self, VALUE size) > > +static VALUE init(int argc, VALUE *argv, VALUE self) > > { > > struct raindrops *r =3D DATA_PTR(self); > > int tries =3D 1; > > size_t tmp; > > + VALUE size; > > + VALUE kwargs_hash; > > + ID kwargs_ids[1] =3D { rb_intern_const("name") }; > > That may be C99-specific, and I think Ruby itself only went to > C99 in 2.7 or 3.x? In any case, I don't think it's appropriate > to introduce C99 at this time. > > (I like using C99 in other projects, too, but I don't think it's > appropriate for us, just yet). > > > + VALUE kwargs[1]; > > + VALUE name; > > > > if (r->drops !=3D MAP_FAILED) > > rb_raise(rb_eRuntimeError, "already initialized"); > > > > + rb_scan_args(argc, argv, "1:", &size, &kwargs_hash); > > + rb_get_kwargs(kwargs_hash, kwargs_ids, 0, 1, kwargs); > > rb_get_kwargs is 2.2+ only, isn't it? > > I suggest using a Ruby wrapper and rename the C method. > > Since you mentioned you were using this for unicorn, that's still > Ruby 1.9.3-compatible and will be 2.0+ for the next release. > > I'm OK with bumping raindrops to Ruby 2.0+, for now; but the > requirement should be <=3D the Ruby version used by its users > (e.g. unicorn). > > Fwiw, some other popular gems I keep an eye on (e.g. `sequel', > `mail') tend to be conservative when it comes to Ruby version > compatibility, too. > > For portability to non-Linux systems, I wonder if: > > Raindrops.new(size, io: Tempfile.new('foo')) > > can also be supported. I can give this a shot on a FreeBSD VM > I sometimes have access to... > > And then maybe drop support for `name:' entirely: > > Raindrops.new(size, io: Raindrops::Memfd.new('foo')) > > > + name =3D kwargs[0]; > > + > > r->size =3D NUM2SIZET(size); > > if (r->size < 1) > > rb_raise(rb_eArgError, "size must be >=3D 1"); > > > > + if (name !=3D Qundef && name !=3D Qnil) { > > RTEST() reads more easily, and covers Qfalse. > > > +#ifdef HAVE_MEMFD_CREATE > > + r->fd =3D memfd_create(StringValueCStr(name), MFD_CLOEXEC); > > + if (r->fd =3D=3D -1) { > > + int err =3D errno; > > + if (err =3D=3D ENOSYS) { > > + rb_raise(rb_eRuntimeError, "system does not support memfd_crea= te"); > > + } else { > > + rb_raise(rb_eRuntimeError, "error calling memfd_create: %s", s= trerror(err)); > > + } > > I don't think the special case for ENOSYS is necessary. And > rb_sys_fail("memfd_create") is sufficient (it matches the > rb_sys_fail(funcname) pattern used in Ruby code. > > Something like: > > int fd =3D memfd_create(StringValueCStr(name), MFD_CLOEXEC); > if (fd < 0) > rb_sys_fail("memfd_create"); > rd->io =3D rb_funcall(klass, rb_intern("for_fd"), 1, INT2NUM(fd))= ; > > > + } > > +#else > > + rb_raise(rb_eRuntimeError, "extension not compiled with memfd_crea= te"); > > +#endif > > + } else { > > + r->fd =3D -1; > > + } > > + > > tmp =3D PAGE_ALIGN(raindrop_size * r->size); > > r->capa =3D tmp / raindrop_size; > > assert(PAGE_ALIGN(raindrop_size * r->capa) =3D=3D tmp && "not ali= gned"); > > > > retry: > > - r->drops =3D mmap(NULL, tmp, > > - PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0)= ; > > + if (r->fd !=3D -1) { > > + if (ftruncate(r->fd, tmp) =3D=3D -1) { > > + r->drops =3D MAP_FAILED; > > + } else { > > + r->drops =3D mmap(NULL, tmp, > > + PROT_READ|PROT_WRITE, MAP_SHARED, r->fd, 0); > > + } > > + } else { > > + r->drops =3D mmap(NULL, tmp, > > + PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED, -1, 0)= ; > > + } > > + > > if (r->drops =3D=3D MAP_FAILED) { > > int err =3D errno; > > > > @@ -153,6 +199,10 @@ static void resize(struct raindrops *r, size_t new= _rd_size) > > if (r->pid !=3D getpid()) > > rb_raise(rb_eRuntimeError, "cannot mremap() from child"); > > > > + if (r->fd !=3D -1) { > > + rb_raise(rb_eRuntimeError, "resize not implemented with named r= aindrops"); > > + } > > + > > rv =3D mremap(old_address, old_size, new_size, MREMAP_MAYMOVE); > > if (rv =3D=3D MAP_FAILED) { > > int err =3D errno; > > @@ -213,6 +263,18 @@ static VALUE capa(VALUE self) > > return SIZET2NUM(get(self)->capa); > > } > > > > +/* > > + * call-seq: > > + * rd.fd -> Integer > > + * > > + * Returns the file descriptor number associated with this Raindrop, i= f > > + * it was created with a name. > > + */ > > +static VALUE fd(VALUE self) > > +{ > > + return INT2NUM(get(self)->fd); > > +} > > If we go with "VALUE io" as I suggested above, I suggest a > "to_io" method. That would allow compatibility with other APIs > (e.g. IO.select), and access to IO#stat, IO#close_on_exec=3D, > IO#ioctl, IO#fileno, and any number of potentially useful > methods. > > I've never used memfd before, but the mmap-ed regions are usable > after a regular file FD is closed. So an FD-starved process > could use rd.to_io.close and continue using the mmap-ed section > as normal. > > Regardless, naming a method "fd" is unnecessary, the > function/method called "fileno" means the same thing across > C stdio.h, Ruby, Perl, and likely other languages. > > > /* > > * call-seq: > > * rd.dup -> rd_copy > > @@ -223,8 +285,9 @@ static VALUE init_copy(VALUE dest, VALUE source) > > { > > struct raindrops *dst =3D DATA_PTR(dest); > > struct raindrops *src =3D get(source); > > + VALUE init_argv[1] =3D { SIZET2NUM(src->size) }; > > > > - init(dest, SIZET2NUM(src->size)); > > + init(1, init_argv, dest); > > memcpy(dst->drops, src->drops, raindrop_size * src->size); > > Same comments as before wrt. C compiler and Ruby compatibility. > > > return dest; > > @@ -372,6 +435,11 @@ static VALUE evaporate_bang(VALUE self) > > r->drops =3D MAP_FAILED; > > if (munmap(addr, raindrop_size * r->capa) !=3D 0) > > rb_sys_fail("munmap"); > > + if (r->fd !=3D -1) { > > + if (close(r->fd) !=3D 0) > > + rb_sys_fail("close"); > > + r->fd =3D -1; > > + } > > This could just use rb_io_close. No need to comment on the > rest... (or I'm too sleepy atm and missed something :x) > > Anyways thanks again for bringing this up. Let us know if you > want to tackle handling my suggestions or if you want to wait > for me to do it. > I think I committed the two cardinal sins of mailing list etiquette at once - replied off list and in HTML /facepalm. Copying my reply in here again.... sorry! -------------------------------------------------------------- Hi Eric, Thank you for the really detailed feedback on this patch! I haven=E2=80=99t= really written much in the way of Ruby C extensions before, so it=E2=80=99s really valuable to me to get your thoughts. I=E2=80=99m definitely happy to send another version of this later this wee= k. At a high level, I think your idea of allowing any file backing, rather tha= n just memfd, makes sense - both for portability and also because it=E2=80=99= s easier to share an actually linked file between processes in eg Kubernetes. Anyway - I=E2=80=99ll be in touch soon with another patch, thanks!