linux-um.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Rob Landley <rob@landley.net>
To: Andy Lutomirski <luto@amacapital.net>
Cc: "Askar Safin" <safinaskar@zohomail.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Christian Brauner" <brauner@kernel.org>,
	"Al Viro" <viro@zeniv.linux.org.uk>, "Jan Kara" <jack@suse.cz>,
	"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
	"Andy Shevchenko" <andy.shevchenko@gmail.com>,
	"Aleksa Sarai" <cyphar@cyphar.com>,
	"Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
	"Julian Stecklina" <julian.stecklina@cyberus-technology.de>,
	"Gao Xiang" <hsiangkao@linux.alibaba.com>,
	"Art Nikpal" <email2tema@gmail.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Eric Curtin" <ecurtin@redhat.com>,
	"Alexander Graf" <graf@amazon.com>,
	"Lennart Poettering" <mzxreary@0pointer.de>,
	linux-arch@vger.kernel.org, linux-alpha@vger.kernel.org,
	linux-snps-arc@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org,
	linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev,
	linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org,
	linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-um@lists.infradead.org,
	x86@kernel.org, "Ingo Molnar" <mingo@redhat.com>,
	linux-block@vger.kernel.org, initramfs@vger.kernel.org,
	linux-api@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-efi@vger.kernel.org, linux-ext4@vger.kernel.org,
	"Theodore Y . Ts'o" <tytso@mit.edu>,
	linux-acpi@vger.kernel.org, "Michal Simek" <monstr@monstr.eu>,
	devicetree@vger.kernel.org,
	"Luis Chamberlain" <mcgrof@kernel.org>,
	"Kees Cook" <kees@kernel.org>,
	"Thorsten Blum" <thorsten.blum@linux.dev>,
	"Heiko Carstens" <hca@linux.ibm.com>,
	patches@lists.linux.dev
Subject: Re: [PATCH 00/62] initrd: remove classic initrd support
Date: Thu, 18 Sep 2025 13:10:22 -0500	[thread overview]
Message-ID: <94023988-8498-4070-bdb7-6758dbe4b91d@landley.net> (raw)
In-Reply-To: <CALCETrXHxOkHoS+0zhvc4cfpZqJ0wpfQUDnXW-A-qyQkqur-DQ@mail.gmail.com>

On 9/17/25 13:00, Andy Lutomirski wrote:
> On Mon, Sep 15, 2025 at 10:09 AM Rob Landley <rob@landley.net> wrote:
> 
>> While you're at it, could you fix static/builtin initramfs so PID 1 has
>> a valid stdin/stdout/stderr?
>>
>> A static initramfs won't create /dev/console if the embedded initramfs
>> image doesn't contain it, which a non-root build can't mknod, so the
>> kernel plumbing won't see it dev in the directory we point it at unless
>> we build with root access.
> 
> I have no current insight as to whether there's a kernel issue here,

They fixed the behavior in one codepath. They left it broken in the 
other codepath. The kernel's behavior is inconsistent.

Look:

$ mkdir sub; cc --static -xc - <<<'int main() {puts("hello\n");if 
(fork()) reboot(0x01234567); for(;;);}' -o sub/init
$ (cd sub; cpio -o -H newc <<<init | gzip) > sub.cpio.gz
$ make allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n <<<'PANIC_TIMEOUT=1 
RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT SERIAL_8250 
SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER' | sed 
's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot 
-append console=ttyS0 -initrd sub.cpio.gz

You get a "hello" output near the end there. (You can add "quiet" to the 
-append but given that qemu can't NOT output its bios spam there's not 
much point.)

Now add INITRAMFS_SOURCE="sub" to the config and remove -initrd 
sub.cpio.gz from the qemu invocation:

$ make clean allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n 
<<<'PANIC_TIMEOUT=1 RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT 
SERIAL_8250 SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER 
INITRAMFS_SOURCE="sub"' | sed 's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot 
-append 'console=ttyS0'

No "hello" output, but it DOES shut down cleanly instead of giving you a 
panic trace so you know it ran the init binary.

All that changed was statically linking the initramfs instead of feeding 
it in through the initrd mechanism: the kernel behaves differently in 
those two codepaths, as I explained in the message you replied to.

(The above instructions assume an x86-64 host toolchain, poke me if you 
want arm64 instead...)

> but why are you trying to put actual device nodes in an actual
> filesystem as part of a build process?

I'm not. Doing that would require root access on the build machine to 
mknod in "sub" directory above. I build new images WITHOUT root access 
on the host.

There used to be a way to feed a the kernel config a text file listing 
what to make in the cpio file instead of just pointing it at a 
directory, and my old Aboriginal Linux build used that mechanism 
(generating such a file by hand, borrowing the kernel infrastructure but 
driving it manually) 15 years ago:

https://landley.net/aboriginal/about.html

https://github.com/landley/aboriginal/blob/master/sources/functions.sh#L403

But kernel commit 469e87e89fd6 broke that mechanism because somebody 
dunning-krugered it away ("I don't understand why we need this therefore 
nobody needs it"). I had a patch to unbreak it for a while:

https://landley.net/bin/mkroot/0.8.10/linux-patches/0011-gen_init_cpio-regression.patch

But as with so many patches, lkml wasn't interested. (I mostly post them 
so when copyright trolls try to rattle sabers I can point to an lkml web 
archive entry that got ignored, and explain precisely HOW much bad PR 
they're in for when they proceed.)

And again: you ONLY need this for static initramfs. Dynamic initramfs 
has code create /dev/console (at boot time, not build time):

https://github.com/torvalds/linux/blob/v6.16/init/noinitramfs.c#L27

That code ONLY gets called for the external initrd loader, it does NOT 
get called when a static initramfs image built into the kernel has a 
runnable /init. This is an inconsistency in the kernel behavior, which 
is what I'm objecting to.

> It's extremely straightforward
> to emit devices nodes in cpio format, and IMO it's far *more*
> straightforward to do that than to make a whole directory, try to get
> all the modes right, and cpio it up.

You mean like commit 595a22acee26 from 2017?

> I wrote an absolutely trivial tool for this several years ago:
> 
> https://github.com/amluto/virtme/blob/master/virtme/cpiowriter.py

Let's see, I wrote the initramfs documentation in 2005:

https://lwn.net/Articles/157676/

Was already correcting kernel developers on how it actually worked 
(rather than theoretically worked) in 2006:

https://lkml.iu.edu/hypermail//linux/kernel/0603.2/2760.html

I added tmpfs support to it in 2013 (because nobody else had bothered 
for EIGHT YEARS):

https://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html

I've maintained my own cpio implementation in toybox for over a decade:

https://github.com/landley/toybox/commit/a2d558151a63

The successor to aboriginal (above) is a 400 line bash script that 
builds a dozen archtectures that each boot to a shell prompt in qemu:

https://github.com/landley/toybox/blob/master/mkroot/mkroot.sh
https://landley.net/bin/mkroot/latest/

With automated regression test infrastructure to boot them all under 
qemu and confirm that it runs, the clocks are set right, the network 
works, and it can read from -hda:

https://github.com/landley/toybox/blob/master/mkroot/testroot.sh

So yes I _can_ create my own bespoke C program to modify the file in 
arbitrary ways, I have my reasons not to do that, and have thought about 
them for a while now.

> it would be barely more complicated to strip the trailer off an cpio
> file from some other source, add some device nodes, and stick the
> trailer back on.

So you're unaware that the kernel accepts concatenated archives, and you 
can just cat together two cpio.gz files and they'll extract. (In gzip 
anyway, I haven't tested the other compression formats. That's why I 
needed to do https://github.com/landley/toybox/commit/dafb9211c777 and 
95a15d238120 by the way.)

The problem is there's no portable existing userspace tool to create a 
cpio archive from non-filesystem data. Partly because there WAS a 
mechanism built into the kernel... until that guy broke it in 2020. When 
I'm making a squashfs I've got the -p option (presumably modeled on what 
the kernel used to do before it broke), but the host cpio hasn't got a 
way to specify that and adding my own bespoke format to toybox... I'm 
still trying to get 
https://lists.gnu.org/archive/html/coreutils/2023-08/msg00009.html into 
coreutils. (Alas lkml isn't the only 30 year old community that's gotten 
stiff and hard of hearing.)

I could emit cpio contents with xxd -r from a HERE document hexdump or 
something to append to the generated file, but xxd isn't installed by 
default on debian and echo \x is WAY ugly, and "here's a giant hex dump 
you're not expected to understand" isn't really something I want to add 
to an otherwise understandable build. Writing, building, and running my 
own bespoke tool in C to do it isn't really an improvement over the hexdump.

The kernel ALMOST already does this. The code just needs to be 
refactored a bit, preferably so there aren't two codepaths each with 
half the testing.

> But it's also really, really, really easy to emit an
> entire, functioning cpio-formatted initramfs from plain user code with
> no filesystem manipulation at all.  This also makes that portion of
> the build reproducible, which is worth quite a bit IMO.

Sigh. When I started working on reproducible builds they weren't called 
that yet, but I don't think digging for more links would help here. I 
did do a rollup of what I'm trying to accomplish 5 years ago though 
http://lists.landley.net/pipermail/toybox-landley.net/2020-July/011898.html 
and long long ago, there was https://landley.net/aboriginal/history.html 
and...

Query: is your "plain user code" built with "cc"? Do you reliably have a 
"cc" link, or do you need to explicitly say "gcc" or "clang"? The kernel 
needs to do the latter for some reason, and my patch to GET to the 
kernel to at least _try_ "cc" before falling back to the others was 
explicitly rejected...

> --Andy

Rob


  reply	other threads:[~2025-09-18 18:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-12 22:38 [PATCH 00/62] initrd: remove classic initrd support Askar Safin
2025-09-12 22:38 ` [PATCH 01/62] init: remove deprecated "load_ramdisk" command line parameter, which does nothing Askar Safin
2025-09-12 22:38 ` [PATCH 02/62] init: remove deprecated "prompt_ramdisk" " Askar Safin
2025-09-12 22:38 ` [PATCH 03/62] init: sh, sparc, x86: remove unused constants RAMDISK_PROMPT_FLAG and RAMDISK_LOAD_FLAG Askar Safin
2025-09-15 16:43 ` [PATCH 00/62] initrd: remove classic initrd support Rob Landley
2025-09-17 18:00   ` Andy Lutomirski
2025-09-18 18:10     ` Rob Landley [this message]
2025-10-06  6:19   ` Askar Safin
2025-09-16  1:48 ` Dave Young

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94023988-8498-4070-bdb7-6758dbe4b91d@landley.net \
    --to=rob@landley.net \
    --cc=akpm@linux-foundation.org \
    --cc=andy.shevchenko@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=cyphar@cyphar.com \
    --cc=devicetree@vger.kernel.org \
    --cc=ecurtin@redhat.com \
    --cc=email2tema@gmail.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=initramfs@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=julian.stecklina@cyberus-technology.de \
    --cc=kees@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-csky@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-hexagon@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-openrisc@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux-snps-arc@lists.infradead.org \
    --cc=linux-um@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=luto@amacapital.net \
    --cc=mcgrof@kernel.org \
    --cc=mingo@redhat.com \
    --cc=monstr@monstr.eu \
    --cc=mzxreary@0pointer.de \
    --cc=patches@lists.linux.dev \
    --cc=safinaskar@zohomail.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=thomas.weissschuh@linutronix.de \
    --cc=thorsten.blum@linux.dev \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).