gfs2.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Salvatore Bonaccorso <carnil@debian.org>
To: Alexander Aring <aahringo@redhat.com>
Cc: Jordan Rife <jrife@google.com>,
	Valentin Kleibel <valentin@vrvis.at>,
	David Teigland <teigland@redhat.com>,
	1063338@bugs.debian.org, gfs2@lists.linux.dev,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	gregkh@linuxfoundation.org, regressions@lists.linux.dev
Subject: Re: [regression 6.1.76] dlm: cannot start dlm midcomms -97 after backport of e9cdebbe23f1 ("dlm: use kernel_connect() and kernel_bind()")
Date: Wed, 7 Feb 2024 22:39:09 +0100	[thread overview]
Message-ID: <ZcP4fXEllcCDHyE6@eldamar.lan> (raw)
In-Reply-To: <CAK-6q+hza9yXb5KpBS2VJMNHJa805nXqiYPTovnf9G-JFadBsg@mail.gmail.com>

Hi Alexander,

On Wed, Feb 07, 2024 at 04:27:48PM -0500, Alexander Aring wrote:
> Hi,
> 
> On Wed, Feb 7, 2024 at 1:33 PM Jordan Rife <jrife@google.com> wrote:
> >
> > On Wed, Feb 7, 2024 at 2:39 AM Salvatore Bonaccorso <carnil@debian.org> wrote:
> > >
> > > Hi Valentin, hi all
> > >
> > > [This is about a regression reported in Debian for 6.1.67]
> > >
> > > On Tue, Feb 06, 2024 at 01:00:11PM +0100, Valentin Kleibel wrote:
> > > > Package: linux-image-amd64
> > > > Version: 6.1.76+1
> > > > Source: linux
> > > > Source-Version: 6.1.76+1
> > > > Severity: important
> > > > Control: notfound -1 6.6.15-2
> > > >
> > > > Dear Maintainers,
> > > >
> > > > We discovered a bug affecting dlm that prevents any tcp communications by
> > > > dlm when booted with debian kernel 6.1.76-1.
> > > >
> > > > Dlm startup works (corosync-cpgtool shows the dlm:controld group with all
> > > > expected nodes) but as soon as we try to add a lockspace dmesg shows:
> > > > ```
> > > > dlm: Using TCP for communications
> > > > dlm: cannot start dlm midcomms -97
> > > > ```
> > > >
> > > > It seems that commit "dlm: use kernel_connect() and kernel_bind()"
> > > > (e9cdebbe) was merged to 6.1.
> > > >
> > > > Checking the code it seems that the changed function dlm_tcp_listen_bind()
> > > > fails with exit code 97 (EAFNOSUPPORT)
> > > > It is called from
> > > >
> > > > dlm/lockspace.c: threads_start() -> dlm_midcomms_start()
> > > > dlm/midcomms.c: dlm_midcomms_start() -> dlm_lowcomms_start()
> > > > dlm/lowcomms.c: dlm_lowcomms_start() -> dlm_listen_for_all() ->
> > > > dlm_proto_ops->listen_bind() = dlm_tcp_listen_bind()
> > > >
> > > > The error code is returned all the way to threads_start() where the error
> > > > message is emmitted.
> > > >
> > > > Booting with the unsigned kernel from testing (6.6.15-2), which also
> > > > contains this commit, works without issues.
> > > >
> > > > I'm not sure what additional changes are required to get this working or if
> > > > rolling back this change is an option.
> > > >
> > > > We'd be happy to test patches that might fix this issue.
> > >
> > > Thanks for your report. So we have a 6.1.76 specific regression for
> > > the backport of e9cdebbe23f1 ("dlm: use kernel_connect() and
> > > kernel_bind()") .
> > >
> > > Let's loop in the upstream regression list for tracking and people
> > > involved for the subsystem to see if the issue can be identified. As
> > > it is working for 6.6.15 which includes the commit backport as well it
> > > might be very well that a prerequisite is missing.
> > >
> > > # annotate regression with 6.1.y specific commit
> > > #regzbot ^introduced e11dea8f503341507018b60906c4a9e7332f3663
> > > #regzbot link: https://bugs.debian.org/1063338
> > >
> > > Any ideas?
> > >
> > > Regards,
> > > Salvatore
> >
> >
> > Just a quick look comparing dlm_tcp_listen_bind between the latest 6.1
> > and 6.6 stable branches,
> > it looks like there is a mismatch here with the dlm_local_addr[0] parameter.
> >
> > 6.1
> > ----
> >
> > static int dlm_tcp_listen_bind(struct socket *sock)
> > {
> > int addr_len;
> >
> > /* Bind to our port */
> > make_sockaddr(dlm_local_addr[0], dlm_config.ci_tcp_port, &addr_len);
> > return kernel_bind(sock, (struct sockaddr *)&dlm_local_addr[0],
> >    addr_len);
> > }
> >
> > 6.6
> > ----
> > static int dlm_tcp_listen_bind(struct socket *sock)
> > {
> > int addr_len;
> >
> > /* Bind to our port */
> > make_sockaddr(&dlm_local_addr[0], dlm_config.ci_tcp_port, &addr_len);
> > return kernel_bind(sock, (struct sockaddr *)&dlm_local_addr[0],
> >    addr_len);
> > }
> >
> > 6.6 contains commit c51c9cd8 (fs: dlm: don't put dlm_local_addrs on heap) which
> > changed
> >
> > static struct sockaddr_storage *dlm_local_addr[DLM_MAX_ADDR_COUNT];
> >
> > to
> >
> > static struct sockaddr_storage dlm_local_addr[DLM_MAX_ADDR_COUNT];
> >
> > It looks like kernel_bind() in 6.1 needs to be modified to match.
> >
> 
> makes sense. I tried to cherry-pick e9cdebbe23f1 ("dlm: use
> kernel_connect() and kernel_bind()") on v6.1.67 as I don't see it
> there. It failed and does not apply cleanly.
> 
> Are we talking here about a debian kernel specific backport? If so,
> maybe somebody missed to modify those parts you mentioned.

Thanks all for looking into it.

No it's not a Debian specific backport, e9cdebbe23f1 ("dlm: use
kernel_connect() and kernel_bind()") got in fact backported upstream
in 6.1.76, 6.6.15 and 6.7.3. The respective commits are:

v6.1.76: e11dea8f503341507018b60906c4a9e7332f3663 dlm: use kernel_connect() and kernel_bind()
v6.6.15: c018ab3e31b16ff97b9b95b69904104c9fcca95b dlm: use kernel_connect() and kernel_bind()
v6.7.3: 4ecf1864f2076872b7aea29d463e785ef6fc9909 dlm: use kernel_connect() and kernel_bind()
v6.8-rc1: e9cdebbe23f1aa9a1caea169862f479ab3fa2773 dlm: use kernel_connect() and kernel_bind()

But for the 6.1.76 case there is the above regression (while it works
for 6.6.15 as confirmed by the reporter).

I'm very sorry I see where I have caused you confusion: The regression
is in 6.1.*76* not 6.1.*67* and I misstyped the version in two places.

Regards,
Salvatore

  reply	other threads:[~2024-02-07 21:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <38f51dbb-65aa-4ec2-bed2-e914aef27d25@vrvis.at>
2024-02-07 10:39 ` [regression 6.1.67] dlm: cannot start dlm midcomms -97 after backport of e9cdebbe23f1 ("dlm: use kernel_connect() and kernel_bind()") Salvatore Bonaccorso
2024-02-07 18:33   ` Jordan Rife
2024-02-07 21:27     ` Alexander Aring
2024-02-07 21:39       ` Salvatore Bonaccorso [this message]
2024-02-08 11:37     ` [regression 6.1.76] " Valentin Kleibel
2024-02-08 17:42       ` Jordan Rife
2024-02-08 21:17         ` Jordan Rife
2024-02-09 11:06           ` Valentin Kleibel
2024-02-09 16:28             ` Jordan Rife

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcP4fXEllcCDHyE6@eldamar.lan \
    --to=carnil@debian.org \
    --cc=1063338@bugs.debian.org \
    --cc=aahringo@redhat.com \
    --cc=gfs2@lists.linux.dev \
    --cc=gregkh@linuxfoundation.org \
    --cc=jrife@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=teigland@redhat.com \
    --cc=valentin@vrvis.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).