gfs2.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Alexander Aring <aahringo@redhat.com>
To: "Valentin Vidić" <vvidic@valentin-vidic.from.hr>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>,
	ocfs2-devel@oss.oracle.com,  David Teigland <teigland@redhat.com>,
	gfs2@lists.linux.dev, ocfs2-devel@lists.linux.dev
Subject: Re: ocfs2 mount error
Date: Mon, 11 Mar 2024 19:43:41 -0400	[thread overview]
Message-ID: <CAK-6q+j2UDh0mi0KUTdtsRE8pj4+tuw_SU+yjzjsR3GCLURz=w@mail.gmail.com> (raw)
In-Reply-To: <Ze9bIiT3rAWTtPJp@valentin-vidic.from.hr>

Hi,

On Mon, Mar 11, 2024 at 3:27 PM Valentin Vidić
<vvidic@valentin-vidic.from.hr> wrote:
>
> On Mon, Mar 11, 2024 at 11:20:56AM -0400, Alexander Aring wrote:
> > when I try doing your steps (btw: thanks for sharing) to run
> > fsck.ocfs2 it tells me:
> >
> > fsck.ocfs2: Unable to access cluster service while initializing the DLM
> >
> > Looking into strace right before that I see a:
> >
> > connect(4, {sa_family=AF_UNIX, sun_path=@"ocfs2_controld_sock"}, 22) =
> > -1 ECONNREFUSED (Connection refused)
> >
> > it's true I don't have ocfs2_controld running, after looking into
> > ocfs2-tools source code, I declare it as not compilable with recent
> > corosync versions.
>
> Not sure about that, because I also don't have ocfs2_controld running.
> This is ocfs2-tools v1.8.8 with corosync v3.1.7. Perhaps cluster stack
> should be set with:
>
>   echo pcmk > /sys/fs/ocfs2/cluster_stack
>

I step forward here, but still I am not able to set a cluster name. I
hardcoded it in the kernel and "it seems" to work. How do you set the
cluster name? It seems that setting the cluster name in mkfs is not
the way to go.
I am able to run fsck and I see on "dlm_tool ls" a lockspace with a
similar name you have.

The bad news is, I can't reproduce it. After fsck the lockspace
releases successfully.

I see on your debug information that it uses dlm user space locks, in
my case I am using dlm kernel locks and I am asking myself why. Can
you tell me more about your cluster configuration?

Somehow in your case fsck is using the DLM user space locks
functionality from libdlm (the user space DLM locking library) but in
my case fsck goes over the ocfs2_fsdlm filesystem and then does DLM
kernel locks.
I am asking myself how this is supposed to work.

> > btw: a "trace-cmd stream -e dlm" would be nice to debug this issue.
> > After I look at your debugging information it would be nice to see if
> > the device() read for ast callbacks are there.
>
> Here is a trace report for the fsck call:
>
> cpus=1
>       fsck.ocfs2-1696  [000]    54.621075: dlm_lock_start:       ls_id=2352585329 lkb_id=1 mode=PR flags=VALBLK res_name=2e39323539656662313866316162386331
>       fsck.ocfs2-1696  [000]    54.692487: dlm_lock_end:         ls_id=2352585329 lkb_id=1 mode=PR flags=VALBLK error=0 res_name=2e39323539656662313866316162386331
>       fsck.ocfs2-1696  [000]    54.695447: dlm_ast:              ls_id=2352585329 lkb_id=1 sb_flags= sb_status=0 res_name=2e39323539656662313866316162386331
>       fsck.ocfs2-1696  [000]    54.695636: dlm_lock_start:       ls_id=2352585329 lkb_id=2 mode=EX flags=NOQUEUE|VALBLK res_name=53303030303030303030303030303030303030303030323030303030303030
>       fsck.ocfs2-1696  [000]    54.766959: dlm_lock_end:         ls_id=2352585329 lkb_id=2 mode=EX flags=NOQUEUE|VALBLK error=0 res_name=53303030303030303030303030303030303030303030323030303030303030
>       fsck.ocfs2-1696  [000]    54.770765: dlm_ast:              ls_id=2352585329 lkb_id=2 sb_flags= sb_status=0 res_name=53303030303030303030303030303030303030303030323030303030303030
>       fsck.ocfs2-1696  [000]    54.770821: dlm_lock_start:       ls_id=2352585329 lkb_id=3 mode=EX flags=NOQUEUE|VALBLK res_name=4d303030303030303030303030303030303030303031383937333737316531
>       fsck.ocfs2-1696  [000]    54.845613: dlm_lock_end:         ls_id=2352585329 lkb_id=3 mode=EX flags=NOQUEUE|VALBLK error=0 res_name=4d303030303030303030303030303030303030303031383937333737316531
>       fsck.ocfs2-1696  [000]    54.848395: dlm_ast:              ls_id=2352585329 lkb_id=3 sb_flags= sb_status=0 res_name=4d303030303030303030303030303030303030303031383937333737316531
>       fsck.ocfs2-1696  [000]    54.868951: dlm_unlock_start:     ls_id=2352585329 lkb_id=3 flags=VALBLK res_name=4d303030303030303030303030303030303030303031383937333737316531
>       fsck.ocfs2-1696  [000]    54.958319: dlm_unlock_end:       ls_id=2352585329 lkb_id=3 flags=VALBLK error=0 res_name=4d303030303030303030303030303030303030303031383937333737316531
>       fsck.ocfs2-1696  [000]    54.982397: dlm_ast:              ls_id=2352585329 lkb_id=3 sb_flags= sb_status=-65538 res_name=4d303030303030303030303030303030303030303031383937333737316531
>       fsck.ocfs2-1696  [000]    54.982450: dlm_lock_start:       ls_id=2352585329 lkb_id=4 mode=EX flags=NOQUEUE|VALBLK res_name=4d303030303030303030303030303030303030303031393937333737316531
>       fsck.ocfs2-1696  [000]    55.056390: dlm_lock_end:         ls_id=2352585329 lkb_id=4 mode=EX flags=NOQUEUE|VALBLK error=0 res_name=4d303030303030303030303030303030303030303031393937333737316531
>       fsck.ocfs2-1696  [000]    55.058946: dlm_ast:              ls_id=2352585329 lkb_id=4 sb_flags= sb_status=0 res_name=4d303030303030303030303030303030303030303031393937333737316531
>       fsck.ocfs2-1696  [000]    55.082546: dlm_unlock_start:     ls_id=2352585329 lkb_id=4 flags=VALBLK res_name=4d303030303030303030303030303030303030303031393937333737316531
>       fsck.ocfs2-1696  [000]    55.182903: dlm_unlock_end:       ls_id=2352585329 lkb_id=4 flags=VALBLK error=0 res_name=4d303030303030303030303030303030303030303031393937333737316531
>       fsck.ocfs2-1696  [000]    55.210128: dlm_ast:              ls_id=2352585329 lkb_id=4 sb_flags= sb_status=-65538 res_name=4d303030303030303030303030303030303030303031393937333737316531
>       fsck.ocfs2-1696  [000]    55.530687: dlm_unlock_start:     ls_id=2352585329 lkb_id=2 flags=VALBLK res_name=53303030303030303030303030303030303030303030323030303030303030
>       fsck.ocfs2-1696  [000]    55.625015: dlm_unlock_end:       ls_id=2352585329 lkb_id=2 flags=VALBLK error=0 res_name=53303030303030303030303030303030303030303030323030303030303030
>       fsck.ocfs2-1696  [000]    55.650054: dlm_ast:              ls_id=2352585329 lkb_id=2 sb_flags= sb_status=-65538 res_name=53303030303030303030303030303030303030303030323030303030303030
>       fsck.ocfs2-1696  [000]    55.672033: dlm_unlock_start:     ls_id=2352585329 lkb_id=1 flags=VALBLK res_name=2e39323539656662313866316162386331
>       fsck.ocfs2-1696  [000]    55.766844: dlm_unlock_end:       ls_id=2352585329 lkb_id=1 flags=VALBLK error=0 res_name=2e39323539656662313866316162386331
>       fsck.ocfs2-1696  [000]    55.791829: dlm_ast:              ls_id=2352585329 lkb_id=1 sb_flags= sb_status=-65538 res_name=2e39323539656662313866316162386331
>

looks okay to me. For dlm user locks there is currently an issue with
lvbarea, see [0]. Not sure if this is why you have such issues right
now, it depends what fsck is doing with the lvb.

- Alex

[0] https://lore.kernel.org/gfs2/CAK-6q+h-RngJqJ6Rx2h4a0Qy1j+ZRvCn5E-3Qee7_P3pWXJonQ@mail.gmail.com/T/#ma861c86ab54991a0d9a2742c6983544289f3bdd0


  reply	other threads:[~2024-03-11 23:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-10 21:46 ocfs2 mount error Valentin Vidić
2024-03-11  8:37 ` Joseph Qi
2024-03-11  9:02   ` Valentin Vidić
2024-03-11  9:27     ` Joseph Qi
2024-03-11 12:28       ` Heming Zhao
2024-03-11 15:20     ` Alexander Aring
2024-03-11 19:27       ` Valentin Vidić
2024-03-11 23:43         ` Alexander Aring [this message]
2024-03-12  8:09           ` Valentin Vidić
2024-03-12 17:55             ` Alexander Aring
2024-03-12 18:37               ` Valentin Vidić
2024-03-12 19:55                 ` Alexander Aring
2024-03-12 20:03                   ` Alexander Aring
2024-03-12 20:42                   ` Valentin Vidić

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK-6q+j2UDh0mi0KUTdtsRE8pj4+tuw_SU+yjzjsR3GCLURz=w@mail.gmail.com' \
    --to=aahringo@redhat.com \
    --cc=gfs2@lists.linux.dev \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=ocfs2-devel@lists.linux.dev \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=teigland@redhat.com \
    --cc=vvidic@valentin-vidic.from.hr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).