All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Wilck <martin.wilck@suse.com>
To: "bmarzins@redhat.com" <bmarzins@redhat.com>
Cc: "dm-devel@redhat.com" <dm-devel@redhat.com>
Subject: Re: [dm-devel] [PATCH 3/5] multipathd: make ev_remove_path return success on path removal
Date: Wed, 12 May 2021 20:36:49 +0000	[thread overview]
Message-ID: <7e6fb44f90e6088f53a41396a7e210cd3009d469.camel@suse.com> (raw)
In-Reply-To: <20210512195305.GE25887@octiron.msp.redhat.com>

On Wed, 2021-05-12 at 14:53 -0500, Benjamin Marzinski wrote:
> On Wed, May 12, 2021 at 11:38:08AM +0000, Martin Wilck wrote:
> > On Tue, 2021-05-11 at 18:22 -0500, Benjamin Marzinski wrote:
> > > When ev_remove_path() returns success, callers assume that the
> > > path
> > > (and
> > > possibly the map) has been removed.  When ev_remove_path()
> > > returns
> > > failure, callers assume that the path has not been removed.
> > > However,
> > > the
> > > path could be removed on both success or failure. This could
> > > cause
> > > callers to dereference the path after it was removed. Change
> > > ev_remove_path() to return success whenever the path is removed,
> > > even
> > > if
> > > the map was removed due to a failure when trying to reload it.
> > > Found by
> > > coverity.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > 
> > This looks ok, but I'd like to discuss it in some more depth.
> > 
> > We need to clarify the meaning of the return code of
> > ev_remove_path().
> > We guarantee that, when ev_remove_path() returns, either the path
> > is
> > removed from internal data structures and kernel maps, or
> > INIT_REMOVED
> > is set. We can't guarantee whether the path
> > 
> >  a) is not referenced any more by any kernel map,
> >  b) was actually removed from pathvec and other data structures in
> > multipathd.
> > 
> > We have to agree on whether it means a) or b) if we can't make
> > these
> > two cases equivalent. Assuming multipathd has correct information
> > about
> > the kernel mappings, the only difference between a) and b) is the
> > unlikely failure in setup_multipath(), where a) is true and b) is
> > not
> > (because sync_map_state() won't be called). Your patch assumes the
> > semantics of a), which is correct AFAICS, but should be more
> > clearly
> > documented.
> 
> Well, actually because of wait_for_udev and !need_do_map, a
> successful
> return can still leave the kernel maps and internal structures
> unchanged. It's just that callers have to assume that b is the case.
>  
> > Actually, I think we can fix the discrepancy between a) and b) - if
> > domap() was successful, we should be able to orphan the path, even
> > if
> > update_multipath_strings() failed for some reason.
> 
> I'm pretty sure that this is already the case.  This comment
> 
>  /*
>   * Successful map reload without this path:
>   * sync_map_state() will free it.
>   */
> 
> is a lie.

Indeed, you are right. I wasn't deliberately lying though, just failing
to understand my own code :-(  We should fix these comments.

>  If setup_multipath() succeeds, the path will get removed by
> check_removed_paths() via:
> 
> __setup_multipath -> update_path_strings -> sync_paths ->
> check_removed_paths
> 
> If setup_multipath() fails, the path will get removed by
> remove_map_and_stop_waiter() via:
> 
> __setup_multipath -> remove_map_and_stop_waiter -> remove_map ->
> orphan_paths
> 
> So AFAICS, the only way for a path not to get removed is if you
> succeed
> with wait_for_udev or !need_do_map, or if you fail in domap.

Agreed. Let's fix these comments.

> > IMO we should consequently change the retval for the cases where
> > ev_remove_path() returns without deleting the path for a non-
> > "failure"
> > case (wait_for_udev and !need_do_map).
> 
> So you think these should return failure? 

What I meant is that the return code of the function doesn't need to be
interpreted in terms of "success" or "failure". Rather as "path is now
gone" or "path is still referenced somewhere", which doesn't map 1:1 to
"success" and "failure", IMO.

> For need_do_map, I think we
> would want to distinguish between cases where everything worked
> correctly and we're just waiting to update the map, and cases where
> something went wrong.

This one is actually trivial, because it's only set to false by
uev_remove_path() if it's merging uevents.

>  Since wait_for_udev can happen in more situations,
> it's a lot harder to say what the right answer is. For cli_add_path
> and
> uev_add_path, it seems like we want to know if the path was really
> removed. So returning failure there makes sense.  For cli_del_path
> and
> uev_remove_path, it seems like we want to avoid spurious error
> messages
> when everything went alright and we're just waiting to update the
> map.
> So returning success makes sense there.
> 
> Perhaps the answer is to return symbolic values, to describe what
> actually happened, rather than success or failure.

This is what I meant. I didn't express myself clearly enough; I just
thought that 0 doesn't have to mean "success".


>  They could either be
> bitmask values or we could have helper functions to help checking
> for multiple valid return values.

I think the callers just need to know if the path is still referenced
somewhere. Acting appropriately is then up to the caller. You just
proved that my cases a) and b) are actually equivalent, which is nice.
Perhaps we need to introduce another return code indicating that the
entire map had been removed (e.g. failure in setup_multipath()).

> > Thoughts? Whatever we decide wrt the semantics of the return code,
> > we
> > should document it clearly in comments at the function header.
> > 
> > Here's a quick review of callers:
> > 
> >  - cli_add_path(): AFAICS this needs b) semantics. We shouldn't set
> > up
> > a new path unless it had been successfully removed internally.
> >  - cli_del_path(): needs a) semantics.
> >  - handle_path_wwid_change(): needs a).
> >  - uev_add_path(): needs a).
> >  - uev_remove_path(): the return code of ev_remove_path doesn't
> > matter
> > much here. This is the only caller that sets need_do_map = false.
> >  - uev_update_path(): we currently don't look at the return code.
> > uev_add_path() will make another attempt at removing the path if
> > necessary.
> > 
> > Regards
> > Martin
> > 
> > P.S.: The remaining failure cases in ev_remove_path() are the
> > failures
> > in update_mpp_paths() and setup_map(). The former can only fail
> > with
> > ENOMEM, afaics. The latter, likewise, or if the map is
> > fundamentally
> > misconfigured (which to me means that a previous call to
> > setup_map()
> > would have failed as well). I'm wondering why we remove the entire
> > map
> > in these failure cases. This goes back all the way to 45eb316 
> > ("[multipathd] DM configuration final cut") from 2005. It's true
> > that
> > both failures are pretty much fatal, but why is removing the map
> > the
> > answer here?
> 
> I don't think it has to be the answer. There are other cases where
> setup_map() fails and we don't automatically wipe the map.  I did
> consider changing it when I was looking through ev_remove_path(), but
> I've never known this code to cause any issues, and as you mention,
> it isn't wrong to do so, just not really necessary AFAICS.

Let's take care of this another time.

Regards,
Martin

> 
> > However, this goes beyond the purpose of your patch. *If* we remove
> > the
> > map, returning 0 is correct for either a) or b).
> > 
> > P.S. 2: I wonder if the logic in uev_update_path() is correct.
> > Rather
> > than calling uev_add_path() after rescan_path() directly, I think
> > we
> > should rather wait for another uevent (and possibly trigger another
> > "add" event, I don't think "rescan" automatically generates one).
> > 
> 
> Yep. You're correct. I'll fix that.
> 
> -Ben
> 
> > 
> > > ---
> > >  multipathd/main.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > index 6090434c..4bdf14bd 100644
> > > --- a/multipathd/main.c
> > > +++ b/multipathd/main.c
> > > @@ -1284,7 +1284,7 @@ ev_remove_path (struct path *pp, struct
> > > vectors *
> > > vecs, int need_do_map)
> > >  
> > >                         strlcpy(devt, pp->dev_t, sizeof(devt));
> > >                         if (setup_multipath(vecs, mpp))
> > > -                               return 1;
> > > +                               return 0;
> > >                         /*
> > >                          * Successful map reload without this
> > > path:
> > >                          * sync_map_state() will free it.
> > > @@ -1304,8 +1304,10 @@ out:
> > >         return retval;
> > >  
> > >  fail:
> > > +       condlog(0, "%s: error removing path. removing map %s",
> > > pp->dev,
> > > +               mpp->alias);
> > >         remove_map_and_stop_waiter(mpp, vecs);
> > > -       return 1;
> > > +       return 0;
> > >  }
> > >  
> > >  static int
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
> 


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  reply	other threads:[~2021-05-12 20:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-11 23:21 [dm-devel] [PATCH 0/5] Memory issues found by coverity Benjamin Marzinski
2021-05-11 23:22 ` [dm-devel] [PATCH 1/5] multipathd: don't fail to remove path once the map is removed Benjamin Marzinski
2021-05-12  9:11   ` Martin Wilck
2021-05-12 14:17     ` Benjamin Marzinski
2021-05-11 23:22 ` [dm-devel] [PATCH 2/5] multipathd: remove duplicate orphan_paths in flush_map Benjamin Marzinski
2021-05-12  9:16   ` Martin Wilck
2021-05-11 23:22 ` [dm-devel] [PATCH 3/5] multipathd: make ev_remove_path return success on path removal Benjamin Marzinski
2021-05-12 11:38   ` Martin Wilck
2021-05-12 19:53     ` Benjamin Marzinski
2021-05-12 20:36       ` Martin Wilck [this message]
2021-05-12 21:52         ` Benjamin Marzinski
2021-05-13 19:36           ` Martin Wilck
2021-05-11 23:22 ` [dm-devel] [PATCH 4/5] multipath: free vectors in configure Benjamin Marzinski
2021-05-12 12:36   ` Martin Wilck
2021-05-12 19:53     ` Benjamin Marzinski
2021-05-11 23:22 ` [dm-devel] [PATCH 5/5] kpartx: Don't leak memory when getblock returns NULL Benjamin Marzinski
2021-05-12 12:39   ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e6fb44f90e6088f53a41396a7e210cd3009d469.camel@suse.com \
    --to=martin.wilck@suse.com \
    --cc=bmarzins@redhat.com \
    --cc=dm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.