Message ID | 1448901241-30844-1-git-send-email-sw@simonwunderlich.de (mailing list archive) |
---|---|
State | Accepted, archived |
Headers |
Return-Path: <sw@simonwunderlich.de> Received: from mail.mail.packetmixer.de (packetmixer.de [79.140.42.25]) by open-mesh.org (Postfix) with ESMTPS id AA79E8058E for <b.a.t.m.a.n@lists.open-mesh.org>; Mon, 30 Nov 2015 17:34:04 +0100 (CET) Received: from kero.packetmixer.de (unknown [IPv6:2a02:3100:2600:2700:221:ccff:fe73:b665]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.mail.packetmixer.de (Postfix) with ESMTPSA id DCBA662159; Mon, 30 Nov 2015 17:34:04 +0100 (CET) From: Simon Wunderlich <sw@simonwunderlich.de> To: b.a.t.m.a.n@lists.open-mesh.org Date: Mon, 30 Nov 2015 17:34:01 +0100 Message-Id: <1448901241-30844-1-git-send-email-sw@simonwunderlich.de> X-Mailer: git-send-email 2.6.2 Cc: Simon Wunderlich <simon@open-mesh.com> Subject: [B.A.T.M.A.N.] [PATCH-maint] batman-adv: fix lockdep splat when doing mcast_free X-BeenThere: b.a.t.m.a.n@lists.open-mesh.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n.lists.open-mesh.org> List-Unsubscribe: <https://lists.open-mesh.org/mm/options/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=unsubscribe> List-Archive: <http://lists.open-mesh.org/pipermail/b.a.t.m.a.n/> List-Post: <mailto:b.a.t.m.a.n@lists.open-mesh.org> List-Help: <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=help> List-Subscribe: <https://lists.open-mesh.org/mm/listinfo/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=subscribe> X-List-Received-Date: Mon, 30 Nov 2015 16:34:04 -0000 |
Commit Message
Simon Wunderlich
Nov. 30, 2015, 4:34 p.m. UTC
From: Simon Wunderlich <simon@open-mesh.com> While testing, we got something like this: WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] Call Trace: [<ffffffff815fc597>] dump_stack+0x4b/0x64 [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] [...] Signed-off-by: Simon Wunderlich <simon@open-mesh.com> --- Changes to PATCH: * rebase on maint, add actual splat --- net/batman-adv/multicast.c | 2 ++ 1 file changed, 2 insertions(+)
Comments
On Mon, Nov 30, 2015 at 05:34:01PM +0100, Simon Wunderlich wrote: > diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c > index eb76386..75fa501 100644 > --- a/net/batman-adv/multicast.c > +++ b/net/batman-adv/multicast.c > @@ -802,7 +802,9 @@ void batadv_mcast_free(struct batadv_priv *bat_priv) > batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_MCAST, 1); > batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST, 1); > > + spin_lock_bh(&bat_priv->tt.commit_lock); > batadv_mcast_mla_tt_retract(bat_priv, NULL); > + spin_unlock_bh(&bat_priv->tt.commit_lock); > } See comments in the other thread before applying.
> From: Simon Wunderlich <simon@open-mesh.com> > > While testing, we got something like this: > > WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() > [...] > Call Trace: > [<ffffffff815fc597>] dump_stack+0x4b/0x64 > [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 > [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] > [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 > [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] > [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] > [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] > [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] > [...] > > Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep") Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
On Saturday 19 December 2015 09:04:05 Linus Lüssing wrote: > > From: Simon Wunderlich <simon@open-mesh.com> > > > > While testing, we got something like this: > > > > WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() > > [...] > > Call Trace: > > [<ffffffff815fc597>] dump_stack+0x4b/0x64 > > [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 > > [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] > > [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 > > [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] > > [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] > > [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] > > [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] > > [...] > > > > Signed-off-by: Simon Wunderlich <simon@open-mesh.com> > > Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep") > Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> I am confused why this should fix a commit which is there to find such problems. Kind regards, Sven
On 19/12/15 17:32, Sven Eckelmann wrote: > On Saturday 19 December 2015 09:04:05 Linus Lüssing wrote: >>> From: Simon Wunderlich <simon@open-mesh.com> >>> >>> While testing, we got something like this: >>> >>> WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() >>> [...] >>> Call Trace: >>> [<ffffffff815fc597>] dump_stack+0x4b/0x64 >>> [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 >>> [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] >>> [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 >>> [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] >>> [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] >>> [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] >>> [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] >>> [...] >>> >>> Signed-off-by: Simon Wunderlich <simon@open-mesh.com> >> >> Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep") >> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> > > I am confused why this should fix a commit which is there to find such problems. Am I wrong or also the commit message is misleading? Looks like it is "fixing" a bogus lockdep check...but as far as I can say this patch is making sure that the lock is held in that particular point. If we want to "fix the lockdep" then we should remove the lockdep_assert_held(), but I don't think this is the goal of the whole discussion. Cheers,
On Sat, Dec 19, 2015 at 07:09:02PM +0800, Antonio Quartulli wrote: > Am I wrong or also the commit message is misleading? Looks like it is > "fixing" a bogus lockdep check... Yes and no. The lockdep checks placed are not bogus as is. They are fine for batadv_iv_ogm_schedule() vs. batadv_update_mtu(), so no need to remove it. > but as far as I can say this patch is > making sure that the lock is held in that particular point. There is no locking issue from batadv_mcast_free() here as it is in the freeing phase of the soft-iface. The soft-iface is supposed to be out-of-scope for anything else by now. This patch therefore can't be a fix for a race regarding mcast.mla_list. However it does fix an issue with a lockdep warning. Which makes it suitable for stable to at least remove that splat for v4.3 users. Regards, Linus
On Saturday 19 December 2015 19:09:02 Antonio Quartulli wrote: > On 19/12/15 17:32, Sven Eckelmann wrote: > > On Saturday 19 December 2015 09:04:05 Linus Lüssing wrote: > >>> From: Simon Wunderlich <simon@open-mesh.com> > >>> > >>> While testing, we got something like this: > >>> > >>> WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 > >>> batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] > >>> Call Trace: > >>> [<ffffffff815fc597>] dump_stack+0x4b/0x64 > >>> [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 > >>> [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 > >>> [batman_adv] > >>> [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 > >>> [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] > >>> [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] > >>> [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] > >>> [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] > >>> [...] > >>> > >>> Signed-off-by: Simon Wunderlich <simon@open-mesh.com> > >> > >> Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with > >> external lock via lockdep") Acked-by: Linus Lüssing > >> <linus.luessing@c0d3.blue> > > > > I am confused why this should fix a commit which is there to find such > > problems. > Am I wrong or also the commit message is misleading? Looks like it is > "fixing" a bogus lockdep check...but as far as I can say this patch is > making sure that the lock is held in that particular point. No, it doesn't say this anywhere. It says that it fixes the "lockdep splat" > If we want to "fix the lockdep" then we should remove the > lockdep_assert_held(), but I don't think this is the goal of the whole > discussion. No, the lockdep check should not be removed because a lock is necessary for other situations when this list is modified as explained earlier [1]. You have multiple concurrent calling contexts for this function (hard_if_event with something like mtu update + something like batadv_iv_ogm_schedule) which require exclusive access to this list. Right now it is just protected by a weird "named" lock (from the translation table and not from the mcast). And no, the solution here is not to add a special codepath which (for no sane reason) requires no locks or synchronized frees - even when this seemed to have become the new trend in the batman-adv codebase. Kind regards, Sven [1] https://lists.open-mesh.org/pipermail/b.a.t.m.a.n/2015-December/013876.html
On Monday, November 30, 2015 17:34:01 Simon Wunderlich wrote: > From: Simon Wunderlich <simon@open-mesh.com> > > While testing, we got something like this: > > WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 > batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] > Call Trace: > [<ffffffff815fc597>] dump_stack+0x4b/0x64 > [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 > [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] > [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 > [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] > [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] > [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] > [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] > [...] > > Signed-off-by: Simon Wunderlich <simon@open-mesh.com> > --- > Changes to PATCH: > * rebase on maint, add actual splat > --- > net/batman-adv/multicast.c | 2 ++ > 1 file changed, 2 insertions(+) Applied in revision 025b743. Thanks, Marek
diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index eb76386..75fa501 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -802,7 +802,9 @@ void batadv_mcast_free(struct batadv_priv *bat_priv) batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_MCAST, 1); batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST, 1); + spin_lock_bh(&bat_priv->tt.commit_lock); batadv_mcast_mla_tt_retract(bat_priv, NULL); + spin_unlock_bh(&bat_priv->tt.commit_lock); } /**