[maint] batman-adv: fix lockdep splat when doing mcast_free

Message ID 1448901241-30844-1-git-send-email-sw@simonwunderlich.de (mailing list archive)
State Accepted, archived
Headers

Commit Message

Simon Wunderlich Nov. 30, 2015, 4:34 p.m. UTC
  From: Simon Wunderlich <simon@open-mesh.com>

While testing, we got something like this:

WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]()
[...]
Call Trace:
[<ffffffff815fc597>] dump_stack+0x4b/0x64
[<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120
[<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
[<ffffffff810b3705>] warn_slowpath_null+0x15/0x20
[<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
[<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv]
[<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv]
[<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv]
[...]

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
---
Changes to PATCH:
 * rebase on maint, add actual splat
---
 net/batman-adv/multicast.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Linus Lüssing Dec. 7, 2015, 10:39 p.m. UTC | #1
On Mon, Nov 30, 2015 at 05:34:01PM +0100, Simon Wunderlich wrote:
> diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c
> index eb76386..75fa501 100644
> --- a/net/batman-adv/multicast.c
> +++ b/net/batman-adv/multicast.c
> @@ -802,7 +802,9 @@ void batadv_mcast_free(struct batadv_priv *bat_priv)
>  	batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_MCAST, 1);
>  	batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST, 1);
>  
> +	spin_lock_bh(&bat_priv->tt.commit_lock);
>  	batadv_mcast_mla_tt_retract(bat_priv, NULL);
> +	spin_unlock_bh(&bat_priv->tt.commit_lock);
>  }

See comments in the other thread before applying.
  
Linus Lüssing Dec. 19, 2015, 8:04 a.m. UTC | #2
> From: Simon Wunderlich <simon@open-mesh.com>
> 
> While testing, we got something like this:
> 
> WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]()
> [...]
> Call Trace:
> [<ffffffff815fc597>] dump_stack+0x4b/0x64
> [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120
> [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
> [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20
> [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
> [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv]
> [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv]
> [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv]
> [...]
> 
> Signed-off-by: Simon Wunderlich <simon@open-mesh.com>

Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep")
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
  
Sven Eckelmann Dec. 19, 2015, 9:32 a.m. UTC | #3
On Saturday 19 December 2015 09:04:05 Linus Lüssing wrote:
> > From: Simon Wunderlich <simon@open-mesh.com>
> > 
> > While testing, we got something like this:
> > 
> > WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]()
> > [...]
> > Call Trace:
> > [<ffffffff815fc597>] dump_stack+0x4b/0x64
> > [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120
> > [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
> > [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20
> > [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
> > [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv]
> > [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv]
> > [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv]
> > [...]
> > 
> > Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
> 
> Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep")
> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>

I am confused why this should fix a commit which is there to find such problems.

Kind regards,
	Sven
  
Antonio Quartulli Dec. 19, 2015, 11:09 a.m. UTC | #4
On 19/12/15 17:32, Sven Eckelmann wrote:
> On Saturday 19 December 2015 09:04:05 Linus Lüssing wrote:
>>> From: Simon Wunderlich <simon@open-mesh.com>
>>>
>>> While testing, we got something like this:
>>>
>>> WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]()
>>> [...]
>>> Call Trace:
>>> [<ffffffff815fc597>] dump_stack+0x4b/0x64
>>> [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120
>>> [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
>>> [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20
>>> [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
>>> [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv]
>>> [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv]
>>> [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv]
>>> [...]
>>>
>>> Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
>>
>> Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with external lock via lockdep")
>> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
> 
> I am confused why this should fix a commit which is there to find such problems.

Am I wrong or also the commit message is misleading? Looks like it is
"fixing" a bogus lockdep check...but as far as I can say this patch is
making sure that the lock is held in that particular point.

If we want to "fix the lockdep" then we should remove the
lockdep_assert_held(), but I don't think this is the goal of the whole
discussion.

Cheers,
  
Linus Lüssing Dec. 19, 2015, 10:14 p.m. UTC | #5
On Sat, Dec 19, 2015 at 07:09:02PM +0800, Antonio Quartulli wrote:
> Am I wrong or also the commit message is misleading? Looks like it is
> "fixing" a bogus lockdep check...

Yes and no. The lockdep checks placed are not bogus as is. They
are fine for batadv_iv_ogm_schedule() vs. batadv_update_mtu(), so
no need to remove it.

> but as far as I can say this patch is
> making sure that the lock is held in that particular point.

There is no locking issue from batadv_mcast_free() here as it is
in the freeing phase of the soft-iface. The soft-iface is supposed
to be out-of-scope for anything else by now. This patch therefore
can't be a fix for a race regarding mcast.mla_list.

However it does fix an issue with a lockdep warning. Which
makes it suitable for stable to at least remove that splat for
v4.3 users.

Regards, Linus
  
Sven Eckelmann Dec. 19, 2015, 10:19 p.m. UTC | #6
On Saturday 19 December 2015 19:09:02 Antonio Quartulli wrote:
> On 19/12/15 17:32, Sven Eckelmann wrote:
> > On Saturday 19 December 2015 09:04:05 Linus Lüssing wrote:
> >>> From: Simon Wunderlich <simon@open-mesh.com>
> >>> 
> >>> While testing, we got something like this:
> >>> 
> >>> WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142
> >>> batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...]
> >>> Call Trace:
> >>> [<ffffffff815fc597>] dump_stack+0x4b/0x64
> >>> [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120
> >>> [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205
> >>> [batman_adv]
> >>> [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20
> >>> [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
> >>> [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv]
> >>> [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv]
> >>> [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv]
> >>> [...]
> >>> 
> >>> Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
> >> 
> >> Fixes: 5b95c427d187 ("batman-adv: Annotate deleting functions with
> >> external lock via lockdep") Acked-by: Linus Lüssing
> >> <linus.luessing@c0d3.blue>
> > 
> > I am confused why this should fix a commit which is there to find such
> > problems.
> Am I wrong or also the commit message is misleading? Looks like it is
> "fixing" a bogus lockdep check...but as far as I can say this patch is
> making sure that the lock is held in that particular point.

No, it doesn't say this anywhere. It says that it fixes the "lockdep splat"

> If we want to "fix the lockdep" then we should remove the
> lockdep_assert_held(), but I don't think this is the goal of the whole
> discussion.

No, the lockdep check should not be removed because a lock is necessary for 
other situations when this list is modified as explained earlier [1]. You have 
multiple concurrent calling contexts for this function (hard_if_event with 
something like mtu update + something like batadv_iv_ogm_schedule) which 
require exclusive access to this list. Right now it is just protected by a 
weird "named" lock (from the translation table and not from the mcast).

And no, the solution here is not to add a special codepath which (for no sane 
reason) requires no locks or synchronized frees - even when this seemed to 
have become the new trend in the batman-adv codebase.

Kind regards,
	Sven

[1] https://lists.open-mesh.org/pipermail/b.a.t.m.a.n/2015-December/013876.html
  
Marek Lindner Dec. 30, 2015, noon UTC | #7
On Monday, November 30, 2015 17:34:01 Simon Wunderlich wrote:
> From: Simon Wunderlich <simon@open-mesh.com>
> 
> While testing, we got something like this:
> 
> WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142
> batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...]
> Call Trace:
> [<ffffffff815fc597>] dump_stack+0x4b/0x64
> [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120
> [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
> [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20
> [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]
> [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv]
> [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv]
> [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv]
> [...]
> 
> Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
> ---
> Changes to PATCH:
>  * rebase on maint, add actual splat
> ---
>  net/batman-adv/multicast.c | 2 ++
>  1 file changed, 2 insertions(+)

Applied in revision 025b743.

Thanks,
Marek
  

Patch

diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c
index eb76386..75fa501 100644
--- a/net/batman-adv/multicast.c
+++ b/net/batman-adv/multicast.c
@@ -802,7 +802,9 @@  void batadv_mcast_free(struct batadv_priv *bat_priv)
 	batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_MCAST, 1);
 	batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST, 1);
 
+	spin_lock_bh(&bat_priv->tt.commit_lock);
 	batadv_mcast_mla_tt_retract(bat_priv, NULL);
+	spin_unlock_bh(&bat_priv->tt.commit_lock);
 }
 
 /**