Message ID | 1302891882-11246-1-git-send-email-linus.luessing@web.de (mailing list archive) |
---|---|
State | Superseded, archived |
Headers |
Return-Path: <linus.luessing@web.de> Received: from fmmailgate02.web.de (fmmailgate02.web.de [217.72.192.227]) by open-mesh.org (Postfix) with ESMTP id D9FBC1542F5 for <b.a.t.m.a.n@lists.open-mesh.org>; Fri, 15 Apr 2011 20:24:45 +0200 (CEST) Received: from smtp04.web.de ( [172.20.0.225]) by fmmailgate02.web.de (Postfix) with ESMTP id 350AC19C08813 for <b.a.t.m.a.n@lists.open-mesh.org>; Fri, 15 Apr 2011 20:24:44 +0200 (CEST) Received: from [77.6.4.66] (helo=localhost) by smtp04.web.de with asmtp (TLSv1:AES128-SHA:128) (WEB.DE 4.110 #2) id 1QAnhD-0006HP-00; Fri, 15 Apr 2011 20:24:43 +0200 From: =?UTF-8?q?Linus=20L=C3=BCssing?= <linus.luessing@web.de> To: b.a.t.m.a.n@lists.open-mesh.org Date: Fri, 15 Apr 2011 20:24:42 +0200 Message-Id: <1302891882-11246-1-git-send-email-linus.luessing@web.de> X-Mailer: git-send-email 1.7.4.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linus.luessing@web.de X-Sender: linus.luessing@web.de X-Provags-ID: V01U2FsdGVkX1+1rINVWGxar0Xo91qtY8Wqz8+iAhivbt9cbzBO FSDrREQGWzm0UuVZ8XCknKhjpb4Dna5GkThAgqXNKnDWhSzPWk 3Z+fTBAjXhhX5VaW7wdw== Subject: [B.A.T.M.A.N.] [PATCH] batman-adv: Fix crash on module shutdown with multiple ifaces X-BeenThere: b.a.t.m.a.n@lists.open-mesh.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n@lists.open-mesh.org> List-Id: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n.lists.open-mesh.org> List-Unsubscribe: <https://lists.open-mesh.org/mm/options/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=unsubscribe> List-Archive: <http://lists.open-mesh.org/pipermail/b.a.t.m.a.n> List-Post: <mailto:b.a.t.m.a.n@lists.open-mesh.org> List-Help: <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=help> List-Subscribe: <https://lists.open-mesh.org/mm/listinfo/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=subscribe> X-List-Received-Date: Fri, 15 Apr 2011 18:24:46 -0000 |
Commit Message
Linus Lüssing
April 15, 2011, 6:24 p.m. UTC
hardif_remove_interfaces() removes all hard interfaces from the
hardif_list before freeing and cleaning up any device. However the clean
up procedures in orig_hash_del_if()
(hardif_remove_interface()->hardif_disable_interface()->
orig_hash_del_if())
need the other interfaces still to be present in the hardif_list.
Otherwise it won't renumber any preceding interfaces, which leads to an
unhandled kernel paging request in orig_node_del_if()'s "/* copy second
part */" due to wrong hard_if->if_num's.
With this commit the interface removal on module shutdown will be down
in the same way as removing single interfaces from batman only: One
interface will be removed and cleaned at a time.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
---
hard-interface.c | 15 ++++-----------
1 files changed, 4 insertions(+), 11 deletions(-)
Comments
Linus Lüssing wrote: > hardif_remove_interfaces() removes all hard interfaces from the > hardif_list before freeing and cleaning up any device. However the clean > up procedures in orig_hash_del_if() > (hardif_remove_interface()->hardif_disable_interface()-> > orig_hash_del_if()) > need the other interfaces still to be present in the hardif_list. > Otherwise it won't renumber any preceding interfaces, which leads to an > unhandled kernel paging request in orig_node_del_if()'s "/* copy second > part */" due to wrong hard_if->if_num's. > > With this commit the interface removal on module shutdown will be down > in the same way as removing single interfaces from batman only: One > interface will be removed and cleaned at a time. > > Signed-off-by: Linus Lüssing <linus.luessing@web.de> Please use --patience as requested in http://www.open-mesh.org/wiki/open-mesh/Contribute Please show us (as part of the commit message) why the information in http://www.open-mesh.org/projects/batman-adv/repository/revisions/132b776c22c9b71962a3ed9a33e5b6f9218dae1b isn't valid anymore and explain why it is save to use the spin_lock only inside the loop (but it would have to protect the loop in normal situations). Kind regards, Sven
Sven Eckelmann wrote: > Linus Lüssing wrote: > > hardif_remove_interfaces() removes all hard interfaces from the > > hardif_list before freeing and cleaning up any device. However the clean > > up procedures in orig_hash_del_if() > > (hardif_remove_interface()->hardif_disable_interface()-> > > orig_hash_del_if()) > > need the other interfaces still to be present in the hardif_list. > > Otherwise it won't renumber any preceding interfaces, which leads to an > > unhandled kernel paging request in orig_node_del_if()'s "/* copy second > > part */" due to wrong hard_if->if_num's. > > > > With this commit the interface removal on module shutdown will be down > > in the same way as removing single interfaces from batman only: One > > interface will be removed and cleaned at a time. > > > > Signed-off-by: Linus Lüssing <linus.luessing@web.de> > > Please use --patience as requested in > http://www.open-mesh.org/wiki/open-mesh/Contribute > > Please show us (as part of the commit message) why the information in > http://www.open-mesh.org/projects/batman-adv/repository/revisions/132b776c > 22c9b71962a3ed9a33e5b6f9218dae1b isn't valid anymore and explain why it is > save to use the spin_lock only inside the loop (but it would have to > protect the loop in normal situations). Sry, this was not the correct commit - The commit which fixed a problematic locking behaviour was 5d4b5a4d - but I didn't gave a lockdep output there. The other question must still be answered. Btw. what is the status of the sysfs_addrm_finish vs. rtnl_lock patch? Kind regards, Sven
On Sat, Apr 16, 2011 at 09:54:48AM +0200, Sven Eckelmann wrote: > Linus Lüssing wrote: > > hardif_remove_interfaces() removes all hard interfaces from the > > hardif_list before freeing and cleaning up any device. However the clean > > up procedures in orig_hash_del_if() > > (hardif_remove_interface()->hardif_disable_interface()-> > > orig_hash_del_if()) > > need the other interfaces still to be present in the hardif_list. > > Otherwise it won't renumber any preceding interfaces, which leads to an > > unhandled kernel paging request in orig_node_del_if()'s "/* copy second > > part */" due to wrong hard_if->if_num's. > > > > With this commit the interface removal on module shutdown will be down > > in the same way as removing single interfaces from batman only: One > > interface will be removed and cleaned at a time. > > > > Signed-off-by: Linus Lüssing <linus.luessing@web.de> > > > Please use --patience as requested in > http://www.open-mesh.org/wiki/open-mesh/Contribute > > Please show us (as part of the commit message) why the information in > http://www.open-mesh.org/projects/batman-adv/repository/revisions/132b776c22c9b71962a3ed9a33e5b6f9218dae1b > isn't valid anymore and explain why it is save to use the spin_lock only > inside the loop (but it would have to protect the loop in normal situations). > > Kind regards, > Sven Hi Sven, Ah, oki doki, didn't know about commit 5d4b5a4d and yes, a revert of that commit looks kind of similar to my patch. Commit 5d4b5a4d together with your statement confuse me a little. The commit message does not say anything about a locking dependancy issue, but seems to be a performance patch (which does not seem as such a severe problem to me, as removing/adding interfaces / removing the batman-adv module does not happen that frequently in common setups ;) ). Could you explain a little further which combinations of locks could introduce a problem? Hmm, for the "and explain why it is save to use the spin_lock only" part, aggreed, I think it was initially a mistake of mine. And usually this would not protect us from a new interface being added or an interface being removed from batman during a NETDEV_REGISTER/UNREGISTER event while we are trying to flush the if_list. However, just before calling hardif_remove_interfaces(), we are calling unregister_netdevice_notifier(&hard_if_notifier). So as far as I know, no hardif_add_interface() or hardif_remove_interface() and according list_add/del_rcu for the if_list should be called anymore. Cheers, Linus PS: And it's actually not an "unhandled kernel paging request" but a "Null pointer dereference". Also see this ticket: http://www.open-mesh.org/issues/147 I'm also wondering why we'd actually need the rtnl_lock() in hardif_remove_interfaces() then with that reasoning. What operation in hardif_remove_interface() (without the 's') needs to be protected with the rtnl_lock(), could be place the rtnl_lock a little tighter instead to also fix the issue from here? http://www.open-mesh.org/issues/145
Linus Lüssing wrote: > Ah, oki doki, didn't know about commit 5d4b5a4d and yes, a revert > of that commit looks kind of similar to my patch. > > Commit 5d4b5a4d together with your statement confuse me a little. The > commit message does not say anything about a locking dependancy > issue, but seems to be a performance patch (which does not seem as > such a severe problem to me, as removing/adding interfaces / > removing the batman-adv module does not happen that frequently in > common setups ;) ). Could you explain a little further which > combinations of locks could introduce a problem? No > Hmm, for the "and explain why it is save to use the spin_lock > only" part, aggreed, I think it was initially a mistake of mine. > And usually this would not protect us from a new interface being > added or an interface being removed from batman during a > NETDEV_REGISTER/UNREGISTER event while we are trying to flush the > if_list. > However, just before calling hardif_remove_interfaces(), we are > calling unregister_netdevice_notifier(&hard_if_notifier). > So as far as I know, no hardif_add_interface() or > hardif_remove_interface() and according list_add/del_rcu for the > if_list should be called anymore. Interesting assumption, but how did you ensure that everything is in a synchronous state? The network core also uses rcu - and it doesn't use the atomic notifier functions. > Cheers, Linus > > PS: And it's actually not an "unhandled kernel paging request" but > a "Null pointer dereference". Also see this ticket: > http://www.open-mesh.org/issues/147 > > I'm also wondering why we'd actually need the rtnl_lock() in > hardif_remove_interfaces() then with that reasoning. What > operation in hardif_remove_interface() (without the 's') needs to > be protected with the rtnl_lock(), could be place the rtnl_lock a > little tighter instead to also fix the issue from here? > http://www.open-mesh.org/issues/145 See 132b776c22c9b71962a3ed9a33e5b6f9218dae1b I will propose two different patches. Regards, Sven
diff --git a/hard-interface.c b/hard-interface.c index b3058e4..f039a3d 100644 --- a/hard-interface.c +++ b/hard-interface.c @@ -490,20 +490,13 @@ static void hardif_remove_interface(struct hard_iface *hard_iface) void hardif_remove_interfaces(void) { struct hard_iface *hard_iface, *hard_iface_tmp; - struct list_head if_queue; - INIT_LIST_HEAD(&if_queue); - - spin_lock(&hardif_list_lock); - list_for_each_entry_safe(hard_iface, hard_iface_tmp, - &hardif_list, list) { + rtnl_lock(); + list_for_each_entry_safe(hard_iface, hard_iface_tmp, &hardif_list, list) { + spin_lock(&hardif_list_lock); list_del_rcu(&hard_iface->list); - list_add_tail(&hard_iface->list, &if_queue); - } - spin_unlock(&hardif_list_lock); + spin_unlock(&hardif_list_lock); - rtnl_lock(); - list_for_each_entry_safe(hard_iface, hard_iface_tmp, &if_queue, list) { hardif_remove_interface(hard_iface); } rtnl_unlock();