Message ID | 1283729393-13842-1-git-send-email-linus.luessing@web.de |
---|---|
State | Superseded, archived |
Commit | f3c1780e4d8b203131fa9b758e497e9a19cb4d0c |
Headers |
Return-Path: <linus.luessing@web.de> Received: from fmmailgate03.web.de (fmmailgate03.web.de [217.72.192.234]) by open-mesh.net (Postfix) with ESMTP id DD99F154545 for <b.a.t.m.a.n@lists.open-mesh.org>; Mon, 6 Sep 2010 01:30:08 +0200 (CEST) Received: from smtp03.web.de ( [172.20.0.65]) by fmmailgate03.web.de (Postfix) with ESMTP id 251B915EE3AA9 for <b.a.t.m.a.n@lists.open-mesh.org>; Mon, 6 Sep 2010 01:30:02 +0200 (CEST) Received: from [87.170.7.131] (helo=localhost) by smtp03.web.de with asmtp (TLSv1:AES128-SHA:128) (WEB.DE 4.110 #4) id 1OsOev-0003po-00; Mon, 06 Sep 2010 01:30:01 +0200 From: =?UTF-8?q?Linus=20L=C3=BCssing?= <linus.luessing@web.de> To: b.a.t.m.a.n@lists.open-mesh.org Date: Mon, 6 Sep 2010 01:29:53 +0200 Message-Id: <1283729393-13842-1-git-send-email-linus.luessing@web.de> X-Mailer: git-send-email 1.7.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linus.luessing@web.de X-Sender: linus.luessing@web.de X-Provags-ID: V01U2FsdGVkX1+BcH9Z0BsNTwx9XvW61Ee/A4isX0vdjeWU/gjs CAoX6QgW3FRgfqrINebsWEYH9wbk/entiG8LSjeMrUDa+FywaP WWffm1ECC+gL661CPXUQ== Subject: [B.A.T.M.A.N.] [PATCH] batman-adv: Always synchronize rcu's on module shutdown X-BeenThere: b.a.t.m.a.n@lists.open-mesh.org X-Mailman-Version: 2.1.11 Precedence: list Reply-To: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n@lists.open-mesh.org> List-Id: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n.lists.open-mesh.org> List-Unsubscribe: <https://lists.open-mesh.org/mm/options/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=unsubscribe> List-Archive: <http://lists.open-mesh.org/pipermail/b.a.t.m.a.n> List-Post: <mailto:b.a.t.m.a.n@lists.open-mesh.org> List-Help: <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=help> List-Subscribe: <https://lists.open-mesh.org/mm/listinfo/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=subscribe> X-List-Received-Date: Sun, 05 Sep 2010 23:30:09 -0000 |
Commit Message
Linus Lüssing
Sept. 5, 2010, 11:29 p.m. UTC
During the module shutdown procedure in batman_exit(), a rcu callback is
being scheduled (batman_exit -> hardif_remove_interfaces ->
hardif_remove_interfae -> call_rcu). However, when the kernel unloads
the module, the rcu callback might not have been executed yet, resulting
in a "unable to handle kernel paging request" in __rcu_process_callback
afterwards, causing the kernel to freeze.
Therefore, we should always flush all rcu callback functions scheduled
during the shutdown procedure.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
---
main.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
Comments
On Mon, Sep 06, 2010 at 01:29:53AM +0200, Linus Lüssing wrote: > During the module shutdown procedure in batman_exit(), a rcu callback is > being scheduled (batman_exit -> hardif_remove_interfaces -> > hardif_remove_interfae -> call_rcu). However, when the kernel unloads > the module, the rcu callback might not have been executed yet, resulting > in a "unable to handle kernel paging request" in __rcu_process_callback > afterwards, causing the kernel to freeze. > Therefore, we should always flush all rcu callback functions scheduled > during the shutdown procedure. I am really irritated by your patch. I would have expected that you add a synchronyze_rcu in batman_exit and that was it. Instead I see a synchronize_net added and a synchronize_net/-_rcu removed from mesh_free. This doesn't seem to match at all. Could you please explain further why it is implemented that way? thanks, Sven
Hi Sven, synchronize_net already contains a synchronize_rcu at its end, so the synchronize_rcu in the batman code there has always been redundant. I've removed the synchronize_rcu instead of the synchronize_net to be on the safe side. I guess usually no more packets should arrive anyway as the batman packet type is not registered anymore. But I wasn't sure if the might_sleep() of synchronize_net() might be needed for something, so I didn't dare to remove synchronize_net. If someone says it'd be ok to remove synchronize_net() instead, I could make a new patch, no problem. Cheers, Linus On Mon, Sep 06, 2010 at 09:30:46AM +0200, Sven Eckelmann wrote: > On Mon, Sep 06, 2010 at 01:29:53AM +0200, Linus Lüssing wrote: > > During the module shutdown procedure in batman_exit(), a rcu callback is > > being scheduled (batman_exit -> hardif_remove_interfaces -> > > hardif_remove_interfae -> call_rcu). However, when the kernel unloads > > the module, the rcu callback might not have been executed yet, resulting > > in a "unable to handle kernel paging request" in __rcu_process_callback > > afterwards, causing the kernel to freeze. > > Therefore, we should always flush all rcu callback functions scheduled > > during the shutdown procedure. > > I am really irritated by your patch. I would have expected that you add a > synchronyze_rcu in batman_exit and that was it. Instead I see a synchronize_net > added and a synchronize_net/-_rcu removed from mesh_free. This doesn't seem to > match at all. Could you please explain further why it is implemented that way? > > thanks, > Sven
Linus Lüssing wrote: > Hi Sven, > > synchronize_net already contains a synchronize_rcu at its end, so > the synchronize_rcu in the batman code there has always been > redundant. > > I've removed the synchronize_rcu instead of the synchronize_net to > be on the safe side. I guess usually no more packets should arrive > anyway as the batman packet type is not registered anymore. But I > wasn't sure if the might_sleep() of synchronize_net() might be > needed for something, so I didn't dare to remove synchronize_net. > > If someone says it'd be ok to remove synchronize_net() instead, > I could make a new patch, no problem. Ok, it would have been nice to state such things in the commit message (otherwise the stable@kernel.org will drop such a patch quite easily). Marek and I have ausgekaspert why it only happens in 1765 and also in 1766. So it will not be a patch for stable. And the might_sleep is only for debugging purposes. But yes, it makes sense to use synchronize_net here (for example due to the usage of dev_remove_pack before). That means that technically the patch seems to be ok, but didn't liked the explanation with the problem that we might have to justify it to the stable@kernel.org guys that way. So I would ack the patch with a minor change in the commit message. So instead of > During the module shutdown procedure in batman_exit(), a rcu callback is > being scheduled (batman_exit -> hardif_remove_interfaces -> > hardif_remove_interfae -> call_rcu). However, when the kernel unloads > the module, the rcu callback might not have been executed yet, resulting > in a "unable to handle kernel paging request" in __rcu_process_callback > afterwards, causing the kernel to freeze. > Therefore, we should always flush all rcu callback functions scheduled > during the shutdown procedure. something like > During the module shutdown procedure in batman_exit(), a rcu callback is > being scheduled (batman_exit -> hardif_remove_interfaces -> > hardif_remove_interfae -> call_rcu). However, when the kernel unloads > the module, the rcu callback might not have been executed yet, resulting > in a "unable to handle kernel paging request" in __rcu_process_callback > afterwards, causing the kernel to freeze. > The synchronize_net and synchronize_rcu in mesh_free are currently > called before the call_rcu in hardif_remove_interface and have no real > effect on it. > Therefore, we should always flush all rcu callback functions scheduled > during the shutdown procedure using synchronize_net. The call to > synchronize_rcu can be omitted because synchronize_net already calls it. thanks, Sven
> So I would ack the patch with a minor change in the commit message. So instead > of > > > During the module shutdown procedure in batman_exit(), a rcu callback is > > being scheduled (batman_exit -> hardif_remove_interfaces -> > > hardif_remove_interfae -> call_rcu). However, when the kernel unloads > > the module, the rcu callback might not have been executed yet, resulting > > in a "unable to handle kernel paging request" in __rcu_process_callback > > afterwards, causing the kernel to freeze. > > Therefore, we should always flush all rcu callback functions scheduled > > during the shutdown procedure. > > something like > > > During the module shutdown procedure in batman_exit(), a rcu callback is > > being scheduled (batman_exit -> hardif_remove_interfaces -> > > hardif_remove_interfae -> call_rcu). However, when the kernel unloads > > the module, the rcu callback might not have been executed yet, resulting > > in a "unable to handle kernel paging request" in __rcu_process_callback > > afterwards, causing the kernel to freeze. > > > The synchronize_net and synchronize_rcu in mesh_free are currently > > called before the call_rcu in hardif_remove_interface and have no real > > effect on it. > > > Therefore, we should always flush all rcu callback functions scheduled > > during the shutdown procedure using synchronize_net. The call to > > synchronize_rcu can be omitted because synchronize_net already calls it. Yep, sounds good :). Thanks for reviewing and the info about synchronize_net. Cheers, Linus > > thanks, > Sven
diff --git a/main.c b/main.c index 209a46b..e8acb46 100644 --- a/main.c +++ b/main.c @@ -73,6 +73,8 @@ static void __exit batman_exit(void) flush_workqueue(bat_event_workqueue); destroy_workqueue(bat_event_workqueue); bat_event_workqueue = NULL; + + synchronize_net(); } int mesh_init(struct net_device *soft_iface) @@ -135,9 +137,6 @@ void mesh_free(struct net_device *soft_iface) hna_local_free(bat_priv); hna_global_free(bat_priv); - synchronize_net(); - - synchronize_rcu(); atomic_set(&bat_priv->mesh_state, MESH_INACTIVE); }