[2/2] batman-adv: Fix general protection fault in batadv_tt_global_del_orig()

Message ID 1363495498-17830-2-git-send-email-linus.luessing@web.de (mailing list archive)
State Rejected, archived
Headers

Commit Message

Linus Lüssing March 17, 2013, 4:44 a.m. UTC
  On shutdown a race condition where we access a just freed global TT hash
might occure. batadv_orig_node_free_rcu() callbacks might have been
scheduled (especially during the shutdown procedure) and unfortunately
batadv_tt_global_table_free() does not wait for them to finish first
before freeing the global TT hash.

This potentially results in a general protection fault in
batadv_tt_global_del_orig(), called via a batadv_orig_node_free_rcu()
callback, which tries to access the just freed global TT hash.

This patch tries to fix this by waiting for any just scheduled
batadv_orig_node_free_rcu() to finish via an extra rcu_barrier() call
before freeing the global TT hash. And by moving the TT freeing call to
the end of the batman cleanup routines.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
  

Comments

Antonio Quartulli March 30, 2013, 1:16 p.m. UTC | #1
On Sun, Mar 17, 2013 at 05:44:58AM +0100, Linus Lüssing wrote:
> On shutdown a race condition where we access a just freed global TT hash
> might occure. batadv_orig_node_free_rcu() callbacks might have been
> scheduled (especially during the shutdown procedure) and unfortunately
> batadv_tt_global_table_free() does not wait for them to finish first
> before freeing the global TT hash.
> 
> This potentially results in a general protection fault in
> batadv_tt_global_del_orig(), called via a batadv_orig_node_free_rcu()
> callback, which tries to access the just freed global TT hash.
> 
> This patch tries to fix this by waiting for any just scheduled
> batadv_orig_node_free_rcu() to finish via an extra rcu_barrier() call
> before freeing the global TT hash. And by moving the TT freeing call to
> the end of the batman cleanup routines.
> 
> Signed-off-by: Linus Lüssing <linus.luessing@web.de>

Acked-by: Antonio Quartulli <ordex@autistici.org>

@Marek: when you will merge this commit, can you please reword "tries to fix" in
"fixes" ? :)
Actually this patch is fixing the problem :)

However, as I discussed with Linus on IRC, this is only a temporary fix, which
aims to remove the problem, but still we will need a redesign of the TT clean up
routine in order to cleanly get rid of this race condition.

Cheers,
  
Antonio Quartulli April 17, 2013, 12:58 p.m. UTC | #2
On Sat, Mar 30, 2013 at 02:16:02PM +0100, Antonio Quartulli wrote:
> On Sun, Mar 17, 2013 at 05:44:58AM +0100, Linus Lüssing wrote:
> > On shutdown a race condition where we access a just freed global TT hash
> > might occure. batadv_orig_node_free_rcu() callbacks might have been
> > scheduled (especially during the shutdown procedure) and unfortunately
> > batadv_tt_global_table_free() does not wait for them to finish first
> > before freeing the global TT hash.
> > 
> > This potentially results in a general protection fault in
> > batadv_tt_global_del_orig(), called via a batadv_orig_node_free_rcu()
> > callback, which tries to access the just freed global TT hash.
> > 
> > This patch tries to fix this by waiting for any just scheduled
> > batadv_orig_node_free_rcu() to finish via an extra rcu_barrier() call
> > before freeing the global TT hash. And by moving the TT freeing call to
> > the end of the batman cleanup routines.
> > 
> > Signed-off-by: Linus Lüssing <linus.luessing@web.de>
> 
> Acked-by: Antonio Quartulli <ordex@autistici.org>

NACK.

This patch is solving one problem but creating a new one:
by using rcu_barrier we avoid the crash but we will leak memory, because
batadv_orig_node_free_rcu()->batadv_tt_global_del_orig() will access an empty
global table and so will not be able to free the global entries.


Patch
("batman-adv: avoid race conditions on TT global table by counting references")
is fixing the problem by redesigning the TT clean up routine.

Cheers,
  

Patch

diff --git a/main.c b/main.c
index 62b1f89..8663d97 100644
--- a/main.c
+++ b/main.c
@@ -166,12 +166,13 @@  void batadv_mesh_free(struct net_device *soft_iface)
 	batadv_originator_free(bat_priv);
 	batadv_nc_free(bat_priv);
 
-	batadv_tt_free(bat_priv);
-
 	batadv_bla_free(bat_priv);
 
 	batadv_dat_free(bat_priv);
 
+	/* Don't call any batadv_orig_node_free_ref() after me */
+	batadv_tt_free(bat_priv);
+
 	free_percpu(bat_priv->bat_counters);
 
 	atomic_set(&bat_priv->mesh_state, BATADV_MESH_INACTIVE);
diff --git a/translation-table.c b/translation-table.c
index ee91cc1..279f0fd 100644
--- a/translation-table.c
+++ b/translation-table.c
@@ -1315,6 +1315,10 @@  static void batadv_tt_global_table_free(struct batadv_priv *bat_priv)
 		spin_unlock_bh(list_lock);
 	}
 
+	/* Wait for any batadv_orig_node_free_rcu() to finish,
+	 * they access the to be freed global TT hash */
+	rcu_barrier();
+
 	batadv_hash_destroy(hash);
 
 	bat_priv->tt.global_hash = NULL;