[2/2] batman-adv: Fix general protection fault in batadv_tt_global_del_orig()
Commit Message
On shutdown a race condition where we access a just freed global TT hash
might occure. batadv_orig_node_free_rcu() callbacks might have been
scheduled (especially during the shutdown procedure) and unfortunately
batadv_tt_global_table_free() does not wait for them to finish first
before freeing the global TT hash.
This potentially results in a general protection fault in
batadv_tt_global_del_orig(), called via a batadv_orig_node_free_rcu()
callback, which tries to access the just freed global TT hash.
This patch tries to fix this by waiting for any just scheduled
batadv_orig_node_free_rcu() to finish via an extra rcu_barrier() call
before freeing the global TT hash. And by moving the TT freeing call to
the end of the batman cleanup routines.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Comments
On Sun, Mar 17, 2013 at 05:44:58AM +0100, Linus Lüssing wrote:
> On shutdown a race condition where we access a just freed global TT hash
> might occure. batadv_orig_node_free_rcu() callbacks might have been
> scheduled (especially during the shutdown procedure) and unfortunately
> batadv_tt_global_table_free() does not wait for them to finish first
> before freeing the global TT hash.
>
> This potentially results in a general protection fault in
> batadv_tt_global_del_orig(), called via a batadv_orig_node_free_rcu()
> callback, which tries to access the just freed global TT hash.
>
> This patch tries to fix this by waiting for any just scheduled
> batadv_orig_node_free_rcu() to finish via an extra rcu_barrier() call
> before freeing the global TT hash. And by moving the TT freeing call to
> the end of the batman cleanup routines.
>
> Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
@Marek: when you will merge this commit, can you please reword "tries to fix" in
"fixes" ? :)
Actually this patch is fixing the problem :)
However, as I discussed with Linus on IRC, this is only a temporary fix, which
aims to remove the problem, but still we will need a redesign of the TT clean up
routine in order to cleanly get rid of this race condition.
Cheers,
On Sat, Mar 30, 2013 at 02:16:02PM +0100, Antonio Quartulli wrote:
> On Sun, Mar 17, 2013 at 05:44:58AM +0100, Linus Lüssing wrote:
> > On shutdown a race condition where we access a just freed global TT hash
> > might occure. batadv_orig_node_free_rcu() callbacks might have been
> > scheduled (especially during the shutdown procedure) and unfortunately
> > batadv_tt_global_table_free() does not wait for them to finish first
> > before freeing the global TT hash.
> >
> > This potentially results in a general protection fault in
> > batadv_tt_global_del_orig(), called via a batadv_orig_node_free_rcu()
> > callback, which tries to access the just freed global TT hash.
> >
> > This patch tries to fix this by waiting for any just scheduled
> > batadv_orig_node_free_rcu() to finish via an extra rcu_barrier() call
> > before freeing the global TT hash. And by moving the TT freeing call to
> > the end of the batman cleanup routines.
> >
> > Signed-off-by: Linus Lüssing <linus.luessing@web.de>
>
> Acked-by: Antonio Quartulli <ordex@autistici.org>
NACK.
This patch is solving one problem but creating a new one:
by using rcu_barrier we avoid the crash but we will leak memory, because
batadv_orig_node_free_rcu()->batadv_tt_global_del_orig() will access an empty
global table and so will not be able to free the global entries.
Patch
("batman-adv: avoid race conditions on TT global table by counting references")
is fixing the problem by redesigning the TT clean up routine.
Cheers,
@@ -166,12 +166,13 @@ void batadv_mesh_free(struct net_device *soft_iface)
batadv_originator_free(bat_priv);
batadv_nc_free(bat_priv);
- batadv_tt_free(bat_priv);
-
batadv_bla_free(bat_priv);
batadv_dat_free(bat_priv);
+ /* Don't call any batadv_orig_node_free_ref() after me */
+ batadv_tt_free(bat_priv);
+
free_percpu(bat_priv->bat_counters);
atomic_set(&bat_priv->mesh_state, BATADV_MESH_INACTIVE);
@@ -1315,6 +1315,10 @@ static void batadv_tt_global_table_free(struct batadv_priv *bat_priv)
spin_unlock_bh(list_lock);
}
+ /* Wait for any batadv_orig_node_free_rcu() to finish,
+ * they access the to be freed global TT hash */
+ rcu_barrier();
+
batadv_hash_destroy(hash);
bat_priv->tt.global_hash = NULL;