net: fix possible deadlocks in rtnl_trylock/unlock

Message ID 1354382991-31350-1-git-send-email-siwu@hrz.tu-chemnitz.de (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Simon Wunderlich Dec. 1, 2012, 5:29 p.m. UTC
  If rtnl_trylock() is used to prevent circular dependencies, rtnl_unlock()
may destroy this attempt because it first unlocks rtnl_mutex but may
lock it again later. The callgraph roughly looks like:

rtnl_unlock()
   netdev_run_todo()
      __rtnl_unlock()
      netdev_wait_allrefs()
         rtnl_lock()
         ...
         __rtnl_unlock()

There are two users which have possible deadlocks and are fixed in this
patch: batman-adv and bridge. Their problematic access pattern is the same:

notifier_call handler:
 * holds rtnl lock when called
 * calls sysfs code at some point (acquiring sysfs locks)

sysfs code:
 * holds sysfs lock when called
 * calls rtnl_trylock() and rtnl_unlock(), rtnl_unlock() calls rtnl_lock
   implicitly

Fix this by exporting __rtnl_unlock() to just unlock the mutex without
implicitly locking again.

Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>

---
Of course, an alternative would be to not lock again after unlocking
within rtnl_unlock(), or put the todo handling into the locked section.
I'm not familiar enough with this code to know what would be best.

There are others using rtnl_trylock(), but I'm not sure if they
are affected.
---
 net/batman-adv/sysfs.c   |    2 +-
 net/bridge/br_sysfs_br.c |    2 +-
 net/bridge/br_sysfs_if.c |    2 +-
 net/core/rtnetlink.c     |    1 +
 4 files changed, 4 insertions(+), 3 deletions(-)
  

Comments

Sven Eckelmann Dec. 1, 2012, 6:07 p.m. UTC | #1
On Saturday 01 December 2012 18:29:51 Simon Wunderlich wrote:
> If rtnl_trylock() is used to prevent circular dependencies, rtnl_unlock()
> may destroy this attempt because it first unlocks rtnl_mutex but may
> lock it again later. The callgraph roughly looks like:
> 
> rtnl_unlock()
>    netdev_run_todo()
>       __rtnl_unlock()
>       netdev_wait_allrefs()
>          rtnl_lock()
>          ...
>          __rtnl_unlock()
> 
> There are two users which have possible deadlocks and are fixed in this
> patch: batman-adv and bridge. Their problematic access pattern is the same:
> 
> notifier_call handler:
>  * holds rtnl lock when called
>  * calls sysfs code at some point (acquiring sysfs locks)
> 
> sysfs code:
>  * holds sysfs lock when called
>  * calls rtnl_trylock() and rtnl_unlock(), rtnl_unlock() calls rtnl_lock
>    implicitly
> 
> Fix this by exporting __rtnl_unlock() to just unlock the mutex without
> implicitly locking again.
> 
> Reported-by: Sven Eckelmann <sven@narfation.org>
> Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
> 
> ---
> Of course, an alternative would be to not lock again after unlocking
> within rtnl_unlock(), or put the todo handling into the locked section.
> I'm not familiar enough with this code to know what would be best.
> 
> There are others using rtnl_trylock(), but I'm not sure if they
> are affected.

At least they look like they have a problem in a parallel user scenario 
involving another lock and locking order (most of them s_active or a device 
lock). So just to list the places and poke some other users. They can better 
decide for themself because they know the code.

drivers/infiniband/ulp/ipoib/ipoib_cm.c:  if (!rtnl_trylock())
drivers/infiniband/ulp/ipoib/ipoib_vlan.c:        if (!rtnl_trylock())
drivers/infiniband/ulp/ipoib/ipoib_vlan.c:        if (!rtnl_trylock())
drivers/net/bonding/bond_alb.c:                   if (!rtnl_trylock()) {
drivers/net/bonding/bond_main.c:          if (!rtnl_trylock()) {
drivers/net/bonding/bond_main.c:          if (!rtnl_trylock()) {
drivers/net/bonding/bond_main.c:          if (!rtnl_trylock()) {
drivers/net/bonding/bond_main.c:          if (!rtnl_trylock()) {
drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock())
drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock())
drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock())
drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock())
drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock())
drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock())
drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:  if (!rtnl_trylock())    /* 
synchronize with ifdown */
drivers/s390/net/qeth_l2_main.c:          if (rtnl_trylock()) {
drivers/s390/net/qeth_l3_main.c:          if (rtnl_trylock()) {
net/bridge/br_sysfs_br.c: if (!rtnl_trylock())
net/bridge/br_sysfs_if.c:         if (!rtnl_trylock())
net/core/net-sysfs.c:     if (!rtnl_trylock())
net/core/net-sysfs.c:     if (!rtnl_trylock())
net/core/net-sysfs.c:     if (!rtnl_trylock())
net/core/net-sysfs.c:     if (!rtnl_trylock())
net/core/net-sysfs.c:     if (!rtnl_trylock())
net/ipv4/devinet.c:                       if (!rtnl_trylock()) {
net/ipv6/addrconf.c:      if (!rtnl_trylock())
net/ipv6/addrconf.c:      if (!rtnl_trylock())

Kind regards,
	Sven
  
Eric Dumazet Dec. 1, 2012, 6:36 p.m. UTC | #2
On Sat, 2012-12-01 at 18:29 +0100, Simon Wunderlich wrote:
> If rtnl_trylock() is used to prevent circular dependencies, rtnl_unlock()
> may destroy this attempt because it first unlocks rtnl_mutex but may
> lock it again later. The callgraph roughly looks like:
> 
> rtnl_unlock()
>    netdev_run_todo()
>       __rtnl_unlock()
>       netdev_wait_allrefs()
>          rtnl_lock()
>          ...
>          __rtnl_unlock()
> 
> There are two users which have possible deadlocks and are fixed in this
> patch: batman-adv and bridge. Their problematic access pattern is the same:
> 
> notifier_call handler:
>  * holds rtnl lock when called
>  * calls sysfs code at some point (acquiring sysfs locks)
> 
> sysfs code:
>  * holds sysfs lock when called
>  * calls rtnl_trylock() and rtnl_unlock(), rtnl_unlock() calls rtnl_lock
>    implicitly
> 
> Fix this by exporting __rtnl_unlock() to just unlock the mutex without
> implicitly locking again.
> 
> Reported-by: Sven Eckelmann <sven@narfation.org>
> Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
> 
> ---
> Of course, an alternative would be to not lock again after unlocking
> within rtnl_unlock(), or put the todo handling into the locked section.
> I'm not familiar enough with this code to know what would be best.
> 
> There are others using rtnl_trylock(), but I'm not sure if they
> are affected.
> ---
>  net/batman-adv/sysfs.c   |    2 +-
>  net/bridge/br_sysfs_br.c |    2 +-
>  net/bridge/br_sysfs_if.c |    2 +-
>  net/core/rtnetlink.c     |    1 +
>  4 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c
> index 66518c7..41b74aa 100644
> --- a/net/batman-adv/sysfs.c
> +++ b/net/batman-adv/sysfs.c
> @@ -635,7 +635,7 @@ static ssize_t batadv_store_mesh_iface(struct kobject *kobj,
>  	ret = batadv_hardif_enable_interface(hard_iface, buff);
>  
>  unlock:
> -	rtnl_unlock();
> +	__rtnl_unlock();
>  out:
>  	batadv_hardif_free_ref(hard_iface);
>  	return ret;
> diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
> index c5c0593..c122782 100644
> --- a/net/bridge/br_sysfs_br.c
> +++ b/net/bridge/br_sysfs_br.c
> @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d,
>  	if (!rtnl_trylock())
>  		return restart_syscall();
>  	br_stp_set_enabled(br, val);
> -	rtnl_unlock();
> +	__rtnl_unlock();
>  
>  	return len;
>  }

I have no idea of why you believe there is a problem here.

Could you explain how net_todo_list could be not empty ?

As long as no device is unregistered between
rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
  
Sven Eckelmann Dec. 1, 2012, 7:04 p.m. UTC | #3
On Saturday 01 December 2012 10:36:12 Eric Dumazet wrote:
[...]
> > diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c
> > index 66518c7..41b74aa 100644
> > --- a/net/batman-adv/sysfs.c
> > +++ b/net/batman-adv/sysfs.c
> > @@ -635,7 +635,7 @@ static ssize_t batadv_store_mesh_iface(struct kobject
> > *kobj,> 
> >  	ret = batadv_hardif_enable_interface(hard_iface, buff);
> >  
> >  unlock:
> > -	rtnl_unlock();
> > +	__rtnl_unlock();
> > 
> >  out:
> >  	batadv_hardif_free_ref(hard_iface);
> >  	return ret;
> > 
> > diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
> > index c5c0593..c122782 100644
> > --- a/net/bridge/br_sysfs_br.c
> > +++ b/net/bridge/br_sysfs_br.c
> > @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d,
> > 
> >  	if (!rtnl_trylock())
> >  	
> >  		return restart_syscall();
> >  	
> >  	br_stp_set_enabled(br, val);
> > 
> > -	rtnl_unlock();
> > +	__rtnl_unlock();
> > 
> >  	return len;
> >  
> >  }
> 
> I have no idea of why you believe there is a problem here.
> 
> Could you explain how net_todo_list could be not empty ?
> 
> As long as no device is unregistered between
> rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.

I am not sure what "here" means for your. At least batman-adv tries to 
unregister a device -> problem [1]. I will not make any judgements about the 
other uses in the kernel/other parts patched by Simon.

Kind regards,
	Sven

[1] http://article.gmane.org/gmane.linux.kernel/1392295
  
Eric Dumazet Dec. 1, 2012, 7:44 p.m. UTC | #4
On Sat, 2012-12-01 at 20:04 +0100, Sven Eckelmann wrote:
> On Saturday 01 December 2012 10:36:12 Eric Dumazet wrote:
>  
> > > diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
> > > index c5c0593..c122782 100644
> > > --- a/net/bridge/br_sysfs_br.c
> > > +++ b/net/bridge/br_sysfs_br.c
> > > @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d,
> > > 
> > >  	if (!rtnl_trylock())
> > >  	
> > >  		return restart_syscall();
> > >  	
> > >  	br_stp_set_enabled(br, val);
> > > 
> > > -	rtnl_unlock();
> > > +	__rtnl_unlock();
> > > 
> > >  	return len;
> > >  
> > >  }
> > 
> > I have no idea of why you believe there is a problem here.
> > 
> > Could you explain how net_todo_list could be not empty ?
> > 
> > As long as no device is unregistered between
> > rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
> 
> I am not sure what "here" means for your. At least batman-adv tries to 
> unregister a device -> problem [1]. I will not make any judgements about the 
> other uses in the kernel/other parts patched by Simon.
> 

I was reacting to the change in net/bridge/br_sysfs_br.c

rtnl_trylock() could set a boolean flag to explicitly WARN_ON()
in case we try to unregister a device.
  
Sven Eckelmann Dec. 1, 2012, 7:57 p.m. UTC | #5
On Saturday 01 December 2012 11:44:34 Eric Dumazet wrote:
[...]
> > > I have no idea of why you believe there is a problem here.
> > > 
> > > Could you explain how net_todo_list could be not empty ?
> > > 
> > > As long as no device is unregistered between
> > > rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
> > 
> > I am not sure what "here" means for your. At least batman-adv tries to
> > unregister a device -> problem [1]. I will not make any judgements about
> > the other uses in the kernel/other parts patched by Simon.
> 
> I was reacting to the change in net/bridge/br_sysfs_br.c

Yes, in this context it makes more sense.

> rtnl_trylock() could set a boolean flag to explicitly WARN_ON()
> in case we try to unregister a device.

Sounds interesting.

Kind regards,
	Sven
  
Simon Wunderlich Dec. 1, 2012, 8:01 p.m. UTC | #6
On Sat, Dec 01, 2012 at 11:44:34AM -0800, Eric Dumazet wrote:
> On Sat, 2012-12-01 at 20:04 +0100, Sven Eckelmann wrote:
> > On Saturday 01 December 2012 10:36:12 Eric Dumazet wrote:
> >  
> > > > diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
> > > > index c5c0593..c122782 100644
> > > > --- a/net/bridge/br_sysfs_br.c
> > > > +++ b/net/bridge/br_sysfs_br.c
> > > > @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d,
> > > > 
> > > >  	if (!rtnl_trylock())
> > > >  	
> > > >  		return restart_syscall();
> > > >  	
> > > >  	br_stp_set_enabled(br, val);
> > > > 
> > > > -	rtnl_unlock();
> > > > +	__rtnl_unlock();
> > > > 
> > > >  	return len;
> > > >  
> > > >  }
> > > 
> > > I have no idea of why you believe there is a problem here.
> > > 
> > > Could you explain how net_todo_list could be not empty ?
> > > 
> > > As long as no device is unregistered between
> > > rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
> > 
> > I am not sure what "here" means for your. At least batman-adv tries to 
> > unregister a device -> problem [1]. I will not make any judgements about the 
> > other uses in the kernel/other parts patched by Simon.
> > 
> 
> I was reacting to the change in net/bridge/br_sysfs_br.c
> 
> rtnl_trylock() could set a boolean flag to explicitly WARN_ON()
> in case we try to unregister a device.

Well, I'm not sure if this can happen in the bridge code, but from looking at the
code it doesn't appear to be impossible. It would be better to be sure that it can't
deadlock IMHO.

(Although doing unlock/lock/unlock in an unlock function is also a little "uncommon").

Cheers,
	Simon
  
Antonio Quartulli Dec. 3, 2012, 8:09 p.m. UTC | #7
On Sat, Dec 01, 2012 at 09:01:53PM +0100, Simon Wunderlich wrote:
> On Sat, Dec 01, 2012 at 11:44:34AM -0800, Eric Dumazet wrote:
> > On Sat, 2012-12-01 at 20:04 +0100, Sven Eckelmann wrote:
> > > On Saturday 01 December 2012 10:36:12 Eric Dumazet wrote:
> > >  
> > > > > diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
> > > > > index c5c0593..c122782 100644
> > > > > --- a/net/bridge/br_sysfs_br.c
> > > > > +++ b/net/bridge/br_sysfs_br.c
> > > > > @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d,
> > > > > 
> > > > >  	if (!rtnl_trylock())
> > > > >  	
> > > > >  		return restart_syscall();
> > > > >  	
> > > > >  	br_stp_set_enabled(br, val);
> > > > > 
> > > > > -	rtnl_unlock();
> > > > > +	__rtnl_unlock();
> > > > > 
> > > > >  	return len;
> > > > >  
> > > > >  }
> > > > 
> > > > I have no idea of why you believe there is a problem here.
> > > > 
> > > > Could you explain how net_todo_list could be not empty ?
> > > > 
> > > > As long as no device is unregistered between
> > > > rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
> > > 
> > > I am not sure what "here" means for your. At least batman-adv tries to 
> > > unregister a device -> problem [1]. I will not make any judgements about the 
> > > other uses in the kernel/other parts patched by Simon.
> > > 
> > 
> > I was reacting to the change in net/bridge/br_sysfs_br.c
> > 
> > rtnl_trylock() could set a boolean flag to explicitly WARN_ON()
> > in case we try to unregister a device.
> 
> Well, I'm not sure if this can happen in the bridge code, but from looking at the
> code it doesn't appear to be impossible. It would be better to be sure that it can't
> deadlock IMHO.
> 
> (Although doing unlock/lock/unlock in an unlock function is also a little "uncommon").

But still we have the problem in batman-adv (as Sven pointed out in a previous
email) that tries to unregister a device in that "critical window".

Exporting __rtnl_unlock() would solve the issue in this case.

If you think the bridge code should not end up in such situation, what if Simon
resends the patch with only the __rtnl_unlock() exportation and the change in 
batman-adv?


Cheers,
  
Sven Eckelmann Dec. 3, 2012, 8:50 p.m. UTC | #8
On Monday 03 December 2012 21:09:06 Antonio Quartulli wrote:
> But still we have the problem in batman-adv (as Sven pointed out in a
> previous email) that tries to unregister a device in that "critical
> window".
> 
> Exporting __rtnl_unlock() would solve the issue in this case.
> 
> If you think the bridge code should not end up in such situation, what if
> Simon resends the patch with only the __rtnl_unlock() exportation and the
> change in batman-adv?

I personally have big doubts about the removal of the second half of the 
unregister. Doesn't sound like the best idea. This would result in side 
effects... one of them for example would be the possible deadlock scenario 
moved to the other users of rtnl_trylock which don't unregister a device 
inside their critical section.... so either you do it always this way when 
using rtnl_trylock or it will break. I don't want to imagine other problems 
caused by this change.

Kind regards,
	Sven
  

Patch

diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c
index 66518c7..41b74aa 100644
--- a/net/batman-adv/sysfs.c
+++ b/net/batman-adv/sysfs.c
@@ -635,7 +635,7 @@  static ssize_t batadv_store_mesh_iface(struct kobject *kobj,
 	ret = batadv_hardif_enable_interface(hard_iface, buff);
 
 unlock:
-	rtnl_unlock();
+	__rtnl_unlock();
 out:
 	batadv_hardif_free_ref(hard_iface);
 	return ret;
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index c5c0593..c122782 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -142,7 +142,7 @@  static ssize_t store_stp_state(struct device *d,
 	if (!rtnl_trylock())
 		return restart_syscall();
 	br_stp_set_enabled(br, val);
-	rtnl_unlock();
+	__rtnl_unlock();
 
 	return len;
 }
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 13b36bd..d99f394 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -223,7 +223,7 @@  static ssize_t brport_store(struct kobject * kobj,
 			if (ret == 0)
 				ret = count;
 		}
-		rtnl_unlock();
+		__rtnl_unlock();
 	}
 	return ret;
 }
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index fad649a..d95ba6f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -72,6 +72,7 @@  void __rtnl_unlock(void)
 {
 	mutex_unlock(&rtnl_mutex);
 }
+EXPORT_SYMBOL(__rtnl_unlock);
 
 void rtnl_unlock(void)
 {