[RFC,maint] batman-adv: fix adding VLANs with partial state

Message ID 20180510152752.2557-1-mareklindner@neomailbox.ch (mailing list archive)
State RFC, archived
Delegated to: Simon Wunderlich
Headers
Series [RFC,maint] batman-adv: fix adding VLANs with partial state |

Commit Message

Marek Lindner May 10, 2018, 3:27 p.m. UTC
  Whenever a new VLAN is created on top of batman virtual interfaces
the batman-adv kernel module creates internal structures to track
the status of said VLAN. Amongst other things, the MAC address of
the VLAN interface itself has to be stored.

Without this change a VLAN and its infrastructure could be created
while the interface MAC address is not stored without triggering
any error, thus creating issues in other parts of the code.

Prevent the VLAN from being created if the MAC address can not
be stored.

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
---
 net/batman-adv/soft-interface.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)
  

Comments

Sven Eckelmann May 10, 2018, 4:11 p.m. UTC | #1
On Donnerstag, 10. Mai 2018 17:27:52 CEST Marek Lindner wrote:
> Whenever a new VLAN is created on top of batman virtual interfaces
> the batman-adv kernel module creates internal structures to track
> the status of said VLAN. Amongst other things, the MAC address of
> the VLAN interface itself has to be stored.
> 
> Without this change a VLAN and its infrastructure could be created
> while the interface MAC address is not stored without triggering
> any error, thus creating issues in other parts of the code.
> 
> Prevent the VLAN from being created if the MAC address can not
> be stored.
> 
> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>

Fixes: 952cebb57518 ("batman-adv: add per VLAN interface attribute framework")

??

[....]
> @@ -604,13 +605,18 @@ int batadv_softif_create_vlan(struct batadv_priv 
*bat_priv, unsigned short vid)
>  	/* add a new TT local entry. This one will be marked with the NOPURGE
>  	 * flag
>  	 */
> -	batadv_tt_local_add(bat_priv->soft_iface,
> -			    bat_priv->soft_iface->dev_addr, vid,
> -			    BATADV_NULL_IFINDEX, BATADV_NO_MARK);
> +	client_added = batadv_tt_local_add(bat_priv->soft_iface,
> +					   bat_priv->soft_iface->dev_addr, vid,
> +					   BATADV_NULL_IFINDEX, BATADV_NO_MARK);
>  
>  	/* don't return reference to new softif_vlan */
>  	batadv_softif_vlan_put(vlan);
>  
> +	if (!client_added) {
> +		batadv_softif_destroy_vlan(bat_priv, vlan);
> +		return -ENOENT;
> +	}
> +

This function (batadv_softif_destroy_vlan) is static and cannot be reached 
from here. The definition+declaration of this function follows the function
were you've added these changes.

[...]
> @@ -683,9 +690,14 @@ static int batadv_interface_add_vid(struct net_device 
*dev, __be16 proto,
>  	 * flag. This must be added again, even if the vlan object already
>  	 * exists, because the entry was deleted by kill_vid()
>  	 */
> -	batadv_tt_local_add(bat_priv->soft_iface,
> -			    bat_priv->soft_iface->dev_addr, vid,
> -			    BATADV_NULL_IFINDEX, BATADV_NO_MARK);
> +	client_added = batadv_tt_local_add(bat_priv->soft_iface,
> +					   bat_priv->soft_iface->dev_addr, vid,
> +					   BATADV_NULL_IFINDEX, BATADV_NO_MARK);
> +
> +	if (!client_added) {
> +		batadv_softif_destroy_vlan(bat_priv, vlan);
> +		return -ENOENT;
> +	}

This looks quite fragile to me. Wouldn't that mean that the vlan would also 
not be created later? When I remember it correctly, then 
batadv_interface_add_vid is the only one which would create the internal vlan 
entries in the first place (beside this odd call in batadv_hard_if_event). The 
VLAN functionality would therefore be completely broken when such a problem 
happens.

1. batadv_interface_add_vid is called
2. vlan is added
3. local entry is not added for this vlan
3.1. vlan is removed again
4. node tries to communicate over this vlan

Btw. it is really odd to see code shared between batadv_interface_add_vid and 
batadv_softif_create_vlan. Especially because batadv_interface_add_vid calls 
batadv_softif_create_vlan.

Kind regards,
	Sven
  
Antonio Quartulli May 10, 2018, 4:24 p.m. UTC | #2
On 11/05/18 00:11, Sven Eckelmann wrote:
[...]
>> @@ -683,9 +690,14 @@ static int batadv_interface_add_vid(struct net_device 
> *dev, __be16 proto,
>>  	 * flag. This must be added again, even if the vlan object already
>>  	 * exists, because the entry was deleted by kill_vid()
>>  	 */
>> -	batadv_tt_local_add(bat_priv->soft_iface,
>> -			    bat_priv->soft_iface->dev_addr, vid,
>> -			    BATADV_NULL_IFINDEX, BATADV_NO_MARK);
>> +	client_added = batadv_tt_local_add(bat_priv->soft_iface,
>> +					   bat_priv->soft_iface->dev_addr, vid,
>> +					   BATADV_NULL_IFINDEX, BATADV_NO_MARK);
>> +
>> +	if (!client_added) {
>> +		batadv_softif_destroy_vlan(bat_priv, vlan);
>> +		return -ENOENT;
>> +	}
> 
> This looks quite fragile to me. Wouldn't that mean that the vlan would also 
> not be created later? When I remember it correctly, then 
> batadv_interface_add_vid is the only one which would create the internal vlan 
> entries in the first place (beside this odd call in batadv_hard_if_event). The 
> VLAN functionality would therefore be completely broken when such a problem 
> happens.
> 
> 1. batadv_interface_add_vid is called
> 2. vlan is added
> 3. local entry is not added for this vlan
> 3.1. vlan is removed again
> 4. node tries to communicate over this vlan

A failure of this function will cause a failure of the
ndo_vlan_rx_add_vid() call made by the kernel/netlink API which in turns
will reach the userspace.

Ideally any userspace application that attempted to create the VLAN
should be in charge of re-attempting the creation again (I believe).

About your point 4, this should not be possible, because in case of
error the bat0.X interface won't be created by the kernel at all. no?


> 
> Btw. it is really odd to see code shared between batadv_interface_add_vid and 
> batadv_softif_create_vlan. Especially because batadv_interface_add_vid calls 
> batadv_softif_create_vlan.

This can probably be re-arranged a bit in the master branch.


Cheers,
  
Sven Eckelmann May 10, 2018, 5:02 p.m. UTC | #3
On Donnerstag, 10. Mai 2018 17:27:52 CEST Marek Lindner wrote:
> Amongst other things, the MAC address of
> the VLAN interface itself has to be stored.
>
> Without this change a VLAN and its infrastructure could be created
> while the interface MAC address is not stored without triggering
> any error, thus creating issues in other parts of the code.

Actually, the call to batadv_tt_local_add is using the address from the softif 
and not from the vlan interface. Just some remark because it let me think 
about the way the "-1" no-VLAN is added.

I would really like to know what exactly failed here. Does Leonardo's system 
really try to add the VLAN 0 before it is storing the info for the no-vlan 
(-1/BATADV_NO_FLAGS) and the mac address is "invalid" for VLAN 0? Or maybe 
both failed and he is just lucky that he got some non-permanent entries for 
this device on the non-vlan "vlan" - I am not sure about that.

On Donnerstag, 10. Mai 2018 18:24:50 CEST Antonio Quartulli wrote:
> A failure of this function will cause a failure of the
> ndo_vlan_rx_add_vid() call made by the kernel/netlink API which in turns
> will reach the userspace.
> 
> Ideally any userspace application that attempted to create the VLAN
> should be in charge of re-attempting the creation again (I believe).
> 
> About your point 4, this should not be possible, because in case of
> error the bat0.X interface won't be created by the kernel at all. no?

Yes, this seems to be true for batadv_interface_add_vid. Still an open 
question for the batadv_hard_if_event (which cannot return an error to the 
caller). Would be interesting to know whether Leonardo's node fails for both 
calls or only for one (see above).

Kind regards,
	Sven
  

Patch

diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index edeffcb9..f7c3a313 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -572,6 +572,7 @@  struct batadv_softif_vlan *batadv_softif_vlan_get(struct batadv_priv *bat_priv,
 int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid)
 {
 	struct batadv_softif_vlan *vlan;
+	bool client_added;
 	int err;
 
 	vlan = batadv_softif_vlan_get(bat_priv, vid);
@@ -604,13 +605,18 @@  int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid)
 	/* add a new TT local entry. This one will be marked with the NOPURGE
 	 * flag
 	 */
-	batadv_tt_local_add(bat_priv->soft_iface,
-			    bat_priv->soft_iface->dev_addr, vid,
-			    BATADV_NULL_IFINDEX, BATADV_NO_MARK);
+	client_added = batadv_tt_local_add(bat_priv->soft_iface,
+					   bat_priv->soft_iface->dev_addr, vid,
+					   BATADV_NULL_IFINDEX, BATADV_NO_MARK);
 
 	/* don't return reference to new softif_vlan */
 	batadv_softif_vlan_put(vlan);
 
+	if (!client_added) {
+		batadv_softif_destroy_vlan(bat_priv, vlan);
+		return -ENOENT;
+	}
+
 	return 0;
 }
 
@@ -648,6 +654,7 @@  static int batadv_interface_add_vid(struct net_device *dev, __be16 proto,
 {
 	struct batadv_priv *bat_priv = netdev_priv(dev);
 	struct batadv_softif_vlan *vlan;
+	bool client_added;
 	int ret;
 
 	/* only 802.1Q vlans are supported.
@@ -683,9 +690,14 @@  static int batadv_interface_add_vid(struct net_device *dev, __be16 proto,
 	 * flag. This must be added again, even if the vlan object already
 	 * exists, because the entry was deleted by kill_vid()
 	 */
-	batadv_tt_local_add(bat_priv->soft_iface,
-			    bat_priv->soft_iface->dev_addr, vid,
-			    BATADV_NULL_IFINDEX, BATADV_NO_MARK);
+	client_added = batadv_tt_local_add(bat_priv->soft_iface,
+					   bat_priv->soft_iface->dev_addr, vid,
+					   BATADV_NULL_IFINDEX, BATADV_NO_MARK);
+
+	if (!client_added) {
+		batadv_softif_destroy_vlan(bat_priv, vlan);
+		return -ENOENT;
+	}
 
 	return 0;
 }