batman-adv: handle race condition for claims also in batadv_bla_rx

Message ID 20170428202610.27022-1-sw@simonwunderlich.de (mailing list archive)
State Rejected, archived
Headers

Commit Message

Simon Wunderlich April 28, 2017, 8:26 p.m. UTC
  From: Andreas Pape <apape@phoenixcontact.com>

Like in the case of the patch for batadv_bla_tx to handle a race
condition when claiming a mac address for bla, a similar situation
can occur when claiming is triggered via batadv_bla_rx. This patch
solves this with a similar approach as for batadv_bla_tx.

Signed-off-by: Andreas Pape <apape@phoenixcontact.com>
---
 net/batman-adv/bridge_loop_avoidance.c |   31 ++++++++++++++++++++-----------
 net/batman-adv/translation-table.c     |   26 ++++++++++++++++++++++++++
 net/batman-adv/translation-table.h     |    3 +++
 3 files changed, 49 insertions(+), 11 deletions(-)
  

Comments

Simon Wunderlich May 9, 2017, 3:50 p.m. UTC | #1
On Friday, April 28, 2017 10:26:10 PM CEST Simon Wunderlich wrote:
> From: Andreas Pape <apape@phoenixcontact.com>
> 
> Like in the case of the patch for batadv_bla_tx to handle a race
> condition when claiming a mac address for bla, a similar situation
> can occur when claiming is triggered via batadv_bla_rx. This patch
> solves this with a similar approach as for batadv_bla_tx.
> 
> Signed-off-by: Andreas Pape <apape@phoenixcontact.com>

Hi Andreas,

thanks again for the patch - in general, I think this looks good, although I 
don't follow completely where you saw that. Can you describe the scenario a 
little more?

We usually don't process packets from the mesh sent by nodes on the same LAN 
segment - we look at the originator and check the BLA group using 
batadv_check_claim_group().

There are two things which we could improve documentation-wise:

1.) Have some kernel doc  batadv_tt_local_has_timed_out - we want to have 
kerneldoc for every new function we add.

2.) Describe the scenario in a comment in batadv_bla_rx(). I find the comment 
not too convincing, see above.

Again, if this situation is really happening then I believe this patch 
provides a good solution which I'd like to adopt. :)

Thank you,
     Simon
  
Andreas Pape May 10, 2017, 5:45 a.m. UTC | #2
Hi Simon,

> Von: Simon Wunderlich <sw@simonwunderlich.de>
> An: Andreas Pape <apape@phoenixcontact.com>
> Kopie: b.a.t.m.a.n@lists.open-mesh.org
> Datum: 09.05.2017 17:50
> Betreff: Re: [B.A.T.M.A.N.] [PATCH] batman-adv: handle race
> condition for claims also in batadv_bla_rx
>
> On Friday, April 28, 2017 10:26:10 PM CEST Simon Wunderlich wrote:
> > From: Andreas Pape <apape@phoenixcontact.com>
> >
> > Like in the case of the patch for batadv_bla_tx to handle a race
> > condition when claiming a mac address for bla, a similar situation
> > can occur when claiming is triggered via batadv_bla_rx. This patch
> > solves this with a similar approach as for batadv_bla_tx.
> >
> > Signed-off-by: Andreas Pape <apape@phoenixcontact.com>
>
> Hi Andreas,
>
> thanks again for the patch - in general, I think this looks good,
although I
> don't follow completely where you saw that. Can you describe the
scenario a
> little more?
>
> We usually don't process packets from the mesh sent by nodes on the same
LAN
> segment - we look at the originator and check the BLA group using
> batadv_check_claim_group().
>
> There are two things which we could improve documentation-wise:
>
> 1.) Have some kernel doc  batadv_tt_local_has_timed_out - we want to
have
> kerneldoc for every new function we add.
>

Sorry, I will add that if this patch really turns out to be useful (see
2.).

> 2.) Describe the scenario in a comment in batadv_bla_rx(). I find the
comment
> not too convincing, see above.
>

As in my earlier patches I use a setup which needs bla. Up to recently I
experimented with
batman-adv-2014.4 and I struggled a lot with looping packets. At least for
version 2014.4
I found out that the patches I mailed last year where not sufficient to
prevent looping
packets between the mesh and the common Ethernet backbone completely under
all conditions. By enabling the debugging
I found that the looping packets always correlated with the claiming of
devices. I got rid of the
looping packets (error message from the kernel "received packet with own
mac address as source address")almost
completely after adding this additional patch. I now only get the kernel
error message about
receiving packets with own mac address sometimes when a mesh node is added
to the network for very few packets
which is ok for my application. But without this patch in 2014.4 I got
these messages randomly during notmal operation of the network.

I have to admit that I did not retest this with the current master or
version 2017.0.1. I simply
integrated the patch and I can at least confirm that bla works as reliable
as in the 2014.4 case with
this patch. I agree that this is no proof that this patch is still really
needed. I think I'll remove it
from my test setup and come back with my results.

Thanks for the feedback and best regards,
        Andreas



..................................................................
PHOENIX CONTACT ELECTRONICS GmbH

Sitz der Gesellschaft / registered office of the company: 31812 Bad Pyrmont
USt-Id-Nr.: DE811742156
Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528
Geschäftsführer / Executive Board: Ulrich Leidecker, Christoph Leifer
  
Sven Eckelmann June 9, 2019, 11:28 a.m. UTC | #3
On Wednesday, 10 May 2017 07:45:50 CEST Andreas Pape wrote:
[...]
> I have to admit that I did not retest this with the current master or
> version 2017.0.1. I simply
> integrated the patch and I can at least confirm that bla works as reliable
> as in the 2014.4 case with
> this patch. I agree that this is no proof that this patch is still really
> needed. I think I'll remove it
> from my test setup and come back with my results.

What is the state of the tests?

Kind regards,
	Sven
  
Andreas Pape July 2, 2019, 8:39 a.m. UTC | #4
Hi Sven,

sorry for my late reply, but I finally found some time for testing. I used 
batman-adv version 2017.2
without this patch and I do not see any negative effects on the way bla 
works in my testsetup.
Therefore this patch doesn't seem to make sense anymore.

Sven Eckelmann <sven@narfation.org> schrieb am 09.06.2019 13:28:24:

> Von: Sven Eckelmann <sven@narfation.org>
> An: b.a.t.m.a.n@lists.open-mesh.org
> Kopie: andreas pape <apape@phoenixcontact.com>, simon wunderlich 
> <sw@simonwunderlich.de>
> Datum: 09.06.2019 13:28
> Betreff: Re: [B.A.T.M.A.N.] Antwort: Re: [PATCH] batman-adv: handle 
> race condition for claims also in batadv_bla_rx
> 
> On Wednesday, 10 May 2017 07:45:50 CEST Andreas Pape wrote:
> [...]
> > I have to admit that I did not retest this with the current master or
> > version 2017.0.1. I simply
> > integrated the patch and I can at least confirm that bla works as 
reliable
> > as in the 2014.4 case with
> > this patch. I agree that this is no proof that this patch is still 
really
> > needed. I think I'll remove it
> > from my test setup and come back with my results.
> 
> What is the state of the tests?
> 
> Kind regards,
>    Sven[Anhang "signature.asc" gelöscht von Andreas Pape/Pyr/DE/
> Phoenix Contact] 

Kind regards,
Andreas



..................................................................
PHOENIX CONTACT ELECTRONICS GmbH

Sitz der Gesellschaft / registered office of the company: 31812 Bad Pyrmont
USt-Id-Nr.: DE811742156
Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528
Geschäftsführer / Executive Board: Ulrich Leidecker, Christoph Leifer
  
Sven Eckelmann July 2, 2019, 8:41 a.m. UTC | #5
On Tuesday, 2 July 2019 10:39:04 CEST Andreas Pape wrote:
> Hi Sven,
> 
> sorry for my late reply, but I finally found some time for testing. I used 
> batman-adv version 2017.2
> without this patch and I do not see any negative effects on the way bla 
> works in my testsetup.
> Therefore this patch doesn't seem to make sense anymore.

Awesome. Thanks that you took the time to test this :)

Kind regards,
	Sven
  

Patch

diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c
index d07e89e..cab8980 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1847,19 +1847,28 @@  bool batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb,
 
 	if (!claim) {
 		/* possible optimization: race for a claim */
-		/* No claim exists yet, claim it for us!
+		/* Make sure this packet is not looping back
+		 * from our own backbone.
 		 */
 
-		batadv_dbg(BATADV_DBG_BLA, bat_priv,
-			   "bla_rx(): Unclaimed MAC %pM found. Claim it. Local: %s\n",
-			   ethhdr->h_source,
-			   batadv_is_my_client(bat_priv,
-					       ethhdr->h_source, vid) ?
-			   "yes" : "no");
-		batadv_handle_claim(bat_priv, primary_if,
-				    primary_if->net_dev->dev_addr,
-				    ethhdr->h_source, vid);
-		goto allow;
+		if (batadv_tt_local_has_timed_out(bat_priv, ethhdr->h_source,
+						  vid, 100)) {
+			/* No claim exists yet, claim it for us!
+			 */
+			batadv_dbg(BATADV_DBG_BLA, bat_priv,
+				   "bla_rx(): Unclaimed MAC %pM found. Claim it. Local: %s\n",
+				   ethhdr->h_source,
+				   batadv_is_my_client(bat_priv,
+						       ethhdr->h_source, vid) ?
+				   "yes" : "no");
+
+			batadv_handle_claim(bat_priv, primary_if,
+					    primary_if->net_dev->dev_addr,
+					    ethhdr->h_source, vid);
+			goto allow;
+		} else {
+			goto handled;
+		}
 	}
 
 	/* if it is our own claim ... */
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index e75b493..b908195 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -4380,3 +4380,29 @@  void batadv_tt_cache_destroy(void)
 	kmem_cache_destroy(batadv_tt_req_cache);
 	kmem_cache_destroy(batadv_tt_roam_cache);
 }
+
+bool batadv_tt_local_has_timed_out(struct batadv_priv *bat_priv,
+				   const u8 *addr, unsigned short vid,
+				   unsigned int timeout)
+{
+	struct batadv_tt_local_entry *tt_local_entry;
+	bool ret = true;
+
+	tt_local_entry = batadv_tt_local_hash_find(bat_priv, addr, vid);
+	if (!tt_local_entry)
+		goto out;
+	/* Check if the client has been logically deleted (but is kept for
+	 * consistency purpose)
+	 */
+	if ((tt_local_entry->common.flags & BATADV_TT_CLIENT_PENDING) ||
+	    (tt_local_entry->common.flags & BATADV_TT_CLIENT_ROAM))
+		goto out;
+	/* Check that the tt_local_entry has a certain age */
+	if (!batadv_has_timed_out(tt_local_entry->last_seen, timeout))
+		ret = false;
+
+out:
+	if (tt_local_entry)
+		batadv_tt_local_entry_put(tt_local_entry);
+	return ret;
+}
diff --git a/net/batman-adv/translation-table.h b/net/batman-adv/translation-table.h
index 411d586..b05d0d8 100644
--- a/net/batman-adv/translation-table.h
+++ b/net/batman-adv/translation-table.h
@@ -65,5 +65,8 @@  bool batadv_tt_global_is_isolated(struct batadv_priv *bat_priv,
 
 int batadv_tt_cache_init(void);
 void batadv_tt_cache_destroy(void);
+bool batadv_tt_local_has_timed_out(struct batadv_priv *bat_priv,
+				   const u8 *addr, unsigned short vid,
+				   unsigned int timeout);
 
 #endif /* _NET_BATMAN_ADV_TRANSLATION_TABLE_H_ */