[RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs

Message ID 20180509160605.23223-1-mareklindner@neomailbox.ch (mailing list archive)
State RFC, archived
Delegated to: Simon Wunderlich
Headers
Series [RFC] batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs |

Commit Message

Marek Lindner May 9, 2018, 4:06 p.m. UTC
  A translation table TVLV changset sent with an OGM consists
of a number of headers (one per VLAN) plus the changeset
itself (addition and/or deletion of entries).

The per-VLAN headers are used by OGM recipients for consistency
checks. Said consistency check might determine that a full
translation table request is needed to restore consistency. If
the TT sender adds per-VLAN headers of empty VLANs into the OGM,
recipients are led to believe to have reached an inconsistent
state and thus request a full table update. The full table does
not contain empty VLANs (due to missing entries) the cycle
restarts when the next OGM is issued.

Consequently, when the translation table TVLV headers are
composed, empty VLANs are to be excluded.

Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific")

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
---
 net/batman-adv/translation-table.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)
  

Comments

Sven Eckelmann May 9, 2018, 5:21 p.m. UTC | #1
On Mittwoch, 9. Mai 2018 18:06:05 CEST Marek Lindner wrote:
[...]
> Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific")

I know, my fault but please use 21a57f6e7a3b instead of 7ea7b4a14275. The 
latter is in the official Linux repository and not batman-adv.git.

[...]
> diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
> index 0225616d..def8d109 100644
> --- a/net/batman-adv/translation-table.c
> +++ b/net/batman-adv/translation-table.c
> @@ -855,7 +855,8 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,

What about batadv_tt_prepare_tvlv_local_data?

Kind regards,
	Sven
  
Sven Eckelmann May 9, 2018, 6:20 p.m. UTC | #2
On Mittwoch, 9. Mai 2018 19:21:32 CEST Sven Eckelmann wrote:
[...]
> [...]
> > diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
> > index 0225616d..def8d109 100644
> > --- a/net/batman-adv/translation-table.c
> > +++ b/net/batman-adv/translation-table.c
> > @@ -855,7 +855,8 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
> 
> What about batadv_tt_prepare_tvlv_local_data?

Let me rephrase that: Are we sure that this is from intermediate nodes or is 
there still the possibility that it is from the actual originator? Regarding 
the latter: Should batadv_tt_prepare_tvlv_local_data be modified or will you 
wait for Leonardo to debug why it happens in the first place?

Kind regards,
	Sven
  
Sven Eckelmann May 9, 2018, 8:38 p.m. UTC | #3
On Mittwoch, 9. Mai 2018 20:20:58 CEST Sven Eckelmann wrote:
> On Mittwoch, 9. Mai 2018 19:21:32 CEST Sven Eckelmann wrote:
> [...]
> > [...]
> > > diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
> > > index 0225616d..def8d109 100644
> > > --- a/net/batman-adv/translation-table.c
> > > +++ b/net/batman-adv/translation-table.c
> > > @@ -855,7 +855,8 @@ batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
> > 
> > What about batadv_tt_prepare_tvlv_local_data?
> 
> Let me rephrase that: Are we sure that this is from intermediate nodes or is 
> there still the possibility that it is from the actual originator? Regarding 
> the latter: Should batadv_tt_prepare_tvlv_local_data be modified or will you 
> wait for Leonardo to debug why it happens in the first place?

Wait. When it is a problem of an intermediate node that doesn't have (for 
whatever reason) not a valid copy of the entries, shouldn't the intermediate
node just ignore the request and not answer it? Otherwise the next OGM with 
the data from batadv_tt_prepare_tvlv_local_data from the actual node will 
just trigger the next request by the receiver.

And the next thing is that your commit message is talking about OGMs all the 
time. But the OGM TVLV are not filled by the 
batadv_tt_prepare_tvlv_global_data but by the 
batadv_tt_prepare_tvlv_local_data. I am unable to find out what you actually
doing here and why you do it.

Kind regards,
	Sven
  
lemoer May 9, 2018, 9:34 p.m. UTC | #4
On 05/09/2018 10:38 PM, Sven Eckelmann wrote:
> Wait. When it is a problem of an intermediate node that doesn't have (for 
> whatever reason) not a valid copy of the entries, shouldn't the intermediate
> node just ignore the request and not answer it?

Yes. It's forwarded here

https://elixir.bootlin.com/linux/v4.16.8/source/net/batman-adv/translation-table.c#L4245

if the check here

https://elixir.bootlin.com/linux/v4.16.8/source/net/batman-adv/translation-table.c#L3199

fails. I also wrote a debug patch which adds counters which
differentiate between crc misses, ttvn misses and other misses. I can
also prepare them, when it is desired.

> And the next thing is that your commit message is talking about OGMs all the 
> time. But the OGM TVLV are not filled by the 
> batadv_tt_prepare_tvlv_global_data but by the 
> batadv_tt_prepare_tvlv_local_data. I am unable to find out what you actually
> doing here and why you do it.

Yes. IIRC the checksums in the OGMs are actually causing the issue. The
checksums in the TT responses are not evaluated for now.

Kind regards,
Leonardo Mörlein
  
Antonio Quartulli May 10, 2018, 9:12 a.m. UTC | #5
On 10/05/18 04:38, Sven Eckelmann wrote:
>> Let me rephrase that: Are we sure that this is from intermediate nodes or is 
>> there still the possibility that it is from the actual originator? Regarding 
>> the latter: Should batadv_tt_prepare_tvlv_local_data be modified or will you 
>> wait for Leonardo to debug why it happens in the first place?
> 
> Wait. When it is a problem of an intermediate node that doesn't have (for 
> whatever reason) not a valid copy of the entries, shouldn't the intermediate
> node just ignore the request and not answer it? Otherwise the next OGM with 
> the data from batadv_tt_prepare_tvlv_local_data from the actual node will 
> just trigger the next request by the receiver.
> 
> And the next thing is that your commit message is talking about OGMs all the 
> time. But the OGM TVLV are not filled by the 
> batadv_tt_prepare_tvlv_global_data but by the 
> batadv_tt_prepare_tvlv_local_data. I am unable to find out what you actually
> doing here and why you do it.

I believe batadv_tt_prepare_tvlv_local_data() was the actual function
that Marek wanted to modify. But we probably spent too much time digging
into translation-table.c these days and I managed to confuse him enough.


Thanks for pointing this out, Sven!


Cheers,
  
Marek Lindner May 10, 2018, 10:28 a.m. UTC | #6
On Thursday, May 10, 2018 2:20:58 AM HKT Sven Eckelmann wrote:
> On Mittwoch, 9. Mai 2018 19:21:32 CEST Sven Eckelmann wrote:
> [...]
> 
> > [...]
> > 
> > > diff --git a/net/batman-adv/translation-table.c
> > > b/net/batman-adv/translation-table.c index 0225616d..def8d109 100644
> > > --- a/net/batman-adv/translation-table.c
> > > +++ b/net/batman-adv/translation-table.c
> > > @@ -855,7 +855,8 @@ batadv_tt_prepare_tvlv_global_data(struct
> > > batadv_orig_node *orig_node,> 
> > What about batadv_tt_prepare_tvlv_local_data?
> 
> Let me rephrase that: Are we sure that this is from intermediate nodes or is
> there still the possibility that it is from the actual originator?
> Regarding the latter: Should batadv_tt_prepare_tvlv_local_data be modified
> or will you wait for Leonardo to debug why it happens in the first place?

We believe to have an idea why this happens in the first place and I have 
another patch in the works which Leonardo should test in his setup.

Cheers,
Marek
  

Patch

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 0225616d..def8d109 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -855,7 +855,8 @@  batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
 				   s32 *tt_len)
 {
 	u16 num_vlan = 0;
-	u16 num_entries = 0;
+	u16 vlan_entries = 0;
+	u16 total_entries = 0;
 	u16 change_offset;
 	u16 tvlv_len;
 	struct batadv_tvlv_tt_vlan_data *tt_vlan;
@@ -864,8 +865,12 @@  batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
 
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) {
+		vlan_entries = atomic_read(&vlan->tt.num_entries);
+		if (vlan_entries < 1)
+			continue;
+
 		num_vlan++;
-		num_entries += atomic_read(&vlan->tt.num_entries);
+		total_entries += vlan_entries;
 	}
 
 	change_offset = sizeof(**tt_data);
@@ -873,7 +878,7 @@  batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
 
 	/* if tt_len is negative, allocate the space needed by the full table */
 	if (*tt_len < 0)
-		*tt_len = batadv_tt_len(num_entries);
+		*tt_len = batadv_tt_len(total_entries);
 
 	tvlv_len = *tt_len;
 	tvlv_len += change_offset;
@@ -890,6 +895,10 @@  batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node,
 
 	tt_vlan = (struct batadv_tvlv_tt_vlan_data *)(*tt_data + 1);
 	hlist_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) {
+		vlan_entries = atomic_read(&vlan->tt.num_entries);
+		if (vlan_entries < 1)
+			continue;
+
 		tt_vlan->vid = htons(vlan->vid);
 		tt_vlan->crc = htonl(vlan->tt.crc);