[maint] batman-adv: fix TT sync flag inconsistencies

Message ID 20170623154826.7792-1-linus.luessing@c0d3.blue (mailing list archive)
State Superseded, archived
Delegated to: Simon Wunderlich
Headers

Commit Message

Linus Lüssing June 23, 2017, 3:48 p.m. UTC
  This patch fixes an issue in the translation table code potentially
leading to a TT Request + Response storm. The issue may occur for nodes
involving BLA and an inconsistent configuration of the batman-adv AP
isolation feature. However, since the new multicast optimizations, a
single, malformed packet may lead to a mesh-wide, persistent
Denial-of-Service, too.

The issue occurs because nodes are currently OR-ing the TT sync flags of
all originators announcing a specific MAC address via the
translation table. When an intermediate node now receives a TT Request
and wants to answer this on behave of the destination node then this
intermediate node now responds with an altered flag field and broken
CRC. The next OGM of the real destination will lead to a CRC mismatch
and triggering a TT Request and Response again.

Furthermore, the OR-ing is currently never undone as long as at least
one originator announcing the according MAC address remains, leading to
the potential persistency of this issue.

This patch fixes this issue by storing the flags used in the CRC
calculation on a a per TT orig entry basis to be able to respond with
the correct, original flags in an intermediate TT Response for one
thing. And to be able to correctly unset sync flags once all nodes
announcing a sync flag vanish for another.

Fixes: fa614fd04692 ("batman-adv: fix tt_global_entries flags update")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 net/batman-adv/translation-table.c | 62 ++++++++++++++++++++++++++++++--------
 net/batman-adv/types.h             |  2 ++
 2 files changed, 51 insertions(+), 13 deletions(-)
  

Comments

Linus Lüssing June 23, 2017, 3:56 p.m. UTC | #1
On Fri, Jun 23, 2017 at 05:48:26PM +0200, Linus Lüssing wrote:
> Fixes: fa614fd04692 ("batman-adv: fix tt_global_entries flags update")
> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>

Reported-by: Simon Wunderlich <sw@simonwunderlich.de>
  
Linus Lüssing June 26, 2017, 1:30 a.m. UTC | #2
On Fri, Jun 23, 2017 at 05:48:26PM +0200, Linus Lüssing wrote:
> [...]
> This patch fixes this issue by storing the flags used in the CRC
> calculation on a a per TT orig entry basis to be able to respond with
> the correct, original flags in an intermediate TT Response for one
> thing. And to be able to correctly unset sync flags once all nodes
> announcing a sync flag vanish for another.
> 
> Fixes: fa614fd04692 ("batman-adv: fix tt_global_entries flags update")
> [...]

By the way, I was able to reliably reproduce the issue within
network namespaces with following two scrips (with the latter
running the former):

https://metameute.de/~tux/batman-adv/setup-batman-netns.sh
https://metameute.de/~tux/batman-adv/test-batman-bla.sh

The scripts create a simple three node topology like:

A === B === C
      |     |
      -------
         |
         c

A, B, C are connected in a line topology, B and C share the same
BLA backbone. There is only one host, c, in the upper mesh layer.

The scripts turn on the extended isolation just on B for a few
seconds via the isolation mark / ebtables and then disables it
again.

So, in the end, the local translation table for B and C looks
sane again, with no wifi or isolation flag. However A continues
sending TT Requests while receiving invalid TT Responses for C.

(I used "batctl tracedump" to check, as "batctl log" unfortunately
does not work with network namespaces yet)


After applying this patch, the endless TT Requests/Replies do
not appear for me anymore. And the global translation table on
all three nodes looks fine again, too (that is the isolation flag
is gone again as it is supposed to while it wrongfully persisted
for A without this patch).

Regards, Linus
  
Marek Lindner July 1, 2017, 3:34 p.m. UTC | #3
On Friday, June 23, 2017 5:48:26 PM HKT Linus Lüssing wrote:
> @@ -1946,6 +1977,7 @@ batadv_tt_global_dump_subentry(struct sk_buff *msg,
> u32 portid, u32 seq, struct batadv_tt_orig_list_entry *orig, bool best)
>  {
> +       u16 flags = (common->flags & (~BATADV_TT_SYNC_MASK)) | orig->flags;

Why do we need to output the combined global (partially masked) and the flags 
propagated by originator ? Shouldn't writing orig->flags be what we want ?

Cheers,
Marek
  
Linus Lüssing July 1, 2017, 4:47 p.m. UTC | #4
Hi Marek,

Thanks for having a first glance at this patch!

On Sat, Jul 01, 2017 at 11:34:37PM +0800, Marek Lindner wrote:
> On Friday, June 23, 2017 5:48:26 PM HKT Linus Lüssing wrote:
> > @@ -1946,6 +1977,7 @@ batadv_tt_global_dump_subentry(struct sk_buff *msg,
> > u32 portid, u32 seq, struct batadv_tt_orig_list_entry *orig, bool best)
> >  {
> > +       u16 flags = (common->flags & (~BATADV_TT_SYNC_MASK)) | orig->flags;
> 
> Why do we need to output the combined global (partially masked) and the flags 
> propagated by originator ? Shouldn't writing orig->flags be what we want ?

I thought it'd be helpful to be able to look at the sync flags
on a per originator basis. So that users can more easily spot nodes with
broken flags and can take action sooner.

Next to Simon's report we have had two more, independent reporters
stumbling over the same issue later. This would help people to more
easily identify which nodes and maybe what kind of nodes are usually
sending broken packets. Parsing the "batctl log" for that is also
possible for this, but not as easy as looking at "batctl tg". And
"batctl log" isn't available in a batman-adv default build anyway.

Does that make sense?

Regards, Linus
  
Linus Lüssing July 1, 2017, 5:34 p.m. UTC | #5
On Sat, Jul 01, 2017 at 11:34:37PM +0800, Marek Lindner wrote:
> On Friday, June 23, 2017 5:48:26 PM HKT Linus Lüssing wrote:
> > @@ -1946,6 +1977,7 @@ batadv_tt_global_dump_subentry(struct sk_buff *msg,
> > u32 portid, u32 seq, struct batadv_tt_orig_list_entry *orig, bool best)
> >  {
> > +       u16 flags = (common->flags & (~BATADV_TT_SYNC_MASK)) | orig->flags;
> 
> Why do we need to output the combined global (partially masked) and the flags 
> propagated by originator ? Shouldn't writing orig->flags be what we want ?

With this patch orig->flags only stores the TT SYNC bits.
Everything other flag in there is always 0. So printing just orig->flags
would only display the isolation and wireless flag and would omit
the roaming and temporary flag.

Compare with:

@@ -1723,7 +1753,8 @@ static bool batadv_tt_global_add(struct batadv_priv *bat_priv,
        }
 add_orig_entry:
        /* add the new orig_entry (if needed) or update it */
-       batadv_tt_global_orig_entry_add(tt_global_entry, orig_node, ttvn);
+       batadv_tt_global_orig_entry_add(tt_global_entry, orig_node, ttvn,
+                                       flags & BATADV_TT_SYNC_MASK);

---

Regards, Linus
  
Marek Lindner July 3, 2017, 3:58 a.m. UTC | #6
On Saturday, July 1, 2017 6:47:11 PM HKT Linus Lüssing wrote:
> On Sat, Jul 01, 2017 at 11:34:37PM +0800, Marek Lindner wrote:
> > On Friday, June 23, 2017 5:48:26 PM HKT Linus Lüssing wrote:
> > > @@ -1946,6 +1977,7 @@ batadv_tt_global_dump_subentry(struct sk_buff
> > > *msg,
> > > u32 portid, u32 seq, struct batadv_tt_orig_list_entry *orig, bool best)
> > > {
> > > +       u16 flags = (common->flags & (~BATADV_TT_SYNC_MASK)) |
> > > orig->flags;
> > 
> > Why do we need to output the combined global (partially masked) and the
> > flags  propagated by originator ? Shouldn't writing orig->flags be what
> > we want ?
> I thought it'd be helpful to be able to look at the sync flags
> on a per originator basis. So that users can more easily spot nodes with
> broken flags and can take action sooner.

Yeah, makes sense. Needed some help from Antonio to not get lost in the TT 
flags jungle.

Cheers,
Marek
  
Antonio Quartulli July 3, 2017, 4:08 a.m. UTC | #7
On Sat, Jul 01, 2017 at 07:34:56PM +0200, Linus Lüssing wrote:
> On Sat, Jul 01, 2017 at 11:34:37PM +0800, Marek Lindner wrote:
> > On Friday, June 23, 2017 5:48:26 PM HKT Linus Lüssing wrote:
> > > @@ -1946,6 +1977,7 @@ batadv_tt_global_dump_subentry(struct sk_buff *msg,
> > > u32 portid, u32 seq, struct batadv_tt_orig_list_entry *orig, bool best)
> > >  {
> > > +       u16 flags = (common->flags & (~BATADV_TT_SYNC_MASK)) | orig->flags;
> > 
> > Why do we need to output the combined global (partially masked) and the flags 
> > propagated by originator ? Shouldn't writing orig->flags be what we want ?
> 
> With this patch orig->flags only stores the TT SYNC bits.
> Everything other flag in there is always 0. So printing just orig->flags
> would only display the isolation and wireless flag and would omit
> the roaming and temporary flag.
> 
> Compare with:
> 
> @@ -1723,7 +1753,8 @@ static bool batadv_tt_global_add(struct batadv_priv *bat_priv,
>         }
>  add_orig_entry:
>         /* add the new orig_entry (if needed) or update it */
> -       batadv_tt_global_orig_entry_add(tt_global_entry, orig_node, ttvn);
> +       batadv_tt_global_orig_entry_add(tt_global_entry, orig_node, ttvn,
> +                                       flags & BATADV_TT_SYNC_MASK);

Linus,

each originator announces flags covered by BATADV_TT_REMOTE_MASK (0x00FF),
however you are extracting only those covered by BATADV_TT_SYNC_MASK (0x00F0).

Am I wrong or this is preventing the other 4 REMOTE flags (0x000F) to be set in
tt_global_entry->common->flags (because you always filter them out when updating
the entry)?


Cheers,
  
Linus Lüssing July 5, 2017, 5:59 a.m. UTC | #8
Hi Antonio,

Thanks for looking at it, too :-).

On Mon, Jul 03, 2017 at 12:08:01PM +0800, Antonio Quartulli wrote:
> [...]
> 
> Linus,
> 
> each originator announces flags covered by BATADV_TT_REMOTE_MASK (0x00FF),
> however you are extracting only those covered by BATADV_TT_SYNC_MASK (0x00F0).
> 
> Am I wrong or this is preventing the other 4 REMOTE flags (0x000F) to be set in
> tt_global_entry->common->flags (because you always filter them out when updating
> the entry)?

Hm, as far as I can tell, the ROAM flag is still set here:
http://elixir.free-electrons.com/linux/latest/source/net/batman-adv/translation-table.c#L787

And unset here:
http://elixir.free-electrons.com/linux/latest/source/net/batman-adv/translation-table.c#L1720

And the TEMP flag is unset here:
http://elixir.free-electrons.com/linux/latest/source/net/batman-adv/translation-table.c#L1702

And is set... hm, ok, was the TEMP flag set by the now deleted
"common->flags |= flags;"?

But then I'm confused, how would the TEMP flag have worked before
patch fa614fd04692? Or was that patch not only supposed to fix the
WIFI but also the TEMP flag?

If so, what would you prefer, should I replace the
"common->flags |= flags" with something like a
"common->flags |= flags & (~BATADV_TT_SYNC_MASK)"? Or would you
prefer setting the TEMP flag somewhere else explicitly, similarly
like we seemingly handle the ROAM flag explicitly, too?

Regards, Linus
  
Linus Lüssing July 6, 2017, 5:05 a.m. UTC | #9
On Wed, Jul 05, 2017 at 07:59:40AM +0200, Linus Lüssing wrote:
> If so, what would you prefer, should I replace the
> "common->flags |= flags" with something like a
> "common->flags |= flags & (~BATADV_TT_SYNC_MASK)"? Or would you
> prefer setting the TEMP flag somewhere else explicitly, similarly
> like we seemingly handle the ROAM flag explicitly, too?

Went with the former approach to keep the change less invasive.

Regards, Linus
  

Patch

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index e1133bc..ef18b5b 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -1549,9 +1549,41 @@  batadv_tt_global_entry_has_orig(const struct batadv_tt_global_entry *entry,
 	return found;
 }
 
+/**
+ * batadv_tt_global_sync_flags - update TT sync flags
+ * @tt_global: the TT global entry to update sync flags in
+ *
+ * Updates the sync flag bits in the tt_global flag attribute with a logical
+ * OR of all sync flags from any of its TT orig entries.
+ */
+static void
+batadv_tt_global_sync_flags(struct batadv_tt_global_entry *tt_global)
+{
+	struct batadv_tt_orig_list_entry *orig_entry;
+	const struct hlist_head *head;
+	u16 flags = BATADV_NO_FLAGS;
+
+	rcu_read_lock();
+	head = &tt_global->orig_list;
+	hlist_for_each_entry_rcu(orig_entry, head, list)
+		flags |= orig_entry->flags;
+	rcu_read_unlock();
+
+	flags |= tt_global->common.flags & (~BATADV_TT_SYNC_MASK);
+	tt_global->common.flags = flags;
+}
+
+/**
+ * batadv_tt_global_orig_entry_add - add or update a TT orig entry
+ * @tt_global: the TT global entry to add an orig entry in
+ * @orig_node: the originator to add an orig entry for
+ * @ttvn: translation table version number of this changeset
+ * @flags: TT sync flags
+ */
 static void
 batadv_tt_global_orig_entry_add(struct batadv_tt_global_entry *tt_global,
-				struct batadv_orig_node *orig_node, int ttvn)
+				struct batadv_orig_node *orig_node, int ttvn,
+				u8 flags)
 {
 	struct batadv_tt_orig_list_entry *orig_entry;
 
@@ -1561,7 +1593,8 @@  batadv_tt_global_orig_entry_add(struct batadv_tt_global_entry *tt_global,
 		 * was added during a "temporary client detection"
 		 */
 		orig_entry->ttvn = ttvn;
-		goto out;
+		orig_entry->flags = flags;
+		goto sync_flags;
 	}
 
 	orig_entry = kmem_cache_zalloc(batadv_tt_orig_cache, GFP_ATOMIC);
@@ -1573,6 +1606,7 @@  batadv_tt_global_orig_entry_add(struct batadv_tt_global_entry *tt_global,
 	batadv_tt_global_size_inc(orig_node, tt_global->common.vid);
 	orig_entry->orig_node = orig_node;
 	orig_entry->ttvn = ttvn;
+	orig_entry->flags = flags;
 	kref_init(&orig_entry->refcount);
 
 	spin_lock_bh(&tt_global->list_lock);
@@ -1582,6 +1616,8 @@  batadv_tt_global_orig_entry_add(struct batadv_tt_global_entry *tt_global,
 	spin_unlock_bh(&tt_global->list_lock);
 	atomic_inc(&tt_global->orig_list_count);
 
+sync_flags:
+	batadv_tt_global_sync_flags(tt_global);
 out:
 	if (orig_entry)
 		batadv_tt_orig_list_entry_put(orig_entry);
@@ -1702,12 +1738,6 @@  static bool batadv_tt_global_add(struct batadv_priv *bat_priv,
 			common->flags &= ~BATADV_TT_CLIENT_TEMP;
 		}
 
-		/* the change can carry possible "attribute" flags like the
-		 * TT_CLIENT_WIFI, therefore they have to be copied in the
-		 * client entry
-		 */
-		common->flags |= flags;
-
 		/* If there is the BATADV_TT_CLIENT_ROAM flag set, there is only
 		 * one originator left in the list and we previously received a
 		 * delete + roaming change for this originator.
@@ -1723,7 +1753,8 @@  static bool batadv_tt_global_add(struct batadv_priv *bat_priv,
 	}
 add_orig_entry:
 	/* add the new orig_entry (if needed) or update it */
-	batadv_tt_global_orig_entry_add(tt_global_entry, orig_node, ttvn);
+	batadv_tt_global_orig_entry_add(tt_global_entry, orig_node, ttvn,
+					flags & BATADV_TT_SYNC_MASK);
 
 	batadv_dbg(BATADV_DBG_TT, bat_priv,
 		   "Creating new global tt entry: %pM (vid: %d, via %pM)\n",
@@ -1946,6 +1977,7 @@  batadv_tt_global_dump_subentry(struct sk_buff *msg, u32 portid, u32 seq,
 			       struct batadv_tt_orig_list_entry *orig,
 			       bool best)
 {
+	u16 flags = (common->flags & (~BATADV_TT_SYNC_MASK)) | orig->flags;
 	void *hdr;
 	struct batadv_orig_node_vlan *vlan;
 	u8 last_ttvn;
@@ -1975,7 +2007,7 @@  batadv_tt_global_dump_subentry(struct sk_buff *msg, u32 portid, u32 seq,
 	    nla_put_u8(msg, BATADV_ATTR_TT_LAST_TTVN, last_ttvn) ||
 	    nla_put_u32(msg, BATADV_ATTR_TT_CRC32, crc) ||
 	    nla_put_u16(msg, BATADV_ATTR_TT_VID, common->vid) ||
-	    nla_put_u32(msg, BATADV_ATTR_TT_FLAGS, common->flags))
+	    nla_put_u32(msg, BATADV_ATTR_TT_FLAGS, flags))
 		goto nla_put_failure;
 
 	if (best && nla_put_flag(msg, BATADV_ATTR_FLAG_BEST))
@@ -2589,6 +2621,7 @@  static u32 batadv_tt_global_crc(struct batadv_priv *bat_priv,
 				unsigned short vid)
 {
 	struct batadv_hashtable *hash = bat_priv->tt.global_hash;
+	struct batadv_tt_orig_list_entry *tt_orig;
 	struct batadv_tt_common_entry *tt_common;
 	struct batadv_tt_global_entry *tt_global;
 	struct hlist_head *head;
@@ -2627,8 +2660,9 @@  static u32 batadv_tt_global_crc(struct batadv_priv *bat_priv,
 			/* find out if this global entry is announced by this
 			 * originator
 			 */
-			if (!batadv_tt_global_entry_has_orig(tt_global,
-							     orig_node))
+			tt_orig = batadv_tt_global_orig_entry_find(tt_global,
+								   orig_node);
+			if (!tt_orig)
 				continue;
 
 			/* use network order to read the VID: this ensures that
@@ -2640,10 +2674,12 @@  static u32 batadv_tt_global_crc(struct batadv_priv *bat_priv,
 			/* compute the CRC on flags that have to be kept in sync
 			 * among nodes
 			 */
-			flags = tt_common->flags & BATADV_TT_SYNC_MASK;
+			flags = tt_orig->flags;
 			crc_tmp = crc32c(crc_tmp, &flags, sizeof(flags));
 
 			crc ^= crc32c(crc_tmp, tt_common->addr, ETH_ALEN);
+
+			batadv_tt_orig_list_entry_put(tt_orig);
 		}
 		rcu_read_unlock();
 	}
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index ea43a64..a627958 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -1260,6 +1260,7 @@  struct batadv_tt_global_entry {
  * struct batadv_tt_orig_list_entry - orig node announcing a non-mesh client
  * @orig_node: pointer to orig node announcing this non-mesh client
  * @ttvn: translation table version number which added the non-mesh client
+ * @flags: per orig entry TT sync flags
  * @list: list node for batadv_tt_global_entry::orig_list
  * @refcount: number of contexts the object is used
  * @rcu: struct used for freeing in an RCU-safe manner
@@ -1267,6 +1268,7 @@  struct batadv_tt_global_entry {
 struct batadv_tt_orig_list_entry {
 	struct batadv_orig_node *orig_node;
 	u8 ttvn;
+	u8 flags;
 	struct hlist_node list;
 	struct kref refcount;
 	struct rcu_head rcu;