Message ID | 20180506195559.32602-1-me@irrelefant.net (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Antonio Quartulli |
Headers |
Return-Path: <b.a.t.m.a.n-bounces@lists.open-mesh.org> X-Original-To: patchwork@open-mesh.org Delivered-To: patchwork@open-mesh.org Received: from open-mesh.org (localhost [IPv6:::1]) by open-mesh.org (Postfix) with ESMTP id B9CC48145A; Sun, 6 May 2018 22:32:09 +0200 (CEST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=80.67.31.31; helo=smtprelay04.ispgateway.de; envelope-from=me@irrelefant.net; receiver=<UNKNOWN> X-Greylist: delayed 770 seconds by postgrey-1.36 at open-mesh.org; Sun, 06 May 2018 22:21:52 CEST Received: from smtprelay04.ispgateway.de (smtprelay04.ispgateway.de [80.67.31.31]) by open-mesh.org (Postfix) with ESMTPS id 7CDB680107 for <b.a.t.m.a.n@lists.open-mesh.org>; Sun, 6 May 2018 22:21:52 +0200 (CEST) Received: from [79.232.139.20] (helo=orange.lan) by smtprelay04.ispgateway.de with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.90_1) (envelope-from <me@irrelefant.net>) id 1fFPlL-0000T5-Ay; Sun, 06 May 2018 21:56:03 +0200 From: =?utf-8?q?Leonardo_M=C3=B6rlein?= <me@irrelefant.net> To: b.a.t.m.a.n@lists.open-mesh.org Date: Sun, 6 May 2018 21:55:59 +0200 Message-Id: <20180506195559.32602-1-me@irrelefant.net> X-Mailer: git-send-email 2.17.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Df-Sender: bWVAaXJyZWxlZmFudC5uZXQ= X-Mailman-Approved-At: Sun, 06 May 2018 22:32:08 +0200 Subject: [B.A.T.M.A.N.] [RFC PATCH] batman-adv: mitigate issue when empty vlan is received X-BeenThere: b.a.t.m.a.n@lists.open-mesh.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n.lists.open-mesh.org> List-Unsubscribe: <https://lists.open-mesh.org/mm/options/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=unsubscribe> List-Archive: <http://lists.open-mesh.org/pipermail/b.a.t.m.a.n/> List-Post: <mailto:b.a.t.m.a.n@lists.open-mesh.org> List-Help: <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=help> List-Subscribe: <https://lists.open-mesh.org/mm/listinfo/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=subscribe> Reply-To: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n@lists.open-mesh.org> Cc: =?utf-8?q?Leonardo_M=C3=B6rlein?= <me@irrelefant.net> Errors-To: b.a.t.m.a.n-bounces@lists.open-mesh.org Sender: "B.A.T.M.A.N" <b.a.t.m.a.n-bounces@lists.open-mesh.org> |
Series |
[RFC] batman-adv: mitigate issue when empty vlan is received
|
|
Commit Message
Leonardo Mörlein
May 6, 2018, 7:55 p.m. UTC
This patch is not a fix for the issue itself, but a fix for the
other nodes, which are also influenced by the issue.
- Some (affected) nodes send TT announcements for an empty vlan (for
now only seen with vlan_id = 0).
- This behaviour is a bug! Batman-adv nodes must not send TT
announcements for empty vlans.
- The receiving batman-adv can not handle incoming TT announcements
with empty vlans. (The crc check routine batadv_tt_global_check_crc()
fails.)
- As the receiving node thinks, the crc is broken, it does a full
table request then.
- The announcing node sends a TT full table to the receiving node
then, which also contains the empty vlan, so
*batadv_tt_global_check_crc()* fails again on the receiver side.
- This causes a lot of unneeded TT traffic. In Freifunk Hannover we
decreased the amount of from ~20kpp/s to ~4kpp/s on our supernodes.
We have ~800 nodes, which are connected via vpn to one of six
supernodes. Those supernodes are connected with each other in a fully
connected mesh.
Signed-off-by: Leonardo Mörlein <me@irrelefant.net>
---
net/batman-adv/translation-table.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
Comments
On Sonntag, 6. Mai 2018 21:55:59 CEST Leonardo Mörlein wrote: > This patch is not a fix for the issue itself, but a fix for the > other nodes, which are also influenced by the issue. > > - Some (affected) nodes send TT announcements for an empty vlan (for > now only seen with vlan_id = 0). > - This behaviour is a bug! Batman-adv nodes must not send TT > announcements for empty vlans. > - The receiving batman-adv can not handle incoming TT announcements > with empty vlans. (The crc check routine batadv_tt_global_check_crc() > fails.) > - As the receiving node thinks, the crc is broken, it does a full > table request then. > - The announcing node sends a TT full table to the receiving node > then, which also contains the empty vlan, so > *batadv_tt_global_check_crc()* fails again on the receiver side. > - This causes a lot of unneeded TT traffic. In Freifunk Hannover we > decreased the amount of from ~20kpp/s to ~4kpp/s on our supernodes. > We have ~800 nodes, which are connected via vpn to one of six > supernodes. Those supernodes are connected with each other in a fully > connected mesh. Can you attach a (small) pcap of actual traffic which show a packet which triggers this behavior? Thanks, Sven
On 05/07/2018 08:32 AM, Sven Eckelmann wrote: > Can you attach a (small) pcap of actual traffic which show a packet which > triggers this behavior? For sure. See for the originator 5e:25:f7:13:2d:eb and vid=0x8000 in the attached pcap. I selected the interested packets (one ogm, one full table request and one full table response). Kind regards, Leonardo Mörlein
On Montag, 7. Mai 2018 10:06:20 CEST me wrote: > On 05/07/2018 08:32 AM, Sven Eckelmann wrote: > > Can you attach a (small) pcap of actual traffic which show a packet which > > triggers this behavior? > > For sure. See for the originator 5e:25:f7:13:2d:eb and vid=0x8000 in the > attached pcap. I selected the interested packets (one ogm, one full > table request and one full table response). Thanks, These are always from intermediate nodes. I was hoping to see the initial packet which "introduced" the incorrect vid=0x8000 without entry in the first place to better understand the problem which causes this. At least I would have hoped that this might help the people at WBM to have a better discussion about it. At least to understand whether this is from an intermediate node or from the sender. On Sonntag, 6. Mai 2018 21:55:59 CEST Leonardo Mörlein wrote: [...] > - The receiving batman-adv can not handle incoming TT announcements > with empty vlans. (The crc check routine batadv_tt_global_check_crc() > fails.) This part is not 100% clear in the commit message. Right now it just sounds like softif_vlan_list (see batadv_tt_prepare_tvlv_local_data) has an "empty" vlan entry (an entry without mac addresses). Which brings me to the point that batadv_softif_create_vlan initializes new vlan's with the CRC 0x0 before it adds it to the list (see tt/ batadv_vlan_tt in batadv_softif_vlan). We know now that the function batadv_tt_local_update_crc is responsible for updating the crc (holding the tt.commit_lock) on the sender site (for non-intermediate nodes). And this crc is calculated over the bat_priv->tt.local_hash - which (according to the crc 0x0) doesn't have a single (non-NEW) entry for this VID. The return (new crc) is therefore 0x0 and it will be submitted this way. The same should be true for the receiver. The batadv_tt_global_update_crc goes over the list of entries for this vid. And we just received a full table which only contains non-vlan entries and no vlan 0 entries. My assumption would therefore be that the crc on the receiver should also be 0x0 when batadv_tt_global_update_crc was done. But if it is not then it would mean that there was still some entry on the receiver which batadv_tt_global_crc used to calculate the crc for this vid or the crc was just not calculated for it (because it doesn't exist). The latter would match the content of the full table reply in your pcap, the comment in the patch and the info info I got from Antonio: The vid entry will only be created in the originator vlan_list when an entry was received (I haven't checked this). This would sound to me like the vlan should just be created instead of adding this workaround. But maybe you should discuss the details here with Antonio because at the end it is just about the different ways of handling "!vlan" case from your patch. Btw. Antonio, wasn't the VLAN 0 always created on each device in the local table by the VLAN driver? See the message [ 29.866038] 8021q: adding VLAN 0 to HW filter on device bat0 when batman-adv attaches itself to the vlan monitoring. But usually, there is at least one entry in it (the bat0 mac address). Not sure why this isn't the case here. At least batadv_interface_add_vid should add it at the end. Maybe there was an error before the entry was created? Would be interesting to know where exactly. Kind regards, Sven
n Montag, 7. Mai 2018 12:23:57 CEST Sven Eckelmann wrote: > This would sound to me like the vlan should just be > created instead of adding this workaround. But maybe you should discuss the > details here with Antonio because at the end it is just about the different > ways of handling "!vlan" case from your patch. Just had a look at the cleanup code for vlans and it seems like each vlan in vlan_list must contain at least one entry (according to vlan->tt.num_entries) and otherwise batadv_tt_global_size_mod will drop it. So it isn't as easy as just adding a simple vlan for each received vlan entry via TT response. So your patch (or a variant of it) will most likely be the way to go for receivers. Now to your commit message: On Sonntag, 6. Mai 2018 21:55:59 CEST Leonardo Mörlein wrote: > batman-adv: mitigate issue when empty vlan is received Something more specific would be nice. Maybe somebody else has a better idea but what about: batman-adv: Mitigate TT loop when receiving empty VLAN > This patch is not a fix for the issue itself, but a fix for the > other nodes, which are also influenced by the issue. I personally don't like to read "this patch" in commit messages. This part should maybe be reformulated and put at the end (before the Fixes line). Something like: While the problem is caused by the sender, the receiver can mitigate the problem by ignoring (potentially) empty VLANs during the CRC check. [...] > - This behaviour is a bug! Batman-adv nodes must not send TT > announcements for empty vlans. Where is this defined as incorrect message? But yes, the current implementation cannot handle it. > - The receiving batman-adv can not handle incoming TT announcements > with empty vlans. (The crc check routine batadv_tt_global_check_crc() > fails.) This is to vague as discussed earlier. Maybe something more like: - The receiving batman-adv can not handle incoming TT announcements with empty vlans. No new struct batadv_orig_node_vlan entry will be added to batadv_orig_node->vlan_list. This will cause the crc check routine batadv_tt_global_check_crc() to fail because it cannot find the local representation of this empty vlan. [...] > - This causes a lot of unneeded TT traffic. In Freifunk Hannover we > decreased the amount of from ~20kpp/s to ~4kpp/s on our supernodes. > We have ~800 nodes, which are connected via vpn to one of six > supernodes. Those supernodes are connected with each other in a fully > connected mesh. This part doesn't seem to fit in the list and might just be a new paragraph at the end to summarize the impact of the problem. A "Fixes: " line is missing here. Please check whether this sounds about right: Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific") Kind regards, Sven
On Monday, May 7, 2018 3:55:59 AM HKT Leonardo Mörlein wrote: > - if (!vlan) > - return false; > + if (!vlan) { > + /* Due to a bug, some originators send TT > + * announcements for empty vlans. As the receiving > + * nodes will ignore those empty vlans (they do not > + * add a batadv_orig_node_vlan into the transglobal > + * for the originating node), crc check will fail > + * here. To circumvent this issue, we skip the > + * verification for the vlan if the crc is > + * equal to 0x00000000. > + */ > + if (tt_vlan_tmp->crc == 0x00000000) > + continue; > + else > + return false; > + } There are some issues with this approach: * As you might be aware, a CRC of 0x00000000 is not an invalid or uninitialized CRC. That means it is conceivable the CRC over the translation table entries and flags results in a CRC of 0x00000000 without any bugs being present. Applying this patch would lead to introducing a bug in these conditions. * The second concern is about how to deal with 'the bug' (referring to your patch comment above). In your case 'the bug' results in missing TT entries for a given VLAN plus a CRC of 0x00000000. What if the CRC was 0x00000001 due to some other bug ? Then you're check would totally miss the mark and we're back at the beginning. If we wish to deal with such misbehavior on a scale where we assume the code should be able to deal with garbage values in some or all fields a different approach is needed. Relying on a magic number (CRC 0x00000000) is as bad as the current behavior (assuming each VLAN has an entry). Shortly, I will send a patch to prevent sending an empty VLAN around. Please note that this patch still is more of a band-aid. We still don't know how the empty VLAN came into being in the first place. Plus, that patch won't protect against garbage packets which certainly requires more effort. Cheers, Marek
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index 0225616d..d2a843fd 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -2996,8 +2996,21 @@ static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node, vlan = batadv_orig_node_vlan_get(orig_node, ntohs(tt_vlan_tmp->vid)); - if (!vlan) - return false; + if (!vlan) { + /* Due to a bug, some originators send TT + * announcements for empty vlans. As the receiving + * nodes will ignore those empty vlans (they do not + * add a batadv_orig_node_vlan into the transglobal + * for the originating node), crc check will fail + * here. To circumvent this issue, we skip the + * verification for the vlan if the crc is + * equal to 0x00000000. + */ + if (tt_vlan_tmp->crc == 0x00000000) + continue; + else + return false; + } crc = vlan->tt.crc; batadv_orig_node_vlan_put(vlan);