Message ID | 6d1b250c047fb4cd3aac72d9e921ed33428fdaf5.1516648598.git.mschiffer@universe-factory.net (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Simon Wunderlich |
Headers |
Return-Path: <b.a.t.m.a.n-bounces@lists.open-mesh.org> X-Original-To: patchwork@open-mesh.org Delivered-To: patchwork@open-mesh.org Received: from open-mesh.org (localhost [IPv6:::1]) by open-mesh.org (Postfix) with ESMTP id 6575C80C46; Mon, 22 Jan 2018 20:31:44 +0100 (CET) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2001:19f0:6c01:100::1; helo=orthanc.universe-factory.net; envelope-from=mschiffer@universe-factory.net; receiver=<UNKNOWN> X-Greylist: delayed 380 seconds by postgrey-1.36 at open-mesh.org; Mon, 22 Jan 2018 20:31:39 CET Received: from orthanc.universe-factory.net (orthanc.universe-factory.net [IPv6:2001:19f0:6c01:100::1]) by open-mesh.org (Postfix) with ESMTPS id 5C75B80203 for <b.a.t.m.a.n@lists.open-mesh.org>; Mon, 22 Jan 2018 20:31:39 +0100 (CET) Received: from localhost.localdomain (unknown [IPv6:2001:19f0:6c01:100::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by orthanc.universe-factory.net (Postfix) with ESMTPSA id B53361F4E6 for <b.a.t.m.a.n@lists.open-mesh.org>; Mon, 22 Jan 2018 20:25:16 +0100 (CET) From: Matthias Schiffer <mschiffer@universe-factory.net> To: b.a.t.m.a.n@lists.open-mesh.org Date: Mon, 22 Jan 2018 20:24:50 +0100 Message-Id: <6d1b250c047fb4cd3aac72d9e921ed33428fdaf5.1516648598.git.mschiffer@universe-factory.net> X-Mailer: git-send-email 2.16.0 Subject: [B.A.T.M.A.N.] [PATCH maint] batman-adv: fix packet checksum in receive path X-BeenThere: b.a.t.m.a.n@lists.open-mesh.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n.lists.open-mesh.org> List-Unsubscribe: <https://lists.open-mesh.org/mm/options/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=unsubscribe> List-Archive: <http://lists.open-mesh.org/pipermail/b.a.t.m.a.n/> List-Post: <mailto:b.a.t.m.a.n@lists.open-mesh.org> List-Help: <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=help> List-Subscribe: <https://lists.open-mesh.org/mm/listinfo/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.org?subject=subscribe> Reply-To: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n@lists.open-mesh.org> Errors-To: b.a.t.m.a.n-bounces@lists.open-mesh.org Sender: "B.A.T.M.A.N" <b.a.t.m.a.n-bounces@lists.open-mesh.org> |
Series |
[maint] batman-adv: fix packet checksum in receive path
|
|
Commit Message
Matthias Schiffer
Jan. 22, 2018, 7:24 p.m. UTC
skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the
skb checksum, otherwise log spam of the form "bat0: hw csum failure" will
result when packets with CHECKSUM_COMPLETE are received (at least in some
setups, e.g. when stacking batman-adv on top of VXLAN).
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
---
I don't know what the exact circumstances are that trigger the log spam,
but it seems this was broken forever (I could also reproduce the issue with
our compat-14 legacy branch)... so please ask David to queue this up for
stable :)
net/batman-adv/soft-interface.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
Comments
On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote: > skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the > skb checksum, otherwise log spam of the form "bat0: hw csum failure" will > result when packets with CHECKSUM_COMPLETE are received (at least in some > setups, e.g. when stacking batman-adv on top of VXLAN). Would be nice to have a better explanation here. The comment previously assumed that skb_pull_rcsum would be enough. But the problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The actual pull of the ethernet header (with skb_pull_inline) happens inside eth_type_trans. Or did I miss anything? [...] > I don't know what the exact circumstances are that trigger the log spam, > but it seems this was broken forever (I could also reproduce the issue with > our compat-14 legacy branch)... so please ask David to queue this up for > stable :) Yes, this is broken since earliest commits. The most relevant commit in batman-adv is: Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs") But I would propose to use following in the kernel tree: Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") The 4.15 release will be soon(tm) and Simon is currently on vacation. So we will most likely postpone the submission to David until Simon found a way out of the snow and after 4.15 is released... But it would be nice when some people could test the patch [1] (together with vxlan?) on batman-adv or batman-adv-legacy. And please provide a "Tested-by: Full Name <email@example.org>" [2] reply when it works. Thanks, Sven [1] https://patchwork.open-mesh.org/patch/17250/ [2] https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
On 01/22/2018 09:52 PM, Sven Eckelmann wrote: > On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote: >> skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the >> skb checksum, otherwise log spam of the form "bat0: hw csum failure" will >> result when packets with CHECKSUM_COMPLETE are received (at least in some >> setups, e.g. when stacking batman-adv on top of VXLAN). > > Would be nice to have a better explanation here. > > The comment previously assumed that skb_pull_rcsum would be enough. But the > problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The > actual pull of the ethernet header (with skb_pull_inline) happens inside > eth_type_trans. Or did I miss anything? This is correct, eth_type_trans() contains a simple skb_pull(), so the csum must be adjusted afterwards (grepping the kernel for eth_type_trans will find a lot of this). I can send a v2 with a better commit message later. > > [...] >> I don't know what the exact circumstances are that trigger the log spam, >> but it seems this was broken forever (I could also reproduce the issue with >> our compat-14 legacy branch)... so please ask David to queue this up for >> stable :) > > Yes, this is broken since earliest commits. The most relevant commit in > batman-adv is: > > Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs") > > But I would propose to use following in the kernel tree: > > Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") > > The 4.15 release will be soon(tm) and Simon is currently on vacation. So we > will most likely postpone the submission to David until Simon found a way out > of the snow and after 4.15 is released... > > But it would be nice when some people could test the patch [1] (together with > vxlan?) on batman-adv or batman-adv-legacy. And please provide a > "Tested-by: Full Name <email@example.org>" [2] reply when it works. > > Thanks,> Sven I've tested this on Kernel 4.14.14 (everything working correctly now) and 4.4.110 (here, there are still checksum errors; it seems on older kernels, the checksum handling in VXLAN is broken too? Still debugging this...) Matthias > > [1] https://patchwork.open-mesh.org/patch/17250/ > [2] https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes >
On 01/22/2018 10:18 PM, Matthias Schiffer wrote: > On 01/22/2018 09:52 PM, Sven Eckelmann wrote: >> On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote: >>> skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the >>> skb checksum, otherwise log spam of the form "bat0: hw csum failure" will >>> result when packets with CHECKSUM_COMPLETE are received (at least in some >>> setups, e.g. when stacking batman-adv on top of VXLAN). >> >> Would be nice to have a better explanation here. >> >> The comment previously assumed that skb_pull_rcsum would be enough. But the >> problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The >> actual pull of the ethernet header (with skb_pull_inline) happens inside >> eth_type_trans. Or did I miss anything? > > This is correct, eth_type_trans() contains a simple skb_pull(), so the csum > must be adjusted afterwards (grepping the kernel for eth_type_trans will > find a lot of this). I can send a v2 with a better commit message later. > >> >> [...] >>> I don't know what the exact circumstances are that trigger the log spam, >>> but it seems this was broken forever (I could also reproduce the issue with >>> our compat-14 legacy branch)... so please ask David to queue this up for >>> stable :) >> >> Yes, this is broken since earliest commits. The most relevant commit in >> batman-adv is: >> >> Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs") >> >> But I would propose to use following in the kernel tree: >> >> Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") >> >> The 4.15 release will be soon(tm) and Simon is currently on vacation. So we >> will most likely postpone the submission to David until Simon found a way out >> of the snow and after 4.15 is released... >> >> But it would be nice when some people could test the patch [1] (together with >> vxlan?) on batman-adv or batman-adv-legacy. And please provide a >> "Tested-by: Full Name <email@example.org>" [2] reply when it works. >> >> Thanks,> Sven > > I've tested this on Kernel 4.14.14 (everything working correctly now) and > 4.4.110 (here, there are still checksum errors; it seems on older kernels, > the checksum handling in VXLAN is broken too? Still debugging this...) I've found the issue of this other checksum problem: batman-adv fragmentation code doesn't handle the checksum on reassembly at all. I think the best option here is to simply set ip_summed to CHECKSUM_NONE on reassembly, I will send another patch for that. The IP fragmentation code does more fancy things when all fragments have CHECKSUM_COMPLETE, adding up the checksums of the fragments under certain circumstances. This only works because IP fragments are guaranteed to be split at even byte offsets (multiples of 8, actually); as far as I can tell, batman-adv allows odd fragment sizes, making it impossible to add up the 16bit checksums in the general case. Matthias > > > >> >> [1] https://patchwork.open-mesh.org/patch/17250/ >> [2] https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes >> > >
Anno domini 2018 Sven Eckelmann scripsit: > On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote: [...] > > I don't know what the exact circumstances are that trigger the log spam, > > but it seems this was broken forever (I could also reproduce the issue with > > our compat-14 legacy branch)... so please ask David to queue this up for > > stable :) > > Yes, this is broken since earliest commits. The most relevant commit in > batman-adv is: > > Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs") > > But I would propose to use following in the kernel tree: > > Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") > > The 4.15 release will be soon(tm) and Simon is currently on vacation. So we > will most likely postpone the submission to David until Simon found a way out > of the snow and after 4.15 is released... > > But it would be nice when some people could test the patch [1] (together with > vxlan?) on batman-adv or batman-adv-legacy. And please provide a > "Tested-by: Full Name <email@example.org>" [2] reply when it works. I took a Debian Kernel package (4.14.13-1~bpo9+1), applied the patch and deployed the package on a gateway running BATMAN over VTEPs. The log messages from previous kernels don't show up anymore <3. Tested-by: Maximilian Wilhelm <max@sdn.clinic> Thanks! Best Max
Anno domini 2018 Matthias Schiffer scripsit: Hi, > On 01/22/2018 10:18 PM, Matthias Schiffer wrote: > > On 01/22/2018 09:52 PM, Sven Eckelmann wrote: > >> On Montag, 22. Januar 2018 20:24:50 CET Matthias Schiffer wrote: > >>> skb_postpull_rcsum() is necessary after eth_type_trans() to adjust the > >>> skb checksum, otherwise log spam of the form "bat0: hw csum failure" will > >>> result when packets with CHECKSUM_COMPLETE are received (at least in some > >>> setups, e.g. when stacking batman-adv on top of VXLAN). > >> > >> Would be nice to have a better explanation here. > >> > >> The comment previously assumed that skb_pull_rcsum would be enough. But the > >> problem here is that the skb_pull_rcsum only pulls the batman-adv headers. The > >> actual pull of the ethernet header (with skb_pull_inline) happens inside > >> eth_type_trans. Or did I miss anything? > > > > This is correct, eth_type_trans() contains a simple skb_pull(), so the csum > > must be adjusted afterwards (grepping the kernel for eth_type_trans will > > find a lot of this). I can send a v2 with a better commit message later. > > > >> > >> [...] > >>> I don't know what the exact circumstances are that trigger the log spam, > >>> but it seems this was broken forever (I could also reproduce the issue with > >>> our compat-14 legacy branch)... so please ask David to queue this up for > >>> stable :) > >> > >> Yes, this is broken since earliest commits. The most relevant commit in > >> batman-adv is: > >> > >> Fixes: fe28a94c01e1 ("batman-adv: receive packets directly using skbs") > >> > >> But I would propose to use following in the kernel tree: > >> > >> Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") > >> > >> The 4.15 release will be soon(tm) and Simon is currently on vacation. So we > >> will most likely postpone the submission to David until Simon found a way out > >> of the snow and after 4.15 is released... > >> > >> But it would be nice when some people could test the patch [1] (together with > >> vxlan?) on batman-adv or batman-adv-legacy. And please provide a > >> "Tested-by: Full Name <email@example.org>" [2] reply when it works. > >> > >> Thanks,> Sven > > > > I've tested this on Kernel 4.14.14 (everything working correctly now) and > > 4.4.110 (here, there are still checksum errors; it seems on older kernels, > > the checksum handling in VXLAN is broken too? Still debugging this...) > > I've found the issue of this other checksum problem: batman-adv > fragmentation code doesn't handle the checksum on reassembly at all. I > think the best option here is to simply set ip_summed to CHECKSUM_NONE on > reassembly, I will send another patch for that. > > The IP fragmentation code does more fancy things when all fragments have > CHECKSUM_COMPLETE, adding up the checksums of the fragments under certain > circumstances. This only works because IP fragments are guaranteed to be > split at even byte offsets (multiples of 8, actually); as far as I can > tell, batman-adv allows odd fragment sizes, making it impossible to add up > the 16bit checksums in the general case. And Tested-By: Maximilian Wilhelm <max@sdn.clinic> to the fix for fragmentation.c, too. Disclaimer: As MTUs are calculated accordingly in our backbone fragmentation of VXLAN packets isn't an issue and we did not see these messages before. I can confirm, that I still don't see any now, meaning the log spam from the previous fix is still fixed and no new issues have arisen as of now. Thanks a lot! <3 Best Max
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index c95e2b26..edeffcb9 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -459,13 +459,7 @@ void batadv_interface_rx(struct net_device *soft_iface, /* skb->dev & skb->pkt_type are set here */ skb->protocol = eth_type_trans(skb, soft_iface); - - /* should not be necessary anymore as we use skb_pull_rcsum() - * TODO: please verify this and remove this TODO - * -- Dec 21st 2009, Simon Wunderlich - */ - - /* skb->ip_summed = CHECKSUM_UNNECESSARY; */ + skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN); batadv_inc_counter(bat_priv, BATADV_CNT_RX); batadv_add_counter(bat_priv, BATADV_CNT_RX_BYTES,