Internet connection problem with Batman

Message ID 1e9ca23deab35c772fdfc80feb96437f@eisox.com (mailing list archive)
State Not Applicable, archived
Delegated to: Simon Wunderlich
Headers
Series Internet connection problem with Batman |

Commit Message

faycel.benhajkhalifa@eisox.com Jan. 30, 2020, 3:24 p.m. UTC
  Hello,
I saw that you are contributing to BATMAN, may I ask you a few questions 
about my installation?
I have several boards with OpenWRT firmware:

Mips processor
Kernel version 4.14.131
OpenWRT 18.06.01
BATMAN: Batctl openwrt-2018.1-1 [batman-adv: openwrt-2018.1-8]

I have 8 connected mesh boards.
Internet connection on the boards is not always available.
When a board no longer has an internet connection, I connect to another 
board and try the following commands:

batctl o: I find the board in the table with last-seen <1 and a quality 
between 191 and 233 (good quality)
batctl ping does not succeed every time
ping with the OpenWRT ping command does not work
If I restart the board which no longer has internet, then it reconnects 
to the network and accesses the internet without problem (functional 
pings)

I tried with a Static or Dynamic IP, the result is the same. The 
internet connection works then is interrupted and I can't find why. And 
sometimes without restarting the board, it finds the internet
OpenWRT Config:

/etc/config/network
config interface 'wan'
option type 'bridge'
option ifname 'eth0 bat0'
option dns '8.8.8.8'
option stp '1'
option gateway '192.168.1.1'
option netmask '255.255.255.0'
option ipaddr '192.168.1.101'
option proto 'static'
/etc/config/wireless
config wifi-iface 'wmesh'
option device 'radio0'
option ifname 'adhoc0'
option mode 'adhoc'
option network 'mesh'
option encryption 'psk2'
option ssid 'ssid'
option bssid 'bssid'
option key 'password'

/etc/config/batman-adv

config 'mesh' 'bat0'
option 'aggregated_ogms'
option 'ap_isolation'
option 'bonding'
option 'fragmentation'
option 'gw_bandwidth'
option 'gw_mode'
option 'gw_sel_class'
option 'log_level'
option 'orig_interval'
option 'bridge_loop_avoidance'
option 'distributed_arp_table'
option 'multicast_mode'
option 'network_coding'
option 'hop_penalty' 0
option 'isolation_mark

and I added a patch in batman-adv/patches


The purpose of this patch is to make the network reconnect more quickly 
when a board being removed or added.
Thanks for your help,
I can provide more information about my network if you wish.

Regards,
  

Comments

Sven Eckelmann Jan. 30, 2020, 3:49 p.m. UTC | #1
On Thursday, 30 January 2020 16:24:55 CET faycel.benhajkhalifa@eisox.com 
wrote:
> Hello,
> I saw that you are contributing to BATMAN, may I ask you a few questions 
> about my installation?

Not all here are contributing to batman-adv. But at least some of the guys on 
the mailing list use batman-adv.

> I have several boards with OpenWRT firmware:
> I have 8 connected mesh boards.
> Internet connection on the boards is not always available.
> When a board no longer has an internet connection, I connect to another 
> board and try the following commands:
[...]

What do you ping from where? Please try to reduce the complexity of your tests 
step by step. The simpler they are, the better to pinpoint the problem. So 
don't try to ping from one node to the internet but to the actual gateway. Or 
just to the next hop.

What is the topology? Are the nodes in the mesh direct neighbors or are you 
using multiple hops?

Did you check each hop with tcpdump? Are the packets arriving at each hop and 
(for intermediate) nodes are forwarded correctly? The forwarding can only be 
seen on the lower interfaces (interfaces in the batadv interface bat0). The 
arrival can be seen on the upper layer (bat0) and on the lower layers. The 
interesting part would be now where the packet or the answer is lost. Or maybe 
the peer is not even answering on the ICMP echo request for some reason.

What protocol are you using above bat0? IPv6/IPv4? When it is IPv4, did you 
check that the MAC addresses in the ARP table are correct? Did you try to 
disable DAT in batman-adv? The disabling of DAT might be required when you 
cannot guarantee that IPv4 addresses stay the same until the DAT cache 
expires.

Did you check whether you have IPv4/IPv6 conflicts?

Did you check whether you have MAC address conflicts (either on the lower 
interfaces or on the bat0/br0 interfaces)?

Is only the internet connection not working or is something already failing in 
the path to your gateway?

Are the gateway and the node trying to get internet access using the correct 
IPv6/IPv4 routes?

Do you have multiple DHCP(v6)/RADV servers in the network which have 
conflicting configurations?

Do you manually set the gateway mode to client and have at least one node in 
the network which have gateway mode set to server but is not actually 
providing a valid DHCP answer.

Did you check whether you have some loops in your network (over the bat0 
interfaces - which seems to be bridged to other interfaces).

Did you check whether the bridge is blocking the access to correct outgoing 
port? Or whether a device behind your gateway device is blocking the 
connection?

Did you check whether your ip(6)tables is blocking some relevant traffic? Did 
you check whether something is going wrong in the some offloading HW?

Did you make sure that network-coding is disabled for the bat0 interface.

What makes you think that batman-adv is the reason for the problem for 
internet outage? batman-adv is by default not interested in layer 3/4/... 
stuff. And thus it is not handling your internet access. There are some 
optimizations like gw_mode and distributed_arp_table (DAT) that can try to 
optimize the routing of some (usually broadcasted) packets. The broadcasted 
packets will then reach the desired destination faster (or in DAT's case get 
an answer faster) - but that is not the main task batman-adv does.

It seems like you provided some config. But this seems to be a config for a 
device which directly has internet access and not internet over batman-adv. So 
not a node which has the internet outage problem.

Kind regards,
	Sven
  

Patch

--- a/net/batman-adv/main.h
+++ b/net/batman-adv/main.h
--- 2 2019-07-15 17:26:55.717093662 +0200
+++ 1 2019-07-15 17:26:46.565093715 +0200
@@ -43,7 +43,7 @@ 
/* purge originators after time in seconds if no valid packet comes in
* -> TODO: check influence on BATADV_TQ_LOCAL_WINDOW_SIZE
*/
-#define BATADV_PURGE_TIMEOUT 200000 /* 200 seconds */
+#define BATADV_PURGE_TIMEOUT 10000 /* 10 seconds */
#define BATADV_TT_LOCAL_TIMEOUT 600000 /* in milliseconds */
#define BATADV_TT_CLIENT_ROAM_TIMEOUT 600000 /* in milliseconds */
#define BATADV_TT_CLIENT_TEMP_TIMEOUT 600000 /* in milliseconds */