[v5,0/7] B.A.T.M.A.N. V - fallback to tp meter estimation if throughput otherwise not available

Message ID 20180907095958.30946-1-mareklindner@neomailbox.ch (mailing list archive)
Headers
Series B.A.T.M.A.N. V - fallback to tp meter estimation if throughput otherwise not available |

Message

Marek Lindner Sept. 7, 2018, 9:59 a.m. UTC
  Under normal circumstances B.A.T.M.A.N. V retrieves the neighbor
throughput values to populate its metric tables from the various
drivers such as WiFi throughput tables and Ethernet throughput..
Whenever the interface drivers do not export link throughput 
information manual overrides become necessary. To further 
automate and thus better support these setups, ELP may call the
batman-adv throughput meter to schedule a throughput estimation
to be used to populate the metric table.

v5:
 * fix tp_vars refcount on queue_work() failure
 * squash batadv_tp_start_work() into batadv_tp_start()

v4:
 * read tp measurement result only once

v3:
 * fix ELP tp meter result computation
 * use batadv_has_timed_out() instead of custom implementation
 * set ELP tp meter test duration to 1000ms in patch #6
 * add comment explaining periodic scheduling

v2:
 * added sysfs attribute to configure tp meter test duration
 * fixed null pointer dereference in TP meter packet sending routine
 * fixed storing the measured throughput in the correct variable
 * checkpatch/kerneldoc/sparse/smatch cleanup

Antonio Quartulli (3):
  batman-adv: tp_meter - prevent concurrent tp_meter sessions by using
    workqueue
  batman-adv: tp_meter - don't check for existing session
  batman-adv: tp_meter - add option to perform one-hop test

Marek Lindner (4):
  batman-adv: tp_meter - allow up to 10 queued sessions
  batman-adv: tp_meter - add caller distinction
  batman-adv: ELP - use tp meter to estimate the throughput if otherwise
    not available
  batman-adv: ELP - add throughput meter test duration attribute

 .../ABI/testing/sysfs-class-net-batman-adv    |   7 +
 include/uapi/linux/batadv_packet.h            |   2 +
 net/batman-adv/bat_v.c                        |   1 +
 net/batman-adv/bat_v_elp.c                    |  69 ++-
 net/batman-adv/bat_v_elp.h                    |  21 +
 net/batman-adv/main.c                         |  10 +-
 net/batman-adv/main.h                         |   7 +-
 net/batman-adv/netlink.c                      |   3 +-
 net/batman-adv/routing.c                      |   6 +-
 net/batman-adv/sysfs.c                        |   3 +
 net/batman-adv/tp_meter.c                     | 484 +++++++++++-------
 net/batman-adv/tp_meter.h                     |  11 +-
 net/batman-adv/types.h                        |  36 ++
 13 files changed, 453 insertions(+), 207 deletions(-)
  

Comments

Sven Eckelmann Sept. 8, 2018, 5:38 p.m. UTC | #1
On Freitag, 7. September 2018 11:59:51 CEST Marek Lindner wrote:
> Under normal circumstances B.A.T.M.A.N. V retrieves the neighbor
> throughput values to populate its metric tables from the various
> drivers such as WiFi throughput tables and Ethernet throughput..
> Whenever the interface drivers do not export link throughput 
> information manual overrides become necessary. To further 
> automate and thus better support these setups, ELP may call the
> batman-adv throughput meter to schedule a throughput estimation
> to be used to populate the metric table.

I know that this patchset is required to have some throughput estimates for 
non-wifi connections (or for wifi drivers without expected throughput). But 
since we have the other discussion with Ligang LIU, I still have to bring this 
up:

We already know that something (maybe ELP unicast probes and the related 
messages) are "breaking" the connectivity between 2 neighbors in Ligang LIU's 
test setup (with 6 not so well connected nodes). It also seems to be the case 
that the driver doesn't return the expected throughput for each neighbor all 
the time and this missing data would then trigger the single-link tpmeter 
test.

The test seem to be done every 60 seconds (or longer when there is another 
source for the throughput) for each neighbor and by default 1s long. My 
assumption would be now that this test could be even more harmful for wifi 
connections than unicast ELP probes (or whatever is causing Ligang LUI's 
problems). Let us just assume that we have these 6 nodes with 5 neighbors (I 
see a lot more in real world setups). If the tests on the different nodes 
don't overlap (which would also be bad for the test results) than all tests 
would take at least 30 seconds of the 60 seconds. So even without the problem 
in the "Performance Evaluation of BATMAN-adv Wireless Mesh Network Routing 
Algorithms" mailing list thread, the tpmeter fallback has a significant cost.

Is there anything which I missed that could ease my mind?

Kind regards,
	Sven
  
Sven Eckelmann Nov. 25, 2019, 8:41 a.m. UTC | #2
On Saturday, 8 September 2018 19:38:01 CET Sven Eckelmann wrote:
[... explanation why this test could create problems for other wifi connections...]
> Is there anything which I missed that could ease my mind?

Beside this potential problem, there is also the problem of the interaction 
with the ethtool throughput gathering (brought up by Matthias Schiffer [1] 
and Matthias Fritzsche [2]). The ethtool code basically prevents the correct 
measurement in many situations. I will try to summarize it now:

* ethtool's link_ksettings is not about speed towards a neighbor behind the 
  interface but about either

  - PHY layer speed class from local ethernet port to next peer (for direct 
    connections)
  - connection towards next (internal) switch
  - propagated value from a lower layer device (e.g for vxlan)
  - "random" value (e.g. tun/tap)
  - ...

* often connections indirect (switches, VPN, WiFi bridges, L2 firewalls, ...)
* B.A.T.M.A.N. V requires a measurement for the throughput towards a neighbor 
  and not the maximum(?) PHY layer speed of the first hop over the current 
  medium

The proposed solution is to drop the ethtool link_ksettings usage when the 
tpmeter (or similar) approach for neighbor speed measurements is integrated.

Kind regards,
	Sven

[1] https://github.com/freifunk-gluon/gluon/pull/1872#issuecomment-55767698
[2] https://patchwork.open-mesh.org/patch/17371/#31059