batman-adv: Use wifi rx/tx as fallback throughput

Message ID 20190609101922.2366-1-treffer@measite.de (mailing list archive)
State Superseded, archived
Delegated to: Simon Wunderlich
Headers
Series batman-adv: Use wifi rx/tx as fallback throughput |

Commit Message

René Treffer June 9, 2019, 10:19 a.m. UTC
  From: rtreffer <treffer@measite.de>

Some wifi drivers (e.g. ath10k) provide per-station rx/tx values but no
estimated throughput. Setting a better estimate than the default 1MBit
makes these devices work well with BATMAN V.

Signed-off-by: René Treffer <treffer@measite.de>
---
 net/batman-adv/bat_v_elp.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)
  

Comments

Sven Eckelmann June 9, 2019, 10:37 a.m. UTC | #1
On Sunday, 9 June 2019 12:19:22 CEST René Treffer wrote:
> @@ -107,10 +107,25 @@ static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
>                 }
>                 if (ret)
>                         goto default_throughput;
> -               if (!(sinfo.filled & BIT(NL80211_STA_INFO_EXPECTED_THROUGHPUT)))
> -                       goto default_throughput;
>  
> -               return sinfo.expected_throughput / 100;
> +               if (sinfo.filled & BIT(NL80211_STA_INFO_EXPECTED_THROUGHPUT)) {
> +                       return sinfo.expected_throughput / 100;
> +               }
> +
> +               // try to estimate en expected throughput based on reported rx/tx rates
> +               // 1/3 of tx or 1/3 of the average of rx and tx, whichever is smaller
> +               if (sinfo.filled & BIT(NL80211_STA_INFO_TX_BITRATE)) {
> +                       tx = cfg80211_calculate_bitrate(&sinfo.txrate);
> +                       if (sinfo.filled & BIT(NL80211_STA_INFO_RX_BITRATE)) {
> +                               rx = cfg80211_calculate_bitrate(&sinfo.rxrate);
> +                               if (rx < tx) {
> +                                       return (rx + tx) / 6;
> +                               }
> +                       }
> +                       return tx / 3;
> +               }
> +
> +               goto default_throughput;
>         }
>  
>         /* if not a wifi interface, check if this device provides data via
> -- 
> 2.20.1

No, we are not interested in rx rate for tx throughput estimations.

Kind regards,
	Sven
  
Sven Eckelmann June 9, 2019, 10:41 a.m. UTC | #2
On Sunday, 9 June 2019 12:19:22 CEST René Treffer wrote:
> From: rtreffer <treffer@measite.de>
> 
> Some wifi drivers (e.g. ath10k) provide per-station rx/tx values but no
> estimated throughput. Setting a better estimate than the default 1MBit
> makes these devices work well with BATMAN V.
> 
> Signed-off-by: René Treffer <treffer@measite.de>
> ---

Please use checkpatch before sending a patch:

    WARNING: braces {} are not necessary for single statement blocks
    #113: FILE: net/batman-adv/bat_v_elp.c:111:
    +               if (sinfo.filled & BIT(NL80211_STA_INFO_EXPECTED_THROUGHPUT)) {
    +                       return sinfo.expected_throughput / 100;
    +               }
    
    WARNING: line over 80 characters
    #117: FILE: net/batman-adv/bat_v_elp.c:115:
    +               // try to estimate en expected throughput based on reported rx/tx rates
    
    WARNING: line over 80 characters
    #118: FILE: net/batman-adv/bat_v_elp.c:116:
    +               // 1/3 of tx or 1/3 of the average of rx and tx, whichever is smaller
    
    WARNING: braces {} are not necessary for single statement blocks
    #123: FILE: net/batman-adv/bat_v_elp.c:121:
    +                               if (rx < tx) {
    +                                       return (rx + tx) / 6;
    +                               }
    
    WARNING: Missing Signed-off-by: line by nominal patch author 'rtreffer <treffer@measite.de>'
    
    total: 0 errors, 5 warnings, 0 checks, 36 lines checked
    
    NOTE: For some of the reported defects, checkpatch may be able to
          mechanically convert to the typical style using --fix or --fix-inplace.
    
    /home/sven/[PATCH] batman-adv_Use wifi rx_tx as fallback throughput.mbox has style problems, please review.
    
    NOTE: If any of the errors are false positives, please report
          them to the maintainer, see CHECKPATCH in MAINTAINERS

Kind regards,
	Sven
  
Sven Eckelmann June 9, 2019, 11:09 a.m. UTC | #3
On Sunday, 9 June 2019 12:19:22 CEST René Treffer wrote:
> -       u32 throughput;
> +       u32 throughput, rx, tx;

Avoid adding multiple variable declarations in a single line. And prefer 
ordering declarations longest to shortest.

[...]
> +		// try to estimate en expected throughput based on reported rx/tx rates
> +		// 1/3 of tx or 1/3 of the average of rx and tx, whichever is smaller

And for the next version of the patch, please also use the same comment style 
as the rest of the code [1,2].

And please explain also where the magic 1/3 comes from.

Kind regards,
	Sven

[1] https://www.kernel.org/doc/html/v5.1/process/coding-style.html
[2] https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt
  
Marek Lindner June 9, 2019, 11:40 a.m. UTC | #4
On Sunday, 9 June 2019 18:37:54 HKT Sven Eckelmann wrote:
> No, we are not interested in rx rate for tx throughput estimations.

Before ruling rx out, can you explain your thinking behind this magic formula 
(if smaller compute sum and divide by 6):

+               if (sinfo.filled & BIT(NL80211_STA_INFO_TX_BITRATE)) {
+                       tx = cfg80211_calculate_bitrate(&sinfo.txrate);
+                       if (sinfo.filled & BIT(NL80211_STA_INFO_RX_BITRATE)) {
+                               rx = 
cfg80211_calculate_bitrate(&sinfo.rxrate);
+                               if (rx < tx) {
+                                       return (rx + tx) / 6;
+                               }
+                       }
+                       return tx / 3;
+               }

Thanks,
Marek
  
René Treffer June 9, 2019, 12:45 p.m. UTC | #5
On 09.06.19 13:40, Marek Lindner wrote:

> On Sunday, 9 June 2019 18:37:54 HKT Sven Eckelmann wrote:
>> No, we are not interested in rx rate for tx throughput estimations.
> Before ruling rx out, can you explain your thinking behind this magic formula 
> (if smaller compute sum and divide by 6):

Sorry, I should have provided way more context here.... The formula is
min(tx/3,avg(rx,tx)/3), which is tx/3 for rx>tx

My thinking was
1. it should be lower than an expected throughput measurement (play it safe)
2. it should still be roughly in line with expected throughput as
implemented elsewhere

I am testing this on devices with ath9k (2.4GHz) and ath10k (5GHz), so I
was looking at the estimates I get from ath9k. Here is a dump from my
home network on 2.4GHz/ath9k and what rx/3 would give us:

> signal  tx     rx     expect  tx/3    min(tx/3,(rx+tx)/2/3)
> -77     13.0   43.3   6.682   4.333
> -57     130.0  117.0  44.677  43.333  41.166
> -53     117.0  130.0  42.388  39.0
> -82     43.3   6.5    13.366  14.433  8.3      (!!!)
> -63     52.0   86.7   26.733  17.333
> -58     130.0  173.3  29.21   43.333            !!!
> -82     6.5    43.3   2.197   2.166
> -48     104.0  65.0   40.191  34.666  28.166
> -69     57.8   13.0   20.49   19.266  11.8
> -58     86.7   52.0   33.507  28.9    23.116
> -61     57.8   72.2   29.21   19.266
> -42     65.0   72.2   31.218  21.666
> -58     52.0   1.0    37.994  17.333  8.833
> -56     115.6  144.4  29.21   38.533            !!!
> -65     39.0   72.2   22.338  13.0
> -55     58.5   72.2   29.21   19.5
> -65     65.0   72.2   31.218  21.666
> -59     86.7   117.0  35.705  28.9
> -78     7.2    1.0    4.394   2.4     1.366
> -22     65.0   72.2   31.218  21.666
> -49     72.2   72.2   33.507  24.066
> -68     13.0   21.7   8.879   4.333
> -56     52.0   52.0   24.536  17.333
> -66     43.3   52.0   24.536  14.433
> -63     26.0   39.0   15.563  8.666
> -42     65.0   58.5   31.218  21.666  20.583
> -60     39.0   26.0   20.49   13.0    10.833
> -63     28.9   58.5   17.852  9.633

Cases where the rx/tx estimate would be higher are marked with !!!.

Why bother and look at rx at all? Asymmetric routing should already
work. I was bit concerned about highly asymmetric links, especially
those where the path back might not work. It might not be worth it though.

Anyway, there are significant over-simplifications in here:
- Is what I see here even representative / does it apply universally?
- Is 5Ghz and 2.4GHz and their modes even comparable like this? Across
drivers/chips?

Regards,
  René
> +               if (sinfo.filled & BIT(NL80211_STA_INFO_TX_BITRATE)) {
> +                       tx = cfg80211_calculate_bitrate(&sinfo.txrate);
> +                       if (sinfo.filled & BIT(NL80211_STA_INFO_RX_BITRATE)) {
> +                               rx = 
> cfg80211_calculate_bitrate(&sinfo.rxrate);
> +                               if (rx < tx) {
> +                                       return (rx + tx) / 6;
> +                               }
> +                       }
> +                       return tx / 3;
> +               }
>
> Thanks,
> Marek
  
Marek Lindner June 10, 2019, 3:31 a.m. UTC | #6
On Sunday, 9 June 2019 20:45:06 HKT René Treffer wrote:
> I am testing this on devices with ath9k (2.4GHz) and ath10k (5GHz), so I
> was looking at the estimates I get from ath9k. Here is a dump from my
> 
> home network on 2.4GHz/ath9k and what rx/3 would give us:
> > signal  tx     rx     expect  tx/3    min(tx/3,(rx+tx)/2/3)
> > -77     13.0   43.3   6.682   4.333
> > -57     130.0  117.0  44.677  43.333  41.166
> > -53     117.0  130.0  42.388  39.0
> > -82     43.3   6.5    13.366  14.433  8.3      (!!!)
> > -63     52.0   86.7   26.733  17.333
> > -58     130.0  173.3  29.21   43.333            !!!
> > -82     6.5    43.3   2.197   2.166
> > -48     104.0  65.0   40.191  34.666  28.166
> > -69     57.8   13.0   20.49   19.266  11.8
> > -58     86.7   52.0   33.507  28.9    23.116
> > -58     52.0   1.0    37.994  17.333  8.833
> > -56     115.6  144.4  29.21   38.533            !!!


To confirm my understanding: What this table shows are raw tx/rx link estimated 
values ? None of these numbers compares to Minstrel HT expected throughput or 
actual throughput ?


> Cases where the rx/tx estimate would be higher are marked with !!!.

I also don't quite understand what the '!!!' thing is trying to indicate. What 
is being compared ? But it may be due to my misunderstandings above. 

In my small test setup with one ath10k device meshing with ath9k over 2.4GHz, 
your tx / 3 formula seems to be quite accurate (had removed the rx part). 

# batctl o (your magic formula)
* ac:86:74:00:38:06    0.930s (       45.7)  ac:86:74:00:38:06 [    mesh24]

# batctl tp ac:86:74:00:38:06 (actual throughput)
Test duration 10440ms.
Sent 58393512 Bytes.
Throughput: 5.33 MB/s (44.75 Mbps)

What would be interesting is how the numbers produced by 'tx / 3' compare to 
either the actual throughput (can easily be tested with the throughput meter) 
or Minstrel expected throughput. 


> Why bother and look at rx at all? Asymmetric routing should already
> work. I was bit concerned about highly asymmetric links, especially
> those where the path back might not work. It might not be worth it though.

Generally, the return path might be entirely different. Batman-adv does not 
enforce or even endorse symetric paths. If there is better path for the return 
route, batman-adv will choose the better path based on tx from the sender and 
if only one return path exists, we don't care anyway ..

Cheers,
Marek
  
Кирилл Луконин June 10, 2019, 7:37 a.m. UTC | #7
Hello, colleagues.

I have a working solution for this problem. It is not batman-related,
but I decided to share it with you right here.
Please let me clarify some  details.

1) Some ath10k firmwares (10.2) do not export tx bitrate. So we can't
rely on it.
2) Throughput estimation is better to inject from userspace, rather
than make batman estimate it from unreliable sources.
3) Here is the patch for mac80211 We made for ath10k and such drivers
that do not export expected throughput value.

I also think it's a better way because a lot of drivers do not work
with mac80211 (wil6210). And all driver-dependent math can be easily
changed on-the-fly.


Here is simple patch that make possible this expected throughput injection.

--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -12,6 +12,7 @@

 #include <linux/debugfs.h>
 #include <linux/ieee80211.h>
+#include <net/mac80211.h>
 #include "ieee80211_i.h"
 #include "debugfs.h"
 #include "debugfs_sta.h"
@@ -20,6 +21,8 @@

 /* sta attributtes */

+#define DEF_THR_BUFF_SIZE sizeof("4294967295")
+
 #define STA_READ(name, field, format_string)                \
 static ssize_t sta_ ##name## _read(struct file *file,            \
                    char __user *userbuf,        \
@@ -490,6 +493,60 @@ static ssize_t sta_vht_capa_read(struct
 STA_OPS(vht_capa);


+static ssize_t sta_def_thr_read(struct file *file, char __user *userbuf,
+                 size_t count, loff_t *ppos)
+{
+    int ret = 0;
+    char buf[DEF_THR_BUFF_SIZE] = { 0 };
+    struct sta_info *sta = file->private_data;
+
+    rcu_read_lock();
+
+    // Access synchronization to struct sta_info is documented in
net/mac80211/sta_info.c:34
+    ret = snprintf(buf, DEF_THR_BUFF_SIZE - 1, "%u", sta->def_thr);
+
+    rcu_read_unlock();
+
+    if(ret >= DEF_THR_BUFF_SIZE)
+        return -EFAULT;
+
+    buf[DEF_THR_BUFF_SIZE - 1] = '\0';
+
+    return simple_read_from_buffer(userbuf, count, ppos, buf, ret);
+}
+
+static ssize_t sta_def_thr_write(struct file *file, const char __user *userbuf,
+                 size_t count, loff_t *ppos)
+{
+    u32 thr = 0;
+    int ret = 0;
+    char buf[DEF_THR_BUFF_SIZE] = { 0 };
+    struct sta_info *sta = file->private_data;
+
+    if(count >= DEF_THR_BUFF_SIZE)
+        return -EINVAL;
+
+    if (copy_from_user(buf, userbuf, count))
+        return -EFAULT;
+
+    buf[DEF_THR_BUFF_SIZE - 1] = '\0';
+
+    ret = sscanf(buf, "%u", &thr);
+    if(ret != 1)
+        return -EINVAL;
+
+    rcu_read_lock();
+
+    // Access synchronization to struct sta_info is documented in
net/mac80211/sta_info.c:34
+    sta->def_thr = thr;
+    ieee80211_sta_set_expected_throughput(&sta->sta, thr);
+
+    rcu_read_unlock();
+
+    return count;
+}
+STA_OPS_RW(def_thr);
+
 #define DEBUGFS_ADD(name) \
     debugfs_create_file(#name, 0400, \
         sta->debugfs_dir, sta, &sta_ ##name## _ops);
@@ -534,6 +591,7 @@ void ieee80211_sta_debugfs_add(struct st
     DEBUGFS_ADD(agg_status);
     DEBUGFS_ADD(ht_capa);
     DEBUGFS_ADD(vht_capa);
+    DEBUGFS_ADD(def_thr);

     DEBUGFS_ADD_COUNTER(rx_duplicates, rx_stats.num_duplicates);
     DEBUGFS_ADD_COUNTER(rx_fragments, rx_stats.fragments);
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -2305,6 +2305,9 @@ u32 sta_get_expected_throughput(struct s
     else
         thr = drv_get_expected_throughput(local, sta);

+    if(thr == 0 && sta->def_thr != 0)
+        thr = sta->def_thr;
+
     return thr;
 }

--- b/net/mac80211/sta_info.h
+++ a/net/mac80211/sta_info.h
@@ -509,6 +509,7 @@
     struct work_struct drv_deliver_wk;

     u16 listen_interval;
+    u32 def_thr;

     bool dead;
     bool removed;



Best Regards,
Lukonin Kirill

пн, 10 июн. 2019 г. в 08:32, Marek Lindner <mareklindner@neomailbox.ch>:
>
> On Sunday, 9 June 2019 20:45:06 HKT René Treffer wrote:
> > I am testing this on devices with ath9k (2.4GHz) and ath10k (5GHz), so I
> > was looking at the estimates I get from ath9k. Here is a dump from my
> >
> > home network on 2.4GHz/ath9k and what rx/3 would give us:
> > > signal  tx     rx     expect  tx/3    min(tx/3,(rx+tx)/2/3)
> > > -77     13.0   43.3   6.682   4.333
> > > -57     130.0  117.0  44.677  43.333  41.166
> > > -53     117.0  130.0  42.388  39.0
> > > -82     43.3   6.5    13.366  14.433  8.3      (!!!)
> > > -63     52.0   86.7   26.733  17.333
> > > -58     130.0  173.3  29.21   43.333            !!!
> > > -82     6.5    43.3   2.197   2.166
> > > -48     104.0  65.0   40.191  34.666  28.166
> > > -69     57.8   13.0   20.49   19.266  11.8
> > > -58     86.7   52.0   33.507  28.9    23.116
> > > -58     52.0   1.0    37.994  17.333  8.833
> > > -56     115.6  144.4  29.21   38.533            !!!
>
>
> To confirm my understanding: What this table shows are raw tx/rx link estimated
> values ? None of these numbers compares to Minstrel HT expected throughput or
> actual throughput ?
>
>
> > Cases where the rx/tx estimate would be higher are marked with !!!.
>
> I also don't quite understand what the '!!!' thing is trying to indicate. What
> is being compared ? But it may be due to my misunderstandings above.
>
> In my small test setup with one ath10k device meshing with ath9k over 2.4GHz,
> your tx / 3 formula seems to be quite accurate (had removed the rx part).
>
> # batctl o (your magic formula)
> * ac:86:74:00:38:06    0.930s (       45.7)  ac:86:74:00:38:06 [    mesh24]
>
> # batctl tp ac:86:74:00:38:06 (actual throughput)
> Test duration 10440ms.
> Sent 58393512 Bytes.
> Throughput: 5.33 MB/s (44.75 Mbps)
>
> What would be interesting is how the numbers produced by 'tx / 3' compare to
> either the actual throughput (can easily be tested with the throughput meter)
> or Minstrel expected throughput.
>
>
> > Why bother and look at rx at all? Asymmetric routing should already
> > work. I was bit concerned about highly asymmetric links, especially
> > those where the path back might not work. It might not be worth it though.
>
> Generally, the return path might be entirely different. Batman-adv does not
> enforce or even endorse symetric paths. If there is better path for the return
> route, batman-adv will choose the better path based on tx from the sender and
> if only one return path exists, we don't care anyway ..
>
> Cheers,
> Marek
  
René Treffer June 10, 2019, 10:06 a.m. UTC | #8
On 10.06.19 05:31, Marek Lindner wrote:
> On Sunday, 9 June 2019 20:45:06 HKT René Treffer wrote:
>> I am testing this on devices with ath9k (2.4GHz) and ath10k (5GHz), so I
>> was looking at the estimates I get from ath9k. Here is a dump from my
>>
>> home network on 2.4GHz/ath9k and what rx/3 would give us:
>>> signal  tx     rx     expect  tx/3    min(tx/3,(rx+tx)/2/3)
>>> -77     13.0   43.3   6.682   4.333
>>> -57     130.0  117.0  44.677  43.333  41.166
>>> -53     117.0  130.0  42.388  39.0
>>> -82     43.3   6.5    13.366  14.433  8.3      (!!!)
>>> -63     52.0   86.7   26.733  17.333
>>> -58     130.0  173.3  29.21   43.333            !!!
>>> -82     6.5    43.3   2.197   2.166
>>> -48     104.0  65.0   40.191  34.666  28.166
>>> -69     57.8   13.0   20.49   19.266  11.8
>>> -58     86.7   52.0   33.507  28.9    23.116
>>> -58     52.0   1.0    37.994  17.333  8.833
>>> -56     115.6  144.4  29.21   38.533            !!!
>
> To confirm my understanding: What this table shows are raw tx/rx link estimated 
> values ? None of these numbers compares to Minstrel HT expected throughput or 
> actual throughput ?

Ah sorry, _expect_ is the current ath9k expected throughput and that
should be minstrel, right? I pulled data from my ath9k devices, e.g.

> # iw dev wlan1 station dump
> Station e8:de:27:70:0e:bd (on wlan1)
>         [...]
>         signal:         -56 [-59, -59, -80] dBm
>         tx bitrate:     117.0 MBit/s MCS 20
>         rx bitrate:     144.4 MBit/s MCS 15 short GI
>         expected throughput:    42.388Mbps
>         [...]
Those are the potential inputs (-56, 117, 144.4) and a desired output
(42.388), or as a table

> signal  tx     rx     expect
> -56     117.0  144.4  42.388
I then computed manually the tx/3 (39.0) which is lower than
(rx+tx)/2/3. The full line would be

> signal  tx     rx     expect  tx/3    min(tx/3,(rx+tx)/2/3)
> -56     117.0  144.4  42.388  39.0
I hope this makes sense now. I wanted to get close to the current
throughput estimation with worse inputs.
I would be happy to check more inputs, but the tx/3 turned out to be
pretty close and usually slightly lower.

>
>
>> Cases where the rx/tx estimate would be higher are marked with !!!.
> I also don't quite understand what the '!!!' thing is trying to indicate. What 
> is being compared ? But it may be due to my misunderstandings above. 

I haven't done an actual throughput test, and I would expect the outputs
of my heuristic to be worse.
So I wanted to give slightly lower values than the expected throughput.

The other way to think about it: if you were to replace the
expected_throughput input where would you over-estimate the link quality
now?

>
> In my small test setup with one ath10k device meshing with ath9k over 2.4GHz, 
> your tx / 3 formula seems to be quite accurate (had removed the rx part). 
>
> # batctl o (your magic formula)
> * ac:86:74:00:38:06    0.930s (       45.7)  ac:86:74:00:38:06 [    mesh24]
>
> # batctl tp ac:86:74:00:38:06 (actual throughput)
> Test duration 10440ms.
> Sent 58393512 Bytes.
> Throughput: 5.33 MB/s (44.75 Mbps)
>
> What would be interesting is how the numbers produced by 'tx / 3' compare to 
> either the actual throughput (can easily be tested with the throughput meter) 
> or Minstrel expected throughput. 

Comparing with actual throughput sounds like a good idea, I'll do that next.
Right now I don't know how well estimates on both radios hold and how
well they are comparable.

>
>
>> Why bother and look at rx at all? Asymmetric routing should already
>> work. I was bit concerned about highly asymmetric links, especially
>> those where the path back might not work. It might not be worth it though.
> Generally, the return path might be entirely different. Batman-adv does not 
> enforce or even endorse symetric paths. If there is better path for the return 
> route, batman-adv will choose the better path based on tx from the sender and 
> if only one return path exists, we don't care anyway ..
>
> Cheers,
> Marek
  
Marek Lindner June 12, 2019, 6:39 a.m. UTC | #9
Hi,

> I have a working solution for this problem. It is not batman-related,
> but I decided to share it with you right here.
> Please let me clarify some  details.
> 
> 1) Some ath10k firmwares (10.2) do not export tx bitrate. So we can't
> rely on it.
> 2) Throughput estimation is better to inject from userspace, rather
> than make batman estimate it from unreliable sources.
> 3) Here is the patch for mac80211 We made for ath10k and such drivers
> that do not export expected throughput value.

you are very right about those issues. However, your patch only provides a 
quick way to push an arbitrary throughput metric into the batman-adv kernel 
module. The current discussion is about how such metric could be best derived 
in an automated fashion. 

Would you mind sharing your approach to obtaining such a metric that addresses 
the problems mentioned above ?

Thanks,
Marek
  
Кирилл Луконин June 12, 2019, 8:50 p.m. UTC | #10
ср, 12 июн. 2019 г. в 11:39, Marek Lindner <mareklindner@neomailbox.ch>:
>
> Hi,
>
> > I have a working solution for this problem. It is not batman-related,
> > but I decided to share it with you right here.
> > Please let me clarify some  details.
> >
> > 1) Some ath10k firmwares (10.2) do not export tx bitrate. So we can't
> > rely on it.
> > 2) Throughput estimation is better to inject from userspace, rather
> > than make batman estimate it from unreliable sources.
> > 3) Here is the patch for mac80211 We made for ath10k and such drivers
> > that do not export expected throughput value.
>
> you are very right about those issues. However, your patch only provides a
> quick way to push an arbitrary throughput metric into the batman-adv kernel
> module. The current discussion is about how such metric could be best derived
> in an automated fashion.
>
> Would you mind sharing your approach to obtaining such a metric that addresses
> the problems mentioned above ?
>
> Thanks,
> Marek
>

Yes, sure.

I can't share a source code because it partially belongs to one
commercial company.
But I want and I can share my ideas with empirical evidence.
Also I will be very glad to help Batman-avd mesh community and become
a part of it.

So here is the algorithm. It has a structure we call matryoshka.
ET - Expected_throughput.

1) ET = TX_bitrate * Transmit_probability * Overhead_coefficient
Transmit_probability is always less than 1 so Expected throughput
can't be equal to the TX_bitrate. Overhead_coefficient is also should
be less than 1

2) Transmit_probability = 1 - Retry_probability - Error_probability

3) Retry_probability = TX_retries / TX_packets

4) Error_probability = TX_errors / TX_packets

5) Overhead_coefficient for 802.11 is fair enough to be 0.65, but can
be changed after additional testing.


ET = TX_bitrate * (1 - (TX_retries + TX_errors) / TX_packets) * 0.65
Such technique has very large hysteresis which is good to avoid
flapping between different nodes.
In my MESH lab this formula works quite well.
Please ask any questions and feel free to criticize.


Also please note that ath10k tx_bitrate can be fetched from ath10k
firmware debugfs statistics. Even with 10.2 firmware.


Best Regards,
Lukonin Kirill
  
Marek Lindner June 25, 2019, 8:26 a.m. UTC | #11
On Thursday, 13 June 2019 04:50:44 HKT Кирилл Луконин wrote:
> So here is the algorithm. It has a structure we call matryoshka.
> ET - Expected_throughput.
> 
> 1) ET = TX_bitrate * Transmit_probability * Overhead_coefficient
> Transmit_probability is always less than 1 so Expected throughput
> can't be equal to the TX_bitrate. Overhead_coefficient is also should
> be less than 1
> 
> 2) Transmit_probability = 1 - Retry_probability - Error_probability
> 
> 3) Retry_probability = TX_retries / TX_packets
> 
> 4) Error_probability = TX_errors / TX_packets
> 
> 5) Overhead_coefficient for 802.11 is fair enough to be 0.65, but can
> be changed after additional testing.
> 
> 
> ET = TX_bitrate * (1 - (TX_retries + TX_errors) / TX_packets) * 0.65
> Such technique has very large hysteresis which is good to avoid
> flapping between different nodes.
> In my MESH lab this formula works quite well.

This looks like an interesting approach. Which chips / environments did you 
test this formula with and how did the result compare to the actual TCP 
throughput ?

Thanks,
Marek
  
Кирилл Луконин June 27, 2019, 10:41 a.m. UTC | #12
Hello, Marek.

On Thursday, 13 June 2019 04:50:44 HKT Кирилл Луконин wrote:
>> So here is the algorithm. It has a structure we call matryoshka.
>> ET - Expected_throughput.
>>
>> 1) ET = TX_bitrate * Transmit_probability * Overhead_coefficient
>> Transmit_probability is always less than 1 so Expected throughput
>> can't be equal to the TX_bitrate. Overhead_coefficient is also should
>> be less than 1
>>
>> 2) Transmit_probability = 1 - Retry_probability - Error_probability
>>
>> 3) Retry_probability = TX_retries / TX_packets
>>
>> 4) Error_probability = TX_errors / TX_packets
>>
>> 5) Overhead_coefficient for 802.11 is fair enough to be 0.65, but can
>> be changed after additional testing.
>>
>>
>> ET = TX_bitrate * (1 - (TX_retries + TX_errors) / TX_packets) * 0.65
>> Such technique has very large hysteresis which is good to avoid
>> flapping between different nodes.
>> In my MESH lab this formula works quite well.

>This looks like an interesting approach. Which chips / environments did you
>test this formula with and how did the result compare to the actual TCP
>throughput ?

>Thanks,
>Marek


Mostly QCA988x was tested.
Sorry, I lost my test results so I need to do it again.
I have UBNT AC MESH, UBNT AP AC Lite and TP-Link RE450 in my lab.

Also, as I think, it's better to test UDP throughput, But I can test
both TCP and UDP.
This formula always show a result that close to UDP throughput. So,
may be we can think about additional parameters/coefficients that
depend on protocols or something else.



Best Regards,
Lukonin Kirill

вт, 25 июн. 2019 г. в 13:26, Marek Lindner <mareklindner@neomailbox.ch>:
>
> On Thursday, 13 June 2019 04:50:44 HKT Кирилл Луконин wrote:
> > So here is the algorithm. It has a structure we call matryoshka.
> > ET - Expected_throughput.
> >
> > 1) ET = TX_bitrate * Transmit_probability * Overhead_coefficient
> > Transmit_probability is always less than 1 so Expected throughput
> > can't be equal to the TX_bitrate. Overhead_coefficient is also should
> > be less than 1
> >
> > 2) Transmit_probability = 1 - Retry_probability - Error_probability
> >
> > 3) Retry_probability = TX_retries / TX_packets
> >
> > 4) Error_probability = TX_errors / TX_packets
> >
> > 5) Overhead_coefficient for 802.11 is fair enough to be 0.65, but can
> > be changed after additional testing.
> >
> >
> > ET = TX_bitrate * (1 - (TX_retries + TX_errors) / TX_packets) * 0.65
> > Such technique has very large hysteresis which is good to avoid
> > flapping between different nodes.
> > In my MESH lab this formula works quite well.
>
> This looks like an interesting approach. Which chips / environments did you
> test this formula with and how did the result compare to the actual TCP
> throughput ?
>
> Thanks,
> Marek
>
  
Marek Lindner June 27, 2019, 10:57 a.m. UTC | #13
On Thursday, 27 June 2019 18:41:54 HKT Кирилл Луконин wrote:
> Mostly QCA988x was tested.
> Sorry, I lost my test results so I need to do it again.
> I have UBNT AC MESH, UBNT AP AC Lite and TP-Link RE450 in my lab.

It'd be great to see those numbers. 


> Also, as I think, it's better to test UDP throughput, But I can test
> both TCP and UDP.
> This formula always show a result that close to UDP throughput. So,
> may be we can think about additional parameters/coefficients that
> depend on protocols or something else.

Why is UDP better ? The thinking behind comparing to TCP throughput is that 
most applications use TCP and not UDP. 

Cheers,
Marek
  
Marek Lindner Aug. 11, 2019, 12:50 p.m. UTC | #14
On Monday, 10 June 2019 18:06:23 HKT René Treffer wrote:
> > I also don't quite understand what the '!!!' thing is trying to indicate.
> > What is being compared ? But it may be due to my misunderstandings above.
>
> I haven't done an actual throughput test, and I would expect the outputs
> of my heuristic to be worse.
> So I wanted to give slightly lower values than the expected throughput.
> 
> The other way to think about it: if you were to replace the
> expected_throughput input where would you over-estimate the link quality
> now?

Sorry, I don't follow that train of thought. Due to the lack of numbers 
comparing the magic formula (tx /3) with actual throughput we (Antonio and I) 
ran a few tests on ath10k devices using your patch. Surprising to us, this 
approach worked quite well. 

I will re-post your patch as RFC to encourage others to test the resulting 
B.A.T.M.A.N. V routing decisions. If you happen to have tested more on your 
end, please share your results.

Also, why considering rx on top of tx still isn't clear. Could you explain 
some more ?

Thanks,
Marek
  
Marek Lindner Aug. 11, 2019, 1:17 p.m. UTC | #15
On Sunday, 11 August 2019 20:54:36 HKT Rene Treffer wrote:
> I found a high TX and (very) low RX unrealistic.
> 
> But there was another suggestion to use the "success" rate (1-retransmits
> or errors) that sounds like a better idea. I would suggest to disregard RX.

Right, retransmission rate or error rate might be helpful to add. A number of 
questions still need to be answered: 

* Is this information exported by Wifi drivers like ath9k/ath10k/others ?
* How is this information factored into the equation ?
* How does the result compare to the actual throughput ? 

Especially the last point is somewhat crucial simply because in a mesh network 
with a healthy mix of devices & drivers all these various sources of 
throughput data need to be comparable. Picture a situation in which batman-adv 
needs to choose a route involving Ethernet throughput / manual throughput / 
expected throughput / bitrate throughput.

Cheers,
Marek
  

Patch

diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index 2614a9ca..ce3b52f1 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -68,7 +68,7 @@  static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
 	struct ethtool_link_ksettings link_settings;
 	struct net_device *real_netdev;
 	struct station_info sinfo;
-	u32 throughput;
+	u32 throughput, rx, tx;
 	int ret;
 
 	/* if the user specified a customised value for this interface, then
@@ -107,10 +107,25 @@  static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
 		}
 		if (ret)
 			goto default_throughput;
-		if (!(sinfo.filled & BIT(NL80211_STA_INFO_EXPECTED_THROUGHPUT)))
-			goto default_throughput;
 
-		return sinfo.expected_throughput / 100;
+		if (sinfo.filled & BIT(NL80211_STA_INFO_EXPECTED_THROUGHPUT)) {
+			return sinfo.expected_throughput / 100;
+		}
+
+		// try to estimate en expected throughput based on reported rx/tx rates
+		// 1/3 of tx or 1/3 of the average of rx and tx, whichever is smaller
+		if (sinfo.filled & BIT(NL80211_STA_INFO_TX_BITRATE)) {
+			tx = cfg80211_calculate_bitrate(&sinfo.txrate);
+			if (sinfo.filled & BIT(NL80211_STA_INFO_RX_BITRATE)) {
+				rx = cfg80211_calculate_bitrate(&sinfo.rxrate);
+				if (rx < tx) {
+					return (rx + tx) / 6;
+				}
+			}
+			return tx / 3;
+		}
+
+		goto default_throughput;
 	}
 
 	/* if not a wifi interface, check if this device provides data via