batman-adv: increase default hop penalty

Message ID 1403000163-8148-1-git-send-email-sw@simonwunderlich.de (mailing list archive)
State Accepted, archived
Commit 7644650bb766bd4c7b6be5d97e6b1c3ed93a38d7
Headers

Commit Message

Simon Wunderlich June 17, 2014, 10:16 a.m. UTC
  From: Simon Wunderlich <simon@open-mesh.com>

The default hop penalty is currently set to 15, which is applied like
that for multi interface devices (e.g. dual band APs). Single band
devices will still use an effective penalty of 30 (hop penalty + wifi
penalty).

After receiving reports of too long paths in mesh networks with dual
band APs which were fixed by increasing the hop penalty, we'd like to
suggest to increase that default value in the default setting as well.
We've evaluated that increase in a handful of medium sized mesh
networks (5-20 nodes) with single and dual band devices, with changes
for the better (shorter routes, higher throughput) or no change at all.

This patch changes the hop penalty to 30, which will give an effective
penalty of 60 on single band devices (hop penalty + wifi penalty).

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
---
 soft-interface.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Linus Lüssing June 17, 2014, 10:44 p.m. UTC | #1
On Tue, Jun 17, 2014 at 12:16:03PM +0200, Simon Wunderlich wrote:
> This patch changes the hop penalty to 30, which will give an effective
> penalty of 60 on single band devices (hop penalty + wifi penalty).

"batman-adv: encourage batman to take shorter routes by changing the default hop penalty"
(6a12de1939281dd7fa62a6e22dc2d2c38f82734f)

This patch changed the hop penalty for single (and back then
also dual) band devices from 10 to 30.


If 60 were always the correct value, why wasn't it changed from
10 to 60 back then?

If the reason was not having it measured thoroughly enough
back then, why would your latest measurements be? (For instance
what will prevent the hop penalty being changed again next year?)

Any data for others to check?

Cheers, Linus
  
Simon Wunderlich June 18, 2014, 9:21 a.m. UTC | #2
Hi Linus,

> On Tue, Jun 17, 2014 at 12:16:03PM +0200, Simon Wunderlich wrote:
> > This patch changes the hop penalty to 30, which will give an effective
> > penalty of 60 on single band devices (hop penalty + wifi penalty).
> 
> "batman-adv: encourage batman to take shorter routes by changing the
> default hop penalty" (6a12de1939281dd7fa62a6e22dc2d2c38f82734f)
> 
> This patch changed the hop penalty for single (and back then
> also dual) band devices from 10 to 30.

that's right. Actually, at this time I was using 50 for most of my networks, 
so 30 was a compromise.

 
> 
> If 60 were always the correct value, why wasn't it changed from
> 10 to 60 back then?

There is no such thing as a correct value for that. The hop penalty is an 
empirical value  derived from various experiments. The original idea was to 
introduce a artificial decrease of the metric for perfect networks (e.g. 
Ethernet) to avoid loops, but it turned out that it can also be useful to 
avoid route flapping between paths of different lengths, or to compensate 
small changes in the measurement. For example, when we placed 10 routers in 
one place, the routes were flapping from 1 hop (which would to be expected) to 
2 hops - because of small changes in the TQ measurement. We then increased the 
hop penalty from 10 to 30 (or even 50) which solved that problem.

> 
> If the reason was not having it measured thoroughly enough
> back then, why would your latest measurements be? (For instance
> what will prevent the hop penalty being changed again next year?)

What is "thoroughly enough"? I didn't do "scientifical research" or write any 
paper on that, and don't plan to do so. It's a default value, but anyone who 
has a better idea can change that. It's solely based on our personal 
experience. I don't guarantee that we will not change it again next year, but 
last time we kept it for quite some time too ...

I tested it on 7 networks with 10-20 nodes each, and different type of 
devices. That is certainly more than last time. If you have the time/resources 
to do a bigger / more detailed test, feel free to do so and share your 
results. :)


> 
> Any data for others to check?

Nope, unfortunately these are customer networks, and I can't reveal data from 
that in public. But I can certainly explain how I tested: We were running 
Antonios throughput meter on these devices and saw some unusual slow 
throughput and too long paths (4 hops were 2 were possible). We then increase 
the hop penalty to the suggested value, and both the hopcount decreased and 
the throughput increase. We repeated that with other 6 networks and had either 
similar improvement or no change at all (since all hopcounts were already 
one).

Cheers,
    Simon
  
Linus Lüssing June 18, 2014, 7:56 p.m. UTC | #3
On Wed, Jun 18, 2014 at 11:21:14AM +0200, Simon Wunderlich wrote:
> > 
> > Any data for others to check?
> 
> Nope, unfortunately these are customer networks, and I can't reveal data from 
> that in public.

That's very, very unfortunate... and made my hair stand on end. It
clashes/undermines a little with a point I love a lot about free
software... Anyways, maybe that's not something to discuss on a
mailing list.


Damn it, why don't we have the stupid hop count in the
measurements from the last WBM? Would have been very easy to
verify with that.

Maybe we could try using the WBM to transparently find better
default values in the future (again; I remember that you had made
nice graphs for the decision of having interface-alternating or
interface-bonding as the default back then at WBMv3 in Italy -
that was awesome!)?

> But I can certainly explain how I tested: We were running 
> Antonios throughput meter on these devices and saw some unusual slow 
> throughput and too long paths (4 hops were 2 were possible). We then increase 
> the hop penalty to the suggested value, and both the hopcount decreased and 
> the throughput increase. We repeated that with other 6 networks and had either 
> similar improvement or no change at all (since all hopcounts were already 
> one).

What mcast-rate were you using? Will this make things worse for
setups with a different mcast-rate?

> 
> Cheers,
>     Simon

Cheers, Linus
  
Simon Wunderlich June 19, 2014, 4:18 p.m. UTC | #4
> On Wed, Jun 18, 2014 at 11:21:14AM +0200, Simon Wunderlich wrote:
> > > Any data for others to check?
> > 
> > Nope, unfortunately these are customer networks, and I can't reveal data
> > from that in public.
> 
> That's very, very unfortunate... and made my hair stand on end. It
> clashes/undermines a little with a point I love a lot about free
> software... Anyways, maybe that's not something to discuss on a
> mailing list.

I don't quite get why you are so emotional about that. There are tons of other 
default settings and "heuristic" values which we determined with much less 
"scientific" effort - e.g. the wifi penalty, local window size, request 
timeout, tq global window size, broadcast number ... and nobody cried about 
setting these values or changing them.

I understand that it would be nicer to get all data in public, but open 
software is used in private and/or commercial environments as well and we 
should respect that these people don't want their network topology revealed. 
These networks are not public playground. Of course, if you want you can 
repeat these kind of experiments in your community or test mesh networks 
(weren't there some EU projects who offered that kind of stuff? :] )

> 
> Damn it, why don't we have the stupid hop count in the
> measurements from the last WBM? Would have been very easy to
> verify with that.

Very easy ...? Well, if you think so, please propose/perform/evaluate these 
tests in the next battlemesh. :)

> 
> Maybe we could try using the WBM to transparently find better
> default values in the future (again; I remember that you had made
> nice graphs for the decision of having interface-alternating or
> interface-bonding as the default back then at WBMv3 in Italy -
> that was awesome!)?

Yeah, that wasn't so bad, but the tests were not very extensive too - 3 
devices with special hardware and setup. We could show the gains of 
alternating/bonding after all ... ;)

In any case, feel free to propose these kind of tests for next WBM.

> 
> > But I can certainly explain how I tested: We were running
> > Antonios throughput meter on these devices and saw some unusual slow
> > throughput and too long paths (4 hops were 2 were possible). We then
> > increase the hop penalty to the suggested value, and both the hopcount
> > decreased and the throughput increase. We repeated that with other 6
> > networks and had either similar improvement or no change at all (since
> > all hopcounts were already one).
> 
> What mcast-rate were you using? Will this make things worse for
> setups with a different mcast-rate?

The mcast rate was 18M. I don't know if it gets "worse" for different MCS 
rates, and it depends what we think is "worse". In general, I'd expect that 
the protocol chooses longer links /shorter paths, for all mcast rates.

Cheers,
    Simon
  
Jay Brussels June 19, 2014, 7:25 p.m. UTC | #5
I will be setting up a test network with 20-25 radio's before our big
roll-out.  I can make the network available for testing if that would help
at all.

-----Original Message-----
From: B.A.T.M.A.N [mailto:b.a.t.m.a.n-bounces@lists.open-mesh.org] On Behalf
Of Simon Wunderlich
Sent: Thursday, June 19, 2014 12:18 PM
To: b.a.t.m.a.n@lists.open-mesh.org
Subject: Re: [B.A.T.M.A.N.] [PATCH] batman-adv: increase default hop penalty


> On Wed, Jun 18, 2014 at 11:21:14AM +0200, Simon Wunderlich wrote:
> > > Any data for others to check?
> > 
> > Nope, unfortunately these are customer networks, and I can't reveal 
> > data from that in public.
> 
> That's very, very unfortunate... and made my hair stand on end. It 
> clashes/undermines a little with a point I love a lot about free 
> software... Anyways, maybe that's not something to discuss on a 
> mailing list.

I don't quite get why you are so emotional about that. There are tons of
other default settings and "heuristic" values which we determined with much
less "scientific" effort - e.g. the wifi penalty, local window size, request
timeout, tq global window size, broadcast number ... and nobody cried about
setting these values or changing them.

I understand that it would be nicer to get all data in public, but open
software is used in private and/or commercial environments as well and we
should respect that these people don't want their network topology revealed.

These networks are not public playground. Of course, if you want you can
repeat these kind of experiments in your community or test mesh networks
(weren't there some EU projects who offered that kind of stuff? :] )

> 
> Damn it, why don't we have the stupid hop count in the measurements 
> from the last WBM? Would have been very easy to verify with that.

Very easy ...? Well, if you think so, please propose/perform/evaluate these
tests in the next battlemesh. :)

> 
> Maybe we could try using the WBM to transparently find better default 
> values in the future (again; I remember that you had made nice graphs 
> for the decision of having interface-alternating or interface-bonding 
> as the default back then at WBMv3 in Italy - that was awesome!)?

Yeah, that wasn't so bad, but the tests were not very extensive too - 3
devices with special hardware and setup. We could show the gains of
alternating/bonding after all ... ;)

In any case, feel free to propose these kind of tests for next WBM.

> 
> > But I can certainly explain how I tested: We were running Antonios 
> > throughput meter on these devices and saw some unusual slow 
> > throughput and too long paths (4 hops were 2 were possible). We then 
> > increase the hop penalty to the suggested value, and both the 
> > hopcount decreased and the throughput increase. We repeated that 
> > with other 6 networks and had either similar improvement or no 
> > change at all (since all hopcounts were already one).
> 
> What mcast-rate were you using? Will this make things worse for setups 
> with a different mcast-rate?

The mcast rate was 18M. I don't know if it gets "worse" for different MCS
rates, and it depends what we think is "worse". In general, I'd expect that
the protocol chooses longer links /shorter paths, for all mcast rates.

Cheers,
    Simon
  
Linus Lüssing June 22, 2014, 10:29 a.m. UTC | #6
Hi Simon,

first of all, sorry for me getting so emotional about it. My bad,
I know, it's usually not very constructive to get emotional on a
mailing list.


On Thu, Jun 19, 2014 at 06:18:11PM +0200, Simon Wunderlich wrote:
> > 
> > Damn it, why don't we have the stupid hop count in the
> > measurements from the last WBM? Would have been very easy to
> > verify with that.
> 
> Very easy ...? Well, if you think so, please propose/perform/evaluate these 
> tests in the next battlemesh. :)

What I ment was, that actually we sort of had these tests/the data
:). If I remember correctly, then there were non-dynamic
environment tests on the slides Axel and others presented. Where
you could see the number of hops each layer 3 routing protocol
took and what throughput they had. If there had been hop-count
information for batman-adv too, we could have compared it to the
other protocols and might have been able to deduce whether more or
less hops would have resulted in higher throughput.

I will see whether I can help setting up logging data from batctl-ping
next year :).

> 
> > 
> > Maybe we could try using the WBM to transparently find better
> > default values in the future (again; I remember that you had made
> > nice graphs for the decision of having interface-alternating or
> > interface-bonding as the default back then at WBMv3 in Italy -
> > that was awesome!)?
> 
> Yeah, that wasn't so bad, but the tests were not very extensive too - 3 
> devices with special hardware and setup. We could show the gains of 
> alternating/bonding after all ... ;)
> 
> In any case, feel free to propose these kind of tests for next WBM.
> 
> > 
> > > But I can certainly explain how I tested: We were running
> > > Antonios throughput meter on these devices and saw some unusual slow
> > > throughput and too long paths (4 hops were 2 were possible). We then
> > > increase the hop penalty to the suggested value, and both the hopcount
> > > decreased and the throughput increase. We repeated that with other 6
> > > networks and had either similar improvement or no change at all (since
> > > all hopcounts were already one).
> > 
> > What mcast-rate were you using? Will this make things worse for
> > setups with a different mcast-rate?
> 
> The mcast rate was 18M. I don't know if it gets "worse" for different MCS 
> rates, and it depends what we think is "worse". In general, I'd expect that 
> the protocol chooses longer links /shorter paths, for all mcast rates.

Thanks, 18MBit/s is a valuable information! Hm, I would kind of
guess then, that everyone using an mcast rate lower or equal to
18MBit/s should be good to go, right? If nodes are using a lower
mcast rate then they mostly have a lower packet loss and would
need a higher hop penalty to select the same "good" path.

On the other hand, people using an mcast rate higher than 18MBit/s
might want to have a hop penalty lower than 60. But right now, I'm
not aware of any such mesh networks, so probably it shouldn't make
things worse for other people if your measurements in your mesh
were correct (which I believe they were - I never wanted to
discredit your capabilities, you, Marek and Antonio are probably
the best people to perform such tests in a reliable way).

> 
> Cheers,
>     Simon

PS: Somebody noted on IRC, that it might seem that I would have a
problem with commercial, non-public mesh networks.
While I would certainly love it if everyone were setting
up their commercial network on top of a free community mesh
network Freifunk, Ninux etc., I don't have an issue if someone
decides to not do that, that's everyone's free choice :). And
people making a living with a commercial mesh network didn't get
me emotional at all, that wasn't it.
  
Marek Lindner June 24, 2014, 3:31 p.m. UTC | #7
On Tuesday 17 June 2014 12:16:03 Simon Wunderlich wrote:
> From: Simon Wunderlich <simon@open-mesh.com>
> 
> The default hop penalty is currently set to 15, which is applied like
> that for multi interface devices (e.g. dual band APs). Single band
> devices will still use an effective penalty of 30 (hop penalty + wifi
> penalty).
> 
> After receiving reports of too long paths in mesh networks with dual
> band APs which were fixed by increasing the hop penalty, we'd like to
> suggest to increase that default value in the default setting as well.
> We've evaluated that increase in a handful of medium sized mesh
> networks (5-20 nodes) with single and dual band devices, with changes
> for the better (shorter routes, higher throughput) or no change at all.
> 
> This patch changes the hop penalty to 30, which will give an effective
> penalty of 60 on single band devices (hop penalty + wifi penalty).
> 
> Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
> ---
>  soft-interface.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied in revision 7644650.

Thanks,
Marek
  

Patch

diff --git a/soft-interface.c b/soft-interface.c
index e783afb..9bf382d 100644
--- a/soft-interface.c
+++ b/soft-interface.c
@@ -757,7 +757,7 @@  static int batadv_softif_init_late(struct net_device *dev)
 	atomic_set(&bat_priv->gw.bandwidth_down, 100);
 	atomic_set(&bat_priv->gw.bandwidth_up, 20);
 	atomic_set(&bat_priv->orig_interval, 1000);
-	atomic_set(&bat_priv->hop_penalty, 15);
+	atomic_set(&bat_priv->hop_penalty, 30);
 #ifdef CONFIG_BATMAN_ADV_DEBUG
 	atomic_set(&bat_priv->log_level, 0);
 #endif