Message ID | 200908031821.22497.lindner_marek@yahoo.de |
---|---|
State | RFC, archived |
Headers |
Return-Path: <lindner_marek@yahoo.de> Received: from mo-p05-ob.rzone.de (mo-p05-ob.rzone.de [81.169.146.180]) by open-mesh.net (Postfix) with ESMTPS id 01912154399 for <b.a.t.m.a.n@lists.open-mesh.net>; Mon, 3 Aug 2009 10:49:20 +0000 (UTC) X-RZG-AUTH: :OGkHfVO9a++ASa1NN1xF8Z+yxAO4YqHmxoKm7X00LncCjhL5i1Yt3al3Gv4UR4pxi2RPNNa4 X-RZG-CLASS-ID: mo05 Received: from turgot.localnet (61-59-128-157.static.seed.net.tw [61.59.128.157]) by post.strato.de (fruni mo21) (RZmta 20.2) with ESMTP id 6020cfl739hmci for <b.a.t.m.a.n@lists.open-mesh.net>; Mon, 3 Aug 2009 12:22:46 +0200 (MEST) From: Marek Lindner <lindner_marek@yahoo.de> To: "The list for a Better Approach To Mobile Ad-hoc Networking" <b.a.t.m.a.n@lists.open-mesh.net> Date: Mon, 3 Aug 2009 18:21:12 +0800 User-Agent: KMail/1.11.4 (Linux/2.6.30-1-686; KDE/4.2.4; i686; ; ) References: <20090722105639.GH32143@ma.tech.ascom.ch> <200907280020.06026.lindner_marek@yahoo.de> <OF82FBEF87.4CF60579-ONC1257603.002FC1DB-C1257603.004FD789@CHASCOM.INT> In-Reply-To: <OF82FBEF87.4CF60579-ONC1257603.002FC1DB-C1257603.004FD789@CHASCOM.INT> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_iordKY7x5WtE2e+" Message-Id: <200908031821.22497.lindner_marek@yahoo.de> Subject: Re: [B.A.T.M.A.N.] batman goes looping... X-BeenThere: b.a.t.m.a.n@lists.open-mesh.net X-Mailman-Version: 2.1.11 Precedence: list Reply-To: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n@lists.open-mesh.net> List-Id: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n.lists.open-mesh.net> List-Unsubscribe: <https://lists.open-mesh.net/mm/options/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.net?subject=unsubscribe> List-Archive: <http://lists.open-mesh.net/pipermail/b.a.t.m.a.n> List-Post: <mailto:b.a.t.m.a.n@lists.open-mesh.net> List-Help: <mailto:b.a.t.m.a.n-request@lists.open-mesh.net?subject=help> List-Subscribe: <https://lists.open-mesh.net/mm/listinfo/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@lists.open-mesh.net?subject=subscribe> X-List-Received-Date: Mon, 03 Aug 2009 10:49:21 -0000 |
Commit Message
Marek Lindner
Aug. 3, 2009, 10:21 a.m. UTC
On Thursday 30 July 2009 22:32:04 Yang Su wrote: > 1. Similar to what Marek proposed. But push it to the extreme: switch to a > neighbor only when it has the best tq AND it has the newest seqno. This > fix along seems to already solve the looping problem in the test cases. It > reduces the rerouting time from more than 1 minutes to less than 15 > seconds. Marek: I also tried the patch you sent. It didn't help in this > setup. Changing the route based on the (fastest) sequence number has some drawbacks which we experienced before (BATMAN III). It tends to discriminate longer but better routes and favors short paths. In some asymetric link (worst case) scenarios it would route against all odds, simply because receiving a fast packet (newest seqno) does not mean we can send the same way back. If possible I'd like to avoid this strict seqno check. I attached another patch which will conduct a route switch only if the TQ of the sending neighbor is better than our current best route (no negative switching anymore). I tested the patch here and it works so far but my environment is less controlled than yours. Could you perform the same test on your setup ? If you intend to experiment with other ideas watch out for the sequence number as it will overflow. A simple "greater than" check might lead to strange results. Also, updates_routes() checks for changed HNA messages even if the next hop does not change. > 2. Relaxed echo cancellation. This is based on the following observation: > the TQ value that a node puts into OGM is completely decoupled with "from > which neighbor this OGM is received". As a result, the TQ value contained > in the echoed OGM represent the real TQ value at the neighbor which echoed > this OGM. The current echo cancellation implementation just drops all the > echoed OGM. This may prevent the node from updating the information towards > the neighbor that echos the OGM. In the extreme case, the information > towards that neighbor may becomes completely stale (similar to what happens > in case 2). The change I made: Always check the TQ contained in the > echoed OGMs. When it is worse than the avg TQ towards that neighbor, we use > this TQ reading to update the avg TQ towards that neighbor. This change > didn't show any effect during the chain tests. However, I still include > this change in the patch to bring up the discussion. Right, every node will emit his currently best TQ value but I did not understand how we can use that. If we send him a better TQ he will send back that number. If we send a bad TQ he will send his good number. Furthermore, each hop will apply some asymetric / hop / wifi penalty that we pull into our routing database ? Regards, Marek
Comments
Hi, I have 3 linksys wrt54 flashed with openwrt 7.09, webif'ed and installed BATMAN 0.2-rv478.1, all OK I think. I have arranged the 3 nodes such that node 1 can only see 2 , and 2 only 3, so no direct contact is available from 1 to 3 (except via the mesh) I have not broken the br-lan configuration of these devices and assigned 172.16.1.1/24 to node 1 172.16.1.2/24 to node 2 172.16.1.3/24 to node 3, so I think that means all packets recvd on wl0 or eh0 are bridged. I have a pc hooked into the lan port of node 3 From here I can ssh into node3 (via the ethernet) and into node2 via the wifi (but not .1) Netstat -rn on node 3 indicates that 172.16.1.1 (node1) is via 172.16.1.2 and this is confirmed by the batman web if that shows .1 is reached via .2 - all great. From the ssh prompt on .3 I can ping .2 with no loss and ping .1 with ~ 6% loss. However, if I try to ping .1 from the pc connected on the ethernet port of .3 I get no route to host. Putting wireshark on the ethernet ifc shows the arp going out but nothing coming back. Interestingly enough if I ssh into .3 and ping .1, with wireshark running on the ethernet of .3 Is see the outgoing ICMP but not the reply (which is there because the ping succeeds) . It seems as if the local host traffic is not truly bridged. This is the first attempt so I might have some incorrect/missing config or done something stupid, any ideas would be welcome! Thanks very much nick
|------------> | From: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Marek Lindner <lindner_marek@yahoo.de> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | To: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |"The list for a Better Approach To Mobile Ad-hoc Networking" <b.a.t.m.a.n@lists.open-mesh.net> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Date: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |03.08.2009 12:23 | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Subject: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Re: [B.A.T.M.A.N.] batman goes looping... | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Sent by: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |b.a.t.m.a.n-bounces@lists.open-mesh.net | >--------------------------------------------------------------------------------------------------------------------------------------------------| On Thursday 30 July 2009 22:32:04 Yang Su wrote: > 1. Similar to what Marek proposed. But push it to the extreme: switch to a > neighbor only when it has the best tq AND it has the newest seqno. This > fix along seems to already solve the looping problem in the test cases. It > reduces the rerouting time from more than 1 minutes to less than 15 > seconds. Marek: I also tried the patch you sent. It didn't help in this > setup. Changing the route based on the (fastest) sequence number has some drawbacks which we experienced before (BATMAN III). It tends to discriminate longer but better routes and favors short paths. In some asymetric link (worst case) scenarios it would route against all odds, simply because receiving a fast packet (newest seqno) does not mean we can send the same way back. If possible I'd like to avoid this strict seqno check. - You are right. Under certain circumstance, this approach may lead to less optimal route selection. Just to understand batman algorithm better, I try to make an example (please correct me if something's wrong here): There are two direct neighbors via which a sender communicates with a receiver. Via the good neighbor, their is a single good route with long delay. Between the bad neighbor and the receiver there are many parallel bad routes. Each bad route has high packet loss rate but very short delay. The receiver's OGMs arrive at the sender first via the bad neighbor. Although there is substantial OGM loss on each individual bad routes, by aggregation over those parallel bad routes, the bad neighbor is still able to relay most of sender's OGM to the receiver. As a result, if the sender already chooses the bad neighbor as the next hop, it will stick to it for quite long time and does not switch to the good neighbor. I attached another patch which will conduct a route switch only if the TQ of the sending neighbor is better than our current best route (no negative switching anymore). I tested the patch here and it works so far but my environment is less controlled than yours. Could you perform the same test on your setup ? - This approach achieves the similar goal as the seqno-based one: switch to a neighbor only when this neighbor is better and has more up-to-date information than the current best route. But this approach is less constrained and seems not to suffer from the above problem for the seqno-based approach. I have tested the patch on our test environment. It works pretty well on the test cases. > 2. Relaxed echo cancellation. This is based on the following observation: > the TQ value that a node puts into OGM is completely decoupled with "from > which neighbor this OGM is received". As a result, the TQ value contained > in the echoed OGM represent the real TQ value at the neighbor which echoed > this OGM. The current echo cancellation implementation just drops all the > echoed OGM. This may prevent the node from updating the information towards > the neighbor that echos the OGM. In the extreme case, the information > towards that neighbor may becomes completely stale (similar to what happens > in case 2). The change I made: Always check the TQ contained in the > echoed OGMs. When it is worse than the avg TQ towards that neighbor, we use > this TQ reading to update the avg TQ towards that neighbor. This change > didn't show any effect during the chain tests. However, I still include > this change in the patch to bring up the discussion. Right, every node will emit his currently best TQ value but I did not understand how we can use that. If we send him a better TQ he will send back that number. If we send a bad TQ he will send his good number. Furthermore, each hop will apply some asymetric / hop / wifi penalty that we pull into our routing database ? - The problem might happen when the neighbor thinks that I am his best neighbor (his good number is actually from me). In this case, my estimation of that neighbor's TQ is actually decided by my own TQ (my TQ minus one hop penalty). When my TQ drops down, the correct behavior should be that my estimation of that neighbor's TQ also drops down accordingly. However, when echo cancellation steps in, my estimation of that neighbor's TQ will not be updated and remains to be the stale value. This might cause problems (e.g. looping) in corner cases even with the new patch. Does this analysis make sense or I miss some details of the echo cancellation? Regards, Marek [attachment "switch-route-with-better-tq.patch" deleted by Yang Su/NSUYAN/CH/Ascom] _______________________________________________ B.A.T.M.A.N mailing list B.A.T.M.A.N@lists.open-mesh.net https://lists.open-mesh.net/mm/listinfo/b.a.t.m.a.n
On Thursday 06 August 2009 16:44:28 Yang Su wrote: > - You are right. Under certain circumstance, this approach may lead to less > optimal route selection. Just to understand batman algorithm better, I try > to make an example (please correct me if something's wrong here): There are > two direct neighbors via which a sender communicates with a receiver. Via > the good neighbor, their is a single good route with long delay. Between > the bad neighbor and the receiver there are many parallel bad routes. Each > bad route has high packet loss rate but very short delay. The receiver's > OGMs arrive at the sender first via the bad neighbor. Although there is > substantial OGM loss on each individual bad routes, by aggregation over > those parallel bad routes, the bad neighbor is still able to relay most of > sender's OGM to the receiver. As a result, if the sender already chooses > the bad neighbor as the next hop, it will stick to it for quite long time > and does not switch to the good neighbor. It is possble to simplify your example more to clearly illustrate the problem. Scenario: Host A wants to send data to host B. Consider the beautiful graphic I attached. ;-) The link between A and B is quite asymetric. No packet loss from B to A but heavy packet loss the other way round. Now, B periodically sends its OGMs and all OGMs will arrive at A directly and via the neighbor whereas the OGMs via the neighbor always be slower than the directly received ones. A should choose the longer path via the neighbor because it has no packet loss but the "fastest sequence number wins" algorithm won't allow that. > - This approach achieves the similar goal as the seqno-based one: switch to > a neighbor only when this neighbor is better and has more up-to-date > information than the current best route. But this approach is less > constrained and seems not to suffer from the above problem for the > seqno-based approach. I have tested the patch on our test environment. It > works pretty well on the test cases. Great! I comitted the patch immediately. Thanks again for your thorough analysis and ideas. :-) > - The problem might happen when the neighbor thinks that I am his best > neighbor (his good number is actually from me). In this case, my estimation > of that neighbor's TQ is actually decided by my own TQ (my TQ minus one hop > penalty). When my TQ drops down, the correct behavior should be that my > estimation of that neighbor's TQ also drops down accordingly. However, when > echo cancellation steps in, my estimation of that neighbor's TQ will not be > updated and remains to be the stale value. This might cause problems (e.g. > looping) in corner cases even with the new patch. Does this analysis make > sense or I miss some details of the echo cancellation? Unless you found another bug the echo cancellation should not hinder the propagation of our own TQ value. Once we (uml2) received a packet we will rebroadcast it. Our neighbor (uml1) will receive it (unless the packet got lost) and update its routing tables accordingly. Now, he will rebroadcast the packet as well which will be killed by the echo cancellation on uml1 as this packet does not contain new information. But even if the TQ received via uml2 is worse uml1 might still have a good TQ from uml3 in its routing table which might get rebroadcasted. Regards, Marek
> - This approach achieves the similar goal as the seqno-based one: switch to > a neighbor only when this neighbor is better and has more up-to-date > information than the current best route. But this approach is less > constrained and seems not to suffer from the above problem for the > seqno-based approach. I have tested the patch on our test environment. It > works pretty well on the test cases. Great! I comitted the patch immediately. Thanks again for your thorough analysis and ideas. :-) - Some updates: I did more thorough testing. It turns out that this patch does really solve the problem. This method seems not able to completely eliminate all possible forms of looping which happen in the test cases. This result in an special pattern: the rerouting is fast (~15 seconds), then (after 3~ 5 seconds) the network enters the looping state and it normally takes long time to recover. Last time when I did tests, I stop the tests right after I observed the successful rerouting. I didn't look into the cause of this problem yet. Any inputs are welcome. > - The problem might happen when the neighbor thinks that I am his best > neighbor (his good number is actually from me). In this case, my estimation > of that neighbor's TQ is actually decided by my own TQ (my TQ minus one hop > penalty). When my TQ drops down, the correct behavior should be that my > estimation of that neighbor's TQ also drops down accordingly. However, when > echo cancellation steps in, my estimation of that neighbor's TQ will not be > updated and remains to be the stale value. This might cause problems (e.g. > looping) in corner cases even with the new patch. Does this analysis make > sense or I miss some details of the echo cancellation? Unless you found another bug the echo cancellation should not hinder the propagation of our own TQ value. Once we (uml2) received a packet we will rebroadcast it. Our neighbor (uml1) will receive it (unless the packet got lost) and update its routing tables accordingly. Now, he will rebroadcast the packet as well which will be killed by the echo cancellation on uml1 as this packet does not contain new information. But even if the TQ received via uml2 is worse uml1 might still have a good TQ from uml3 in its routing table which might get rebroadcasted. - In the following example (also appended in the attachment :) ), if I understand the current echo cancellation implementation correctly, batman will enter permanent looping between A and B. In this example, A send to F, all the links are perfect and have the same delay. Only exception is link A-E. It is an asymmetric link. Regards, Yang (See attached file: looping.pdf)
On Friday 07 August 2009 23:13:46 Yang Su wrote: > - Some updates: I did more thorough testing. It turns out that this patch > does really solve the problem. This method seems not able to completely > eliminate all possible forms of looping which happen in the test cases. > This result in an special pattern: the rerouting is fast (~15 seconds), > then (after 3~ 5 seconds) the network enters the looping state and it > normally takes long time to recover. Last time when I did tests, I stop > the tests right after I observed the successful rerouting. I didn't look > into the cause of this problem yet. Any inputs are welcome. Can you provide logs from the nodes involved ? You also can send them off-list to avoid sending too big attachments. > - In the following example (also appended in the attachment :) ), if I > understand the current echo cancellation implementation correctly, batman > will enter permanent looping between A and B. In this example, A send to F, > all the links are perfect and have the same delay. Only exception is link > A-E. It is an asymmetric link. Why do you think it would loop permanently and why do you think it is the fault of the echo cancellation ? Actually, this example should[tm] be pretty easy. Every time the OGM from E or F arrive at A via the asymetric link A will apply a severe asymetric link penalty. All nodes will route via the longer path. Did you try to run this scenario in your testbed ? Regards, Marek
For the echo cancellation example: the problem happens with the interaction between A and B. Because of the asym link, A chooses B as its next neighbor towards F. Because the asym path is short, B always receives F's OGM from A first (before B receives the OGM with the same seqno from C). As a result, from A's point of view, F's OGMs rebroadcasted by B are always echoes and A will never update the TQ information towards F via B. When the quality of any links on the path B to F drops down, B might choose A as its next neighbor towards F and a loop forms between A and B. Regards, Yang On Friday 07 August 2009 23:13:46 Yang Su wrote: > - Some updates: I did more thorough testing. It turns out that this patch > does really solve the problem. This method seems not able to completely > eliminate all possible forms of looping which happen in the test cases. > This result in an special pattern: the rerouting is fast (~15 seconds), > then (after 3~ 5 seconds) the network enters the looping state and it > normally takes long time to recover. Last time when I did tests, I stop > the tests right after I observed the successful rerouting. I didn't look > into the cause of this problem yet. Any inputs are welcome. Can you provide logs from the nodes involved ? You also can send them off-list to avoid sending too big attachments. > - In the following example (also appended in the attachment :) ), if I > understand the current echo cancellation implementation correctly, batman > will enter permanent looping between A and B. In this example, A send to F, > all the links are perfect and have the same delay. Only exception is link > A-E. It is an asymmetric link. Why do you think it would loop permanently and why do you think it is the fault of the echo cancellation ? Actually, this example should[tm] be pretty easy. Every time the OGM from E or F arrive at A via the asymetric link A will apply a severe asymetric link penalty. All nodes will route via the longer path. Did you try to run this scenario in your testbed ? Regards, Marek
On Monday 10 August 2009 15:20:28 Yang Su wrote: > For the echo cancellation example: the problem happens with the interaction > between A and B. Because of the asym link, A chooses B as its next neighbor > towards F. Because the asym path is short, B always receives F's OGM from A > first (before B receives the OGM with the same seqno from C). As a result, > from A's point of view, F's OGMs rebroadcasted by B are always echoes and A > will never update the TQ information towards F via B. > > When the quality of any links on the path B to F drops down, B might choose > A as its next neighbor towards F and a loop forms between A and B. Ah, here we have the misunderstanding. Let me briefly outline how the echo cancellation works (I refer to your PDF-example): 1. Node C emits a packet. 2. Node B receives it and writes the address of Node C in the previous sender field because it received that packet via C. Now, B rebroadcasts the OGM. 3. Node C receives the packet it sent before and discards it as the packet contains its own address in the previous sender field. 4. Node A happily receives the packet (and goes back to step 2). The echo cancellation will not kill the packets coming via the longer path. Does this help ? Regards, Marek
|------------> | From: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Marek Lindner <lindner_marek@yahoo.de> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | To: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |"The list for a Better Approach To Mobile Ad-hoc Networking" <b.a.t.m.a.n@lists.open-mesh.net> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Date: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |10.08.2009 09:50 | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Subject: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Re: [B.A.T.M.A.N.] batman goes looping... | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Sent by: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |b.a.t.m.a.n-bounces@lists.open-mesh.net | >--------------------------------------------------------------------------------------------------------------------------------------------------| On Monday 10 August 2009 15:20:28 Yang Su wrote: > For the echo cancellation example: the problem happens with the interaction > between A and B. Because of the asym link, A chooses B as its next neighbor > towards F. Because the asym path is short, B always receives F's OGM from A > first (before B receives the OGM with the same seqno from C). As a result, > from A's point of view, F's OGMs rebroadcasted by B are always echoes and A > will never update the TQ information towards F via B. > > When the quality of any links on the path B to F drops down, B might choose > A as its next neighbor towards F and a loop forms between A and B. Ah, here we have the misunderstanding. Let me briefly outline how the echo cancellation works (I refer to your PDF-example): 1. Node C emits a packet. 2. Node B receives it and writes the address of Node C in the previous sender field because it received that packet via C. Now, B rebroadcasts the OGM. -- B has already rebroadcasted an earlier OGM with the same sequence number (received from A). Will B rebroadcast the OGM with the same sequence number (received from C) again? 3. Node C receives the packet it sent before and discards it as the packet contains its own address in the previous sender field. 4. Node A happily receives the packet (and goes back to step 2). Cheers, Yang
On Monday 10 August 2009 15:57:29 Yang Su wrote: > -- B has already rebroadcasted an earlier OGM with the same sequence number > (received from A). Will B rebroadcast the OGM with the same sequence number > (received from C) again? Excellent question - it should. I will check the code later to verify that this is the case. However, this is not related to the echo cancellation anymore. Regards, Marek
On Monday 10 August 2009 16:16:49 Marek Lindner wrote: > Excellent question - it should. I will check the code later to verify that > this is the case. > However, this is not related to the echo cancellation anymore. I inspected the code for behaviour in the case you outlined. I think our dropping policy might be a bit too strict. I suggest the following behaviour: * New OGMs are processed and rebroadcasted as usual. * Known OGMs (the sequence number is not new but we did not hear this sequence number via that neighbor) are processed and rebroadcasted as well. * Duplicates (known sequence number via a neighbor that sent us this sequence number before are dropped. Objections / ideas ? Regards, Marek
Given the DV nature of Batman, current strict dropping policy seems to make sense, because sending known OGMs (as suggested) does introduce significant overhead but does not propagate so much new information. In addition, loosing the dropping policy may introduce new problems which might not be easily predictable at this moment, e.g., stability, looping (propagating stale information within the network might cause even more nasty looping problems). Another way (and maybe simpler ) to fix the problem: instead of loosing the OGM dropping policy, loosing the echo cancellation policy. It is a very simple fix: in addition to the usual echo cancellation process, I also check the TQ value contained in the echoed OGM. This TQ is announced by the neighbor that broadcasts this echo. If this TQ is worse than my current TQ estimation towards that neighbor, I update the estimation with this TQ. Does that make sense? Regards, Yang |------------> | From: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Marek Lindner <lindner_marek@yahoo.de> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | To: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |"The list for a Better Approach To Mobile Ad-hoc Networking" <b.a.t.m.a.n@lists.open-mesh.net> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Date: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |11.08.2009 18:44 | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Subject: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Re: [B.A.T.M.A.N.] batman goes looping... | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Sent by: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |b.a.t.m.a.n-bounces@lists.open-mesh.net | >--------------------------------------------------------------------------------------------------------------------------------------------------| On Monday 10 August 2009 16:16:49 Marek Lindner wrote: > Excellent question - it should. I will check the code later to verify that > this is the case. > However, this is not related to the echo cancellation anymore. I inspected the code for behaviour in the case you outlined. I think our dropping policy might be a bit too strict. I suggest the following behaviour: * New OGMs are processed and rebroadcasted as usual. * Known OGMs (the sequence number is not new but we did not hear this sequence number via that neighbor) are processed and rebroadcasted as well. * Duplicates (known sequence number via a neighbor that sent us this sequence number before are dropped. Objections / ideas ? Regards, Marek
On Wednesday 12 August 2009 22:34:17 Yang Su wrote: > Given the DV nature of Batman, current strict dropping policy seems to make > sense, because sending known OGMs (as suggested) does introduce significant > overhead but does not propagate so much new information. > > In addition, loosing the dropping policy may introduce new problems which > might not be easily predictable at this moment, e.g., stability, looping > (propagating stale information within the network might cause even more > nasty looping problems). I agree that this might introduce new problems which is the reason for posting it here. I definitely see the overhead created by that methog. Every better idea will be accepted immediately. :-) > Another way (and maybe simpler ) to fix the problem: instead of loosing the > OGM dropping policy, loosing the echo cancellation policy. > > It is a very simple fix: in addition to the usual echo cancellation > process, I also check the TQ value contained in the echoed OGM. This TQ > is announced by the neighbor that broadcasts this echo. If this TQ is worse > than my current TQ estimation towards that neighbor, I update the > estimation with this TQ. In your testbed it fixed the issue ? Looking at the example you gave I would expect that the problem is only solved partially. You argued the OGMs from E via A would always arrive faster at B compared to the longer path. Using your method would probably avoid the loop between A and B in the event of a packet loss on the longer path but it would not let route A via the longer path. Am I mistaken ? Regards, Marek
Maybe I miss some thing here. What do you think would prevent A from choosing via the long route (via B) ? Cheers, Yang |------------> | From: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Marek Lindner <lindner_marek@yahoo.de> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | To: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |"The list for a Better Approach To Mobile Ad-hoc Networking" <b.a.t.m.a.n@lists.open-mesh.net> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Date: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |13.08.2009 11:59 | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Subject: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Re: [B.A.T.M.A.N.] batman goes looping... | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Sent by: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |b.a.t.m.a.n-bounces@lists.open-mesh.net | >--------------------------------------------------------------------------------------------------------------------------------------------------| On Wednesday 12 August 2009 22:34:17 Yang Su wrote: > Given the DV nature of Batman, current strict dropping policy seems to make > sense, because sending known OGMs (as suggested) does introduce significant > overhead but does not propagate so much new information. > > In addition, loosing the dropping policy may introduce new problems which > might not be easily predictable at this moment, e.g., stability, looping > (propagating stale information within the network might cause even more > nasty looping problems). I agree that this might introduce new problems which is the reason for posting it here. I definitely see the overhead created by that methog. Every better idea will be accepted immediately. :-) > Another way (and maybe simpler ) to fix the problem: instead of loosing the > OGM dropping policy, loosing the echo cancellation policy. > > It is a very simple fix: in addition to the usual echo cancellation > process, I also check the TQ value contained in the echoed OGM. This TQ > is announced by the neighbor that broadcasts this echo. If this TQ is worse > than my current TQ estimation towards that neighbor, I update the > estimation with this TQ. In your testbed it fixed the issue ? Looking at the example you gave I would expect that the problem is only solved partially. You argued the OGMs from E via A would always arrive faster at B compared to the longer path. Using your method would probably avoid the loop between A and B in the event of a packet loss on the longer path but it would not let route A via the longer path. Am I mistaken ? Regards, Marek
On Thursday 13 August 2009 23:07:06 Yang Su wrote: > Maybe I miss some thing here. What do you think would prevent A from > choosing via the long route (via B) ? The starting point (still using your example): * A points to E as it never learned about the long path. * B points to C because it learned about the long path but does not rebroadcast the packets from C because they are slower than the packets via A. * C uses the longer path. Your idea: Use the echo cancellation's answer to retrieve more information, especially when the TQ which is coming back is worse than my own I update my routing information. The situation now: * A still points to E because the echos coming from B do not contain TQ values that are worse than its own. * B will probably never route via A because the TQ values are much worse. * C is not affected. Did I overlook something ? Regards, Marek
batman-adv-kernelland/routing.c | 44 ++++++++++++++++---------------------- 1 files changed, 19 insertions(+), 25 deletions(-) diff --git a/batman-adv-kernelland/routing.c b/batman-adv-kernelland/routing.c index b576f8c..204f6a3 100644 --- a/batman-adv-kernelland/routing.c +++ b/batman-adv-kernelland/routing.c @@ -270,31 +270,22 @@ static int isBidirectionalNeigh(struct orig_node *orig_node, struct orig_node *o static void update_orig(struct orig_node *orig_node, struct ethhdr *ethhdr, struct batman_packet *batman_packet, struct batman_if *if_incoming, unsigned char *hna_buff, int hna_buff_len, char is_duplicate) { - struct neigh_node *neigh_node = NULL, *tmp_neigh_node = NULL, *best_neigh_node = NULL; - unsigned char max_tq = 0, max_bcast_own = 0; + struct neigh_node *neigh_node = NULL, *tmp_neigh_node = NULL; int tmp_hna_buff_len; debug_log(LOG_TYPE_BATMAN, "update_originator(): Searching and updating originator entry of received packet \n"); list_for_each_entry(tmp_neigh_node, &orig_node->neigh_list, list) { - if (compare_orig(tmp_neigh_node->addr, ethhdr->h_source) && (tmp_neigh_node->if_incoming == if_incoming)) { neigh_node = tmp_neigh_node; - } else { - - if (!is_duplicate) { - ring_buffer_set(tmp_neigh_node->tq_recv, &tmp_neigh_node->tq_index, 0); - tmp_neigh_node->tq_avg = ring_buffer_avg(tmp_neigh_node->tq_recv); - } + continue; + } - /* if we got have a better tq value via this neighbour or same tq value if it is currently our best neighbour (to avoid route flipping) */ - if ((tmp_neigh_node->tq_avg > max_tq) || ((tmp_neigh_node->tq_avg == max_tq) && (tmp_neigh_node->orig_node->bcast_own_sum[if_incoming->if_num] > max_bcast_own)) || ((orig_node->router == tmp_neigh_node) && (tmp_neigh_node->tq_avg == max_tq))) { + if (is_duplicate) + continue; - max_tq = tmp_neigh_node->tq_avg; - max_bcast_own = tmp_neigh_node->orig_node->bcast_own_sum[if_incoming->if_num]; - best_neigh_node = tmp_neigh_node; - } - } + ring_buffer_set(tmp_neigh_node->tq_recv, &tmp_neigh_node->tq_index, 0); + tmp_neigh_node->tq_avg = ring_buffer_avg(tmp_neigh_node->tq_recv); } if (neigh_node == NULL) @@ -313,17 +304,20 @@ static void update_orig(struct orig_node *orig_node, struct ethhdr *ethhdr, stru neigh_node->last_ttl = batman_packet->ttl; } - if ((neigh_node->tq_avg > max_tq) || ((neigh_node->tq_avg == max_tq) && (neigh_node->orig_node->bcast_own_sum[if_incoming->if_num] > max_bcast_own)) || ((orig_node->router == neigh_node) && (neigh_node->tq_avg == max_tq))) { - - max_tq = neigh_node->tq_avg; - max_bcast_own = neigh_node->orig_node->bcast_own_sum[if_incoming->if_num]; - best_neigh_node = neigh_node; - - } - tmp_hna_buff_len = (hna_buff_len > batman_packet->num_hna * ETH_ALEN ? batman_packet->num_hna * ETH_ALEN : hna_buff_len); - update_routes(orig_node, best_neigh_node, hna_buff, tmp_hna_buff_len); + /** + * if we got have a better tq value via this neighbour or + * same tq value but the link is more symetric change the next hop + * router + */ + if ((orig_node->router != neigh_node) && ((!orig_node->router) || + (neigh_node->tq_avg > orig_node->router->tq_avg) || + ((neigh_node->tq_avg == orig_node->router->tq_avg) && + (neigh_node->orig_node->bcast_own_sum[if_incoming->if_num] > orig_node->router->orig_node->bcast_own_sum[if_incoming->if_num])))) + update_routes(orig_node, neigh_node, hna_buff, tmp_hna_buff_len); + else + update_routes(orig_node, orig_node->router, hna_buff, tmp_hna_buff_len); } static char count_real_packets(struct ethhdr *ethhdr, struct batman_packet *batman_packet, struct batman_if *if_incoming)