slowpath warning

Message ID 20100220180411.GA15286@lunn.ch (mailing list archive)
State Superseded, archived
Headers

Commit Message

Andrew Lunn Feb. 20, 2010, 6:04 p.m. UTC
  On Fri, Feb 19, 2010 at 06:19:05PM +0100, Linus L??ssing wrote:
> Hi Andrew,
> 
> Sorry, didn't have the time to try your patch any earlier, I'm
> right in the middle of my exams :).

Hi Linus

Marek told me. No problems. I remember what its like studying for
exams. However, it is nice to sometimes take a break and do something
totally different. 

> Your patch already looks quite good, I couldn't reproduce any
> memory leaks or crashes here (tried that with three routers and 1
> or 2 vis servers activated, also activating/deactivating them a
> lot, no problems with that). And yes, the slow-path warning has
> gone with your patch.

Great. So we are on the right tracks.

> However, I'm having some weird output when connecting two routers
> over wifi _and_ over ethernet cable. The setup:
> 
> Before plugging in the cable:
> r1-ath1 <-- wifi --> r2-ath1
> ------------
> root@OpenWrt:~# batctl vd dot
> digraph {
>         "r1-ath1" -> "r2-ath1" [label="1.32"]
>         "r1-ath1" -> "r1-hna" [label="HNA"]
>         "r1-ath1" -> "5a:2e:1e:1f:64:6b" [label="HNA"]
>         subgraph "cluster_r1-ath1" {
>                 "r1-ath1" [peripheries=2]
>         }
>         "r2-ath1" -> "r1-ath1" [label="1.11"]
>         "r2-ath1" -> "r2-hna" [label="HNA"]
>         "r2-ath1" -> "82:31:95:f9:14:6f" [label="HNA"]
>         subgraph "cluster_r2-ath1" {
>                 "r2-ath1" [peripheries=2]
>         }
> }
> ------------
> After plugging in the cable:
> r1-ath1 <-- wifi --> r2-ath1 +
> r1-eth0.3 <-- cable --> r2-eth0.3
> ------------
> root@OpenWrt:~# batctl vd dot
> digraph {
>         "r1-ath1" -> "r2-ath1" [label="1.0"]
>         "r1-ath1" -> "r2-eth0.3" [label="1.66"]
>         "r1-ath1" -> "r1-hna" [label="HNA"]
>         "r1-ath1" -> "5a:2e:1e:1f:64:6b" [label="HNA"]
>         subgraph "cluster_r1-ath1" {
>                 "r1-ath1" [peripheries=2]
>                 "r1-eth0.3"
>         }
>         subgraph "cluster_r1-ath1" {
>                 "r1-ath1" [peripheries=2]
>         }
>         "r2-ath1" -> "r1-ath1" [label="1.0"]
>         "r2-ath1" -> "r1-eth0.3" [label="1.15"]
>         "r2-ath1" -> "r2-hna" [label="HNA"]
>         "r2-ath1" -> "82:31:95:f9:14:6f" [label="HNA"]
>         subgraph "cluster_r2-ath1" {
>                 "r2-ath1" [peripheries=2]
>                 "r2-eth0.3"
>         }
>         subgraph "cluster_r2-ath1" {
>                 "r2-ath1" [peripheries=2]
>         }
> }
> root@OpenWrt:~# cat /proc/net/batman-adv/vis_data
> 06:22:b0:98:87:dd,TQ 04:22:b0:98:87:fa 251, HNA 00:22:b0:98:87:dd, HNA 5a:2e:1e:1f:64:6b, PRIMARY, SEC 04:22:b0:98:87:de,
> 06:22:b0:98:87:f9,TQ 06:22:b0:98:87:dd 255, TQ 04:22:b0:98:87:de 251, HNA 00:22:b0:98:87:f9, HNA 82:31:95:f9:14:6f, SEC 04:22:b0:98:87:fa, PRIMARY,

Actually, this vis_data to does not map to the dot above!  There are
the wrong number of HNA, wrong order etc.

Here is what i think your bat-host file contains:
06:22:b0:98:87:dd r1-ath1
06:22:b0:98:87:f9 r2-ath1
00:22:b0:98:87:dd r1-hna
04:22:b0:98:87:de r1-eth0.3
00:22:b0:98:87:f9 r2-hna
04:22:b0:98:87:fa r2-eth0.3

and this is what i get, assuming i got the MAC->name mapping correct:

digraph {
	"r1-ath1" -> "r2-eth0.3" [label="1.15"]
	"r1-ath1" -> "r1-hna" [label="HNA"]
	"r1-ath1" -> "5a:2e:1e:1f:64:6b" [label="HNA"]
	subgraph "cluster_r1-ath1" {
		"r1-ath1" [peripheries=2]
	}
	subgraph "cluster_r1-ath1" {
		"r1-ath1" [peripheries=2]
		"r1-eth0.3"
	}
	"r2-ath1" -> "r1-ath1" [label="1.0"]
	"r2-ath1" -> "r1-eth0.3" [label="1.15"]
	"r2-ath1" -> "r2-hna" [label="HNA"]
	"r2-ath1" -> "82:31:95:f9:14:6f" [label="HNA"]
	subgraph "cluster_r2-ath1" {
		"r2-ath1" [peripheries=2]
		"r2-eth0.3"
	}
	subgraph "cluster_r2-ath1" {
		"r2-ath1" [peripheries=2]
	}
}

batctl parses top-to-bottom, left-to-right. It does not consolidate
the PRIMARY and the SECONDARY into one cluster. It leaves DOT to do
that. Hence there are two cluster statements for each cluster actually
drawn.

> So the second 'subgraph "cluster_r1-ath1"' is obviously
> unnecessary.

Yes, unnecessary, but makes the batctl code easier.

 Also "r1-ath1" -> "r2-eth0.3" looks wrong, should be
> "r1-eth0.3" -> "r2-eth0.3" instead (and the same with r2 a few
> lines later).

These comments i agree with. A wireless and a wired device should not
be neighbours. We don't have any records which originate from the
secondary MAC address. That is guess is the major problem here.

So, did my/Mareks patch break it, or was it broken before?

First i suggest you go back to just before Simon's patch which
introduced receiving using skbufs:

http://open-mesh.org/changeset/1517

That will tell us if we need to go back further, or our patch broke
it. 

If you need to go back further, i would suggest just before:

http://open-mesh.org/changeset/1510

However, if it is our patch then we can chop the patch into two:

Use Mareks patch:

https://lists.open-mesh.org/pipermail/b.a.t.m.a.n/2010-January/002261.html

and 


This adds a race condition, which i hope if O.K. for debugging
purposes, but i hope allows the send to happen without the slowpath
errors. If so, we can test Marek's part of the patch.

I'm on vacation for a week now. I will have Internet access some time,
but not much. 

Have fun debugging.

     Andrew
  

Comments

Linus Lüssing Feb. 21, 2010, 1:10 p.m. UTC | #1
On Sat, Feb 20, 2010 at 07:04:11PM +0100, Andrew Lunn wrote:
> On Fri, Feb 19, 2010 at 06:19:05PM +0100, Linus L??ssing wrote:
> > Hi Andrew,
> > 
> > Sorry, didn't have the time to try your patch any earlier, I'm
> > right in the middle of my exams :).
> 
> Hi Linus
> 
> Marek told me. No problems. I remember what its like studying for
> exams. However, it is nice to sometimes take a break and do something
> totally different. 
> 
> > Your patch already looks quite good, I couldn't reproduce any
> > memory leaks or crashes here (tried that with three routers and 1
> > or 2 vis servers activated, also activating/deactivating them a
> > lot, no problems with that). And yes, the slow-path warning has
> > gone with your patch.
> 
> Great. So we are on the right tracks.
> 
> > However, I'm having some weird output when connecting two routers
> > over wifi _and_ over ethernet cable. The setup:
> > 
> > Before plugging in the cable:
> > r1-ath1 <-- wifi --> r2-ath1
> > ------------
> > root@OpenWrt:~# batctl vd dot
> > digraph {
> >         "r1-ath1" -> "r2-ath1" [label="1.32"]
> >         "r1-ath1" -> "r1-hna" [label="HNA"]
> >         "r1-ath1" -> "5a:2e:1e:1f:64:6b" [label="HNA"]
> >         subgraph "cluster_r1-ath1" {
> >                 "r1-ath1" [peripheries=2]
> >         }
> >         "r2-ath1" -> "r1-ath1" [label="1.11"]
> >         "r2-ath1" -> "r2-hna" [label="HNA"]
> >         "r2-ath1" -> "82:31:95:f9:14:6f" [label="HNA"]
> >         subgraph "cluster_r2-ath1" {
> >                 "r2-ath1" [peripheries=2]
> >         }
> > }
> > ------------
> > After plugging in the cable:
> > r1-ath1 <-- wifi --> r2-ath1 +
> > r1-eth0.3 <-- cable --> r2-eth0.3
> > ------------
> > root@OpenWrt:~# batctl vd dot
> > digraph {
> >         "r1-ath1" -> "r2-ath1" [label="1.0"]
> >         "r1-ath1" -> "r2-eth0.3" [label="1.66"]
> >         "r1-ath1" -> "r1-hna" [label="HNA"]
> >         "r1-ath1" -> "5a:2e:1e:1f:64:6b" [label="HNA"]
> >         subgraph "cluster_r1-ath1" {
> >                 "r1-ath1" [peripheries=2]
> >                 "r1-eth0.3"
> >         }
> >         subgraph "cluster_r1-ath1" {
> >                 "r1-ath1" [peripheries=2]
> >         }
> >         "r2-ath1" -> "r1-ath1" [label="1.0"]
> >         "r2-ath1" -> "r1-eth0.3" [label="1.15"]
> >         "r2-ath1" -> "r2-hna" [label="HNA"]
> >         "r2-ath1" -> "82:31:95:f9:14:6f" [label="HNA"]
> >         subgraph "cluster_r2-ath1" {
> >                 "r2-ath1" [peripheries=2]
> >                 "r2-eth0.3"
> >         }
> >         subgraph "cluster_r2-ath1" {
> >                 "r2-ath1" [peripheries=2]
> >         }
> > }
> > root@OpenWrt:~# cat /proc/net/batman-adv/vis_data
> > 06:22:b0:98:87:dd,TQ 04:22:b0:98:87:fa 251, HNA 00:22:b0:98:87:dd, HNA 5a:2e:1e:1f:64:6b, PRIMARY, SEC 04:22:b0:98:87:de,
> > 06:22:b0:98:87:f9,TQ 06:22:b0:98:87:dd 255, TQ 04:22:b0:98:87:de 251, HNA 00:22:b0:98:87:f9, HNA 82:31:95:f9:14:6f, SEC 04:22:b0:98:87:fa, PRIMARY,
> 
> Actually, this vis_data to does not map to the dot above!  There are
> the wrong number of HNA, wrong order etc.
Hmm, just noticed, the output also seems to be flapping between
those two from time to time:
------------------
root@OpenWrt:~# cat /proc/net/batman-adv/vis
06:22:b0:98:87:dd,TQ 04:22:b0:98:87:fa 251, HNA 00:22:b0:98:87:dd, HNA f6:ae:97:b3:9a:5c, PRIMARY, SEC 04:22:b0:98:87:de,
06:22:b0:98:87:f9,TQ 04:22:b0:98:87:de 251, HNA da:3e:79:2c:d3:3e, HNA 00:22:b0:98:87:f9, PRIMARY, SEC 04:22:b0:98:87:fa,
root@OpenWrt:~# cat /proc/net/batman-adv/vis
06:22:b0:98:87:dd,TQ 04:22:b0:98:87:fa 251, HNA 00:22:b0:98:87:dd, HNA f6:ae:97:b3:9a:5c, PRIMARY, SEC 04:22:b0:98:87:de,
06:22:b0:98:87:f9,TQ 06:22:b0:98:87:dd 255, TQ 04:22:b0:98:87:de 251, HNA da:3e:79:2c:d3:3e, HNA 00:22:b0:98:87:f9, SEC 04:22:b0:98:87:fa, PRIMARY,
------------------

> 
> Here is what i think your bat-host file contains:
> 06:22:b0:98:87:dd r1-ath1
> 06:22:b0:98:87:f9 r2-ath1
> 00:22:b0:98:87:dd r1-hna
> 04:22:b0:98:87:de r1-eth0.3
> 00:22:b0:98:87:f9 r2-hna
> 04:22:b0:98:87:fa r2-eth0.3
> 
> and this is what i get, assuming i got the MAC->name mapping correct:
Yes, correct mapping :).

> 
> digraph {
> 	"r1-ath1" -> "r2-eth0.3" [label="1.15"]
> 	"r1-ath1" -> "r1-hna" [label="HNA"]
> 	"r1-ath1" -> "5a:2e:1e:1f:64:6b" [label="HNA"]
> 	subgraph "cluster_r1-ath1" {
> 		"r1-ath1" [peripheries=2]
> 	}
> 	subgraph "cluster_r1-ath1" {
> 		"r1-ath1" [peripheries=2]
> 		"r1-eth0.3"
> 	}
> 	"r2-ath1" -> "r1-ath1" [label="1.0"]
> 	"r2-ath1" -> "r1-eth0.3" [label="1.15"]
> 	"r2-ath1" -> "r2-hna" [label="HNA"]
> 	"r2-ath1" -> "82:31:95:f9:14:6f" [label="HNA"]
> 	subgraph "cluster_r2-ath1" {
> 		"r2-ath1" [peripheries=2]
> 		"r2-eth0.3"
> 	}
> 	subgraph "cluster_r2-ath1" {
> 		"r2-ath1" [peripheries=2]
> 	}
> }
> 
> batctl parses top-to-bottom, left-to-right. It does not consolidate
> the PRIMARY and the SECONDARY into one cluster. It leaves DOT to do
> that. Hence there are two cluster statements for each cluster actually
> drawn.
> 
> > So the second 'subgraph "cluster_r1-ath1"' is obviously
> > unnecessary.
> 
> Yes, unnecessary, but makes the batctl code easier.
> 
>  Also "r1-ath1" -> "r2-eth0.3" looks wrong, should be
> > "r1-eth0.3" -> "r2-eth0.3" instead (and the same with r2 a few
> > lines later).
> 
> These comments i agree with. A wireless and a wired device should not
> be neighbours. We don't have any records which originate from the
> secondary MAC address. That is guess is the major problem here.
> 
> So, did my/Mareks patch break it, or was it broken before?
> 
> First i suggest you go back to just before Simon's patch which
> introduced receiving using skbufs:
> 
> http://open-mesh.org/changeset/1517
> 
> That will tell us if we need to go back further, or our patch broke
> it. 
> 
> If you need to go back further, i would suggest just before:
> 
> http://open-mesh.org/changeset/1510
Okay, just checked, this got introduced with 1510 already, yes. I
might have a closer look at this next week.

Cheers, Linus
  

Patch

Index: vis.c
===================================================================
--- vis.c	(revision 1575)
+++ vis.c	(working copy)
@@ -444,10 +444,15 @@ 
 			memcpy(info->packet.target_orig,
 			       orig_node->orig, ETH_ALEN);
 
+spin_unlock_irqrestore(&orig_hash_lock, flags);
+
 			send_raw_packet((unsigned char *) &info->packet,
 					packet_length,
 					orig_node->batman_if,
 					orig_node->router->addr);
+
+spin_lock_irqsave(&orig_hash_lock, flags);
+
 		}
 	}
 	memcpy(info->packet.target_orig, broadcastAddr, ETH_ALEN);