[v2,net-next,2/4] bridge: adhere to querier election mechanism specified by RFCs

Message ID 1400994205-31165-3-git-send-email-linus.luessing@web.de (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Linus Lüssing May 25, 2014, 5:03 a.m. UTC
  MLDv1 (RFC2710 section 6), MLDv2 (RFC3810 section 7.6.2), IGMPv2
(RFC2236 section 3) and IGMPv3 (RFC3376 section 6.6.2) specify that the
querier with lowest source address shall become the selected
querier.

So far the bridge stopped its querier as soon as it heard another
querier regardless of its source address. This results in the "wrong"
querier potentially becoming the active querier or a potential,
unnecessary querying delay.

With this patch the bridge memorizes the source address of the currently
selected querier and ignores queries from queriers with a higher source
address than the currently selected one. This slight optimization is
supposed to make it more RFC compliant (but is rather uncritical and
therefore probably not necessary to be queued for stable kernels).

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
---
 net/bridge/br_multicast.c |  101 +++++++++++++++++++++++++++++++++++++++------
 net/bridge/br_private.h   |    7 ++++
 2 files changed, 95 insertions(+), 13 deletions(-)
  

Comments

Linus Lüssing June 27, 2014, 12:12 a.m. UTC | #1
Hi Ajith,


On Mon, Jun 23, 2014 at 07:58:21AM +0530, Ajith Adapa wrote:
> Hi Luessing,
> 
> As per RFC 4541, snooping switches send query with source address as
> 0.0.0.0 since port in a L2 switch doesn't have ip-address configured.

Hm, I'm not quite sure whether this is true. Do you have a
section/paragraph in RFC4541 which says that?

To me, it seems that 0.0.0.0/:: should only be used when the
topology changes, on links becoming active (the RFC uses the example
of STP). As a performance optimization but not as the source
address for General Queries.

In section 2.1.1 1) b) says:

     "The 0.0.0.0 address represents a special case where the switch
      is proxying IGMP Queries for faster network convergence,
      but is not itself the Querier."

In section 2.1.1 4) it says:

     "If the switch is not the Querier, it should use the 'all-zeros' IP
      Source Address in these proxy queries (even though some hosts may
      elect to not process queries with a 0.0.0.0 IP Source Address)."

(which seems to imply that if a switch is the selected Querier, that
it should use something other than the 'all-zeros' IP Source
Address?)

Also, section 2.1.1 4) says:

"[...] even though some hosts may elect to not process queries with a
 0.0.0.0 IP Source Address"

Which seems to indicate that 0.0.0.0 might be a bad choice to get
reliable IGMP Reports and might only make sense for performance
optimizations (at least for IPv4 - for IPv6 it doesn't make sense
to me to send General Queries with '::' source address at all
since '::' is ignored by hosts per RFC).

(*)

> 
> With current changes linux bridge will always select a querier who sends a
> query message with 0.0.0.0. Right ?
> 
> +       if (ntohl(saddr) <= ntohl(br->ip4_querier.addr.u.ip4))
> +               goto update;

Yes. Which is what the querier election mechanism in the RFCs for
IGMPv2/IGMPv3 (RFC2236/RFC3376) specifies, I think. (**)

> 
> Currently linux bridge sends a query message with source address as
> 0.0.0.0. Right ?

For IPv4, this is currently the default, yes. (there's a sysfs switch
to change that, if desired) (***)

> 
> What should be the expected behaviour ?

Hm, good question.

Regarding (***): After rereading RFC4541 a few more times, I think
(*) would be the right choice: The bridge code should probably be
changed to always use an available, non-zero IPv4 source address
for General Queries. If none is available it, doesn't make sense
to me to become the selected Querier at all (should be changed for
IPv6, too: Currently it refrains from sending Query messages
then, but thinks a valid querier were in the network, enabling
the multicast snooping of the bridge).


Regarding (**): Now that you're mentioning that, you're right, my
change isn't quite straightforward in that regard, a few lines
earlier there is:

> +       if (!br->ip4_querier.addr.u.ip4)
> +               goto update;

So it would select the all-zeros Querier if it receives such
messages. But would happily switch to a non-all-zeros Querier as
well if receiving such messages...

And now, this baffles me, section 2.1.1 4) again:

     "When such proxy queries are received, they must not be included in
      the Querier election process."

If we were following that, a snooping switch and a true multicast
router could potentially have an argument about which becomes the
selected IGMP querier? (for IPv6/MLD this dilemma should not
exist, I think)

I'm currently unsure, whether we should follow
RFC2236(IGMPv2)+RFC3376(IGMPv3) or the informational (!) RFC4541.

Or am I overlooking something? What do others think?

Cheers, Linus
  

Patch

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 5ccac62..b3f17c9 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -789,6 +789,18 @@  static void br_ip6_multicast_querier_expired(unsigned long data)
 }
 #endif
 
+static void br_multicast_select_own_querier(struct net_bridge *br,
+					    struct br_ip *ip,
+					    struct sk_buff *skb)
+{
+	if (ip->proto == htons(ETH_P_IP))
+		br->ip4_querier.addr.u.ip4 = ip_hdr(skb)->saddr;
+#if IS_ENABLED(CONFIG_IPV6)
+	else
+		br->ip6_querier.addr.u.ip6 = ipv6_hdr(skb)->saddr;
+#endif
+}
+
 static void __br_multicast_send_query(struct net_bridge *br,
 				      struct net_bridge_port *port,
 				      struct br_ip *ip)
@@ -804,8 +816,10 @@  static void __br_multicast_send_query(struct net_bridge *br,
 		skb->dev = port->dev;
 		NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
 			dev_queue_xmit);
-	} else
+	} else {
+		br_multicast_select_own_querier(br, ip, skb);
 		netif_rx(skb);
+	}
 }
 
 static void br_multicast_send_query(struct net_bridge *br,
@@ -1065,6 +1079,62 @@  static int br_ip6_multicast_mld2_report(struct net_bridge *br,
 }
 #endif
 
+static bool br_ip4_multicast_select_querier(struct net_bridge *br,
+					    __be32 saddr)
+{
+	if (!timer_pending(&br->ip4_own_query.timer) &&
+	    !timer_pending(&br->ip4_other_query.timer))
+		goto update;
+
+	if (!br->ip4_querier.addr.u.ip4)
+		goto update;
+
+	if (ntohl(saddr) <= ntohl(br->ip4_querier.addr.u.ip4))
+		goto update;
+
+	return false;
+
+update:
+	br->ip4_querier.addr.u.ip4 = saddr;
+
+	return true;
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static bool br_ip6_multicast_select_querier(struct net_bridge *br,
+					    struct in6_addr *saddr)
+{
+	if (!timer_pending(&br->ip6_own_query.timer) &&
+	    !timer_pending(&br->ip6_other_query.timer))
+		goto update;
+
+	if (ipv6_addr_cmp(saddr, &br->ip6_querier.addr.u.ip6) <= 0)
+		goto update;
+
+	return false;
+
+update:
+	br->ip6_querier.addr.u.ip6 = *saddr;
+
+	return true;
+}
+#endif
+
+static bool br_multicast_select_querier(struct net_bridge *br,
+					struct br_ip *saddr)
+{
+	switch (saddr->proto) {
+	case htons(ETH_P_IP):
+		return br_ip4_multicast_select_querier(br, saddr->u.ip4);
+#if IS_ENABLED(CONFIG_IPV6)
+	case htons(ETH_P_IPV6):
+		return br_ip6_multicast_select_querier(br, &saddr->u.ip6);
+#endif
+	}
+
+	return false;
+}
+
 static void
 br_multicast_update_query_timer(struct net_bridge *br,
 				struct bridge_mcast_other_query *query,
@@ -1127,15 +1197,13 @@  timer:
 static void br_multicast_query_received(struct net_bridge *br,
 					struct net_bridge_port *port,
 					struct bridge_mcast_other_query *query,
-					int saddr,
-					bool is_general_query,
+					struct br_ip *saddr,
 					unsigned long max_delay)
 {
-	if (saddr && is_general_query)
-		br_multicast_update_query_timer(br, query, max_delay);
-	else if (timer_pending(&query->timer))
+	if (!br_multicast_select_querier(br, saddr))
 		return;
 
+	br_multicast_update_query_timer(br, query, max_delay);
 	br_multicast_mark_router(br, port);
 }
 
@@ -1150,6 +1218,7 @@  static int br_ip4_multicast_query(struct net_bridge *br,
 	struct igmpv3_query *ih3;
 	struct net_bridge_port_group *p;
 	struct net_bridge_port_group __rcu **pp;
+	struct br_ip saddr;
 	unsigned long max_delay;
 	unsigned long now = jiffies;
 	__be32 group;
@@ -1191,11 +1260,14 @@  static int br_ip4_multicast_query(struct net_bridge *br,
 		goto out;
 	}
 
-	br_multicast_query_received(br, port, &br->ip4_other_query,
-				    !!iph->saddr, !group, max_delay);
+	if (!group) {
+		saddr.proto = htons(ETH_P_IP);
+		saddr.u.ip4 = iph->saddr;
 
-	if (!group)
+		br_multicast_query_received(br, port, &br->ip4_other_query,
+					    &saddr, max_delay);
 		goto out;
+	}
 
 	mp = br_mdb_ip4_get(mlock_dereference(br->mdb, br), group, vid);
 	if (!mp)
@@ -1235,6 +1307,7 @@  static int br_ip6_multicast_query(struct net_bridge *br,
 	struct mld2_query *mld2q;
 	struct net_bridge_port_group *p;
 	struct net_bridge_port_group __rcu **pp;
+	struct br_ip saddr;
 	unsigned long max_delay;
 	unsigned long now = jiffies;
 	const struct in6_addr *group = NULL;
@@ -1283,12 +1356,14 @@  static int br_ip6_multicast_query(struct net_bridge *br,
 		goto out;
 	}
 
-	br_multicast_query_received(br, port, &br->ip6_other_query,
-				    !ipv6_addr_any(&ip6h->saddr),
-				    is_general_query, max_delay);
+	if (is_general_query) {
+		saddr.proto = htons(ETH_P_IPV6);
+		saddr.u.ip6 = ip6h->saddr;
 
-	if (!group)
+		br_multicast_query_received(br, port, &br->ip6_other_query,
+					    &saddr, max_delay);
 		goto out;
+	}
 
 	mp = br_mdb_ip6_get(mlock_dereference(br->mdb, br), group, vid);
 	if (!mp)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index f186ce8..bc1803b 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -78,6 +78,11 @@  struct bridge_mcast_other_query {
 	struct timer_list		timer;
 	unsigned long			delay_time;
 };
+
+/* selected querier */
+struct bridge_mcast_querier {
+	struct br_ip addr;
+};
 #endif
 
 struct net_port_vlans {
@@ -284,9 +289,11 @@  struct net_bridge
 	struct timer_list		multicast_router_timer;
 	struct bridge_mcast_other_query	ip4_other_query;
 	struct bridge_mcast_own_query	ip4_own_query;
+	struct bridge_mcast_querier	ip4_querier;
 #if IS_ENABLED(CONFIG_IPV6)
 	struct bridge_mcast_other_query	ip6_other_query;
 	struct bridge_mcast_own_query	ip6_own_query;
+	struct bridge_mcast_querier	ip6_querier;
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 #endif