From patchwork Wed Jan 22 20:51:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sven Eckelmann X-Patchwork-Id: 18795 X-Patchwork-Delegate: sw@simonwunderlich.de Return-Path: X-Original-To: patchwork@open-mesh.org Delivered-To: patchwork@open-mesh.org Received: from diktynna.open-mesh.org (localhost [IPv6:::1]) by diktynna.open-mesh.org (Postfix) with ESMTP id A8D2C83F24 for ; Wed, 22 Jan 2025 21:53:13 +0100 (CET) ARC-Seal: i=2; cv=pass; a=rsa-sha256; d=open-mesh.org; s=20121; t=1737579193; b=eN3ZKb/nQeZNv4M7jROPX2FL9Xz1bD9a+kcm1n/z0p7pEPnyBpZpZMrlvM+fruUC/BItu hRb0quxmKjShLqbhIjJBCUspTqp8zeLyPNvAXWmaNYnDpCV5hJlFDvuRwZ2kbAG8bVxdSUc wXILUmuMkDXl1k4Hwavbzy6Dl0qZ7F4= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=open-mesh.org; s=20121; t=1737579193; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=2YtDAyTtAIMl1eCEZmQ0DSMlC3jboxX0K1cFkKCwprw=; b=olm3/iLmCUYUJu8ZkiCDzp9EqE+k4yw/Ncf4UBy/3Ta5vGKvA1M1tQxryEho9SSN8ndDp Ii5sr4U634WZ8MufPeNO6+zEwyOXiZBHeLUUIFwJqe0ZZbfhpD9QOHAlu1udH+7ZqlHHQHh eEmtXjAbEKd6XU5adFM+gBNzLDTeeZg= ARC-Authentication-Results: i=2; open-mesh.org; dkim=pass header.d=narfation.org; arc=pass; dmarc=pass (Used From Domain Record) header.from=narfation.org policy.dmarc=none Authentication-Results: open-mesh.org; dkim=pass header.d=narfation.org; arc=pass; dmarc=pass (Used From Domain Record) header.from=narfation.org policy.dmarc=none Received: from dvalin.narfation.org (dvalin.narfation.org [IPv6:2a00:17d8:100::8b1]) by diktynna.open-mesh.org (Postfix) with ESMTPS id 174DB83D57 for ; Wed, 22 Jan 2025 21:51:34 +0100 (CET) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=open-mesh.org; s=20121; t=1737579094; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2YtDAyTtAIMl1eCEZmQ0DSMlC3jboxX0K1cFkKCwprw=; b=KjGOc4W5rp5qDaiB98R3oH71MmcSWcsL2PTK935kPpHG3bnnnNysO7w8tGEE7eKRuJEZ+O VLk0ewVwBqwO6m6jDcgRIwpIkf8wXpGvE5ZWxSYBPMJtH/x0PwoEk5DZcCD+m9qoULeqc4 6/ZJEpaUoarViLRXJtn8fCUgoTUYoRs= ARC-Authentication-Results: i=1; diktynna.open-mesh.org; dkim=pass header.d=narfation.org header.s=20121 header.b=E6E9khbm; spf=pass (diktynna.open-mesh.org: domain of sven@narfation.org designates 2a00:17d8:100::8b1 as permitted sender) smtp.mailfrom=sven@narfation.org; dmarc=pass (policy=none) header.from=narfation.org ARC-Seal: i=1; s=20121; d=open-mesh.org; t=1737579094; a=rsa-sha256; cv=none; b=d6+Ad9DPn7I/tuwl5n0srxqDNid/T/q37wflDE8rDkJOZmOpmxGo5UYUUbVLw9QQpcl4Tn wD85q+GmRzurYJmMnJZvXTobcxnm4Q+FI6K3IwUBcw75hBIEpqONIliuXR7InacXkPi/io l5p8YmbbyKBm1khSdxzw9zb9Exitehc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=narfation.org; s=20121; t=1737579093; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2YtDAyTtAIMl1eCEZmQ0DSMlC3jboxX0K1cFkKCwprw=; b=E6E9khbmIWAtlQFrP7NAQCkWRwIdzOyR07pvFfg61T4+OSv5v6DkRfccVguGqCaBPzAdVC 5f34RMbWLNd20SvM7xZ793B3/tVwJtkBVnnbSr8E+G249WiObNC5j2TnOGa3zRedTg8zes ATGy3xf1bfbrcYyKYZFfjzol/wLUH+M= From: Sven Eckelmann Date: Wed, 22 Jan 2025 21:51:21 +0100 Subject: [PATCH 2/2] batman-adv: Drop unmanaged ELP metric worker MIME-Version: 1.0 Message-Id: <20250122-fix-metric-update-v1-2-c489b3f82f6d@narfation.org> References: <20250122-fix-metric-update-v1-0-c489b3f82f6d@narfation.org> In-Reply-To: <20250122-fix-metric-update-v1-0-c489b3f82f6d@narfation.org> To: b.a.t.m.a.n@lists.open-mesh.org Cc: Sven Eckelmann X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=9509; i=sven@narfation.org; h=from:subject:message-id; bh=O8Huo9jVw9iLJuGMf8qJbfQSUz/+zA/baHPRTo+bvk4=; b=owGbwMvMwCXmy1+ufVnk62nG02pJDOkTo7z/HxQ/cI2Vf1pWiMGHaTVlzJ27soSW8Qsanz0jM OlT6velHaUsDGJcDLJiiix7ruSf38z+Vv7ztI9HYeawMoEMYeDiFICJvC5k+F8RUVSwn2lPUEmB INf/ju/3tXY5W2craP5KS3wxbcsHsxxGhjaBReU/pu0tZ+vZ/L3xFdd3weID92bOuyRhlbcjVmH 3bH4A X-Developer-Key: i=sven@narfation.org; a=openpgp; fpr=522D7163831C73A635D12FE5EC371482956781AF Message-ID-Hash: FZ5M2EXN2XQ6Y2FGD7XGHN5IQJO2GBVP X-Message-ID-Hash: FZ5M2EXN2XQ6Y2FGD7XGHN5IQJO2GBVP X-MailFrom: sven@narfation.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-b.a.t.m.a.n.lists.open-mesh.org-0; header-match-b.a.t.m.a.n.lists.open-mesh.org-1; header-match-b.a.t.m.a.n.lists.open-mesh.org-2; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: The list for a Better Approach To Mobile Ad-hoc Networking Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: The ELP worker needs to calculate new metric values for all neighbors "reachable" over an interface. Some of the used metric sources require locks which might need to sleep. This sleep is incompatible with the RCU list iterator used for the recorded neighbors. The initial approach to work around of this problem was to queue another work item per neighbor and then run this in a new context. Even when this solved the RCU vs might_sleep() conflict, it has a major problems: Nothing was stopping the work item in case it is not needed anymore - for example because one of the related interfaces was removed or the batman-adv module was unloaded - resulting in potential invalid memory accesses. Directly canceling the metric worker also has various problems: * cancel_work_sync for a to-be-deactivated interface is called with rtnl_lock held. But the code in the ELP metric worker also tries to use rtnl_lock() - which will never return in this case. This also means that cancel_work_sync would never return because it is waiting for the worker to finish. * iterating over the neighbor list for the to-be-deactivated interface is currently done using the RCU specific methods. Which means that it is possible to miss items when iterating over it without the associated spinlock - a behaviour which is acceptable for a periodic metric check but not for a cleanup routine (which must "stop" all still running workers) The better approch is to get rid of the per interface neighbor metric worker and handle everything in the interface worker. The original problems are solved by: * creating a list of neighbors which require new metric information inside the RCU protected context, gathering the metric according to the new list outside the RCU protected context * only use rcu_trylock inside metric gathering code to avoid a deadlock when the cancel_delayed_work_sync is called in the interface removal code (which is called with the rtnl_lock held) Fixes: 5c3245172c01 ("batman-adv: ELP - compute the metric based on the estimated throughput") Signed-off-by: Sven Eckelmann --- net/batman-adv/bat_v.c | 2 -- net/batman-adv/bat_v_elp.c | 71 +++++++++++++++++++++++++++++++--------------- net/batman-adv/bat_v_elp.h | 2 -- net/batman-adv/types.h | 3 -- 4 files changed, 48 insertions(+), 30 deletions(-) diff --git a/net/batman-adv/bat_v.c b/net/batman-adv/bat_v.c index ac11f1f08db0f93bdaee81a2ce8822a6eda8db04..d35479c465e2c4868be4f7ec02f8626e54c1bb92 100644 --- a/net/batman-adv/bat_v.c +++ b/net/batman-adv/bat_v.c @@ -113,8 +113,6 @@ static void batadv_v_hardif_neigh_init(struct batadv_hardif_neigh_node *hardif_neigh) { ewma_throughput_init(&hardif_neigh->bat_v.throughput); - INIT_WORK(&hardif_neigh->bat_v.metric_work, - batadv_v_elp_throughput_metric_update); } /** diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c index 65e52de52bcd2688e0ed33cd8605821b17f5a63b..b065578b4436ee6b173e1201b0ad68af9c250f3b 100644 --- a/net/batman-adv/bat_v_elp.c +++ b/net/batman-adv/bat_v_elp.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -26,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -41,6 +43,18 @@ #include "routing.h" #include "send.h" +/** + * struct batadv_v_metric_queue_entry - list of hardif neighbors which require + * and metric update + */ +struct batadv_v_metric_queue_entry { + /** @hardif_neigh: hardif neighbor scheduled for metric update */ + struct batadv_hardif_neigh_node *hardif_neigh; + + /** @list: list node for metric_queue */ + struct list_head list; +}; + /** * batadv_v_elp_start_timer() - restart timer for ELP periodic work * @hard_iface: the interface for which the timer has to be reset @@ -137,10 +151,17 @@ static bool batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh, goto default_throughput; } + /* only use rtnl_trylock because the elp worker will be cancelled while + * the rntl_lock is held. the cancel_delayed_work_sync() would otherwise + * wait forever when the elp work_item was started and it is then also + * trying to rtnl_lock + */ + if (!rtnl_trylock()) + return false; + /* if not a wifi interface, check if this device provides data via * ethtool (e.g. an Ethernet adapter) */ - rtnl_lock(); ret = __ethtool_get_link_ksettings(hard_iface->net_dev, &link_settings); rtnl_unlock(); if (ret == 0) { @@ -175,31 +196,19 @@ static bool batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh, /** * batadv_v_elp_throughput_metric_update() - worker updating the throughput * metric of a single hop neighbour - * @work: the work queue item + * @neigh: the neighbour to probe */ -void batadv_v_elp_throughput_metric_update(struct work_struct *work) +static void +batadv_v_elp_throughput_metric_update(struct batadv_hardif_neigh_node *neigh) { - struct batadv_hardif_neigh_node_bat_v *neigh_bat_v; - struct batadv_hardif_neigh_node *neigh; u32 throughput; bool valid; - neigh_bat_v = container_of(work, struct batadv_hardif_neigh_node_bat_v, - metric_work); - neigh = container_of(neigh_bat_v, struct batadv_hardif_neigh_node, - bat_v); - valid = batadv_v_elp_get_throughput(neigh, &throughput); if (!valid) - goto put_neigh; + return; ewma_throughput_add(&neigh->bat_v.throughput, throughput); - -put_neigh: - /* decrement refcounter to balance increment performed before scheduling - * this task - */ - batadv_hardif_neigh_put(neigh); } /** @@ -273,14 +282,16 @@ batadv_v_elp_wifi_neigh_probe(struct batadv_hardif_neigh_node *neigh) */ static void batadv_v_elp_periodic_work(struct work_struct *work) { + struct batadv_v_metric_queue_entry *metric_entry; + struct batadv_v_metric_queue_entry *metric_safe; struct batadv_hardif_neigh_node *hardif_neigh; struct batadv_hard_iface *hard_iface; struct batadv_hard_iface_bat_v *bat_v; struct batadv_elp_packet *elp_packet; + struct list_head metric_queue; struct batadv_priv *bat_priv; struct sk_buff *skb; u32 elp_interval; - bool ret; bat_v = container_of(work, struct batadv_hard_iface_bat_v, elp_wq.work); hard_iface = container_of(bat_v, struct batadv_hard_iface, bat_v); @@ -316,6 +327,8 @@ static void batadv_v_elp_periodic_work(struct work_struct *work) atomic_inc(&hard_iface->bat_v.elp_seqno); + INIT_LIST_HEAD(&metric_queue); + /* The throughput metric is updated on each sent packet. This way, if a * node is dead and no longer sends packets, batman-adv is still able to * react timely to its death. @@ -340,16 +353,28 @@ static void batadv_v_elp_periodic_work(struct work_struct *work) /* Reading the estimated throughput from cfg80211 is a task that * may sleep and that is not allowed in an rcu protected - * context. Therefore schedule a task for that. + * context. Therefore add it to metric_queue and process it + * outside rcu protected context. */ - ret = queue_work(batadv_event_workqueue, - &hardif_neigh->bat_v.metric_work); - - if (!ret) + metric_entry = kzalloc(sizeof(*metric_entry), GFP_ATOMIC); + if (!metric_entry) { batadv_hardif_neigh_put(hardif_neigh); + continue; + } + + metric_entry->hardif_neigh = hardif_neigh; + list_add(&metric_entry->list, &metric_queue); } rcu_read_unlock(); + list_for_each_entry_safe(metric_entry, metric_safe, &metric_queue, list) { + batadv_v_elp_throughput_metric_update(metric_entry->hardif_neigh); + + batadv_hardif_neigh_put(metric_entry->hardif_neigh); + list_del(&metric_entry->list); + kfree(metric_entry); + } + restart_timer: batadv_v_elp_start_timer(hard_iface); out: diff --git a/net/batman-adv/bat_v_elp.h b/net/batman-adv/bat_v_elp.h index 9e2740195fa2d44471c9067838384f0801c7057a..c9cb0a30710045729a49921c96011d59ed67120f 100644 --- a/net/batman-adv/bat_v_elp.h +++ b/net/batman-adv/bat_v_elp.h @@ -10,7 +10,6 @@ #include "main.h" #include -#include int batadv_v_elp_iface_enable(struct batadv_hard_iface *hard_iface); void batadv_v_elp_iface_disable(struct batadv_hard_iface *hard_iface); @@ -19,6 +18,5 @@ void batadv_v_elp_iface_activate(struct batadv_hard_iface *primary_iface, void batadv_v_elp_primary_iface_set(struct batadv_hard_iface *primary_iface); int batadv_v_elp_packet_recv(struct sk_buff *skb, struct batadv_hard_iface *if_incoming); -void batadv_v_elp_throughput_metric_update(struct work_struct *work); #endif /* _NET_BATMAN_ADV_BAT_V_ELP_H_ */ diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index f491bff8c51b8bf68eb11dbbeb1a434d446c25f0..fe89f08533fe694e137bf31594c60915d4783c7a 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -596,9 +596,6 @@ struct batadv_hardif_neigh_node_bat_v { * neighbor */ unsigned long last_unicast_tx; - - /** @metric_work: work queue callback item for metric update */ - struct work_struct metric_work; }; /**