mbox

[0/1] pull request for net: batman-adv 2023-06-07

Message ID 20230607155515.548120-1-sw@simonwunderlich.de (mailing list archive)
State Not Applicable, archived
Headers

Pull-request

git://git.open-mesh.org/linux-merge.git tags/batadv-net-pullrequest-20230607

Message

Simon Wunderlich June 7, 2023, 3:55 p.m. UTC
  Hi David, hi Jakub,

here is a bugfix for batman-adv which we would like to have integrated into net.

Please pull or let me know of any problem!

Thank you,
      Simon

The following changes since commit 44c026a73be8038f03dbdeef028b642880cf1511:

  Linux 6.4-rc3 (2023-05-21 14:05:48 -0700)

are available in the Git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batadv-net-pullrequest-20230607

for you to fetch changes up to abac3ac97fe8734b620e7322a116450d7f90aa43:

  batman-adv: Broken sync while rescheduling delayed work (2023-05-26 23:14:49 +0200)

----------------------------------------------------------------
Here is a batman-adv bugfix:

 - fix a broken sync while rescheduling delayed work, by
   Vladislav Efanov

----------------------------------------------------------------
Vladislav Efanov (1):
      batman-adv: Broken sync while rescheduling delayed work

 net/batman-adv/distributed-arp-table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Jakub Kicinski June 8, 2023, 5:01 a.m. UTC | #1
On Wed,  7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote:
> The reason for these issues is the lack of synchronization. Delayed
> work (batadv_dat_purge) schedules new timer/work while the device
> is being deleted. As the result new timer/delayed work is set after
> cancel_delayed_work_sync() was called. So after the device is freed
> the timer list contains pointer to already freed memory.

I guess this is better than status quo but is the fix really complete?
We're still not preventing the timer / work from getting scheduled
and staying alive after the netdev has been freed, right?
  
Keller, Jacob E June 8, 2023, 5:24 a.m. UTC | #2
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, June 7, 2023 10:01 PM
> To: Simon Wunderlich <sw@simonwunderlich.de>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; b.a.t.m.a.n@lists.open-
> mesh.org; Vladislav Efanov <VEfanov@ispras.ru>; stable@kernel.org; Sven
> Eckelmann <sven@narfation.org>
> Subject: Re: [PATCH 1/1] batman-adv: Broken sync while rescheduling delayed
> work
> 
> On Wed,  7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote:
> > The reason for these issues is the lack of synchronization. Delayed
> > work (batadv_dat_purge) schedules new timer/work while the device
> > is being deleted. As the result new timer/delayed work is set after
> > cancel_delayed_work_sync() was called. So after the device is freed
> > the timer list contains pointer to already freed memory.
> 
> I guess this is better than status quo but is the fix really complete?
> We're still not preventing the timer / work from getting scheduled
> and staying alive after the netdev has been freed, right?

Yea, I would expect some synchronization mechanism to ensure that after cancel_delayed_work_sync() you can't queue the work again.

I know for timers there is recently timer_shutdown_sync() which can be used to guarantee a timer can't re-arm at all, and its intended for some situations where there is a cyclic dependency...

Thanks,
Jake
  
Vlad Efanov June 8, 2023, 9:01 a.m. UTC | #3
As far as I found the synchronization is provided by delayed work 
subsystem. It is based on the WORK_STRUCT_PENDING_BIT in work->data field.

The cancel_delayed_work_sync() atomically sets this bit and 
queue_delayed_work() checks it before scheduling new delayed work.


The problem is caused by the INIT_DELAYED_WORK() call inside 
batadv_dat_start_timer(). This call happens before the 
queue_delayed_work() call and clears this bit.


Best regards,

Vlad


On 08.06.2023 08:24, Keller, Jacob E wrote:
>
>> -----Original Message-----
>> From: Jakub Kicinski <kuba@kernel.org>
>> Sent: Wednesday, June 7, 2023 10:01 PM
>> To: Simon Wunderlich <sw@simonwunderlich.de>
>> Cc: davem@davemloft.net; netdev@vger.kernel.org; b.a.t.m.a.n@lists.open-
>> mesh.org; Vladislav Efanov <VEfanov@ispras.ru>; stable@kernel.org; Sven
>> Eckelmann <sven@narfation.org>
>> Subject: Re: [PATCH 1/1] batman-adv: Broken sync while rescheduling delayed
>> work
>>
>> On Wed,  7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote:
>>> The reason for these issues is the lack of synchronization. Delayed
>>> work (batadv_dat_purge) schedules new timer/work while the device
>>> is being deleted. As the result new timer/delayed work is set after
>>> cancel_delayed_work_sync() was called. So after the device is freed
>>> the timer list contains pointer to already freed memory.
>> I guess this is better than status quo but is the fix really complete?
>> We're still not preventing the timer / work from getting scheduled
>> and staying alive after the netdev has been freed, right?
> Yea, I would expect some synchronization mechanism to ensure that after cancel_delayed_work_sync() you can't queue the work again.
>
> I know for timers there is recently timer_shutdown_sync() which can be used to guarantee a timer can't re-arm at all, and its intended for some situations where there is a cyclic dependency...
>
> Thanks,
> Jake
  
Paolo Abeni June 8, 2023, 9:27 a.m. UTC | #4
On Wed, 2023-06-07 at 22:01 -0700, Jakub Kicinski wrote:
> On Wed,  7 Jun 2023 17:55:15 +0200 Simon Wunderlich wrote:
> > The reason for these issues is the lack of synchronization. Delayed
> > work (batadv_dat_purge) schedules new timer/work while the device
> > is being deleted. As the result new timer/delayed work is set after
> > cancel_delayed_work_sync() was called. So after the device is freed
> > the timer list contains pointer to already freed memory.
> 
> I guess this is better than status quo but is the fix really complete?
> We're still not preventing the timer / work from getting scheduled
> and staying alive after the netdev has been freed, right?

I *think* this specific use case does not expose such problem, as the
delayed work is (AFAICS) scheduled only at device creation time and by
the work itself, it should never be re-scheduled after
cancel_delayed_work_sync()

Cheers,

Paolo
  
Sven Eckelmann June 8, 2023, 4:57 p.m. UTC | #5
On Thursday, 8 June 2023 11:27:31 CEST Paolo Abeni wrote:
[...]
> > We're still not preventing the timer / work from getting scheduled
> > and staying alive after the netdev has been freed, right?
> 
> I *think* this specific use case does not expose such problem, as the
> delayed work is (AFAICS) scheduled only at device creation time and by
> the work itself, it should never be re-scheduled after
> cancel_delayed_work_sync()

Correct.

* batadv_dat_start_timer is the only thing scheduling it
* batadv_dat_start_timer is called by:

  - batadv_dat_purge (the worker rearming itself)
  - batadv_dat_init (when the interface is created)

Kind regards,
	Sven