batman-adv-kernelland: Fix memory corruption bug

Message ID DFF8A357-1AF1-4F11-BE89-46DD131FA7FF@gmail.com (mailing list archive)
State Accepted, archived
Headers

Commit Message

Scott Raynel Dec. 4, 2008, 1:14 a.m. UTC
  Hi there,

I've been spending some time tracking down a bug that's been causing  
memory corruption followed by random kernel panics. Thanks to the  
kernel's slab memory debugger I tracked it down to a kfree in send.c  
that was freeing a block of memory that had been written to past the  
end of its allocation.

Turned out to be a simple typo, which I've fixed in the following  
patch. When resizing the packet_buff struct in batman_if, the new  
length was being updated but the old length was being used for the  
kmalloc(), causing something later to think it had more memory  
allocated to write to, hence writing past the end of the allocation.

Signed-off-by: Scott Raynel <scottraynel@gmail.com>



Cheers,

--
Scott Raynel
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
  

Comments

Marek Lindner Dec. 4, 2008, 2:30 a.m. UTC | #1
Hey,

> Turned out to be a simple typo, which I've fixed in the following
> patch. When resizing the packet_buff struct in batman_if, the new
> length was being updated but the old length was being used for the
> kmalloc(), causing something later to think it had more memory
> allocated to write to, hence writing past the end of the allocation.

wow - nice catch ! 
I happily applied your patch (revision 1173).  :-)

Regards,
Marek
  
Simon Wunderlich Dec. 4, 2008, 11:35 a.m. UTC | #2
Hey Scott,

thank you very much for the fix! Can you confirm if this bug is related
to https://dev.open-mesh.net/batman/ticket/86 ?
This bug has very likely been caused by a memory corruption, but i
couldn´t find where. (i have not experienced any kernel panics by this
however ...).

Thanks, best regards
	Simon

On Thu, Dec 04, 2008 at 02:14:27PM +1300, Scott Raynel wrote:
> Hi there,
> 
> I've been spending some time tracking down a bug that's been causing  
> memory corruption followed by random kernel panics. Thanks to the  
> kernel's slab memory debugger I tracked it down to a kfree in send.c  
> that was freeing a block of memory that had been written to past the  
> end of its allocation.
> 
> Turned out to be a simple typo, which I've fixed in the following  
> patch. When resizing the packet_buff struct in batman_if, the new  
> length was being updated but the old length was being used for the  
> kmalloc(), causing something later to think it had more memory  
> allocated to write to, hence writing past the end of the allocation.
> 
> Signed-off-by: Scott Raynel <scottraynel@gmail.com>
> 
> Index: send.c
> ===================================================================
> --- send.c	(revision 1105)
> +++ send.c	(working copy)
> @@ -159,7 +159,7 @@
>  	if ((hna_local_changed) && (batman_if->if_num == 0)) {
> 
>  		new_len = sizeof(struct batman_packet) + (num_hna * 
>  		ETH_ALEN);
> -		new_buf = kmalloc(batman_if->pack_buff_len, GFP_ATOMIC);
> +		new_buf = kmalloc(new_len, GFP_ATOMIC);
> 
>  		/* keep old buffer if kmalloc should fail */
>  		if (new_buf) {
> 
> 
> Cheers,
> 
> --
> Scott Raynel
> WAND Network Research Group
> Department of Computer Science
> University of Waikato
> New Zealand
> 
> 
> 
> _______________________________________________
> B.A.T.M.A.N mailing list
> B.A.T.M.A.N@open-mesh.net
> https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n
>
  
Scott Raynel Dec. 5, 2008, 10:40 a.m. UTC | #3
Hi Simon,

On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote:

> Hey Scott,
>
> thank you very much for the fix! Can you confirm if this bug is  
> related
> to https://dev.open-mesh.net/batman/ticket/86 ?
> This bug has very likely been caused by a memory corruption, but i
> couldn´t find where. (i have not experienced any kernel panics by  
> this
> however ...).


It is quite possible that they are related. The slab error states that  
a memory allocation was overwritten - the same problem as my patch  
fixed. However, I can't confirm whether it is the same memory  
allocation or a different one. The stack trace I got specifically  
mentioned the kfree() in send_own_packet(), whereas this stack trace  
does not.

Is that bug easily reproducible? It will be a couple of days before I  
can try to look at it.

Also, the stack trace is confusing as it appears to indicate a kfree()  
within hardif_min_mtu(), which I can't find :)

I'll try to do some stress testing of the module with the slab  
debugger turned on for a while and see what happens.

Cheers,

--
Scott Raynel
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
  
Simon Wunderlich Dec. 5, 2008, 7:51 p.m. UTC | #4
Hey Scott,

On Fri, Dec 05, 2008 at 11:40:30PM +1300, Scott Raynel wrote:
> Hi Simon,
> 
> On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote:
> 
> >Hey Scott,
> >
> >thank you very much for the fix! Can you confirm if this bug is  
> >related
> >to https://dev.open-mesh.net/batman/ticket/86 ?
> >This bug has very likely been caused by a memory corruption, but i
> >couldn´t find where. (i have not experienced any kernel panics by  
> >this
> >however ...).
> 
> 
> It is quite possible that they are related. The slab error states that  
> a memory allocation was overwritten - the same problem as my patch  
> fixed. However, I can't confirm whether it is the same memory  
> allocation or a different one. The stack trace I got specifically  
> mentioned the kfree() in send_own_packet(), whereas this stack trace  
> does not.
> 
> Is that bug easily reproducible? It will be a couple of days before I  
> can try to look at it.

Yep, it was quite easy: just turn it on and off a few times. (echo
device and nothing into /proc/net/batman-adv/interfaces). The warning
appeared after 10 times in my qemu instance. No crash, only this warning.
> 
> Also, the stack trace is confusing as it appears to indicate a kfree()  
> within hardif_min_mtu(), which I can't find :)

That's the problem, that is what confused me at this point. :/

> 
> I'll try to do some stress testing of the module with the slab  
> debugger turned on for a while and see what happens.

Sounds great. Thanks for you hard work. :)

best regards,
	Simon
  
Scott Raynel Dec. 12, 2008, 9:08 a.m. UTC | #5
Hi Simon,

On 6/12/2008, at 8:51 AM, Simon Wunderlich wrote:

> Hey Scott,
>
> On Fri, Dec 05, 2008 at 11:40:30PM +1300, Scott Raynel wrote:
>> Hi Simon,
>>
>> On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote:
>>
>>> Hey Scott,
>>>
>>> thank you very much for the fix! Can you confirm if this bug is
>>> related
>>> to https://dev.open-mesh.net/batman/ticket/86 ?
>>> This bug has very likely been caused by a memory corruption, but i
>>> couldn´t find where. (i have not experienced any kernel panics by
>>> this
>>> however ...).
>>
>>
>> It is quite possible that they are related. The slab error states  
>> that
>> a memory allocation was overwritten - the same problem as my patch
>> fixed. However, I can't confirm whether it is the same memory
>> allocation or a different one. The stack trace I got specifically
>> mentioned the kfree() in send_own_packet(), whereas this stack trace
>> does not.
>>
>> Is that bug easily reproducible? It will be a couple of days before I
>> can try to look at it.
>
> Yep, it was quite easy: just turn it on and off a few times. (echo
> device and nothing into /proc/net/batman-adv/interfaces). The warning
> appeared after 10 times in my qemu instance. No crash, only this  
> warning.

I can't reproduce this bug before my patch is applied because the bug  
it fixes always gets in the way :)

After applying the patch I seem to be able to consistently lock up the  
system by adding and removing an interface from the batman device  
several times. The box still replies to pings, but I can't SSH in.  
This does not trigger the slab debugger. I've looked at using the  
magic sysreq interface to see what's going on and by printing the  
current task it appears to be hanging during the  
cancel_rearming_delayed_work() call in shutdown_module(). This might  
be related to the scheduling-while-atomic bugs. I'll keep looking into  
this as I get time, but things are pretty busy here at the moment.

Cheers,


--
Scott Raynel
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
  

Patch

Index: send.c
===================================================================
--- send.c	(revision 1105)
+++ send.c	(working copy)
@@ -159,7 +159,7 @@ 
  	if ((hna_local_changed) && (batman_if->if_num == 0)) {

  		new_len = sizeof(struct batman_packet) + (num_hna * ETH_ALEN);
-		new_buf = kmalloc(batman_if->pack_buff_len, GFP_ATOMIC);
+		new_buf = kmalloc(new_len, GFP_ATOMIC);

  		/* keep old buffer if kmalloc should fail */
  		if (new_buf) {