Message ID | DFF8A357-1AF1-4F11-BE89-46DD131FA7FF@gmail.com |
---|---|
State | Accepted, archived |
Headers |
Received: from warlock.cs.waikato.ac.nz (warlock.cs.waikato.ac.nz [130.217.250.15]) by open-mesh.net (8.13.4/8.13.4/Debian-3sarge3) with ESMTP id mB41KNbn010108 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for <b.a.t.m.a.n@open-mesh.net>; Thu, 4 Dec 2008 02:20:27 +0100 Received: from [192.107.171.51] (helo=dhcp-236.uni.crc.net.nz) by warlock.cs.waikato.ac.nz with esmtpsa (TLS-1.0:RSA_AES_128_CBC_SHA:16) (Exim 4.50) id 1L82nU-0005yh-69 for b.a.t.m.a.n@open-mesh.net; Thu, 04 Dec 2008 14:14:28 +1300 Message-Id: <DFF8A357-1AF1-4F11-BE89-46DD131FA7FF@gmail.com> From: Scott Raynel <scottraynel@gmail.com> To: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n@open-mesh.net> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v929.2) Date: Thu, 4 Dec 2008 14:14:27 +1300 X-Mailer: Apple Mail (2.929.2) Subject: [B.A.T.M.A.N.] [PATCH] batman-adv-kernelland: Fix memory corruption bug X-BeenThere: b.a.t.m.a.n@open-mesh.net X-Mailman-Version: 2.1.5 Precedence: list Reply-To: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n@open-mesh.net> List-Id: The list for a Better Approach To Mobile Ad-hoc Networking <b.a.t.m.a.n.open-mesh.net> List-Unsubscribe: <https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@open-mesh.net?subject=unsubscribe> List-Archive: <http://list.open-mesh.net/pipermail/b.a.t.m.a.n> List-Post: <mailto:b.a.t.m.a.n@open-mesh.net> List-Help: <mailto:b.a.t.m.a.n-request@open-mesh.net?subject=help> List-Subscribe: <https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n>, <mailto:b.a.t.m.a.n-request@open-mesh.net?subject=subscribe> X-List-Received-Date: Thu, 04 Dec 2008 01:20:27 -0000 |
Commit Message
Scott Raynel
Dec. 4, 2008, 1:14 a.m. UTC
Hi there,
I've been spending some time tracking down a bug that's been causing
memory corruption followed by random kernel panics. Thanks to the
kernel's slab memory debugger I tracked it down to a kfree in send.c
that was freeing a block of memory that had been written to past the
end of its allocation.
Turned out to be a simple typo, which I've fixed in the following
patch. When resizing the packet_buff struct in batman_if, the new
length was being updated but the old length was being used for the
kmalloc(), causing something later to think it had more memory
allocated to write to, hence writing past the end of the allocation.
Signed-off-by: Scott Raynel <scottraynel@gmail.com>
Cheers,
--
Scott Raynel
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
Comments
Hey, > Turned out to be a simple typo, which I've fixed in the following > patch. When resizing the packet_buff struct in batman_if, the new > length was being updated but the old length was being used for the > kmalloc(), causing something later to think it had more memory > allocated to write to, hence writing past the end of the allocation. wow - nice catch ! I happily applied your patch (revision 1173). :-) Regards, Marek
Hey Scott, thank you very much for the fix! Can you confirm if this bug is related to https://dev.open-mesh.net/batman/ticket/86 ? This bug has very likely been caused by a memory corruption, but i couldn´t find where. (i have not experienced any kernel panics by this however ...). Thanks, best regards Simon On Thu, Dec 04, 2008 at 02:14:27PM +1300, Scott Raynel wrote: > Hi there, > > I've been spending some time tracking down a bug that's been causing > memory corruption followed by random kernel panics. Thanks to the > kernel's slab memory debugger I tracked it down to a kfree in send.c > that was freeing a block of memory that had been written to past the > end of its allocation. > > Turned out to be a simple typo, which I've fixed in the following > patch. When resizing the packet_buff struct in batman_if, the new > length was being updated but the old length was being used for the > kmalloc(), causing something later to think it had more memory > allocated to write to, hence writing past the end of the allocation. > > Signed-off-by: Scott Raynel <scottraynel@gmail.com> > > Index: send.c > =================================================================== > --- send.c (revision 1105) > +++ send.c (working copy) > @@ -159,7 +159,7 @@ > if ((hna_local_changed) && (batman_if->if_num == 0)) { > > new_len = sizeof(struct batman_packet) + (num_hna * > ETH_ALEN); > - new_buf = kmalloc(batman_if->pack_buff_len, GFP_ATOMIC); > + new_buf = kmalloc(new_len, GFP_ATOMIC); > > /* keep old buffer if kmalloc should fail */ > if (new_buf) { > > > Cheers, > > -- > Scott Raynel > WAND Network Research Group > Department of Computer Science > University of Waikato > New Zealand > > > > _______________________________________________ > B.A.T.M.A.N mailing list > B.A.T.M.A.N@open-mesh.net > https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n >
Hi Simon, On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote: > Hey Scott, > > thank you very much for the fix! Can you confirm if this bug is > related > to https://dev.open-mesh.net/batman/ticket/86 ? > This bug has very likely been caused by a memory corruption, but i > couldn´t find where. (i have not experienced any kernel panics by > this > however ...). It is quite possible that they are related. The slab error states that a memory allocation was overwritten - the same problem as my patch fixed. However, I can't confirm whether it is the same memory allocation or a different one. The stack trace I got specifically mentioned the kfree() in send_own_packet(), whereas this stack trace does not. Is that bug easily reproducible? It will be a couple of days before I can try to look at it. Also, the stack trace is confusing as it appears to indicate a kfree() within hardif_min_mtu(), which I can't find :) I'll try to do some stress testing of the module with the slab debugger turned on for a while and see what happens. Cheers, -- Scott Raynel WAND Network Research Group Department of Computer Science University of Waikato New Zealand
Hey Scott, On Fri, Dec 05, 2008 at 11:40:30PM +1300, Scott Raynel wrote: > Hi Simon, > > On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote: > > >Hey Scott, > > > >thank you very much for the fix! Can you confirm if this bug is > >related > >to https://dev.open-mesh.net/batman/ticket/86 ? > >This bug has very likely been caused by a memory corruption, but i > >couldn´t find where. (i have not experienced any kernel panics by > >this > >however ...). > > > It is quite possible that they are related. The slab error states that > a memory allocation was overwritten - the same problem as my patch > fixed. However, I can't confirm whether it is the same memory > allocation or a different one. The stack trace I got specifically > mentioned the kfree() in send_own_packet(), whereas this stack trace > does not. > > Is that bug easily reproducible? It will be a couple of days before I > can try to look at it. Yep, it was quite easy: just turn it on and off a few times. (echo device and nothing into /proc/net/batman-adv/interfaces). The warning appeared after 10 times in my qemu instance. No crash, only this warning. > > Also, the stack trace is confusing as it appears to indicate a kfree() > within hardif_min_mtu(), which I can't find :) That's the problem, that is what confused me at this point. :/ > > I'll try to do some stress testing of the module with the slab > debugger turned on for a while and see what happens. Sounds great. Thanks for you hard work. :) best regards, Simon
Hi Simon, On 6/12/2008, at 8:51 AM, Simon Wunderlich wrote: > Hey Scott, > > On Fri, Dec 05, 2008 at 11:40:30PM +1300, Scott Raynel wrote: >> Hi Simon, >> >> On 5/12/2008, at 12:35 AM, Simon Wunderlich wrote: >> >>> Hey Scott, >>> >>> thank you very much for the fix! Can you confirm if this bug is >>> related >>> to https://dev.open-mesh.net/batman/ticket/86 ? >>> This bug has very likely been caused by a memory corruption, but i >>> couldnôt find where. (i have not experienced any kernel panics by >>> this >>> however ...). >> >> >> It is quite possible that they are related. The slab error states >> that >> a memory allocation was overwritten - the same problem as my patch >> fixed. However, I can't confirm whether it is the same memory >> allocation or a different one. The stack trace I got specifically >> mentioned the kfree() in send_own_packet(), whereas this stack trace >> does not. >> >> Is that bug easily reproducible? It will be a couple of days before I >> can try to look at it. > > Yep, it was quite easy: just turn it on and off a few times. (echo > device and nothing into /proc/net/batman-adv/interfaces). The warning > appeared after 10 times in my qemu instance. No crash, only this > warning. I can't reproduce this bug before my patch is applied because the bug it fixes always gets in the way :) After applying the patch I seem to be able to consistently lock up the system by adding and removing an interface from the batman device several times. The box still replies to pings, but I can't SSH in. This does not trigger the slab debugger. I've looked at using the magic sysreq interface to see what's going on and by printing the current task it appears to be hanging during the cancel_rearming_delayed_work() call in shutdown_module(). This might be related to the scheduling-while-atomic bugs. I'll keep looking into this as I get time, but things are pretty busy here at the moment. Cheers, -- Scott Raynel WAND Network Research Group Department of Computer Science University of Waikato New Zealand
Index: send.c =================================================================== --- send.c (revision 1105) +++ send.c (working copy) @@ -159,7 +159,7 @@ if ((hna_local_changed) && (batman_if->if_num == 0)) { new_len = sizeof(struct batman_packet) + (num_hna * ETH_ALEN); - new_buf = kmalloc(batman_if->pack_buff_len, GFP_ATOMIC); + new_buf = kmalloc(new_len, GFP_ATOMIC); /* keep old buffer if kmalloc should fail */ if (new_buf) {